Well it seems that by 2020, there will be 5,200 Gigabytes* of data for every man woman and child on the planet. That’s 40 Zettabytes or 40 trillion gigabytes in total for anyone who can even get their head around that kind of figure. That’s a lot of data.
We are already used to getting hit by targeted ads as soon as we hit the web – somehow every website you visit knows you are on the lookout for those limited edition trainers or a replacement ink cartridge for your printer. Almost all of it is machine generated – with algorithms furiously chatting to each other and sharing data points, as they manage billions of uploads, downloads, shares, transactions, connections, correlations and calculations.
The personal data you generate is substantial; imagine creating a cache that stored every digital detail of the life you lead in fully coded detail – every email, text, call, transaction, share, notification, web search, site visit, gps data point, transfer made, medicine taken, clinic visited, every Bluetooth enablement, every I watch exercise session, every profile page, every chatroom visited, PDF stored, film viewed, platform utilised, shopping delivery taken, product selected, even those you popped in the shopping basket and then dumped.
The scale of data directly attributable to you is vast and increasing. BUT. It is still not what we might call personal. It is about you but not of you.
When we say personal data we mean actually personal – what makes you you. Our genome has the instructions for making and maintaining you. It’s written in a chemical code called DNA and it’s made up of 3.2 billion letters. You have a copy in almost every one of your 30 trillion cells. The file size of just one human genome is roughly 200 gigabytes (9 zeros).
Rummaging around in how many films you uploaded, the journey you take from one shop to another or what predictive algorithms we can make about your purchasing habits might deliver the odd reward card benefit or a better customer experience – but rummage in your genomic data and the benefits could be much more profound.
Reading a person’s genome (called sequencing) can have significant benefits for healthcare by improving our understanding of disease and bringing about more personalised treatments for patients. But scanning through billions of letters in your genetic code takes time and sizeable computing power.
We all have millions of differences in our genome – 3 to 4 million to be exact. These differences make us unique and mean that some people have dark hair whilst others are blonde, most are harmless but some could be responsible for diseases. Scientists have to filter and interpret all these differences to pinpoint the ones that are likely to cause disease and be potential targets for drugs.
The tricky part comes when we start to discuss or consider how and where we store the data we collect: How secure is it? Who has access to it? How anonymous is it really?
The 100,000 Genome project has exactly that conundrum to contend with. It demands the collection, storage, sharing, analysis and reporting of huge amounts of data. Over the project’s lifetime it’s estimated to generate about 20 petabytes (15 zeros) of information, which will need 500 million computing hours to process. If you converted this to mp3 files, it would take 40,000 years to listen to from start to finish.
Data in the 100,000 Genomes Project is kept in a highly secure government facility. But beyond the project what could happen if Big DNA Data, millions of gigabytes, falls into the hands of hackers like an episode of Black Mirror? Could identity fraud one-day move from cloning your bank details to bumping into your actual clone? In an age of information sharing, would you be up for posting your DNA code on Instagram alongside your favourite holiday snaps? Whose responsibility should it be to govern these vast datasets and what are the implications on future society?
Where do we draw the line between open shared digital living and an invasion of personal data space?
Let us know what you think, tweet us @vergemagonline with the hashtags #DNAgeYes and #DNAgeNo.
Check out this animation to find out the data journey in the 100,000 Genomes Project:
*Digital universe Study 2012