A delimma for our future with Digital Data, and how turning to biology may solve it
I am in the process of writing a paper for my CIS 395 Cryptography class. We are allowed to pick any topic we wanted, so I researched this. This paper is actually formatted 2 column APA style, but copying it here destroyed the layout. So i just put it in some kind of order to be read.
Mind you this is no where near finished. This is more less a brain dump, a gathering of information, and organizing it some. The final paper wont be done until the beginning of December. By then I will have revised this thing over and over. I have not had a writing tutor proof read it for sentences making sense yet.. I hate English and writing, its so not me. Anyway this basically if for those who are interested in a dilemma we are facing and how tech science is going to try to solve it.
Enjoy.
ABSTRACT
We are coming to a point in our world where most information has gone digital. The demand for storage will eventually come to a point where the amount of data being generated will exceed the capacity to store it. Computer scientists are looking towards DNA to help solve this problem.
The Idea
The idea of putting data into DNA has been around since the 1960’s, but it wasn’t until 2012 that George Church and Sri Kosuri of Harvard University stored 5.5 petabits, or 700 Terabytes in a single gram of DNA [1]. Five years later in 2017, researchers from Columbia University and the New York Genome Center have developed a process for storing 214 petabytes or 214000 Terabytes of data per gram of DNA [6].
Data is stored by first deciding the order of letters going into the DNA. Then chemical reactions are used to manufacture the letters. The letters are then mixed in a bottle with other chemical solutions to specify the exact data strand needed to represent the data [5]. A benefit of this procedure is that many identical copies of the same data are made at the same time before the next string is made. The DNA data is sensitive to light, temperature, and moisture. These criteria need to be considered to preserve the longevity of the DNA.
To retrieve the data back from the DNA storage a sequencing machine is used. The machine is just like the kind that is used for analysis of genomic DNA in cells. This is where the molecules are identified that generate a specific letter sequence. Each molecule is decoded back into a binary sequence of 0’s and 1’s in its proper order. During this process of extraction, the DNA strand is destroyed, but recall earlier that many copies of the data strands are made so no data is ever entirely lost.
The first person to map the ones and zeroes of digital data onto the four base pairs of DNA was artist Joe Davis, in a 1988 collaboration with researchers from Harvard. The DNA sequence, which they inserted into E. coli , encoded just 35 bits [2].
How DNA is used
In DNA nucleotides there are three groups, a Phosphate Group, a Sugar Group, and a Nitrogen Base. In the nitrogen base there are four types: Adenine (A), Thymine (T), Guanine (G) and Cytosine (C). Originally Data was encoded in DNA using its 4 base nucleotides A, C, T, G. A and C represents 0, T and G represents 1 in Binary base. This has changed since 2012 and now data is encoded in the following manner: 00 is equivalent to A, 01 to C, 10 to T and 11 to G. Therefore, if a file data string is 8 bits long 01111000 it is broken into pairs. It now would look like 01 11 10 00 which translates to C-G-T-A [5]. Because of the complexity of building DNA strands, they are limited to about 20 bytes per strand. This is quite small when compared to some files sizes that are substantially larger. The smaller chunks of DNA strands have indicators on all of them to ensure that the data stays in a certain order.
Advantages
The main advantage of using DNA to store data is that the amount of information stored to size ratio. It is estimated that by 2020 all global data will be around 44 trillion gigabytes of information. By 2040 the amount of data will exceed the archive by 10 to100 times the expected supply of microchip-grade silicon [4]. This is a dilemma similar to Moore’s Law that density of a silicon chip doubles every 18 months and Rocks Law that the cost for factories to produce chips doubles every four years. One of them will eventually fail. A data center using magnetic tapes could be built to store an exabyte (one billion gigabytes) but it would require US $1 billion over 10 years to build and maintain, as well as hundreds of megawatts of power [4]. However, using DNA has the capability to store 214 petabytes of data per gram of DNA. Basically, all the worlds data could be stored in the back of a vehicle.
Most storage capacity becomes unreliable after a certain amount of time. A typical Hard Drive may last up to 10 years, and Magnetic Tape about 30 if kept in a cool environment. However, DNA storage if kept in a cool, dry and dark environment can last thousands of years. This helps solve the issue with data growing in huge amounts and storage capacity limitations to hold all the data for very long periods of time.
When it comes to managing power consumption with data storage, DNA is the winner here. Most traditional storage devices use about 0.04 watts of energy per gigabyte of data, but DNA uses 10^ -10 watts of energy per gigabyte [4].
Challenges
As with any new technology it comes with challenges that need to be overcome to make it as efficient as possible.
In 2012 the cost to encode every megabyte of data into DNA is about $12,500 and $3000 to extract the data back. The price has dropped since then, it currently is about $3500 to write a megabyte of data and $1000 to read it back. As technology advances this price will fall even more making it an efficient for everyone who will use this new technology.
Because of how data is stored in the process of making DNA it cannot simply be accessed in specific place when extracting data. In other words, if you stored 1000 gigabytes of movies and wanted to access just one specific movie, it would require all the 1000 gigabytes of data to be read just to get the one specific file. This is an obstacle that will need to be overcome to make DNA storage efficient [5].
Because of the process of using chemicals to store data in DNA, the speed at which data can be transferred is limited. DNA synthesis only allows about a few hundred bytes per second. That speed is extremely slow when compared to that of today’s hard drives that write hundreds of millions of bytes per second [5]
Latest Breakthrough
Up until March of 2017 data retrieval had not been 100% error free. Some data used to be lost while using older sequencing of T, A, C, G nucleotides. The newest method of data retrieval is known as DNA Fountain. Dr. Yaniv Erlich and Dina Zielinski of Columbia University and the New York Genome Center created this technique allowing them to retrieve data, a French Movie, an Amazon gift card, a computer virus, a pioneer plaque and study by Claude Shannon, and an operating system called KolibriOS all from 215 petabytes of information in a single gram of DNA. The accuracy of retrieval was 100% while noting that the technique approaches the Shannon Capacity of DNA storage by achieving 85% of theoretical limit. Shannon’s Capacity is the limit of maximum bit/second in a channel for a certain noise level.
In an ideal world each nucleotide has the potential of reaching 2 bits but because of noise level the DNA storage is obstructed. When DNA is written it is basically transmitting a form of communication over a channel by synthesizing DNA oligos (molecules). The goal is to get the highest information rate with the smallest error probability. Erlich and Zielinksi stated in their paper, “The channel is noisy due to various experimental factors, including DNA synthesis imperfections, PCR dropout, stutter noise, degradation of DNA molecules over time, and sequencing errors" [7]. The result of information density was 1.57 bits per nucleotide, which is safely shy of Shannon’s Capacity. That density amount relates to being able to store 215 petabytes in a gram of DNA.
How Will It Help
Think of how much energy and space that will be saved as DNA storage becomes more practical. With DNA Synthesis only requiring 10^-10 watts per gigabyte that saves much energy compared to traditional drives that use 0.04 watts per gigabyte. The amount of space needed to store data will be substantially smaller which translates to much less energy being used for electricity. The Range International Information Group is a data information center in Langfang China that is 6.3 million square feet or equivalent to the United States Pentagon. The size is needed to store all that data on machines that require a lot of cooling power. One day all the worlds information will be able to be stored in a small building the size of an outhouse.
When it comes to quantum computers DNA storage will help with a certain problem. According to a WIRED article on quantum computers it states that they can’t save or duplicate information by the very nature of quantum computing. However, quantum data can be converted and put on traditional storage devices, but this process would require huge amounts of storage space. Because of the data density of quantum computing, it will require that any kind of storage device will have the same kind of density as well. A classical computer reads, stores, and manipulates bits: 1’s and 0’s. A quantum computer uses qubits: tiny quantum objects that can be in two states—both 1 and 0—at the same time, as long as you’re not looking at it. And if you control a quantum particle in a superposition of two states, you can perform tasks in parallel, which speeds up certain computational tasks exponentially [8].
When Quantum computing and DNA Storage finally come together it will launch us into an entire new age of technological innovations.
REFERENCES
[1] Sebastian Anthony on August 17, 2012 at 10:22 am Comment, Anthony. “Harvard Cracks DNA Storage, Crams 700 Terabytes of Data into a Single Gram.” ExtremeTech , 17 Aug. 2012, www.extremetech.com/extreme/134672-harvard-cracks-dna-storage-crams-700- terabytes-of- data-into-a-single-gram.
[2] Agapakis, Christina. “Communicating with Aliens through DNA.” Scientific American Blog
Network , 18 Aug. 2012, blogs.scientificamerican.com/oscillator/dna-code/.
[3] De Silva, Pavani Yashodha and Gamage Upeksha Ganegoda. "New Trends of Digital Data Storage in DNA." Biomed Research International , vol. 2016, 05 Sept. 2016, pp. 1-14. EBSCO host , doi:10.1155/2016/8072463.
[4] Extance, Andy. “How DNA could store all the world’s data.” Nature , vol. 537, no. 7618, 2016, pp. 22–24., doi:10.1038/537022a.
[5] US, Luis CezeKarin StraussThe Conversation. “Storing Data in DNA Brings Nature into the .
Digital Universe.” Scientific American , 29 July 2017, www.scientificamerican.com/article/storing-data-in-dna-brings-nature-into-the-digital-universe/
[6] Robert ServiceMar. 2, 2017 , 2:00 PM, et al. “DNA Could Store All of the World's Data in One
Room.” Science | AAAS , 26 July 2017, www.sciencemag.org/news/2017/03/dna-could-store-all-
worlds-data-one-room.
[7] Tung, Liam. “DNA Data Storage Landmark: Now It's 215 Petabytes per Gram or over 100 Million
Movies.” ZDNet , ZDNet, 7 Mar. 2017, www.zdnet.com/article/dna-data-storage-landmark-now-
its-215-petabytes-per-gram-or-over-100-million-movies/.
[8] Chen, Sophia. “What If Quantum Computers Used Hard Drives Made of DNA?” Wired , Conde Nast, 3
June 2017, www.wired.com/2017/03/quantum-computers-used-hard-drives-made-dna/.
Interesting. Do you plan on posting updates as you reach milestones on this paper?
Thank you.
.
The really cool thing about using something like this is the data can easily be erased simply by denaturing the DNA strands by chemical agent, heat, radiation, etc. No more hours of 1s & 0s written over and over again on a hard drive.
But that also brings up a question how do you keep the "memory" stable for long periods?
This is way cool research. I wonder if it will have practical applications in the health fields as in programming genes to cure diseases.
E.A So far, each time Humans Played God, how has it turned out?
Thanks for providing this article.
It is inevitable that we drive into lower representations of information. What we have now will be laughable and considered as crude as we now consider vacuum tubes and memory cores. Biology logically is the next frontier for encoding information. After that we get into Chemistry where information might be recorded in general chemical structures - breaking free of the limitations imposed by the DNA structure. Then we move to Physics where information might be recorded in energy levels of atoms, etc. Finally (for now) we move into Quantum physics where information might be recorded as configurations of hadrons or maybe even quarks.
Reality is fascinating