The rapidly growing amount of unstructured data, multi-level virtualization, as well as progressive digitization and consolidation, are the driving force behind Data Centers all around the world. With the arrival of the IoT, the number of data has already outpaced the capacity of some systems.
Taking into account the fact that worldwide generated and stored data is skyrocketing every year, companies are trying to find the best way to store them on dedicated servers, colocations, cloud, hybrid cloud etc. Data storage should not be a huge problem for small and middle size companies. Things are getting more complicated when it comes to data storage from scientific organizations such as CERN ( which stores only a tenth of the 15PB of data it gets from the Large Hardon Collider each year on disk), bigger businesses and IoT which need a much bigger amount of data storage. Even though, we are still far from running out of the storage capacity.
The new generation of data storage
Having many means to store our data, we now need to think about our storage expansion, and even more important its longevity. The data longevity creates a new challenge for the IT industry which male up with a new and quite revolutionary idea – the data storage on artificial DNA.
Already in 2012 a bioengineer and geneticist at Harvard’s Wyss Institute have successfully stored 5.5 petabits of data — around 700 terabytes — in a single gram of DNA, smashing the previous DNA data density record by a thousand times.
The work, carried out by George Church and Sri Kosuri, basically treats DNA as just another digital storage device. Instead of binary data being encoded as magnetic regions on a hard drive platter, strands of DNA that store 96 bits are synthesized, with each of the bases (TGAC) representing a binary value (T and G = 1, A and C = 0). To read the data stored in DNA, you simply sequence it — just as if you were sequencing the human genome — and convert each of the TGAC bases back into binary. To aid with sequencing, each strand of DNA has a 19-bit address block at the start (the red bits in the image below) — so a whole vat of DNA can be sequenced out of order, and then sorted into usable data using the addresses.
Scientists have been eyeing up DNA as a potential storage medium for a long time, for three very good reasons: It’s incredibly dense (you can store one bit per base, and a base is only a few atoms large); it’s volumetric (beaker) rather than planar (hard disk); and it’s incredibly stable — where other bleeding-edge storage mediums need to be kept in sub-zero vacuums, DNA can survive for hundreds of thousands of years in a box in your garage.
The new upgraded solution based on our organic DNA model, which in its microscopic molecules can store exabytes of data – it can be exactly the same for a hi-tech DNA solution. It is basically the storage of digital data in the base sequence of DNA. Humans were and will always be interested in decoding human and non-human genomes, which makes this option even more interesting. No matter what will happen in the future, someone will be able to read those information one day. What is more interesting though is the fact that just like an organic DNA, information can be stored for thousands of years.
When we compare it to our current storage techniques such as microfilms (500 years longevity), our data is quite difficult to store. Actually, if we really think about it, paper documents can last longer – aren’t our archeologists discovering them nowadays in ancient archeological sites? Tests show that at higher temperatures shows that DNA will stay readable for 2,000 years if it’s stored at ten degrees centigrade (and for up to 2 million years if it’s frozen); encapsulating it in spheres of silica means that humidity doesn’t affect it. DNA storage molecules have been already developed by Twist Bioscience lab where researchers have been able to write and read texts, photos, videos, and other files with 100 percent accuracy.
In 2016 Microsoft invested in 10 million strands of DNA for data storage, announces Twist Bioscience. With a single gram of DNA capable of storing close to 1 Zettabyte or 1 billion Terabytes of data, it makes sense to embrace the cutting-edge technology.
“We could encode and recover 100% of the Digital Data from synthetic DNA”, Doug Carmean said, a Microsoft partner architect within Twist Bioscience’s Technology and Research organization. Microsoft has taken a chunk of data normally stored on a hard drive and translated it into genetic codes representative of a DNA’s chemical building block. These are As, Cs, Gs and Ts. The company then asked Twist bioscience to manufacture 10 million strands of DNA based on the sequence of codes they have provided. The biggest challenge with DNA storage is reading and writing the codes, but the biology start-up has a machine that can produce custom strings of DNA. Microsoft only needs to supply the DNA sequence, and Twist Bioscience will make the DNA from scratch. Only Microsoft has the decoder key for the sequence of codes they supplied to Twist, according to Leproust. The information is kept as a trade secret as well. After the data has been transformed into invisible molecules in a test tube, the company will send it back to Microsoft for testing. There are ways to read the data from out of the test tubes. There are also methods to stimulate the passing of time by millennia. The current cost of a custom DNA sequence is about $0.10 per base. Twist hopes to bring the cost down to $0.02. The cost of genetic sequencing used to read data has already dropped significantly, adding more benefits to using DNA storage. The start-up company admits that it would be years before the DNA storage becomes commercially viable. But based on initial tests, the company is capable of substantially increasing “the density and durability of data storage”.
This is new brilliant storage solution may brighten the data center existence. Just imagine, that you can put one of those molecules in your data center.
There are still some manual steps that we must take into account nowadays, however, Microsoft and Twist are working with the University of Washington to turn that into a fully automated system. “There’s software to do the first step of translating the bits to what bases we want; the next step is manufacturing the molecules. There’s a manual interface there because we send Twist the file, and we get back the molecules; internally they have an automated process but they still need somebody to remove the DNA from the machine and ship us the molecules. The sequences are all automated; you throw the molecules in, and it spits out the data. And then we have the rest of the data pipeline to decode the data.”
When this technology can be available to the public?
The cost of owning a DNA molecule and it’s sequencing is decreasing, however, its use is justified only when you need to store data for a long time rather than a few years – so who can be interested in this type of product? Of course data centers, cloud providers, health and scientific organizations and banks.
”The type of workload is definitely archival, at least at first,” Strauss said. “The type of users we’ve been seeing where this would make sense, are where you need to keep the data by mandate, like hospitals and clinics, or there’s legal data, pension data. They’re applications where you want to keep the data for a long time and put it away and not read it very repetitively. In the end, it’s bits you’re storing, and we can store any kind of bits.”, adding that “People ship hard drives today; in the future, it might be DNA. You have trucks and planes moving hard drives around; it’s done that way because you get better throughput. With DNA you can expect it to be even better, because it’s a lot more compact, and you can easily make copies for distribution.”
“We think there is a good roadmap to getting this into operation, and we see no fundamental reasons why we wouldn’t be able to put it together. It’s not going to be next year, but it’s not going to be in ten years either; I think it will be somewhere in between,” said Strauss, pointing that we may see the DNA sequencing in use in quite near future.