February 24, 2007

Bookish Bacteria

booksubtilis.jpgA ScienceDaily story, Storing Digital Data In Living Organisms tells of a paper describing how to store information in living organisms.

Converting a message into DNA isn't hard or new, and has been demonstrated a number of times. Perhaps the most impressive demonstrations are Eduardo Kac's installations Genesis (where mutating bacteria evolved a bible sentence), and Move 36 (where a plant contains a gene coding the sentence "Cogito, ergo sum"). Signatures or watermarks can be added. Messages can be hidden in DNA microdots, where the secrecy of the message is both protected by the problem of discovering the microdot and the complex background of human DNA in it - to find the message the right PCR primer sequence has to be known.

Tomita's team inserted a very short message into the genome of Bacillus subtilis. Rather than just insert it once, they inserted four versions. The first was a straight translation of the binary into nucleotides (each 4 bits turned into two nucleotides). The second had the bits shifted one step to the right before being translated, the third two steps and the fourth three steps. These casettes of DNA form a nice error-correcting RAID code, enabling long-term storage.

They say:

"Hence, adopting and developing further codes and experimental methods or inserting plural fragments into the partial volumes of multiple-species metagenomes will enable the storage of huge volumes of data in heritable media. We suggest that this simple, flexible, and robust method offers a practical solution to data storage and retrieval challenges in combination with other, previously published techniques"

Given that the 3.5Mb genome of Synechocystis PCC6803 has been put into B. subtilis 168, there ought to be space for at least 218,750 byte messages in it. But assume a spore is 1.2 microns in diameter. That gives a data density around 1.9*1024 bits/m3. That beats the 5.5*1017 bits/m3 in existing holographic storage and the 1018 bit/m3 of two-photon dyes and is on the same order of suggested nanotechnological storage.

But of course that is a big overestimate. There is no practical way to transfect every spore individually with a 0.2 meg message, it would be useful to have redundant spores, and reading a genome currently takes 24 hours - it would be an excessively slow and expensive storage for large volumes of data. Even with far faster sequencers and better plasmid vectors (and working in parallel) it seems unlikely that readin and readout would take less than tens of minutes.

Storing a library this way "just in case" would require a very expensive effort both to encode, purify the spores and decode again. Maybe a way of hiding away data in a seedbank, but it seems so much easier to laser-engrave it on inert metal. Maybe it is better for spies smuggling data in microdots (or as stomach bacteria!), but unless bacterial transformation and sequencing becomes everyday activities it seems less effective than (say) steganographing the data into a collection of tourist photos in a camera.

I think the real use of this kind of information storage will be when the bacteria themselves can exchange and process the information. There are obvious links to synthetic biology and wet nanotech. But maybe somebody will eventually try out Stanislaw Lem's idea of eruntics - teaching bacteria to write themselves.

Posted by Anders3 at February 24, 2007 03:53 PM