Des cassettes et de l’ADN pour faire face à l’explosion de nos données numériques

Des cassettes et de l'ADN pour faire face à l'explosion de nos données numériques

Research, industries and people accumulate more and more digital data. So much so that hard drives and other burners will soon be overwhelmed. To compensate for future shortages, an ancient object is constantly evolving: the magnetic cassette, awaiting cutting-edge technology based on DNA.

An Instagram photo, videos in a ridesince emails… each individual accumulates a considerable amount of digital data, which is constantly increasing with the new technologies at our disposal – 4K videos, streaming on Netflix – all stored not on a hard drive, but in the “cloud”, the ” clouds “, sometimes hundreds of kilometers from oneself. But these data, although very familiar, are not the most important in “Big Data”, massive data.

Research is a much more important contributor. Human scientific experiments are heavy, very heavy: the European Organization for Nuclear Research, CERN, near Geneva, has accumulated, since its creation, more than 100 petabytes (PB) of images, raw data, information, to be saved for future generations who will want to study them. 100 Po is the equivalent of approximately 102,400 1 terabyte (TB) hard drives, for sale to individuals…

The first image of the black hole M87* required an immense amount of data. Event Horizon Telescope (EHT)/National Science Foundation/Handou

the first photo of a black hole it required about 5 Po, which is equivalent to 5,000 1Tb hard drives. Industries, such as Twitter, EDF, or any company with a minimum of digitization, are other contributors to Big Data.

physical limits

Between 2010 and 2020, the amount of information contained in big data has increased more than 30-fold, from 2 zettabytes (2 million Po) to 60 zettabytes. And the pace picks up. By 2025, humanity is expected to produce 175 zettabytes of data.

François Képès, cell biologist, responsible between 2018 and 2021 for a prospective working group on digital data storage, explains: ” In 2018, one millionth of the planet’s land surface was occupied by data centers. At this exponential rate there, by 2060, all land masses will be covered in data centers. »

Construction of a Facebook data center on October 5, 2021 in Eagle Mountain, Utah.
Construction of a Facebook data center on October 5, 2021 in Eagle Mountain, Utah. Getty Images via AFP – GEORGE FREY

Yet in 70 years, researchers have continued to shrink storage systems from floppy disks to hard drives to increase capacity. But in its conclusions, the working group report published in 2020 recalls that the moore’s law on semiconductors also applies to magnetic and electronic storage systems. ” It is not possible to miniaturize and optimize indefinitely. There was a doubling of capacity and a halving of price, every two years, for several decades, but this optimization is slowing down. We are reaching some hard physical limits and the optimization we can still expect is relatively low. », specifies François Képès.

The cassette, an emergency solution

If electronic storage systems reach their limits, the cassette continues to break records. Yes, we are talking here about the cassette, the one you put in your old camcorder or cassette player, whose tapes could be thrown in all directions in case of bad rewind. But the cassettes developed today have nothing to do with those of yesterday. The latest Fujifilm and IBM record stands at 580TB that’s the equivalent of 76 million audio cassettes from the 1990s (60 Mb/cassette). Here’s a video during the 2017 record, which was then 330TB.

With ribbons twenty times thinner than a hair and more than a kilometer long, the cassette fits in the palm of one hand, and still has a few years to go. Mark Lantz, a magnetic tape researcher at IBM, says: This really demonstrates the possibility of continuing to scale tape technology, essentially at historic rates of doubling cartridge capacity every two years, for at least the next ten years. »

The next ten years… and after? By highlighting this temporality, Mark Lantz, like many engineers working in storage, demonstrates that he is well aware of the limits of electronic and magnetic storage. Both consume enormous resources, in energy and space.

IBM scientist Mark Lantz holds a several hundred TB tape in his hand.
IBM scientist Mark Lantz holds a several hundred TB tape in his hand. © Photo courtesy of IBM Research

However, the magnetic cassette has the advantage of requiring less electronics: a single player can read several cassettes, where each hard drive has its own reading system. Also, a tape lasts for decades unlike a hard drive and is more energy efficient.

However, a tape, as powerful as it is, still takes up too much physical space and will not be able to hold the size of the massive data to come. Therefore, we must move up a gear. And that is what François Képès’s working group intended to do. ” Logically, we consider alternatives such as etching on glass, crystal or storage in polymers such as DNA. It seemed likely to us that the only technology that could be developed in time and had enough improvement factors was polymer storage. summarizes the researcher.

waiting for DNA

DNA? Do not panic: it is not about storing information in living beings, or modifying it directly in someone. It is true that it was imagined that it would be in bacteria or spores, but this is no longer the main clue.

DNA is a large chain of molecules that carry the instructions for the reproduction and development of living things. Here, it is the term “instruction” that is interesting. DNA is a chain of four monomers, the “rods” that connect the two helices: A, C, G and T. The sequence of these monomers (AAGTTCCGATAT, for example) gives the information, exactly like… the binary system, based on in 1 and 0, at the origin of any computer system.

DNA sequencing is made up of four different monomers: A, C, T, G.
DNA sequencing is made up of four different monomers: A, C, T, G. Getty Images Alan Phillips

First, it is necessary to determine which monomer sequence is to be aligned, in order to encode the digital file. Let’s imagine that A is 0 0, C is 0 1, G is 1 1, and T is 1 0. Let’s take a completely bogus example. If we want to store a photo, encoded as 01 11, this would mean that the computer must translate » on 01 11 in CG. This is the encoding, we encrypt the file. Then you have to write CG “chemically” into the DNA, then store it to get it out when you need it.

At the moment of reading, the software will translate the sequence of letters into binary code, thus reconstituting the photo on the screen. In short, there are therefore five stages: encoding, writing, storing, reading, decoding.

But why store our information in DNA? Because of the amount of information that can be encoded in it (informational density), its energetic sobriety and its durability. DNA does not need to be cooled, unlike in data centers: it can be stored at room temperature… up to 52,000 years, using the French company Imagene’s encapsulation technique.

Each of its capsules can hold up to 0.8 g of DNA, or 1.4 exabytes of data. As a reminder, one exabyte represents one million 1TB hard drives. 0.8 g of DNA would contain as much information as 150 tons of hard drives! To store the 175 Zettabytes of Big Data of 2025, only 175 kilos of DNA would be needed. The US agency DARPA believes that DNA could make it possible to divide the energy consumption of our data by a thousand.

development potential?

The main advantage of DNA is that we know it very well, remembers François Képès: “ Biomedicine has led to the development of DNA technology, which is already very advanced. It means that all the necessary methods for the digital data storage and archiving work have already been carried out, however, it does not mean that it is at a commercial level, at all. »

However, technology is advancing very quickly. ” The cost of sequencing a human genome [la lecture, NDLR] has extraordinarily low. We were at 3 billion dollars in 2003, we are at 500 today », the researcher enthuses. But there are still limits: $500 for a DNA read at the speed of 2022 is still 1,000 times too expensive and 1,000 times too slow, compared to a hard drive. For writing, it is even 100 million times too slow and too expensive.

There are people who told us to go back and talk about it at the end of the century. No way ! DNA-related technologies advance by a factor of two approximately every six months : four times faster than electronics between 1976 and 2011. At this rate, the factor 1000 of reads will be swallowed up in five years, around 2025. And the 100 million writes, around 2035! »

Some applications for DNA are already possible, until 2035. It is not necessary to read or write all the data regularly. Thus, the INA, the French organization in charge of archiving audiovisual productions, accumulates an additional 20 PB of data each year. All this data does not need to be extracted quickly, hence the interest in encoding it in DNA. Similarly, the banking sector, which must keep its customers’ bank details, sometimes for decades, could make use of this new storage technology.

Proof that the stakes are high, the US DARPA has invested hundreds of millions of euros in DNA technologies. France, for its part, is starting to get going, in particular thanks to François Képès’s working group, with an investment of 20 million euros government funding for DNA storage research.

Also read: Given the immensity of Big Data, the strategies of investigative journalists

#Des #cassettes #lADN #pour #faire #face #lexplosion #nos #données #numériques

Leave a Reply

Your email address will not be published. Required fields are marked *