In 2001, sequencing one person’s genome cost a whopping $100 million.And a genome sequencing factory would actually cover the length and breadth of a factory floor, said Dr Warren Kaplan, the Chief of Informatics at the Garvan Institute and the Kinghorn Centre for Clinical Genomics, speaking at CeBIT 2016.Now, with the Illumina HiSeq X-Ten, ten machines with the capacity to sequence 150 genomes every 3 days, it now costs only $1000, making genome sequencing a much more accessible diagnostic tool for doctors and patients.
But with each genome generating over 100 Gigabytes of data, this creates an astounding amount of data – more than 5 Terabytes a day. This is a goldmine of information for medical professionals – genomes are essentially the source code of what defines us, and can be used to provide valuable insights into many areas of medicine. To give just a few examples, this information has helped increase the percentage of correct diagnosis in the area of inherited diseases from 5% to 50%. It has been fundamental in advances in pharmacogenomics, which may eventually allow doctors to prescribe personalised medicine based on our genetics. It has also provided lots of new information into cancer and the mutations that can be seen in various types of cancers.
However, data storage and computing capacity have not been progressing at the same rate as sequencing has been improving, meaning creative solutions have to make up for this gap. One such solution is DreamLab, a free app created by Garvan Institute in collaboration with Vodafone, which uses the processing power of idle smartphones to help crunch huge sets of data for cancer research.
It also becomes imperative to find ways to store and share such large amounts of data, in order that it can be used most effectively, said Kaplan. One such initiative was the establishment of the Global Alliance for Genomics and Health, which was set up to find responsible ways of sharing genomic data among hundreds of institutions worldwide. Researchers at the University of California in Berkley have also put out a white paper proposing the creation of a ‘Million Cancer Genome Warehouse’, a large-scale information commons for cancer genomic data. Eventually, Kaplan hopes, it will become possible to select patients from variant databases and use querying to isolate genomic data of interest, and then compare that data to electronic health records, placing even more information at the tip of researchers’ fingers.
Genomics is not only at the forefront of medical research, but also the processing and storage of big data, and other industries could learn many lessons about finding creative ways to compute data, as well as collaborating to enable data sharing so that we make the most of the data that is being collected.