2.2 Biology as a Mature Data Science

The completion of the first draft human genome ushered in a revolution in how we understand ourselves as humans, from our evolutionary history, our ancestry, our traits, and our health. It provided fundamentally new and empirical tools and approaches to human genetic and biomedical research, and the technologies and techniques that were developed in the completion of the draft sequence formed the foundation for genetic research in non-human systems as well.

Biological Data Timeline - Human Genome Era

While the focus of the human genome project was on determining the DNA sequence of the human genome, this sequence and the technologies used to ascertain it provide us with opportunities to learn many other properties of genomes and biological systems by analyzing the data with different approaches. For example, knowing the complete sequence of a genome also provides information on the number of genes it contains, how repetitive the sequence is, and when combined with genetic sequences of other individuals or organisms, how closely related genes or even organisms as a whole are. Thanks to the central dogma of molecular biology, the gene sequences also give us information about the intermediate RNA molecule and resultant proteins encoded by a genome, creating opportunities for new ideas, hypotheses, experiments, and even new data-generating assays and approaches. These advances are causing exponential growth of different types of biological data and its volume, necessitating ever more powerful and sophisticated computational resources and analytical methods with no signs of slowing.

The biochemical instruments used to produce these data are continually improving the precision, accuracy, throughput, and cost of their output and operations.

The Biologist’s Tools