The field of genomics is an incredibly complex and compelling area of scientific research. As new discoveries about the human genome emerge, so too do insights into the role that genes play in the onset of disease — opening up the possibility for the development of new treatments. And with the assistance of machine learning, the potential reach of genomics is huge.
The world of science is overflowing with data. A lot of data. From clinical trials to epidemiology, decades of scientific research have generated billions of data points that remain unstructured, remain in silos, and thus are difficult to harness. If such valuable data were to remain in this format, we would likely never experience the benefits of data-led science.
Thankfully, exponential advances in computational power and statistical analysis have allowed scientists across a number of fields to exploit the previously untapped potential of the data they collect.
One such area that is inherently data-driven is genomics, an interdisciplinary branch of molecular biology that deals with the structure, function, evolution, and mapping of genomes.
Today, artificial intelligence (AI) and machine learning are playing a crucial role in the evolution of genomics, allowing science to better analyse DNA and come up with novel treatments that have the potential to manage and cure disease. Thanks to increasing automation, the cost of genome sequencing continues to fall year-on-year, too.
This article will explore how technology is playing such an instrumental role in accelerating genetic research — and examine the widespread ethical implications of such rapid progress.
What is genomics?
Genomics is a field of science that focuses on the study of the genome: an organism’s complete set of DNA, including all its genes.
The human genome contains roughly 3.2 billion base pairs — the building blocks of the DNA double helix — and 20,000 genes. These base pairs are based on four basic units (A, C, G, and T) called nucleotides. A pairs with T, while C pairs with G.
Genes make up roughly 1-5% of the genome. They contain instructions that tell your cells to make proteins, which are essential for health.
How can machine learning improve genomic research?
In order to extract the genetic information needed for genomics, DNA must be sampled and sequenced.
Such a process — especially with rapid advances in genome-sequencing technologies — produces vast amounts of data that are too large for traditional applied statistical techniques. In addition, the most valuable signals in genomics datasets are often tiny and masked by technical noise, meaning more complex analytical techniques are needed.
Increasingly sophisticated machine learning technologies, therefore, can cut through the noise and help researchers draw clinically useful information from cross-disciplinary genomics datasets. This is why the application of machine learning to datasets generated from genome sequencing has been so successful.
Machine learning also enables researchers to combine ever-larger datasets. To take one example, Rampášek and Goldenburg (2017) developed a variational autoencoder (VAE) that was capable of predicting drug responses in cancer by combining two different datasets (“Genomics of Drug Sensitivity in Cancer” and “Cancer Cell Line Encyclopedia”).
Precision medicine and genomics
Genomics is closely related to precision medicine, an approach to patient care that is based on targeted therapies. Also known as personalised medicine, this medical model incorporates genetics, behaviour, and environment with the aim of tailoring treatment intervention towards a specific patient or population — offering an alternative to the one-size-fits-all approach of traditional medicine.
Technological progress and decreasing costs of genome sequencing have made precision medicine more accessible than ever — for health professionals and patients alike. Though several GPs have raised concerns about privacy and discrimination risks, NHS England’s 10-year plan outlines its commitment to increasingly personalised healthcare provision.
The potential for further research and development within the precision medicine space is massive. Thanks to the technology behind precision medicine, the UK life sciences sector is now thought to worth as much as £70 billion. The global market size of precision medicine itself is predicted to reach $87 billion by 2023.
Current applications of machine learning in genomics
Genomics is a broad field that encompasses the life sciences, research and development, and business. It has even led to the development of a direct-to-consumer genomics industry.
The impact of DNA ancestry kits | Building a family portrait: how accurate are DNA ancestry testing kits?
As technology and data expand, the use of machine learning methods to decode and derive meaning from various datasets is becoming increasingly commonplace in a number of subfields of genomics.
Whole genome sequencing (WGS) is an emergent area of research within medical diagnostics. This has been revolutionised by next-generation sequencing (NGS), a DNA sequencing technology that allows researchers to sequence the human genome in a single day.
The advance in technology has been nothing short of rapid. For context, the previous Sanger sequencing method (developed by British biochemist Frederick Sanger) took over a decade to decipher the human genome.
The speed at which researchers can now sequence DNA has led to an explosion of innovation within the private sector. The medical company Deep Genomics, for example, uses machine learning to unlock datasets and help researchers interpret genetic variation. Its AI-powered discovery platform uses neural networks to analyse genomic data — allowing scientists to identify the genes that cause diseases and to develop drugs to treat the symptoms.
Genome editing (also called gene editing) refers to the method which gives scientists the ability to make specific changes to DNA of an organism. The technologies used in genome editing, including machine learning, allow genetic material to be added, removed, or altered at certain locations in the genome.
CRISPR-Cas9 is a powerful gene-editing technology that allows scientists to easily alter DNA sequences and modify gene function. In order to use the system, scientists first need to select a target sequence and guide RNA (gRNA) — a process that involves many choices and unpredictable, potentially unsafe outcomes (including introducing mutations). As such, the ethics of using CRISPR-cas9 on humans has been called into question.
Despite these legitimate safety concerns, researchers at the Wellcome Sanger Institute claim to have used machine learning to develop a prediction tool that makes CRISPR-cas9 editing more reliable. Evidence suggests that the technology can turn off a gene related to the risk of heart attacks, and scientists hope to trial gene therapy on people with a rare disorder in the next three years.
In the UK, fractured care pathways waste hundreds of hours of GP and patient time and cost the NHS billions each year. The jobs of healthcare workers are made yet more difficult by substantial gaps in patient health records, a problem that renders clinical data more difficult to interpret and compare.
Such hurdles have prompted a growing interest among health professionals in the potential of machine learning to improve the efficiency of clinical workflows.
Despite generating controversy over privacy concerns, London NHS partnered with Google’s pioneering DeepMind AI technology in a data-sharing arrangement to develop clinical new mobile apps linked to electronic patient records. Though very much in the discovery stage, many argue that further data integration could prevent thousands of deaths a year and release pressure on NHS staff.
In the United States, meanwhile, Intel designed an Analytics Toolkit to integrate genetic data into clinical workflows. In partnership with Intermountain Healthcare’s Transformation Lab in Salt Lake City, this machine learning technology led to the development of a workflow model that improved data accessibility. Researchers also developed an algorithm to measure a patient’s risk of developing multiple cancers.
Further integration of machine learning into the clinical workflow can help make important genomic data available to healthcare professionals in a more centralised and uniform manner. Moreover, the integration of other datasets (such as genetic information) could prove crucial in enhancing the value of medical databases — allowing healthcare systems to optimise patient treatment and care pathways.
Conclusion: what does the future hold?
Machine learning is one of the most exciting fields in development today. It has the potential to quickly carry out complex analyses of rich, unstructured healthcare datasets, standardise clinical systems, and empower individuals to understand their own genome points to a future for genomics driven by AI and machine learning.
Though its burgeoning technologies have yet to reach their full potential, we are already seeing the latent advantages in a number of sectors.
In the emerging field of pharmacogenomics, an extension of precision medicine that examines how genetics affects an individual’s response to certain drugs, a February 2017 study applied machine learning methods to determine the stable dose of a drug for renal transplant recipients. The drug in question, Tacrolimus, is usually administered to patients following a allogeneic organ transplantation to prevent “acute rejection” of the new organ.
Meanwhile, analysts predict that newborn genetic screening will become commonplace across mainstream healthcare over the next decade. Through machine learning techniques, genomic data collected at birth can be easily integrated into a patient’s electronic health record (EHR). Such data can give paediatricians insights into which drugs a baby can — and can’t — metabolise, allowing them to make better-informed, data-driven prescribing decisions.
Finally, the San Francisco-based startup Trace Genomics has even used machine learning techniques to give farmers accurate insights into soil health, carbon sequestration, and sustainable agricultural production. By building AI-enabled diagnostic tools aimed at predicting and preventing diseases in crops, innovative companies can directly influence agricultural productivity.
Though valid ethical concerns around the widespread implementation of data-driven genomics leave a lot for scientists and policymakers to ponder, the possibilities for the application of machine learning in the field seem limitless.
For more fascinating insights into the ever-changing world of the life sciences sector, stay tuned to the SRG Science Blog.
For more fascinating insights into the ever-changing world of the life sciences sector, stay tuned to all SRG Blogs.