Book Image

Deep Learning for Genomics

By : Upendra Kumar Devisetty
Book Image

Deep Learning for Genomics

By: Upendra Kumar Devisetty

Overview of this book

Deep learning has shown remarkable promise in the field of genomics; however, there is a lack of a skilled deep learning workforce in this discipline. This book will help researchers and data scientists to stand out from the rest of the crowd and solve real-world problems in genomics by developing the necessary skill set. Starting with an introduction to the essential concepts, this book highlights the power of deep learning in handling big data in genomics. First, you’ll learn about conventional genomics analysis, then transition to state-of-the-art machine learning-based genomics applications, and finally dive into deep learning approaches for genomics. The book covers all of the important deep learning algorithms commonly used by the research community and goes into the details of what they are, how they work, and their practical applications in genomics. The book dedicates an entire section to operationalizing deep learning models, which will provide the necessary hands-on tutorials for researchers and any deep learning practitioners to build, tune, interpret, deploy, evaluate, and monitor deep learning models from genomics big data sets. By the end of this book, you’ll have learned about the challenges, best practices, and pitfalls of deep learning for genomics.
Table of Contents (18 chapters)
1
Part 1 – Machine Learning in Genomics
5
Part 2 – Deep Learning for Genomic Applications
11
Part 3 – Operationalizing models

Why machine learning for genomics?

One of the most important events in the field of biology was the completion of the human genome sequence in 2003, which is considered one of the significant milestones in genomics. Since then, genomics has been evolving rapidly, from research to clinical practice at scale, especially in oncology and infectious diseases. Genomics, because of its ability to identify root causes of diseases due to tiny changes in the genome, fueled the discovery of many important disease genes – particularly rare disease genes – which brought clinical decision-making one step closer to personalized medicine. As a result, sequencing efforts have exploded globally, and so the amount of genomics data that’s being generated has shot up. Along with sequencing efforts, biological techniques have started to increase in complexity and number, resulting in large-scale genomics data being generated. It is estimated that there will be between 2 and 40 exabytes of genomics data generated in the next decade (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494865/). This is a lot of data, which the current computational and bioinformatics tools can handle, extract, interpret, and identify biological insights. ML, with its inherent nature of learning from experience, holds incredible promise in analyzing this large and complex genomic data. Since ML algorithms can detect patterns in the data automatically, it is suitable for interpreting this large trove of genomic data.

ML has a strong place in genomics since it uses mathematical and data analysis techniques that are applied to complex multi-dimensional datasets, such as genomic datasets, to build predictive models and uncover insights from those models. ML can transform heterogeneous and large-scale genomic datasets into biological insights. ML approaches rely on sophisticated statistical and computational algorithms to make biological predictions. It does this by mapping the complex association between the input features and the labels or finding complex patterns in the input features and creating groups of samples based on similarities using supervised and unsupervised methods, respectively. They can learn useful and new patterns from data that is hard to find by experts. There is now a huge demand for applying ML to genomic datasets because of their huge success in other domains.