Book Image

Deep Learning for Genomics

By : Upendra Kumar Devisetty
Book Image

Deep Learning for Genomics

By: Upendra Kumar Devisetty

Overview of this book

Deep learning has shown remarkable promise in the field of genomics; however, there is a lack of a skilled deep learning workforce in this discipline. This book will help researchers and data scientists to stand out from the rest of the crowd and solve real-world problems in genomics by developing the necessary skill set. Starting with an introduction to the essential concepts, this book highlights the power of deep learning in handling big data in genomics. First, you’ll learn about conventional genomics analysis, then transition to state-of-the-art machine learning-based genomics applications, and finally dive into deep learning approaches for genomics. The book covers all of the important deep learning algorithms commonly used by the research community and goes into the details of what they are, how they work, and their practical applications in genomics. The book dedicates an entire section to operationalizing deep learning models, which will provide the necessary hands-on tutorials for researchers and any deep learning practitioners to build, tune, interpret, deploy, evaluate, and monitor deep learning models from genomics big data sets. By the end of this book, you’ll have learned about the challenges, best practices, and pitfalls of deep learning for genomics.
Table of Contents (18 chapters)
1
Part 1 – Machine Learning in Genomics
5
Part 2 – Deep Learning for Genomic Applications
11
Part 3 – Operationalizing models

What is machine learning?

Before we talk about ML, let’s understand what AI is. In the simplest terms, AI is the ability of a machine to mimic human intelligence and iteratively improve itself based on the information it collects. The goal of AI is to build systems to perform actions that are routinely done by humans such as problem-solving, pattern matching, image recognition, knowledge acquisition, and so on. ML, a subset of AI, is the process of training a model to learn and improve from experience. Deep learning (DL), in turn, is a subfield of ML, in which we leverage artificial neural networks (ANNs) to mimic the human brain and find the nonlinear relationships between the input and output to generate predictions (Figure 1.1):

Figure 1.1 – AI versus ML versus DL – how they are related

Figure 1.1 – AI versus ML versus DL – how they are related

In ML, a model is built based on input data and an underlying algorithm to make useful predictions from real-world data. In a simplified ML, “features” that represent an individual measurable property of the data are provided as input, and “labels” are returned as the predictions. Suppose we want to predict whether a particular sequence of DNA has a binding site for a transcription factor (TF) of your interest or not. Using the traditional approach, we would use a positional weight matrix (PWF) to scan the sequence and identify the potential motifs that are overrepresented. Even though this works, this is extremely difficult, manual, scalable, and so on. Using an ML-based approach, we would give an ML model plenty of DNA sequences until the ML model learns the mathematical relationship between the features from those DNA sequences that either have or don’t have binding sites (labels) based on experimental results. It then uses this knowledge to make decisions on new data and make informed predictions. For example, we could give the ML model an unknown DNA sequence, and it would predict the correct binding site motif if present. This is one such example of why ML is a good fit for genomics problems. Some other ways in which ML can be used in genomics include identifying genetic disorders, predicting the type of cancer from genetic variants, improving disease prognosis, and so on.