Book Image

Deep Learning for Genomics

By : Upendra Kumar Devisetty
Book Image

Deep Learning for Genomics

By: Upendra Kumar Devisetty

Overview of this book

Deep learning has shown remarkable promise in the field of genomics; however, there is a lack of a skilled deep learning workforce in this discipline. This book will help researchers and data scientists to stand out from the rest of the crowd and solve real-world problems in genomics by developing the necessary skill set. Starting with an introduction to the essential concepts, this book highlights the power of deep learning in handling big data in genomics. First, you’ll learn about conventional genomics analysis, then transition to state-of-the-art machine learning-based genomics applications, and finally dive into deep learning approaches for genomics. The book covers all of the important deep learning algorithms commonly used by the research community and goes into the details of what they are, how they work, and their practical applications in genomics. The book dedicates an entire section to operationalizing deep learning models, which will provide the necessary hands-on tutorials for researchers and any deep learning practitioners to build, tune, interpret, deploy, evaluate, and monitor deep learning models from genomics big data sets. By the end of this book, you’ll have learned about the challenges, best practices, and pitfalls of deep learning for genomics.
Table of Contents (18 chapters)
1
Part 1 – Machine Learning in Genomics
5
Part 2 – Deep Learning for Genomic Applications
11
Part 3 – Operationalizing models

ML challenges in genomics

ML is the backbone of the large-scale analysis of genomic data. ML algorithms can be used to mine biological insights from genomics big data and discover predictable patterns that may be hard to extract by experts. However, there are a few challenges the current ML algorithms face in the analysis of genomic data:

  • Although the amount of data coming out of biological systems and genome sequencing is huge and ever-growing, integrating these diverse datasets from multiple sources, platforms, and technologies into ML algorithms is not trivial.
  • Because of this huge variation in trained data, models tend to overfit and they generalize very poorly on new data that is different from the training data. We can use methods such as L1 and L2 regularization to address this poor generalization, which we will see in future chapters.
  • The nature of ML models, which are mostly “black boxes”, may bring new challenges to biological applications in...