Deep Learning for Genomics

By : Upendra Kumar Devisetty

Deep Learning for Genomics

By: Upendra Kumar Devisetty

Overview of this book

Deep learning has shown remarkable promise in the field of genomics; however, there is a lack of a skilled deep learning workforce in this discipline. This book will help researchers and data scientists to stand out from the rest of the crowd and solve real-world problems in genomics by developing the necessary skill set. Starting with an introduction to the essential concepts, this book highlights the power of deep learning in handling big data in genomics. First, you’ll learn about conventional genomics analysis, then transition to state-of-the-art machine learning-based genomics applications, and finally dive into deep learning approaches for genomics. The book covers all of the important deep learning algorithms commonly used by the research community and goes into the details of what they are, how they work, and their practical applications in genomics. The book dedicates an entire section to operationalizing deep learning models, which will provide the necessary hands-on tutorials for researchers and any deep learning practitioners to build, tune, interpret, deploy, evaluate, and monitor deep learning models from genomics big data sets. By the end of this book, you’ll have learned about the challenges, best practices, and pitfalls of deep learning for genomics.

Preface

Who is this book for?

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

Share Your Thoughts

Download a free PDF copy of this book

Part 1 – Machine Learning in Genomics

Free Chapter

Chapter 1: Introducing Machine Learning for Genomics

What is machine learning?

Why machine learning for genomics?

Machine learning for genomics in life sciences and biotechnology

Summary

Chapter 2: Genomics Data Analysis

Technical requirements

What is a genome?

Genome sequencing

Analysis of genomic data

Introduction to Biopython for genomic data analysis

Chapter 3: Machine Learning Methods for Genomic Applications

Technical requirements

Genomics big data

Supervised and unsupervised ML

ML for genomics

An ML use case for genomics – Disease prediction

ML challenges in genomics

Summary

Part 2 – Deep Learning for Genomic Applications

Chapter 4: Deep Learning for Genomics

Understanding what deep learning is and how it works

Anatomy of deep neural networks

DNNs for genomics

Introducing deep learning algorithms and Python libraries

Summary

Chapter 5: Introducing Convolutional Neural Networks for Genomics

Introduction to CNNs

CNNs for genomics

Applications of CNNs in genomics

Summary

Chapter 6: Recurrent Neural Networks in Genomics

What are RNNs?

Introducing RNNs

Different RNN architectures

Applications and use cases of RNNs in genomics

Summary

Chapter 7: Unsupervised Deep Learning with Autoencoders

What is unsupervised DL?

Types of unsupervised DL

What are autoencoders?

Autoencoders for genomics

Summary

Chapter 8: GANs for Improving Models in Genomics

What are GANs?

Challenges working with genomics datasets

How can GANs help improve models?

Practical applications of GANs in genomics

Summary

Part 3 – Operationalizing models

Chapter 9: Building and Tuning Deep Learning Models

Technical requirements

Use case – Predicting the binding site location of the JunD TF

Summary

Chapter 10: Model Interpretability in Genomics

What is model interpretability?

Unlocking business value from model interpretability

Model interpretability methods in genomics

Use case – Model interpretability for genomics

Summary

Chapter 11: Model Deployment and Monitoring

Technical requirements

Introducing model deployment

Monitoring models using advanced tools

Summary

Chapter 12: Challenges, Pitfalls, and Best Practices for Deep Learning in Genomics

Deep learning challenges regarding genomics

Common pitfalls for applying deep learning to genomics

Best practices for applying deep learning to genomics

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Machine learning for genomics in life sciences and biotechnology

Because of the incredible promise that ML has shown for genomics applications such as drug discovery, diagnostics, precision medicine, agriculture, and biological research, more and more life science and biotech organizations are leveraging ML to analyze genomic data for population health and predictive analytics. As per the market research study, which takes into account technology, functionality, application, and region, the global AI in the genomics market is forecasted to reach $1.671 billion by 2025 from $202 million in 2020 (https://www.marketsandmarkets.com/Market-Reports/artificial-intelligence-in-genomics-market-36649899.html). The main drivers for this growth can be attributed to the need to control spiraling drug costs, increasing public and private investments, and, most importantly, the adoption of AI solutions in precision medicine. The recent COVID-19 pandemic has played its part in accelerating the adoption of AI for genomics as well (https://www.jmir.org/2021/3/e22453/). Even though the outlook for ML for genomics is exciting, there is a lack of a skilled workforce to develop, manage, and apply these ML methodologies in genomics. Additionally, integrating these ML systems into existing systems is a challenging task that requires a proper understanding of the concepts and techniques. For researchers to stand out from the crowd and contribute to data-driven decisions by the company, they must have the necessary skill set.

This book will address the problem of the skill gap that currently exists in the market. This book is a Swiss Army knife for any research professional, data scientist, or manager who is getting started with genomic data analysis using ML. This book highlights the power of ML approaches in handling genomics big data by introducing key concepts, employing real-life business examples, use cases, best practices, and so on to help fill the gaps in both the technical skill set as well as general mentality within the field.

Exploring machine learning software

Before we start the tutorials, we will need some tools. To accommodate users regarding their specific operating system requirements, we will use ML software that is compatible across all operating systems, whether it’s Windows, macOS, or Linux. We will be using Python programming language and the Python libraries such as BioPython for genomic data analysis, Scikit-learn for ML building, and Keras to train our DL models. Let’s take a closer look at these pieces of ML software.

Python programming language

We will be using the Python programming language throughout this book. Python is a widely used programming language for researchers because of its popularity, the available packages that support all types of data analysis, and its user-friendliness. More importantly, ML, DL, and the genomic community routinely use Python for their own analysis needs. Throughout this book, we will use Python version 3.7 and look at a few ways of installing Python using Pip, Conda, and Anaconda.

Visualization

We will be using the Matplotlib and Seaborn Python packages, which are the two most popular visualization libraries in Python. They are quick to install, easy to use, and easy to import in the Python script. They both come with a variety of functions and methods to use on the data. Throughout this book, we will use Matplotlib version 3.5.1 and Seaborn version 0.11.2. We will look at a few ways of installing these libraries in the subsequent chapters.

Biopython

We will also be using Biopython, a Python module that provides a collection of Python tools for processing genomic data. It creates high-quality, reusable calls for analyzing complex genomic data. It has inherent libraries to connect to databases such as Swiss-Port, NCBI, ENSEMBL, and so on. We will use Biopython version 1.78 and look at separate ways of installing Biopython using Pip, Conda, and Anaconda.

Scikit-learn

Scikit-learn is a Python package written for the sole purpose of performing ML and is one of the most popular ML libraries used by data scientists. It has a rich collection of ML algorithms, extensive tutorials, good documentation, and, most importantly, an excellent user community. For this introductory chapter, we will use scikit-learn for developing ML models in Python. Wherever applicable, we will use scikit-learn version 1.0.2 and look at separate ways of installing scikit-learn in the subsequent chapters.

Deep Learning for Genomics

By : Upendra Kumar Devisetty

Deep Learning for Genomics

By: Upendra Kumar Devisetty

Overview of this book

Related Content you might be interested in

Current Title:

Deep Learning for Genomics

Applied Machine Learning for Healthcare and Life Sciences Using AWS

R Bioinformatics Cookbook

The Deep Learning Architect’s Handbook

Machine learning for genomics in life sciences and biotechnology

Exploring machine learning software

Python programming language

Visualization

Biopython

Scikit-learn