Book Image

Machine Learning in Biotechnology and Life Sciences

By : Saleh Alkhalifa
Book Image

Machine Learning in Biotechnology and Life Sciences

By: Saleh Alkhalifa

Overview of this book

The booming fields of biotechnology and life sciences have seen drastic changes over the last few years. With competition growing in every corner, companies around the globe are looking to data-driven methods such as machine learning to optimize processes and reduce costs. This book helps lab scientists, engineers, and managers to develop a data scientist's mindset by taking a hands-on approach to learning about the applications of machine learning to increase productivity and efficiency in no time. You’ll start with a crash course in Python, SQL, and data science to develop and tune sophisticated models from scratch to automate processes and make predictions in the biotechnology and life sciences domain. As you advance, the book covers a number of advanced techniques in machine learning, deep learning, and natural language processing using real-world data. By the end of this machine learning book, you'll be able to build and deploy your own machine learning models to automate processes and make predictions using AWS and GCP.
Table of Contents (17 chapters)
1
Section 1: Getting Started with Data
6
Section 2: Developing and Training Models
13
Section 3: Deploying Models to Users

Tutorial – protein sequence classification via LSTMs using Keras and MLflow

Deep learning has gained a surge of popularity in recent years, prompting many scientists to turn to the field as a new means for solving and optimizing scientific problems. One of the most popular applications for deep learning within the biotechnology space involves protein sequence data. So far within this book, we have focused our efforts on developing predictive models when it comes to structured data. We will now turn our attention to data that's sequential in the sense that the elements within a sequence bear some relation to their previous element. Within this tutorial, we will attempt to develop a protein sequence classification model in which we will classify protein sequences based on their known family accession using the Pfam (https://pfam.xfam.org/) dataset.

Important note

Pfam dataset: Pfam: The protein families database in 2021 J. Mistry, S. Chuguransky, L. Williams, M. Qureshi...