Book Image

Machine Learning in Biotechnology and Life Sciences

By : Saleh Alkhalifa
Book Image

Machine Learning in Biotechnology and Life Sciences

By: Saleh Alkhalifa

Overview of this book

The booming fields of biotechnology and life sciences have seen drastic changes over the last few years. With competition growing in every corner, companies around the globe are looking to data-driven methods such as machine learning to optimize processes and reduce costs. This book helps lab scientists, engineers, and managers to develop a data scientist's mindset by taking a hands-on approach to learning about the applications of machine learning to increase productivity and efficiency in no time. You’ll start with a crash course in Python, SQL, and data science to develop and tune sophisticated models from scratch to automate processes and make predictions in the biotechnology and life sciences domain. As you advance, the book covers a number of advanced techniques in machine learning, deep learning, and natural language processing using real-world data. By the end of this machine learning book, you'll be able to build and deploy your own machine learning models to automate processes and make predictions using AWS and GCP.
Table of Contents (17 chapters)
1
Section 1: Getting Started with Data
6
Section 2: Developing and Training Models
13
Section 3: Deploying Models to Users

Working with unstructured data

In the previous section, we explored some of the most common tasks and processes that are conducted when handing text-based data. More often than not, you will find that the data you work with is generally not of a structured nature, or perhaps not of a digital nature. Take, for example, a company that has decided to move all printed documents to a digital state. Or perhaps a company that maintains a large repository of documents, none of which are structured or organized. For tasks such as these, we can rely on several AWS products to come to our rescue. We will explore two of the most useful NLP tools in the next few sections.

OCR using AWS Textract

In my opinion, one of the most useful tools available within AWS is an Optical Character Recognition (OCR) tool known as AWS Textract. The main idea behind this tool is to enable users to extract text, tables, and other useful items from images or static PDF documents using pre-built machine learning...