Book Image

Bioinformatics with Python Cookbook - Second Edition

By : Tiago Antao
Book Image

Bioinformatics with Python Cookbook - Second Edition

By: Tiago Antao

Overview of this book

Bioinformatics is an active research field that uses a range of simple-to-advanced computations to extract valuable information from biological data. This book covers next-generation sequencing, genomics, metagenomics, population genetics, phylogenetics, and proteomics. You'll learn modern programming techniques to analyze large amounts of biological data. With the help of real-world examples, you'll convert, analyze, and visualize datasets using various Python tools and libraries. This book will help you get a better understanding of working with a Galaxy server, which is the most widely used bioinformatics web-based pipeline system. This updated edition also includes advanced next-generation sequencing filtering techniques. You'll also explore topics such as SNP discovery using statistical approaches under high-performance computing frameworks such as Dask and Spark. By the end of this book, you'll be able to use and implement modern programming techniques and frameworks to deal with the ever-increasing deluge of bioinformatics data.
Table of Contents (16 chapters)
Title Page
About Packt
Contributors
Preface
Index

Introduction


Next-generation sequencing (NGS) is one of the fundamental technological developments of the decade in life sciences. Whole genome sequencing (WGS), RAD-Seq, RNA-Seq, Chip-Seq, and several other technologies are routinely used to investigate important biological problems. These are also called high-throughput sequencing technologies, and with good reason: they generate vast amounts of data that needs to be processed. NGS is the main reason that computational biology has become a big-data discipline. More than anything else, this is a field that requires strong bioinformatics techniques.

Here, we will not discuss each individual NGS technique per se (this would require a whole book on its own). We will use an existing WGS dataset and the 1,000 Genomes Project to illustrate the most common steps necessary to analyze genomic data. The recipes presented here will be easily applicable to other genomic sequencing approaches. Some of them can also be used for transcriptomic analysis...