Book Image

Bioinformatics with Python Cookbook - Second Edition

By : Tiago Antao
Book Image

Bioinformatics with Python Cookbook - Second Edition

By: Tiago Antao

Overview of this book

Bioinformatics is an active research field that uses a range of simple-to-advanced computations to extract valuable information from biological data. This book covers next-generation sequencing, genomics, metagenomics, population genetics, phylogenetics, and proteomics. You'll learn modern programming techniques to analyze large amounts of biological data. With the help of real-world examples, you'll convert, analyze, and visualize datasets using various Python tools and libraries. This book will help you get a better understanding of working with a Galaxy server, which is the most widely used bioinformatics web-based pipeline system. This updated edition also includes advanced next-generation sequencing filtering techniques. You'll also explore topics such as SNP discovery using statistical approaches under high-performance computing frameworks such as Dask and Spark. By the end of this book, you'll be able to use and implement modern programming techniques and frameworks to deal with the ever-increasing deluge of bioinformatics data.
Table of Contents (16 chapters)
Title Page
About Packt
Contributors
Preface
Index

Using generic pipelines with bioinformatics data


Galaxy is mostly geared toward users who are less inclined to program. Knowing how to deal with it, even if you prefer a more programmer-friendly environment, is important because of its pervasiveness. It is reassuring that an API exists to interact with Galaxy. But if you want a more programmer-friendly pipeline, there are many alternatives available.

Here, we will explore Airflow, originally from Airbnb, and currently incubating under the Apache umbrella. Airflow is somewhat at the other end of the pipeline world: it is completely subject-agnostic (actually, its development has nothing to do with bioinformatics), and it is completely geared toward programming.

Getting ready

Be careful with the sources for the installation of Airflow, as some might not be up to date. At this stage, this applies to some conda packages that are available.

At the time of writing, the best approach is probably to use a standard Python installation that either comes...