Book Image

Bioinformatics with Python Cookbook - Second Edition

By : Tiago Antao
Book Image

Bioinformatics with Python Cookbook - Second Edition

By: Tiago Antao

Overview of this book

Bioinformatics is an active research field that uses a range of simple-to-advanced computations to extract valuable information from biological data. This book covers next-generation sequencing, genomics, metagenomics, population genetics, phylogenetics, and proteomics. You'll learn modern programming techniques to analyze large amounts of biological data. With the help of real-world examples, you'll convert, analyze, and visualize datasets using various Python tools and libraries. This book will help you get a better understanding of working with a Galaxy server, which is the most widely used bioinformatics web-based pipeline system. This updated edition also includes advanced next-generation sequencing filtering techniques. You'll also explore topics such as SNP discovery using statistical approaches under high-performance computing frameworks such as Dask and Spark. By the end of this book, you'll be able to use and implement modern programming techniques and frameworks to deal with the ever-increasing deluge of bioinformatics data.
Table of Contents (16 chapters)
Title Page
About Packt
Contributors
Preface
Index

Installing the required software with Docker


Docker is the most widely-used framework for implementing operating system-level virtualization. This technology allows you to have an independent container: a layer that is lighter than a virtual machine, but still allows you to compartmentalize software. This mostly isolates all processes, making it feel like each container is a virtual machine. Docker works quite well at both extremes of the development spectrum: it's an expedient way to set up the content of this book for learning purposes, and may become your platform of choice for deploying your applications in complex environments. This recipe is an alternative to the previous recipe.

However, for long-term development environments, something along the lines of the previous recipe is probably your best route, although it can entail a more laborious initial setup.

Getting ready

If you are on Linux, the first thing you have to do is install Docker. The safest solution is to get the latest version from https://www.docker.com/. While your Linux distribution may have a Docker package, it may be too old and buggy (remember the "advancing at breakneck speed" thing we mentioned?).

If you are on Windows or macOS, do not despair; take a look at the Docker site. Various options are available there to save you, but there is no clear-cut formula, as Docker advances quite quickly on those platforms. A fairly recent computer is necessary to run our 64-bit virtual machine. If you have any problems, reboot your machine and make sure that on the BIOS, VT-X or AMD-V is enabled. At the very least, you will need 6 GB of memory, preferably more.

Note

This will require a very large download from the internet, so be sure that you have plenty of bandwidth. Also, be ready to wait for a long time.

How to do it...

Follow these steps to get started:

  1. Use the following command on your Docker shell:
docker build -t bio https://raw.githubusercontent.com/PacktPublishing/Bioinformatics-with-Python-Cookbook-Second-Edition/master/docker/Dockerfile

On Linux, you will either need to have root privileges or be added to the Docker Unix group.

  1. Now, you are ready to run the container, as follows:
docker run -ti -p 9875:9875 -v YOUR_DIRECTORY:/data bio

Replace YOUR_DIRECTORY with a directory on your operating system. This will be shared between your host operating system and the Docker container. YOUR_DIRECTORY will be seen in the container on /data and vice versa.-p 9875:9875 will expose the container TCP port 9875 on the host computer port 9875. Especially on Windows (and maybe on macOS), make sure that your directory is actually visible inside the Docker shell environment. If not, check the Docker documentation on how to expose directories.

  1. You are now ready to use the system. Point your browser to http://localhost:9875 and you should get the Jupyter environment.

If this does not work on Windows, check the Docker documentation (https://docs.docker.com/) on how to expose ports.

See also

  • Docker is the most widely used containerization software and has seen enormous growth in usage in recent times. You can read more about it at https://www.docker.com/.
  • A security-minded alternative to Docker is rkt, which can be found at https://coreos.com/rkt/.
  • If you are not able to use Docker; for example, if you do not have permissions, as will be the case on most computer clusters, then take a look at Singularity at https://www.sylabs.io/singularity/.