Book Image

Bioinformatics with Python Cookbook

By : Tiago R Antao, Tiago Antao
Book Image

Bioinformatics with Python Cookbook

By: Tiago R Antao, Tiago Antao

Overview of this book

Table of Contents (16 chapters)
Bioinformatics with Python Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Preparing the Ebola dataset


Here, we will download and prepare the dataset to be used for our analysis. The dataset contains complete genomes of the Ebola virus. We will use DendroPy to download and prepare the data.

Getting ready

We will download complete genomes from Genbank; these genomes were collected from various Ebola outbreaks, including several from the 2014 outbreak. Note that there are several virus species that cause the Ebola virus disease; the species involved in the 2014 outbreak (the EBOV virus, formally known as the Zaire Ebola virus) is the most common, but this disease is caused by more species of the genus Ebola virus; four others are also available in sequenced form. You can read more at https://en.wikipedia.org/wiki/Ebolavirus. For scientific references of all downloaded entries, check the GenBank records.

If you have dealt with previous chapters, you may panic looking at the potential data sizes involved here; this is not a problem at all because these are genomes of...