Book Image

Bioinformatics with Python Cookbook

By : Tiago R Antao, Tiago Antao
Book Image

Bioinformatics with Python Cookbook

By: Tiago R Antao, Tiago Antao

Overview of this book

Table of Contents (16 chapters)
Bioinformatics with Python Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Comparing sequences


Here, we will compare aligned sequences. We will perform gene and genome-wide comparisons.

Getting ready

We will use DendroPy and will require results from the previous two recipes. As usual, this information is available in the corresponding notebook at 05_Phylo/Comparison.ipynb.

How to do it...

Take a look at the following steps:

  1. Let's start analyzing the gene data. For simplicity, we will only use the data from two other species of the genus Ebola virus that are available in the extended dataset: the Reston virus (RESTV) and the Sudan virus (SUDV):

    from __future__ import print_function
    import os
    from collections import OrderedDict
    import dendropy
    from dendropy import popgenstat
    genes_species = OrderedDict()
    my_species = ['RESTV', 'SUDV']
    my_genes = ['NP', 'L', 'VP35', 'VP40']
    
    for name in my_genes:
        gene_name = name.split('.')[0]
        char_mat = \ dendropy.DnaCharacterMatrix.get_from_path('%s_align.fasta' % name, 'fasta')
        genes_species[gene_name] = {}
        
        for...