Book Image

Bioinformatics with Python Cookbook

By : Tiago R Antao, Tiago Antao
Book Image

Bioinformatics with Python Cookbook

By: Tiago R Antao, Tiago Antao

Overview of this book

Table of Contents (16 chapters)
Bioinformatics with Python Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Performing Principal Components Analysis


Principal Components Analysis (PCA) is a statistical procedure to perform a reduction of dimension of a number of variables to a smaller subset that is linearly uncorrelated. Its practical application in population genetics is assisting the visualization of relationships of individuals that is being studied.

While most of the recipes in this chapter make use of Python as a "glue language" (Python calls external applications that actually do most of the work) with PCA, we have an option, that is, we can either use an external application (for example, EIGENSOFT smartpca) or use scikit-learn and perform everything on Python. We will perform both.

Getting ready

You will need to run the first recipe in order to use the hapmap10_auto_noofs_ld_12 PLINK file (with alleles recoded as 1 and 2). PCA requires LD-pruned markers; we will not risk using the offspring here because it will probably bias the result. We will use the recoded PLINK file with alleles as...