Book Image

Mastering Predictive Analytics with R

By : Rui Miguel Forte, Rui Miguel Forte
Book Image

Mastering Predictive Analytics with R

By: Rui Miguel Forte, Rui Miguel Forte

Overview of this book

Table of Contents (19 chapters)
Mastering Predictive Analytics with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Predicting promoter gene sequences


The first application we will study in detail comes from the field of biology. There, we learn that the basic building blocks of DNA molecules are actually four fundamental molecules known as nucleotides. These are called Thymine, Cytosine, Adenine, and Guanine, and it is the order in which these molecules appear in a DNA strand that encodes the genetic information carried by the DNA.

An interesting problem in molecular biology is finding promoter sequences within a larger DNA strand. These are special sequences of nucleotides that play an important role in regulating a genetic process known as gene transcription. This is the first step in the mechanism by which information in the DNA is read.

The molecular biology (promoter gene sequences) data set, hosted by the UCI Machine Learning repository at https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Promoter+Gene+Sequences) contains a number of gene sequences from DNA belonging to the bacterium E....