Book Image

Mastering Predictive Analytics with R

By : Rui Miguel Forte, Rui Miguel Forte
Book Image

Mastering Predictive Analytics with R

By: Rui Miguel Forte, Rui Miguel Forte

Overview of this book

Table of Contents (19 chapters)
Mastering Predictive Analytics with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Predicting chemical biodegration


In this section, we are going to use R's e1071 package to try out the models we've discussed on a real-world data set. As our first example, we have chosen the QSAR biodegration data set, which can be found at https://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation#. This is a data set containing 41 numerical variables that describe the molecular composition and properties of 1055 chemicals. The modeling task is to predict whether a particular chemical will be biodegradable based on these properties. Example properties are the percentages of carbon, nitrogen, and oxygen atoms as well as the number of heavy atoms in the molecule. These features are highly specialized and sufficiently numerous, so a full listing won't be given here. The complete list and further details of the quantities involved can be found on the website. For now, we've downloaded the data into a bdf data frame:

> bdf <- read.table("biodeg.csv", sep = ";", quote = "\"")
> head...