Book Image

Machine Learning with R

By : Brett Lantz
Book Image

Machine Learning with R

By: Brett Lantz

Overview of this book

Machine learning, at its core, is concerned with transforming data into actionable knowledge. This fact makes machine learning well-suited to the present-day era of "big data" and "data science". Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start applying machine learning. Whether you are new to data science or a veteran, machine learning with R offers a powerful set of methods for quickly and easily gaining insight from your data. "Machine Learning with R" is a practical tutorial that uses hands-on examples to step through real-world application of machine learning. Without shying away from the technical details, we will explore Machine Learning with R using clear and practical examples. Well-suited to machine learning beginners or those with experience. Explore R to find the answer to all of your questions. How can we use machine learning to transform data into action? Using practical examples, we will explore how to prepare data for analysis, choose a machine learning method, and measure the success of the process. We will learn how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering. By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data. "Machine Learning with R" will provide you with the analytical tools you need to quickly gain insight from complex data.
Table of Contents (19 chapters)
Machine Learning with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
9
Finding Groups of Data – Clustering with k-means
Index

Using R for machine learning


Many of the algorithms needed for machine learning in R are not included as part of the base installation. Thanks to R being free open source software, there is no additional charge for this functionality. The algorithms needed for machine learning were added to base R by a large community of experts who contributed to the software. A collection of R functions that can be shared among users is called a package. Free packages exist for each of the machine learning algorithms covered in this book. In fact, this book only covers a small portion of the more popular machine learning packages.

If you are interested in the breadth of R packages (4,209 packages were available at the time of writing this), you can view a list at the Comprehensive R Archive Network (CRAN) collection of web and FTP sites located around the world to provide the most up-to-date versions of R software and R packages for download. If you obtained the R software via download, it was most likely from CRAN. The CRAN website is available at:

http://cran.r-project.org/index.html.

Tip

If you do not already have R, the CRAN website also provides installation instructions and information on where to find help if you have trouble.

The Packages link on the left side of the page will take you to a page where you can browse the packages in alphabetical order or sorted by publication date. Perhaps even better, the CRAN Task Views provide organized lists of packages by subject area. The task view for machine learning, which lists the packages covered in this book (and many more), is available at:

http://cran.r-project.org/web/views/MachineLearning.html

Installing and loading R packages

Despite the vast set of available R add-ons, the package format makes installation and use a virtually effortless process. To demonstrate the use of packages, we will install and load the RWeka package, which was developed by Kurt Hornik, Christian Buchta, and Achim Zeileis (see Open-Source Machine Learning: R Meets Weka in Computational Statistics 24: 225-232 for more information). The RWeka package provides a collection of functions that give R access to the machine learning algorithms in the Java-based Weka software package by Ian H. Witten and Eibe Frank. For more information on Weka, see:

http://www.cs.waikato.ac.nz/~ml/weka/.

Tip

To use the RWeka package, you will need to have Java installed if it isn't already (many computers come with Java preinstalled). Java is a set of programming tools, available for free, which allow for the use of cross-platform applications such as Weka. For more information and to download Java for your system, visit: http://java.com.

Installing an R package

The most direct way to install a package is via the install.packages() function. To install the RWeka package, at the R command prompt simply type:

> install.packages("RWeka")

R will then connect to CRAN and download the package in the correct format for your operating system. Some packages such as RWeka require additional packages to be installed before they can be used (these are called dependencies). By default, the installer will automatically download and install any dependencies.

Tip

The first time you install a package, R may ask you to choose a CRAN mirror. If this happens, choose the mirror residing at a location close to you. This will generally provide the fastest download speed.

The default installation options are appropriate for most systems. However, in some cases, you may want to install a package to another location. For example, if you do not have root or administrator privileges on your system, you may need to specify an alternative installation path. This can be accomplished using the lib option, as follows:

> install.packages("RWeka", lib="/path/to/library")

The installation function also provides additional options for installing from a local file, installing from source, or using experimental versions. You can read about these options in the help file by using the following command:

> ?install.packages

Installing a package using the point-and-click interface

As an alternative to typing the install.packages() command, R provides a graphical user interface (GUI) for package installation. On a Microsoft Windows system, this can be accessed from the Install package(s) command item under the Packages menu, as shown in the following screenshot. On Mac OS X, the command is labeled Package Installer and is located under the Packages & Data menu.

On Windows, after launching the package installer (and choosing a CRAN mirror location if you haven't already), a large list of packages will appear. Simply scroll to the RWeka package and click on the OK button to install the package and all dependencies to the default location.

On Mac OS X, the package installer menu provides additional options. To load the list of packages, click on the Get List button. Scroll to the RWeka package (or use the Package Search feature) and click on Install Selected. Note that by default, the Mac OS X Package Installer does not install dependencies unless the Install Dependencies checkbox is selected, as shown in the following screenshot:

Loading an R package

In order to conserve memory, R does not load every installed package by default. Instead, packages are loaded by users as they are needed using the library() function.

Tip

The name of this function leads some people to incorrectly use the terms library and package interchangeably. However, to be precise, a library refers to the location where packages are installed and never to a package itself.

To load the RWeka package we installed previously, you would type the following:

> library(RWeka)

Aside from RWeka, there are several other R packages that will be used in later chapters. Installation instructions will be provided as additional packages are used.