Book Image

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati
Book Image

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Overview of this book

<p>Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing. <br /><br />Big Data Analytics with R and Hadoop is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop. A powerful data analytics engine can be built, which can process analytics algorithms over a large scale dataset in a scalable manner. This can be implemented through data analytics operations of R, MapReduce, and HDFS of Hadoop.<br /><br />You will start with the installation and configuration of R and Hadoop. Next, you will discover information on various practical data analytics examples with R and Hadoop. Finally, you will learn how to import/export from various data sources to R. Big Data Analytics with R and Hadoop will also give you an easy understanding of the R and Hadoop connectors RHIPE, RHadoop, and Hadoop streaming.</p>
Table of Contents (16 chapters)
Big Data Analytics with R and Hadoop
Credits
About the Author
Acknowledgment
About the Reviewers
www.PacktPub.com
Preface
Index

Unsupervised machine learning algorithm


In machine learning, unsupervised learning is used for finding the hidden structure from the unlabeled dataset. Since the datasets are not labeled, there will be no error while evaluating for potential solutions.

Unsupervised machine learning includes several algorithms, some of which are as follows:

  • Clustering

  • Artificial neural networks

  • Vector quantization

We will consider popular clustering algorithms here.

Clustering

Clustering is the task of grouping a set of object in such a way that similar objects with similar characteristics are grouped in the same category, but other objects are grouped in other categories. In clustering, the input datasets are not labeled; they need to be labeled based on the similarity of their data structure.

In unsupervised machine learning, the classification technique performs the same procedure to map the data to a category with the help of the provided set of input training datasets. The corresponding procedure is known as...