Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Rapid - Apache Mahout Clustering designs
  • Table Of Contents Toc
Rapid - Apache Mahout Clustering designs

Rapid - Apache Mahout Clustering designs

By : Ashish Gupta
5 (1)
close
close
Rapid - Apache Mahout Clustering designs

Rapid - Apache Mahout Clustering designs

5 (1)
By: Ashish Gupta

Overview of this book

As more and more organizations are discovering the use of big data analytics, interest in platforms that provide storage, computation, and analytic capabilities has increased. Apache Mahout caters to this need and paves the way for the implementation of complex algorithms in the field of machine learning to better analyse your data and get useful insights into it. Starting with the introduction of clustering algorithms, this book provides an insight into Apache Mahout and different algorithms it uses for clustering data. It provides a general introduction of the algorithms, such as K-Means, Fuzzy K-Means, StreamingKMeans, and how to use Mahout to cluster your data using a particular algorithm. You will study the different types of clustering and learn how to use Apache Mahout with real world data sets to implement and evaluate your clusters. This book will discuss about cluster improvement and visualization using Mahout APIs and also explore model-based clustering and topic modelling using Dirichlet process. Finally, you will learn how to build and deploy a model for production use.
Table of Contents (11 chapters)
close
close
10
Index

Launching the Mahout job on the cluster


Mahout has a script under the bin folder of the installation. Notice line 120 onwards of the following script:

# CLASSPATH initially contains $MAHOUT_CONF_DIR, or defaults to $MAHOUT_HOME/src/conf
CLASSPATH=${CLASSPATH}:$MAHOUT_CONF_DIR

if [ "$MAHOUT_LOCAL" != "" ]; then
echo "MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath."
elif [ -n "$HADOOP_CONF_DIR"  ] ; then
echo "MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath."
  CLASSPATH=${CLASSPATH}:$HADOOP_CONF_DIR
fi

We can set HADOOP_HOME and HADOOP_CONF_DIR to launch the Mahout job (algorithm) on the Hadoop cluster.

Just before the algorithm command, set the two previously mentioned parameters using the export command:

export  HADOOP_HOME=<ur hadoop location>
export HADOOP_CONF_DIR=$HADOOP_HOME/conf

The Mahout launcher script helps to launch the job locally or on a cluster.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Rapid - Apache Mahout Clustering designs
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon