Book Image

Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Overview of this book

Table of Contents (16 chapters)
Apache Mahout Clustering Designs
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Installing Mahout


We can install Mahout using different methods. Each method is independent from the others. You can choose any one of these:

  • Building Mahout code using Maven.

  • Setting up the development environment using Eclipse

  • Setting up Mahout for Windows users

Before performing any of the steps, the prerequisites are:

Building Mahout code using Maven

The Mahout build and release system is based on Maven. For Maven installation:

  1. Create the folder /usr/local/maven:

    mkdir /usr/local/maven
    
  2. Download the distribution apache-maven-x.y.z-bin.tar.gz from the Maven site (http://maven.apache.org/download.cgi), and move this to /usr/local/maven:

    mvapache-maven-x.y.z-bin.tar.gz /usr/local/maven
    
  3. Unpack this to the location /usr/local/maven:

    tar –xvfapache-maven-x.y.z-bin.tar.gz
    
  4. Edit the .bashrc file as follows:

    export M2_HOME=/usr/local/apache-maven-x.y.z
    export M2=$M2_HOME/bin
    export PATH=$M2:$PATH
    
    

    Note

    For the Eclipse IDE, go to help and select Install new Software, click on the add button, and in the popup fill up the name M2Eclipse and provide the http://download.eclipse.org/technology/m2e/releases link, and click on OK.

Building the Mahout code:

By default, Mahout assumes that Hadoop is already installed on the system. Mahout uses the HADOOP_HOME and HADOOP_CONF_DIR environment variables to access Hadoop cluster configurations. To set up Mahout, follow the steps given here:

  1. Download the Mahout distribution file mahout-distribution-0.9-src.tar.gz from http://archive.apache.org/dist/mahout/0.9/.

  2. Choose an installation directory for Mahout (/usr/local/Mahout) and place the downloaded source in the folder. Extract the source code and ensure that the folder contains the pom.xml file:

    tar -xvf  mahout-distribution-0.9-src.tar.gz
    
  3. Install the Mahout Maven project and skip the test cases during installation:

    mvn install -Dmaven.test.skip=true
    
  4. Set the MAHOUT_HOME environment variable in the ~/.bashrc file and update the PATH variable with the Mahout bin directory:

    export MAHOUT_HOME=/user/local/mahout/mahout-distribution-0.9
    export PATH=$PATH:$MAHOUT_HOME/bin
    
  5. To test the Mahout installation, execute the mahout command. This will list the available programs within the distribution bundle, as shown in following screenshot:

Setting up the development environment using Eclipse

For this setup, you should have Maven installed on the system and the Maven plugin for Eclipse. Refer to the Installing Maven steps mentioned in a previous section.

  1. Download the Mahout distribution file mahout-distribution-0.9-src.tar.gz from the http://archive.apache.org/dist/mahout/0.9/ location and unzip this:

    tarxzfmahout-distribution-0.9-src.tar.gz
    
  2. Create a folder name workspace under /usr/local/workspace:

    mkdir /usr/local/workspace
    
  3. Move the downloaded distribution to this folder (from the downloads folder):

    mvmahout-distribution-0.9 /usr/local/workspace/
    
  4. Move to the /usr/local/workspace/mahout-distribution-0.9 folder and make the Eclipse project:

    mvneclipse:eclipse (this command can take up to one hour)

  5. Set the Mahout home in the .bashrc file, as explained earlier.

  6. Now, open Eclipse, select the file and click on import. Under Maven, select Existing Maven Projects. Now, browse to the location for mahout-distribution-0.9 and click on Finish.

Setting up Mahout for Windows users

Windows users can use cygwin to setup their environment. There is one more easy-to-use way.

Download Hortonworks Sandbox for VirtualBox on your system (http://hortonworks.com/products/hortonworks-sandbox/#install). On your system, this will be a pseudo-distributed mode of Hadoop. Log in to the console, and enter the following command:

yum install mahout

Now, you will see the following screen:

Enter y and your Mahout will start installing. Once done, you can test it by typing the command – mahout, and this will show you the same screen as shown in preceding figure.