We can install Mahout using different methods. Each method is independent from the others. You can choose any one of these:
Building Mahout code using Maven.
Setting up the development environment using Eclipse
Setting up Mahout for Windows users
Before performing any of the steps, the prerequisites are:
Having Java installed on your system
Having Hadoop installed on your system (http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleNodeSetup.html)
The Mahout build and release system is based on Maven. For Maven installation:
Create the folder
/usr/local/maven
:mkdir /usr/local/maven
Download the distribution
apache-maven-x.y.z-bin.tar.gz
from the Maven site (http://maven.apache.org/download.cgi), and move this to/usr/local/maven
:mvapache-maven-x.y.z-bin.tar.gz /usr/local/maven
Unpack this to the location
/usr/local/maven
:tar –xvfapache-maven-x.y.z-bin.tar.gz
Edit the
.bashrc
file as follows:export M2_HOME=/usr/local/apache-maven-x.y.z export M2=$M2_HOME/bin export PATH=$M2:$PATH
Note
For the Eclipse IDE, go to help and select Install new Software, click on the add button, and in the popup fill up the name M2Eclipse and provide the http://download.eclipse.org/technology/m2e/releases link, and click on OK.
Building the Mahout code:
By default, Mahout assumes that Hadoop is already installed on the system. Mahout uses the HADOOP_HOME
and HADOOP_CONF_DIR
environment variables to access Hadoop cluster configurations. To set up Mahout, follow the steps given here:
Download the Mahout distribution file
mahout-distribution-0.9-src.tar.gz
from http://archive.apache.org/dist/mahout/0.9/.Choose an installation directory for Mahout (
/usr/local/Mahout
) and place the downloaded source in the folder. Extract the source code and ensure that the folder contains thepom.xml
file:tar -xvf mahout-distribution-0.9-src.tar.gz
Install the Mahout Maven project and skip the test cases during installation:
mvn install -Dmaven.test.skip=true
Set the
MAHOUT_HOME
environment variable in the~/.bashrc
file and update thePATH
variable with the Mahoutbin
directory:export MAHOUT_HOME=/user/local/mahout/mahout-distribution-0.9 export PATH=$PATH:$MAHOUT_HOME/bin
To test the Mahout installation, execute the
mahout
command. This will list the available programs within the distribution bundle, as shown in following screenshot:
For this setup, you should have Maven installed on the system and the Maven plugin for Eclipse. Refer to the Installing Maven steps mentioned in a previous section.
Download the Mahout distribution file
mahout-distribution-0.9-src.tar.gz
from the http://archive.apache.org/dist/mahout/0.9/ location and unzip this:tarxzfmahout-distribution-0.9-src.tar.gz
Create a folder name workspace under
/usr/local/workspace
:mkdir /usr/local/workspace
Move the downloaded distribution to this folder (from the downloads folder):
mvmahout-distribution-0.9 /usr/local/workspace/
Move to the
/usr/local/workspace/mahout-distribution-0.9
folder and make the Eclipse project:mvneclipse:eclipse
(this command can take up to one hour)Set the Mahout home in the
.bashrc
file, as explained earlier.Now, open Eclipse, select the file and click on import. Under Maven, select Existing Maven Projects. Now, browse to the location for
mahout-distribution-0.9
and click on Finish.
Windows users can use cygwin
to setup their environment. There is one more easy-to-use way.
Download Hortonworks Sandbox for VirtualBox on your system (http://hortonworks.com/products/hortonworks-sandbox/#install). On your system, this will be a pseudo-distributed mode of Hadoop. Log in to the console, and enter the following command:
yum install mahout
Now, you will see the following screen:
Enter y
and your Mahout will start installing. Once done, you can test it by typing the command – mahout
, and this will show you the same screen as shown in preceding figure.