Canopy clustering on mahout runs on Hadoop's MapReduce mode. The algorithm is implemented using the map reduce steps. It uses the Hadoop sequence file format as an input. The steps are as follows:
Convert the data into a form that you can use as an input. This is called data messaging.
As per the input set received, each mapper runs Canopy clustering and outputs its Canopy centers.
Reducers received the Canopy center and clusters these centers to produce the final Canopy center.
Data points are assigned to these Canopies.
The whole process we are referring to can be understood using the Canopy generation phase and the Canopy clustering phase. The process is available at https://mahout.apache.org/users/clustering/canopy-clustering.html