K-means is a clustering algorithm. A clustering algorithm takes data points defined in an N-dimensional space and groups them into multiple clusters by considering the distance between those data points. A cluster is a set of data points such that the distance between the data points inside the cluster is much less than the distance from data points within cluster to data points outside the cluster. More details about the K-means clustering can be found from lecture 4 (http://www.youtube.com/watch?v=1ZDybXl212Q) of the Cluster computing and MapReduce lecture series by Google.
In this recipe, we will use a dataset that includes the Human Development Report (HDR) by country. The HDR describes different countries based on several human development measures. You can find the dataset at http://hdr.undp.org/en/statistics/data/. A sample of this dataset is available in the chapter7/resources/hdi-data.csv
file in the sample source code repository. This recipe will use...