In this example, we will look at a cluster finding algorithm in Scikit-learn called DBSCAN. DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise, and is a clustering algorithm that favors groups of points and can identify points outside any of these groups (clusters) as noise (outliers). As with the linear machine learning methods, Scikit-learn makes it very easy to work with it. We first read in the data from
Chapter 5
, Clustering, with Pandas' read_pickle
function:
TABLE_FILE = 'data/test.pick' mycat = pd.read_pickle(TABLE_FILE)
As with the previous dataset, to refresh your memory, we plot the data. It contains a slice of the mapped nearby Universe, that is, galaxies with determined positions (direction and distance from us). As before, we scale the color with the Z-magnitude, as found in the data table:
fig,ax = plt.subplots(1,2, figsize=(10,2.5)) plt.subplot(121) plt.scatter(mycat['Y'], -1*mycat['X'], s...