Locating regions of high density via DBSCAN
Although we can't cover the vast amount of different clustering algorithms in this chapter, let's at least introduce one more approach to clustering: Density-based Spatial Clustering of Applications with Noise (DBSCAN), which does not make assumptions about spherical clusters like k-means, nor does it partition the dataset into hierarchies that require a manual cut-off point. As its name implies, density-based clustering assigns cluster labels based on dense regions of points. In DBSCAN, the notion of density is defined as the number of points within a specified radius .
According to the DBSCAN algorithm, a special label is assigned to each sample (point) using the following criteria:
- A point is considered a core point if at least a specified number (MinPts) of neighboring points fall within the specified radius
- A border point is a point that has fewer neighbors than MinPts within , but lies within the radius of a core point
All other points that...