Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Clustering text


Clustering is the process of finding groups of objects that are similar to each other. The goal is that objects within a cluster should be more similar to each other than to objects in other clusters. Like classification, it is not a specific algorithm so much as a general class of algorithms that solve a general problem.

Although there are a variety of clustering algorithms, all rely to some extent on a distance measure. For an algorithm to determine whether two objects belong in the same or different clusters it must be able to determine a quantitative measure of the distance (or, if you prefer, the similarity) between them. This calls for a numeric measure of distance: the smaller the distance, the greater the similarity between two objects.

Since clustering is a general technique that can be applied to diverse data types, there are a large number of possible distance measures. Nonetheless, most data can be represented by one of a handful of common abstractions: a set, a...