#### Overview of this book

There are many algorithms for data analysis and it’s not always possible to quickly choose the best one for each case. Implementation of the algorithms takes a lot of time. With the help of Mathematica, you can quickly get a result from the use of a particular method, because this system contains almost all the known algorithms for data analysis. If you are not a programmer but you need to analyze data, this book will show you the capabilities of Mathematica when just few strings of intelligible code help to solve huge tasks from statistical issues to pattern recognition. If you're a programmer, with the help of this book, you will learn how to use the library of algorithms implemented in Mathematica in your programs, as well as how to write algorithm testing procedure. With each chapter, you'll be more immersed in the special world of Mathematica. Along with intuitive queries for data processing, we will highlight the nuances and features of this system, allowing you to build effective analysis systems. With the help of this book, you will learn how to optimize the computations by combining your libraries with the Mathematica kernel.
Mathematica Data Analysis
Credits
www.PacktPub.com
Preface
Free Chapter
First Steps in Data Analysis
Creating an Interface for an External Program
Analyzing Data with the Help of Mathematica
Discovering the Advanced Capabilities of Time Series
Statistical Hypothesis Testing in Two Clicks
Predicting the Dataset Behavior
Rock-Paper-Scissors – Intelligent Processing of Datasets
Index

## Data clustering

Clusters are data groups of elements that are very close or similar. For example, a group of people can be divided into clusters according to age, height, sex, social status, and so on. Clustering helps to better understand input information because if we know the properties of one element of the cluster, it is likely that the other elements may also have these properties. The process of finding a cluster can go on without a teacher (unsupervised learning technique) and can be based on two functions: the distance function that indicates the distance between the elements of a cluster—the closer the elements are to each other, the greater is the probability that they are in the same cluster, and the dissimilarity function, the result of which is the degree of dissimilarity between the elements.

To cluster data, we'll use the `FindClusters` function. First, let's consider its application in simple examples:

By default, the `FindClusters` function finds clusters on the basis of the...