#### Overview of this book

Scala for Machine Learning
Credits
www.PacktPub.com
Preface
Free Chapter
Getting Started
Hello World!
Data Preprocessing
Unsupervised Learning
Naïve Bayes Classifiers
Regression and Regularization
Sequential Data Models
Kernel Models and Support Vector Machines
Artificial Neural Networks
Genetic Algorithms
Reinforcement Learning
Scalable Frameworks
Basic Concepts
Index

## Profiling data

The selection of a preprocessing, clustering, or classification algorithm depends highly on the quality and profile of the input data (observations and expected values whenever available). The Step 3 – preprocessing the data section under Let's kick the tires in Chapter 1, Getting Started, introduced the `MinMax` class for normalizing a dataset using the minimum and maximum values.

### Immutable statistics

The mean and standard deviation are the most commonly used statistics.

### Note

Mean and variance

Arithmetic mean is defined as:

Variance is defined as:

Variance adjusted for a sampling bias is defined as:

Let's extend the `MinMax` class with some basic statistics capabilities using `Stats`:

```class Stats[T < : AnyVal](
values: Vector[T])(implicit f ; T => Double)
extends MinMax[T](values) {

val zero = (0.0. 0.0)
val sums = values./:(zero)((s,x) =>(s._1 +x, s._2 + x*x)) //1

lazy val mean = sums._1/values.size  //2
lazy val variance =
(sums._2 - mean*mean*values...```