# Quantiles

The median is one way to calculate the *middle* value from a list, and the variance provides a way to measure the spread of the data about this midpoint. If the entire spread of data were represented on a scale of zero to one, the median would be the value at 0.5.

For example, consider the following sequence of numbers:

[10 11 15 21 22.5 28 30]

There are seven numbers in the sequence, so the median is the fourth, or 21. This is also referred to as the 0.5 quantile. We can get a richer picture of a sequence of numbers by looking at the 0, 0.25, 0.5, 0.7, and 1.0 quantiles. Taken together, these numbers will not only show the median, but will also summarize the range of the data and how the numbers are distributed within it. They're sometimes referred to as the *five-number summary*.

One way to calculate the five-number summary for the UK electorate data is shown as follows:

(defn quantile [q xs] (let [n (dec (count xs)) i (-> (* n q) (+ 1/2) (int))] (nth (sort xs) i))) (defn ex-1-10 [] (let [xs (->> (load-data :uk-scrubbed) (i/$ "Electorate")) f (fn [q] (quantile q xs))] (map f [0 1/4 1/2 3/4 1]))) ;; (21780.0 66219.0 70991.0 75115.0 109922.0)

Quantiles can also be calculated in Incanter directly with the `s/quantile`

function. A sequence of desired quantiles is passed as the keyword argument `:probs`

.

### Note

Incanter's `quantile`

function uses a variant of the algorithm shown earlier called the **phi-quantile**, which performs linear interpolation between consecutive numbers in certain cases. There are many alternative ways of calculating quantiles—consult https://en.wikipedia.org/wiki/Quantile for a discussion of the differences.

Where quantiles split the range into four equal ranges as earlier, they are called **quartiles**. The difference between the lower and upper quartile is referred to as the **interquartile ****range**, also often abbreviated to just **IQR**. Like the variance about the mean, the IQR gives a measure of the spread of the data about the median.