## Computing the median in a large dataset

As you have seen in the first recipe, computing the median requires having all the values available. With something like a mean, we just need an accumulator and a counter. The fundamental point of this recipe is to introduce the idea of approximate computing; with big data, it may not always be the best strategy to get the precise value (of course, this should be evaluated on a case-by-case basis).

### Getting ready

We will require the first recipe to have been fully run.

Here, we will take two different strategies to compute the median: approximating the data points in a way that allows compression of data and subsampling of data.

As usual, this is available in the `08_Advanced/Median.ipynb`

notebook.

### How to do it...

Take a look at the following steps:

Our first approach will be to use approximations of all values, starting with creating a dictionary. This code should be run where the first recipe was run:

from __future__ import division, print_function import...