Book Image

Learning Pandas

By : Michael Heydt
Book Image

Learning Pandas

By: Michael Heydt

Overview of this book

Table of Contents (19 chapters)
Learning pandas
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Discretization and Binning


Although not directly using grouping constructs, in a chapter on grouping, it is worth explaining the process of discretization of continuous data. Discretization is a means of slicing up continuous data into a set of "bins", where each bin represents a range of the continuous sample and the items are then placed into the appropriate bin—hence the term "binning". Discretization in pandas is performed using the pd.cut() and pd.qcut() functions.

We will look at discretization by generating a large set of normally distributed random numbers and cutting these numbers into various pieces and analyzing the contents of the bins. The following generates 10000 numbers and reports the mean and standard deviation, which we expect to approach 0 and 1 as the sample size gets larger:

In [48]:
   # generate 10000 normal random #'s
   np.random.seed(123456)
   dist = np.random.normal(size = 10000)

   # show the mean and std
   "{0} {1}".format(dist.mean(), dist.std())

Out[48...