Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

Box plots


A box plot is a very good plot to understand the spread, median, and outliers of data:

The various parts of the preceding figure are explained as follows:

  • Q3: This is the 75th percentile value of the data. It's also called the upper hinge.

  • Q1: This is the 25th percentile value of the data. It's also called the lower hinge.

  • Box: This is also called a step. It's the difference between the upper hinge and the lower hinge.

  • Median: This is the midpoint of the data.

  • Max: This is the upper inner fence. It is 1.5 times the step above Q3.

  • Min: This is the lower inner fence. It is 1.5 times the step below Q1.

Any value that is greater than Max or lesser than Min is called an outlier, which is also known as a flier.

The following code will create some data, and by using the boxplot function we'll create box plots:

>>> ## Creating some data
>>> np.random.seed(10)
>>> box_data_1 = np.random.normal(100, 10, 200)
>>> box_data_2 = np.random.normal(80, 30, 200)...