Book Image

Jupyter for Data Science

By : Dan Toomey
Book Image

Jupyter for Data Science

By: Dan Toomey

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook. If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.
Table of Contents (17 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Analyzing changes in college admissions


We can look at trends in college admissions acceptance rates over the last few years. For this analysis, I am using the data on https://www.ivywise.com/ivywise-knowledgebase/admission-statistics.

First, we read in our dataset and show the summary points, from head to validate:

df <- read.csv("Documents/acceptance-rates.csv")summary(df)head(df)

We see the summary data for school acceptance rates as follows:

It's interesting to note that the acceptance rate varies so widely, from a low of 5 percent to a high of 41 percent in 2017.

Let us look at the data plots, again, to validate that the data points are correct:

plot(df)

From the correlation graphics shown, it does not look like we can use the data points from 2007. The graphs show a big divergence between 2007 and the other years, whereas the other three have good correlations.

So, we have 3 consecutive years of data from 25 major US universities. We can convert the data into a time series using a few steps...