Book Image

Jupyter for Data Science

By : Dan Toomey
Book Image

Jupyter for Data Science

By: Dan Toomey

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook. If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.
Table of Contents (17 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Reading another CSV file


We can look at another CSV in the same dataset to see what kind of issues we run across. Using the yearly batting records for all Major League Baseball players that we previously downloaded from the same site, we can use coding like the following to start analyzing the data:

players <- read.csv(file="Documents/baseball.csv", header=TRUE, sep=",")head(players)

This produces the following head display:

There are many statistics for baseball players in this dataset. There are also many NA values. R is pretty good at ignoring NA values. Let us first look at the statistics for the data using:

summary(players)

This generates statistics on all the fields involved (there are several more that are not in this display):

A number of interesting points are visible in the preceding display that are worth noting:

  • We have about 30 data points per player
  • It is interesting that the player data goes back to 1871
  • There are about 1,000 data points per team
  • American League and National League...