Book Image

Jupyter for Data Science

By : Dan Toomey
Book Image

Jupyter for Data Science

By: Dan Toomey

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook. If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.
Table of Contents (17 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Manipulating data with dplyr


The dplyr package for R is described as a package providing a grammar for data manipulation. It has the entry points you would expect for wrangling your data frame in one package. We will use the dplyr package against the baseball player statistics we used earlier in this chapter.

We read in the player data and show the first few rows:

players <- read.csv(file="Documents/baseball.csv", header=TRUE, sep=",")
head(players)

We will be using the dplyr package, so we need to pull the package into our notebook:

library(dplyr)

Converting a data frame to a dplyr table

The dplyr package has functions to convert your data object into a dplyr table. A dplyr table stores data in a compact format using much less memory. Most of the other dplyr functions can operate directly on the table as well.

We can convert our data frame to a table using:

playerst <- tbl_df(players)playerst

This results in a very similar display pattern:

Getting a quick overview of the data value ranges

Another...