Jupyter for Data Science

Jupyter for Data Science

By : Dan Toomey

Buy this Book

Jupyter for Data Science

By: Dan Toomey

Buy this Book

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook. If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.

Title Page

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Jupyter and Data Science

Jupyter concepts

A first look at the Jupyter user interface

Summary

Working with Analytical Data on Jupyter

Data scraping with a Python notebook

Using heavy-duty data processing functions in Jupyter

Using SciPy in Jupyter

Expanding on panda data frames in Jupyter

Summary

Data Visualization and Prediction

Make a prediction using scikit-learn

Make a prediction using R

Interactive visualization

Plotting using Plotly

Creating a human density map

Draw a histogram of social data

Plotting 3D data

Summary

Data Mining and SQL Queries

Special note for Windows installation

Using Spark to analyze data

Another MapReduce example

Using SparkSession and SQL

Combining datasets

Loading JSON into Spark

Using Spark pivot

Summary

R with Jupyter

How to set up R for Jupyter

R data analysis of the 2016 US election demographics

Analyzing 2016 voter registration and voting

Analyzing changes in college admissions

Predicting airplane arrival time

Summary

Data Wrangling

Reading a CSV file

Reading another CSV file

Manipulating data with dplyr

Sampling a dataset

Tidying up data with tidyr

Summary

Jupyter Dashboards

Visualizing glyph ready data

Publishing a notebook

Creating a Shiny dashboard

Building standalone dashboards

Summary

Statistical Modeling

Converting JSON to CSV

Evaluating Yelp reviews

Using Python to compare ratings

Visualizing average ratings by cuisine

Arbitrary search of ratings

Determining relationships between number of ratings and ratings

Machine Learning Using Jupyter

Naive Bayes

Nearest neighbor estimator

Decision trees

Neural networks

Random forests

Summary

Optimizing Jupyter Notebooks

Deploying notebooks

Optimizing your script

Monitoring Jupyter

Caching your notebook

Securing a notebook

Scaling Jupyter Notebooks

Sharing Jupyter Notebooks

Converting a notebook

Versioning a notebook

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Another MapReduce example

We can use MapReduce in another example where we get the word counts from a file. A standard problem, but we use MapReduce to do most of the heavy lifting. We can use the source code for this example. We can use a script similar to this to count the word occurrences in a file:

import pysparkif not 'sc' in globals():    sc = pyspark.SparkContext()text_file = sc.textFile("Spark File Words.ipynb")counts = text_file.flatMap(lambda line: line.split(" ")) \             .map(lambda word: (word, 1)) \             .reduceByKey(lambda a, b: a + b)for x in counts.collect():    print x

Note

We have the same preamble to the coding.

Then we load the text file into memory.

Note

text_file is a Spark RDD (Resilient Distributed Dataset), not a data frame.

It is assumed to be massive and the contents distributed over many handlers.

Once the file is loaded we split each line into words, and then use a lambda function to tick off each occurrence of a word. The code is truly creating a new record...

Jupyter for Data Science

By : Dan Toomey

Jupyter for Data Science

By: Dan Toomey

Overview of this book

Related Content you might be interested in

Current Title:

Jupyter for Data Science

Jupyter Cookbook

Learning Jupyter 5

Hands-On Exploratory Data Analysis with R

Another MapReduce example

Note

Note