Book Image

Jupyter for Data Science

By : Dan Toomey
Book Image

Jupyter for Data Science

By: Dan Toomey

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook. If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.
Table of Contents (17 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Special note for Windows installation


Spark (really Hadoop) needs a temporary storage location for its working set of data. Under Windows this defaults to the \tmp\hive location. If the directory does not exist when Spark/Hadoop starts it will create it. Unfortunately, under Windows, the installation does not have the correct tools built-in to set the access privileges to the directory.

You should be able to run chmod under winutils to set the access privileges for the hive directory. However, I have found that the chmod function does not work correctly.

A better idea has been to create the tmp\hive directory yourself in admin mode. And then grant full privileges to the hive directory to all users, again in admin mode.

Without this change, Hadoop fails right away. When you start pyspark, the output (including any errors) are displayed in the command line window. One of the errors will be insufficient access to this directory.