Book Image

Learning Jupyter

By : Dan Toomey
Book Image

Learning Jupyter

By: Dan Toomey

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. The Jupyter Notebook system is extensively used in domains such as data cleaning and transformation, numerical simulation, statistical modeling, machine learning, and much more. This book starts with a detailed overview of the Jupyter Notebook system and its installation in different environments. Next we’ll help you will learn to integrate Jupyter system with different programming languages such as R, Python, JavaScript, and Julia and explore the various versions and packages that are compatible with the Notebook system. Moving ahead, you master interactive widgets, namespaces, and working with Jupyter in a multiuser mode. Towards the end, you will use Jupyter with a big data set and will apply all the functionalities learned throughout the book.
Table of Contents (16 chapters)
Learning Jupyter
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Python data access in Jupyter


Now that we have seen how Python works in Jupyter, including the underlying encoding, then how does Python accessing a large dataset work in Jupyter?

I started another view for pandas using Python Data Access as the name. From here, we will read in a large dataset and compute some standard statistics on the data. We are interested in seeing how we use pandas in Jupyter, how well the script performs, and what information is stored in the metadata (especially if it is a larger dataset).

Our script accesses the iris dataset that's built into one of the Python packages. All we are looking to do is to read in a slightly large number of items and calculate some basic operations on the dataset. We are really interested to see how much of the data is cached in the IPYNB file

The Python code is as follows:

# import the datasets package
from sklearn import datasets
# pull in the iris data
iris_dataset = datasets.load_iris()
# grab the first two columns of data
X = iris_dataset...