Book Image

Jupyter for Data Science

By : Dan Toomey
Book Image

Jupyter for Data Science

By: Dan Toomey

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook. If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.
Table of Contents (17 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Make a prediction using scikit-learn


scikit-learn is a machine learning toolset built using Python. Part of the package is supervised learning, where the sample data points have attributes that allow you to assign the data points into separate classes. We use an estimator that assigns a data point to a class and makes predictions as to other data points with similar attributes. In scikit-learn, an estimator provides two functions, fit() and predict(), providing mechanisms to classify data points and predict classes of other data points, respectively.

As an example, we will be using the housing data from https://uci.edu/ (I think this is data for the Boston area). There are a number of factors including a price factor.

We will take the following steps:

  • We will break up the dataset into a training set and a test set
  • From the training set, we will produce a model
  • We will then use the model against the test set and evaluate how well our model fits the actual data for predicting housing prices

The...