Book Image

Learning Jupyter 5 - Second Edition

Book Image

Learning Jupyter 5 - Second Edition

Overview of this book

The Jupyter Notebook allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. The Jupyter Notebook system is extensively used in domains such as data cleaning and transformation, numerical simulation, statistical modeling, and machine learning. Learning Jupyter 5 will help you get to grips with interactive computing using real-world examples. The book starts with a detailed overview of the Jupyter Notebook system and its installation in different environments. Next, you will learn to integrate the Jupyter system with different programming languages such as R, Python, Java, JavaScript, and Julia, and explore various versions and packages that are compatible with the Notebook system. Moving ahead, you will master interactive widgets and namespaces and work with Jupyter in a multi-user mode. By the end of this book, you will have used Jupyter with a big dataset and be able to apply all the functionalities you’ve explored throughout the book. You will also have learned all about the Jupyter Notebook and be able to start performing data transformation, numerical simulation, and data visualization.
Table of Contents (18 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

R machine learning


In this section, we will use an approach for machine learning where we will do the following:

  • Partition the dataset into a training and testing set
  • Generate a model of the data
  • Test the efficiency of our model

 

Dataset

Machine learning works by featuring a dataset that we will break up into a training section and a testing section. We will use the training data to come up with a model. We can then prove or test that model against the testing dataset.

For a dataset to be usable, we need at least a few hundred observations. I am using the housing data from http://uci.edu. Let's load the dataset by using the following command:

housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data") 

The site documents the names of the variables as follows:

Variables

Description

CRIM

Per capita crime rate

ZN

Residential zone rate percentage

INDUS

Proportion of non-retail business in town

CHAS

Proximity to Charles River (Boolean)

NOX

Nitric oxide concentration

RM...