Book Image

Learn Python by Building Data Science Applications

By : Philipp Kats, David Katz

Book Image

Learn Python by Building Data Science Applications

By: Philipp Kats, David Katz

Overview of this book

Python is the most widely used programming language for building data science applications. Complete with step-by-step instructions, this book contains easy-to-follow tutorials to help you learn Python and develop real-world data science projects. The “secret sauce” of the book is its curated list of topics and solutions, put together using a range of real-world projects, covering initial data collection, data analysis, and production. This Python book starts by taking you through the basics of programming, right from variables and data types to classes and functions. You’ll learn how to write idiomatic code and test and debug it, and discover how you can create packages or use the range of built-in ones. You’ll also be introduced to the extensive ecosystem of Python data science packages, including NumPy, Pandas, scikit-learn, Altair, and Datashader. Furthermore, you’ll be able to perform data analysis, train models, and interpret and communicate the results. Finally, you’ll get to grips with structuring and scheduling scripts using Luigi and sharing your machine learning models with the world as a microservice. By the end of the book, you’ll have learned not only how to implement Python in data science projects, but also how to maintain and design them to meet high programming standards.

Preface

Who this book is for

What this book covers

To get the most out of this book

Free Chapter

Section 1: Getting Started with Python

Section 1: Getting Started with Python

Preparing the Workspace

Preparing the Workspace

Technical requirements

Installing Python

Downloading materials for running the code

Working with VS Code

Beginning with Jupyter

Pre-flight check

Further reading

First Steps in Coding - Variables and Data Types

First Steps in Coding - Variables and Data Types

Technical requirements

Assigning variables

Naming the variable

Understanding data types

Converting the data types

Further reading

Functions

Technical requirements

Understanding a function

Defining the function

Refactoring the temperature conversion

Understanding anonymous (lambda) functions

Understanding recursion

Further reading

Data Structures

Data Structures

Technical requirements

What are data structures?

More data structures

Using generators

Useful functions to use with data structures

Further reading

Loops and Other Compound Statements

Loops and Other Compound Statements

Technical requirements

Understanding if, else, and elif statements

Running code many times with loops

Handling exceptions with try/except and try/finally

Understanding the with statements

Further reading

First Script – Geocoding with Web APIs

First Script – Geocoding with Web APIs

Technical requirements

Geocoding as a service

Learning about web APIs

Working with the Nominatim API

Caching with decorators

Reading and writing data

Moving code to a separate module

Collecting NYC Open Data from the Socrata service

Further reading

Scraping Data from the Web with Beautiful Soup 4

Scraping Data from the Web with Beautiful Soup 4

Technical requirements

When there is no API

Scraping WWII battles

Beyond Beautiful Soup

Further reading

Simulation with Classes and Inheritance

Simulation with Classes and Inheritance

Technical requirements

Understanding classes

Using classes in simulation

Further reading

Shell, Git, Conda, and More – at Your Command

Shell, Git, Conda, and More – at Your Command

Technical requirements

Section 2: Hands-On with Data

Section 2: Hands-On with Data

Python for Data Applications

Python for Data Applications

Technical requirements

Introducing Python for data science

Exploring NumPy

Beginning with pandas

Trying SciPy and scikit-learn

Understanding Jupyter

Data Cleaning and Manipulation

Data Cleaning and Manipulation

Technical requirements

Getting started with pandas

Working with real data

Getting to know regular expressions

Parsing locations

Understanding casualties

Quality assurance

Writing the file

Further reading

Data Exploration and Visualization

Data Exploration and Visualization

Technical requirements

Exploring the dataset

Declarative visualization with vega and altair

Big data visualization with datashader

Further reading

Training a Machine Learning Model

Training a Machine Learning Model

Technical requirements

Understanding the basics of ML

Further reading

Improving Your Model – Pipelines and Experiments

Improving Your Model – Pipelines and Experiments

Technical requirements

Understanding cross-validation

Exploring feature engineering

Optimizing the hyperparameters

Tracking your data and metrics with version control

Further reading

Section 3: Moving to Production

Section 3: Moving to Production

Packaging and Testing with Poetry and PyTest

Packaging and Testing with Poetry and PyTest

Technical requirements

Building a package

A few ways to build your package

Testing the code so far

Automating the process with CI services

Generating documentation generation with sphinx

Installing a package in editable mode

Further reading

Data Pipelines with Luigi

Data Pipelines with Luigi

Technical requirements

Introducing the ETL pipeline

Building our first task in Luigi

Understanding time-based tasks

Exploring the different output formats

Expanding Luigi with custom template classes

Further reading

Let's Build a Dashboard

Let's Build a Dashboard

Technical requirements

Building a dashboard – three types of dashboard

Understanding dynamic dashboards

Further reading

Serving Models with a RESTful API

Serving Models with a RESTful API

Technical requirements

What is a RESTful API?

Building a basic API service

Building a web page

Speeding up with asynchronous calls

Deploying and testing your API loads with Locust

Further reading

Serverless API Using Chalice

Serverless API Using Chalice

Technical requirements

Understanding serverless

Getting started with Chalice

Setting up a simple model

Building a serverless API for an ML model

Building a serverless function as a data pipeline

Further reading

Best Practices and Python Performance

Best Practices and Python Performance

Technical requirements

Speeding up your Python code

Using best practices for coding in your project

Beyond this book – packages and technologies to look out for

Further reading

Assessments

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Understanding the basics of ML

As it's implied in its name, Machine Learning (ML) is the science of building machines (algorithms) that can learn from data. In other words, this class of algorithms generates certain outcomes (predictions) based on the relations they infer from the training data—not from the hardcoded, predetermined rules. Usually, ML is described as having two main branches—supervised and unsupervised ML.

Unsupervised models attempt to find structure in the data itself, without any given supervision or target to focus on. The usual task is to find clusters of similar records (for example, users) to understand the underlying latent logic (for example, using target audiences and the corresponding use cases for the service).

Supervised learning is all about training the model by feeding it pairs of independent features and the correct values of...