Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

Learn Python by Building Data Science Applications

By : Philipp Kats, David Katz

3 (3)

Learn Python by Building Data Science Applications

3 (3)

By: Philipp Kats, David Katz

Overview of this book

Python is the most widely used programming language for building data science applications. Complete with step-by-step instructions, this book contains easy-to-follow tutorials to help you learn Python and develop real-world data science projects. The “secret sauce” of the book is its curated list of topics and solutions, put together using a range of real-world projects, covering initial data collection, data analysis, and production. This Python book starts by taking you through the basics of programming, right from variables and data types to classes and functions. You’ll learn how to write idiomatic code and test and debug it, and discover how you can create packages or use the range of built-in ones. You’ll also be introduced to the extensive ecosystem of Python data science packages, including NumPy, Pandas, scikit-learn, Altair, and Datashader. Furthermore, you’ll be able to perform data analysis, train models, and interpret and communicate the results. Finally, you’ll get to grips with structuring and scheduling scripts using Luigi and sharing your machine learning models with the world as a microservice. By the end of the book, you’ll have learned not only how to implement Python in data science projects, but also how to maintain and design them to meet high programming standards.

Preface

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Section 1: Getting Started with Python

Section 1: Getting Started with Python

Preparing the Workspace

Preparing the Workspace

Technical requirements

Installing Python

Downloading materials for running the code

Working with VS Code

Beginning with Jupyter

Pre-flight check

Summary

Questions

Further reading

First Steps in Coding - Variables and Data Types

First Steps in Coding - Variables and Data Types

Technical requirements

Assigning variables

Naming the variable

Understanding data types

Converting the data types

Exercise

Summary

Questions

Further reading

Functions

Functions

Technical requirements

Understanding a function

Defining the function

Refactoring the temperature conversion

Understanding anonymous (lambda) functions

Understanding recursion

Summary

Questions

Further reading

Data Structures

Data Structures

Technical requirements

What are data structures?

More data structures

Using generators

Useful functions to use with data structures

Comprehensions

Summary

Questions

Further reading

Loops and Other Compound Statements

Loops and Other Compound Statements

Technical requirements

Understanding if, else, and elif statements

Running code many times with loops

Handling exceptions with try/except and try/finally

Understanding the with statements

Summary

Questions

Further reading

First Script – Geocoding with Web APIs

First Script – Geocoding with Web APIs

Technical requirements

Geocoding as a service

Learning about web APIs

Working with the Nominatim API

Caching with decorators

Reading and writing data

Moving code to a separate module

Collecting NYC Open Data from the Socrata service

Summary

Questions

Further reading

Scraping Data from the Web with Beautiful Soup 4

Scraping Data from the Web with Beautiful Soup 4

Technical requirements

When there is no API

Scraping WWII battles

Beyond Beautiful Soup

Summary

Questions

Further reading

Simulation with Classes and Inheritance

Simulation with Classes and Inheritance

Technical requirements

Understanding classes

Using classes in simulation

Summary

Questions

Further reading

Shell, Git, Conda, and More – at Your Command

Shell, Git, Conda, and More – at Your Command

Technical requirements

Shell

Git

Conda

Make

Cookiecutter

Summary

Questions

Section 2: Hands-On with Data

Section 2: Hands-On with Data

Python for Data Applications

Python for Data Applications

Technical requirements

Introducing Python for data science

Exploring NumPy

Beginning with pandas

Trying SciPy and scikit-learn

Understanding Jupyter

Summary

Questions

Data Cleaning and Manipulation

Data Cleaning and Manipulation

Technical requirements

Getting started with pandas

Working with real data

Getting to know regular expressions

Parsing locations

Time

Belligerents

Understanding casualties

Quality assurance

Writing the file

Summary

Questions

Further reading

Data Exploration and Visualization

Data Exploration and Visualization

Technical requirements

Exploring the dataset

Declarative visualization with vega and altair

Big data visualization with datashader

Summary

Questions

Further reading

Training a Machine Learning Model

Training a Machine Learning Model

Technical requirements

Understanding the basics of ML

Summary

Questions

Further reading

Improving Your Model – Pipelines and Experiments

Improving Your Model – Pipelines and Experiments

Technical requirements

Understanding cross-validation

Exploring feature engineering

Optimizing the hyperparameters

Tracking your data and metrics with version control

Summary

Questions

Further reading

Section 3: Moving to Production

Section 3: Moving to Production

Packaging and Testing with Poetry and PyTest

Packaging and Testing with Poetry and PyTest

Technical requirements

Building a package

A few ways to build your package

Testing the code so far

Automating the process with CI services

Generating documentation generation with sphinx

Installing a package in editable mode

Summary

Questions

Further reading

Data Pipelines with Luigi

Data Pipelines with Luigi

Technical requirements

Introducing the ETL pipeline

Building our first task in Luigi

Understanding time-based tasks

Exploring the different output formats

Expanding Luigi with custom template classes

Summary

Questions

Further reading

Let's Build a Dashboard

Let's Build a Dashboard

Technical requirements

Building a dashboard – three types of dashboard

Understanding dynamic dashboards

Summary

Questions

Further reading

Serving Models with a RESTful API

Serving Models with a RESTful API

Technical requirements

What is a RESTful API?

Building a basic API service

Building a web page

Speeding up with asynchronous calls

Deploying and testing your API loads with Locust

Summary

Questions

Further reading

Serverless API Using Chalice

Serverless API Using Chalice

Technical requirements

Understanding serverless

Getting started with Chalice

Setting up a simple model

Building a serverless API for an ML model

Building a serverless function as a data pipeline

Summary

Questions

Further reading

Best Practices and Python Performance

Best Practices and Python Performance

Technical requirements

Speeding up your Python code

Using best practices for coding in your project

Beyond this book – packages and technologies to look out for

Summary

Questions

Further reading

Assessments

Assessments

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

Chapter 14

Chapter 15

Chapter 16

Chapter 17

Chapter 18

Chapter 19

Chapter 20

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Optimizing the hyperparameters

There are probably a lot of other features to add, but let's now shift our attention to the model itself. For now, we assumed the default, static parameters of the model, restricting its max_depth parameter to an arbitrary number. Now, let's try to fine-tune those parameters. If done properly, this process could add a few additional percentage points to the model accuracy, and sometimes, even a small gain in performance metrics can be a game-changer.

To do this, we'll use RandomizedSearchCV—another wrapper around the concept of cross-validation, but this time, one that iterates over parameters of the model, trying to find the optimal ones. A simpler approach, called GridSearchCV, takes a finite number of parameters, creates all of the permutations, and runs them all iteratively using, essentially, a brute-force approach.

Randomized...

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Learn Python by Building Data Science Applications

Search

Your notes and bookmarks