Book Image

Learn Python by Building Data Science Applications

By : Philipp Kats, David Katz
Book Image

Learn Python by Building Data Science Applications

By: Philipp Kats, David Katz

Overview of this book

Python is the most widely used programming language for building data science applications. Complete with step-by-step instructions, this book contains easy-to-follow tutorials to help you learn Python and develop real-world data science projects. The “secret sauce” of the book is its curated list of topics and solutions, put together using a range of real-world projects, covering initial data collection, data analysis, and production. This Python book starts by taking you through the basics of programming, right from variables and data types to classes and functions. You’ll learn how to write idiomatic code and test and debug it, and discover how you can create packages or use the range of built-in ones. You’ll also be introduced to the extensive ecosystem of Python data science packages, including NumPy, Pandas, scikit-learn, Altair, and Datashader. Furthermore, you’ll be able to perform data analysis, train models, and interpret and communicate the results. Finally, you’ll get to grips with structuring and scheduling scripts using Luigi and sharing your machine learning models with the world as a microservice. By the end of the book, you’ll have learned not only how to implement Python in data science projects, but also how to maintain and design them to meet high programming standards.
Table of Contents (26 chapters)
Free Chapter
1
Section 1: Getting Started with Python
11
Section 2: Hands-On with Data
17
Section 3: Moving to Production

To get the most out of this book

This book is designed for complete beginners and people who have just started to learn to code. It does not require any specific knowledge besides basic computer literacy.

The execution of the code examples provided in this book requires an installation of Python 3.7.3 or later on macOS, Linux, or Microsoft Windows. The code presented throughout the book makes use of many Python libraries. In each chapter, a list of required libraries is given at the beginning. A full list of libraries is stored in the GitHub repository, in the environment.yaml file. The same file can be used to install Python and all of the required libraries in bulk—full instructions are given in Chapter 1, Preparing the Workspace.

The code for this book was developed in and extensively uses two development environments—VS Code editor with its Python bundle, and Jupyter. We recommend using both for better alignment with the book's narrative.

The code for Chapter 6, First Script – Geocoding with Web APIs, Chapter 7, Scraping Data from the Web with Beautiful Soup 4, Chapter 11, Data Cleaning and Manipulation, and Chapter 16, Data Pipelines with Luigi, requires an internet connection.

The first chapter will provide you with step-by-step instructions and some useful tips for setting up your Python environment, the core libraries, and all the necessary tools.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learn-Python-by-Building-Data-Science-Applications. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

Code in Action

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "As you can see, pi is a float, name is a string, age is an integer, and sky_is_blue is a Boolean."

A block of code is set as follows:

import pandas as pd

for word in 'Hello Word!'.split():
print(word)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

pi = 3.14159265359    # Decimal
name = 'Philipp' # Text
age = 31 # Integer
sky_is_blue = True # Boolean

Often code will be shown as a print of an interactive console, with both code and the output being mixed. In this case, all input code lines will start with a triple "greater than" sign. Lines with no such sign represent the output:

>>> import pandas as pd
>>> for word in 'Hello Word!'.split():
>>> print(word)

Hello
Word

Any command-line input or output is written as follows:

> conda install <mypackage>
> conda install -c <mychannel> <mypackage>

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Just use the Clone or download button on the right-hand side (1), and select Download ZIP (2)."

Warnings or important notes appear like this.
Tips and tricks appear like this.