Book Image

Python Real-World Projects

By : Steven F. Lott
5 (1)
Book Image

Python Real-World Projects

5 (1)
By: Steven F. Lott

Overview of this book

In today's competitive job market, a project portfolio often outshines a traditional resume. Python Real-World Projects empowers you to get to grips with crucial Python concepts while building complete modules and applications. With two dozen meticulously designed projects to explore, this book will help you showcase your Python mastery and refine your skills. Tailored for beginners with a foundational understanding of class definitions, module creation, and Python's inherent data structures, this book is your gateway to programming excellence. You’ll learn how to harness the potential of the standard library and key external projects like JupyterLab, Pydantic, pytest, and requests. You’ll also gain experience with enterprise-oriented methodologies, including unit and acceptance testing, and an agile development approach. Additionally, you’ll dive into the software development lifecycle, starting with a minimum viable product and seamlessly expanding it to add innovative features. By the end of this book, you’ll be armed with a myriad of practical Python projects and all set to accelerate your career as a Python programmer.
Table of Contents (20 chapters)
19
Index

2.1 General data acquisition

All data analysis processing starts with the essential step of acquiring the data from a source.

The above statement seems almost silly, but failures in this effort often lead to complicated rework later. It’s essential to recognize that data exists in these two essential forms:

  • Python objects, usable in analytic programs. While the obvious candidates are numbers and strings, this includes using packages like Pillow to operate on images as Python objects. A package like librosa can create objects representing audio data.

  • A serialization of a Python object. There are many choices here:

    • Text. Some kind of string. There are numerous syntax variants, including CSV, JSON, TOML, YAML, HTML, XML, etc.

    • Pickled Python Objects. These are created by the pickle module.

    • Binary Formats. Tools like Protobuf can serialize native Python objects into a stream of bytes. Some YAML extensions, similarly, can serialize an object in a binary format that isn’t text. Images and audio samples are often stored in compressed binary formats.

The format for the source data is — almost universally — not fixed by any rules or conventions. Writing an application based on the assumption that source data is always a CSV-format file can lead to problems when a new format is required.

It’s best to treat all input formats as subject to change. The data — once acquired — can be saved in a common format used by the analysis pipeline, and independent of the source format (we’ll get to the persistence in Clean, validate, standardize, and persist).

We’ll start with Project 1.1: ”Acquire Data”. This will build the Data Acquisition Base Application. It will acquire CSV-format data and serve as the basis for adding formats in later projects.

There are a number of variants on how data is acquired. In the next few chapters, we’ll look at some alternative data extraction approaches.