Book Image

Python Real-World Projects

By : Steven F. Lott
5 (1)
Book Image

Python Real-World Projects

5 (1)
By: Steven F. Lott

Overview of this book

In today's competitive job market, a project portfolio often outshines a traditional resume. Python Real-World Projects empowers you to get to grips with crucial Python concepts while building complete modules and applications. With two dozen meticulously designed projects to explore, this book will help you showcase your Python mastery and refine your skills. Tailored for beginners with a foundational understanding of class definitions, module creation, and Python's inherent data structures, this book is your gateway to programming excellence. You’ll learn how to harness the potential of the standard library and key external projects like JupyterLab, Pydantic, pytest, and requests. You’ll also gain experience with enterprise-oriented methodologies, including unit and acceptance testing, and an agile development approach. Additionally, you’ll dive into the software development lifecycle, starting with a minimum viable product and seamlessly expanding it to add innovative features. By the end of this book, you’ll be armed with a myriad of practical Python projects and all set to accelerate your career as a Python programmer.
Table of Contents (20 chapters)
19
Index

9.1 Description

We need to build a data validating, cleaning, and standardizing application. A data inspection notebook is a handy starting point for this design work. The goal is a fully-automated application to reflect the lessons learned from inspecting the data.

A data preparation pipeline has the following conceptual tasks:

  • Validate the acquired source text to be sure it’s usable and to mark invalid data for remediation.

  • Clean any invalid raw data where necessary; this expands the available data in those cases where sensible cleaning can be defined.

  • Convert the validated and cleaned source data from text (or bytes) to usable Python objects.

  • Where necessary, standardize the code or ranges of source data. The requirements here vary with the problem domain.

The goal is to create clean, standardized data for subsequent analysis. Surprises occur all the time. There are several sources:

  • Technical problems with file formats of the upstream software. The intent of the acquisition...