-
Book Overview & Buying
-
Table Of Contents
Building ETL Pipelines with Python
By :
We will be using open source data for CSV, Parquet, and APIs, as well as manually preparing data for RDBMS databases and HTML using public safety data from NYC Open Data (available at https://data.cityofnewyork.us).
Within your PyCharm terminal, verify that your pipenv virtual environment has been activated and open the Jupyter notebook associated with Chapter 4. In the first cell, import the pandas module into your notebook, like so:
# Import modules import pandas as pd
Not surprisingly, stored data files are commonly used as an input data source for an extract, transform, load (ETL) pipeline. Data files can be sourced from anywhere, from locally stored files on your device to cloud storage filesystems. Even when primarily working with databases or external APIs, using physical files is a great way to use timestamped data with ease, which can come in handy during any temporary connection issues.
Download...