-
Book Overview & Buying
-
Table Of Contents
Python Data Cleaning Cookbook - Second Edition
By :
There may be times when we want to persist data without overwriting a prior version of the data file. This can be accomplished by appending a time stamp to a filename or a unique identifier. However, there are more elegant solutions available. One such solution is the Delta Lake library, which we will explore in this recipe.
We will work with the land temperature data again in this recipe. We will load the data, save it to a data lake, and then save an altered version to the same data lake.
We will be using the Delta Lake library in this recipe, which can be installed with pip install deltalake. We will also need the os library so that we can make a directory for the data lake.
You can get started with the data and version it as follows:
temps_lake for our data versions:
import pandas as pd
from deltalake.writer import write_deltalake...