-
Book Overview & Buying
-
Table Of Contents
Modern Data Architectures with Python
By :
Now that we have all the pre-setup work done, let’s jump right into organizing and running our workloads in Databricks. We will cover a variety of topics, the first of which is managing incremental new additions via files.
Spark Streaming isn’t something new and many deployments are using it in their data platforms. Spark Streaming has rough edges that Autoloader resolves. Autoloader is an efficient way to have Databricks detect new files and process them. Autoloader works with the Spark structured streaming context, so there isn’t much difference in usage once it’s set up.
To create a streaming DataFrame using Autoloader, you can simply use the cloud file format, along with the needed options. In the following case, we are setting the schema, delimiter, and format for a CSV load:
spark.readStream.format("cloudFiles") \
.option("cloudFiles...
Change the font size
Change margin width
Change background colour