Distributed Data Systems with Azure Databricks

Distributed Data Systems with Azure Databricks

Section 1: Introducing Databricks
Section 2: Data Pipelines with Databricks
Section 3: Machine and Deep Learning with Databricks

Example on Structured Streaming

In this example, we will be looking at how we can leverage knowledge we have acquired on Structured Streaming throughout the previous sections. We will simulate an incoming stream of data by using one of the example datasets in which we have small JSON files that, in real scenarios, could be the incoming stream of data that we want to process. We will use these files in order to compute metrics such as counts and windowed counts on a stream of timestamped actions. Let's take a look at the contents of the structured-streaming example dataset, as follows:

%fs ls /databricks-datasets/structured-streaming/events/

You will find that there are about 50 JSON files in the directory. You can see some of these in the following screenshot:

Figure 6.3 – The structured-streaming dataset's JSON files

We can see what one of these JSON files contains by using the fs head option, as follows:

%fs head /databricks-datasets...
Unlock full access

Continue reading with a subscription

Packt gives you instant online access to a library of over 7,500 practical eBooks and videos, constantly updated with the latest in tech

Your notes and bookmarks