In this section, we will use Spark DSL to build queries for structured data operations:
- In the following command, we have used the same query as used earlier; this time expressed in the Spark DSL to illustrate and compare how using the Spark DSL is different, but achieves the same goal as our SQL is shown in the previous section:
df.select("duration").filter(df.duration>2000).filter(df.protocol=="tcp").show()
In this command, we first take the df object that we created in the previous section. We then select the duration by calling the select function and feeding in the duration parameter.
- Next, in the preceding code snippet, we call the filter function twice, first by using df.duration, and the second time by using df.protocol. In the first instance, we are trying to see whether the duration is larger than 2000, and...