So far, we have worked with data sources that built-in support available in Spark. However, Spark Streaming can receive data from any arbitrary source, but we will need to implement a receiver for receiving data from the custom data source.
In this section, we will define a custom source for public APIs available from the Transport for London (TfL) site. This site makes a unified API available for each mode of transportation in London. These APIs provide access to real-time data, for instance, rail arrivals. The output is available in the XML and JSON formats. We will use the APIs for current arrival predictions of underground on a specific line.
Note
The reference site for TfL is https://tfl.gov.uk; register on this site to generate an application key for accessing the APIs.
We will start by extending the abstract class Receiver
and implementing the onStart()
and onStop()
methods. In the onStart()
method, we start the threads responsible for receiving...