Using Spark Streaming, it is possible to update the parameters of the linear model online. In many ways, Spark Streaming's linear regression solution works very similarly to the k-means streaming solution.
We will be using the StreamingLinearRegressionWithSGD class that is provided as part of Spark MLlib. To initialize a StreamingLinearRegressionWithSGD object, the following needs to be done:
- Instantiate the StreamingLinearRegressionWithSGD object using the new StreamingLinearRegressionWithSGD() method
- Set the number of initial weights
- We should get a model that can be trained in a streaming fashion and can be used to make predictions
Let's explore this solution in a Spark shell by going through the following steps:
- Start a Spark shell in your Terminal as follows:
- Stop the current Spark session using the following...