Twitter is a famous microblogging platform. It produces a massive amount of data with around 500 million tweets sent each day. Twitter allows its data to be accessed by APIs, and that makes it the poster child of testing any big data streaming application.
In this recipe, we will see how we can live stream data in Spark using Twitter-streaming libraries. Twitter is just one source of providing streaming data to Spark and has no special status. Therefore, there are no built-in libraries for Twitter. Spark does provide some APIs to facilitate the integration with Twitter libraries, though.
An example use of a live Twitter data feed can be to find trending tweets in the last 5 minutes.
- Create a Twitter account if you do not already have one.
- Go to http://apps.twitter.com.
- Click on
Create New App
. - Fill out the
Name
,Description
,Website
, andCallback URL
fields and then click onCreate your Twitter Application
. You will receive a screen like this:
- You will reach...