In this chapter, we will develop a data pipeline to fetch, store, and, later on, analyze bitcoin transaction data.
After an introduction to Apache Spark, we will see how to call a REST API to fetch transactions from a cryptocurrency exchange. A cryptocurrency exchange allows customers to trade digital currencies, such as bitcoin, for fiat currencies, such as the US dollar. The transaction data will allow us to track the price and quantity exchanged at a certain point in time.
We will then introduce the Parquet format. This is a columnar data format that is widely used for big data analytics. After that, we will build a standalone application that will produce a history of bitcoin/USD transactions and save it in Parquet. In the following chapter, we will use Apache Zeppelin to query and analyze the data interactively.
The volume of data that we will deal with is not very large, but the tools and techniques used will be the same if the data...