New challenges appear, to the extent that data size increases. Large sets of data bring problems related to excessive processing time and great memory consumption. These problems may turn data analysis into a painful process or may even make it completely impossible.
In this chapter, we will create an application capable of processing huge datasets in an efficient way. We will review our code, implementing new tools and techniques that will make our analysis not only run faster, but also make better use of computer hardware, allowing virtually any amount of data to be processed.
In order to achieve those goals, we will learn how to use databases and how to stream the data into them, making the use of computing power constant and stable regardless of the amount of data.
These tools will also enable us to perform more advanced searches, calculations, and cross information from different sources, allowing you to mine the data for precious information.
This chapter will...