Summary
In this chapter, we focused on looking at data collection systems that are available in the open source community and can be used to implement different varieties of use cases.
We looked at NiFi, which is a highly-scalable and user-friendly system to define data flows. We looked at Sqoop, which addresses a very specific use case of transferring data between HDFS and relational systems. We also discussed the ELK stack, which is very popular in the industry for collecting and visualizing large amounts of data.
The next chapter will focus on the next aspect of handling data, which is data processing.
We will discuss the various requirements of Data Processing, and we will look at various challenges in processing data at scale. Stay tuned.