Enterprises in today's modern era consume data from a variety of data sources. The delivery of data from these sources is not only in different formats (CSV, text, Excel, and so on), but at the same time they may provide different mechanisms for data consumption. For example, some data sources may provide a particular location on the shared filesystem or some may provide data streams (https://en.wikipedia.org/wiki/Data_stream) or queuing-based systems.
Though there are tools and technologies to handle the complexities of data consumption, the real challenge is always to have a single solution/platform that can meet and solve all problems. Enterprises have been focusing on developing/deploying a single platform that is flexible and extendable to handle all complexities of data consumption/processing and produce it in a unified format.
Spark with its extensions is emerging as a one-stop solution to meet all the requirements of enterprises...