Elastic Data Processing functionality exposed by Sahara simplifies enormously the execution of any task or job in a defined Hadoop cluster. Now, when we have a running big data cluster on OpenStack, it is possible to burst workload and check results.
To run a job in Sahara, several objects must be provided before executing a job as the following:
Data localization: Path of input/output data
Code: Defines which code will be executed and run
It is important to decide the first place where Sahara should grab data to start processing it and where it will store results. The next diagram shows a sample overview of data sources using Swift within Sahara:
Data localization can be optionally configured in Sahara using one of the following storage types:
Swift: OpenStack object storage
HDFS: Native Hadoop Distributed File System storage
Manila: Network filesystem shares in OpenStack