The scheduler/executor component of the streaming architecture
Whenever we want to execute a highly-distributed set of interconnected tasks whose life cycle needs to be managed, we run into a set of complexity. If we look at the nonfunctional requirements of these tasks, they boil down to the following:
- The tasks should be executed in the order that they are defined
- If a task fails, it should be automatically restarted to bring fault-tolerance into the overall system
- If a machine fails, all the tasks running on that system should be restarted on a different machine
- If there are fewer resources and more tasks to execute, higher-priority tasks should get executed first and lower-priority tasks should be queued, and resources divided in some defined order between tasks
These are just some of the requirements, but I hope you can see the problem here. It is not just executing a set of tasks, but it is about handling situations related to resource-management. Resources here could be CPU, memory, time...