Introducing workload groups
I remember working on a big data project where we had a wide range of end users and applications using our clusters. At one end of the spectrum, we had engineers executing ad hoc queries to analyze application logs, while at the other end, we had product management and customer support teams running complex reports by using integrations into third-party tools, such as Power BI, to gain insights into usage patterns and statistics. At the end of each month, the team would start to receive phone calls and tickets related to query and job performance. Users were complaining that their jobs were either not running or timing out. It turned out that the customer support team was running jobs and reports to generate billing information and that these jobs were resource-intensive and would consume all the resources, causing other jobs to be queued or time out. The only way to resolve the issue was to log into the cluster and kill the long-running tasks.
Managing...