In YARN, whenever a job is submitted, it sets up a distributed cache for jars and configuration files per job. What this means is that the jars will be cached during the execution life cycle of a job. However, often the jars or the code does not change across different users of the cluster.
To solve the problem of loading jars for every job, which consume network bandwidth, a proposal is in place to implement a shared cache across the cluster for all users to use it.
You will need a running cluster with HDFS and YARN set up properly so that the user can run test jobs such as pi or wordcount examples on it.