Like MapReduce, Spark uses speculation to spawn additional tasks if it suspects a task is running on a straggler node. A good use case would be to think of a situation when 95 percent or 99 percent of your job finishes really fast and then gets stuck (we have all been there).
There are a few settings you can use to control speculation. The examples are provided only to show how to change values. Mostly, just turning on speculation is good enough:
- Setting
spark.speculation
(the default isfalse
):
$ spark-shell -conf spark.speculation=true
- Setting
spark.speculation.interval
(the default is100
milliseconds) (denotes the rate at which Spark examines tasks to see whether speculation is needed):
$ spark-shell -conf spark.speculation.interval=200
- Setting
spark.speculation.multiplier
(the default is1.5
) (denotes how many times a task has to be slower than median to be a candidate for speculation):
$ spark-shell -conf spark.speculation.multiplier=1.5
- Setting
spark...