In this recipe, we explore Spark 2.0's implementation for Survival regression, which is not the typical proportional hazard model, but the Accelerated Failure Time (AFT) model instead. This is an important distinction that should be kept in mind while running this recipe otherwise the results would not make sense.
The survival regression analysis considers itself with models of time to an event nature, which are common in medicine, insurance, and anytime survivability of the subject is of interest. One of my coauthors happen to be a fully trained medical doctor (in addition to being a computer scientist), so we use a real dataset HMO-HIM+ study from a well-respected book in the field so we can obtain a reasonable output.
Currently, we are using this technique to do drought modeling at scale to predict price impact on agricultural commodities in long-range time frames and forecasts.