Optimizing training costs with Managed Spot Training
In the previous chapter, we trained the Image Classification algorithm on the ImageNet dataset. The job ran for a little less than 5 hours. At $350 per hour, it cost $1,680. That's a lot of money, but is it really?
Before you throw your arms up the air yelling "What is he thinking?", please consider how much it would cost your organization to own and run this training cluster. A back-of-the-envelope calculation for capital expenditure (servers, storage, GPUs, 100 Gbit/s networking equipment) says at least $1.5M. As far as operational expenditure is concerned, hosting costs won't be cheap, as each equivalent server will require 4-5 kW of power. That's enough to fill one rack at your typical hosting company, so even if high-density racks are available, you'll need several. Add bandwidth, cross connects, and so on: my gut feeling says it would cost about $15K per month (much more in...