Implementing adaptive model training in the cloud
First, we need to install the corresponding Python packages:
# installation pip3 -m pip install adaptdl
Once the package is successfully installed, we can use it for adaptive and distributed DNN training, as follows:
#import package import adaptdl # Initialize process group adaptdl.torch.init_process_group("MPI") # Wrap model to adaptdl version model = adaptdl.torch.AdaptiveDataParallel(model, optimizer) # Wrap data loader to adaptdl version dataloader = adaptdl.torch.AdaptiveDataLoader(dataset, batch_size = 128) # Start adaptive DNN training remaining_epoch = 200 epoch = 0 for epoch in adaptdl.torch.remaining_epochs_until(remaining_epochs) ... train(model) ...