Using datasets in Azure Machine Learning
In the previous sections of this chapter, we discussed how to get data into the cloud, store the data in a datastore, and connect the data via a datastore and dataset to an Azure Machine Learning workspace. We did all this effort of managing the data and data access centrally in order to use the data across all compute environments, either for experimentation, training, or inferencing. In this section, we will focus on how to create, explore, and access these datasets during training.
Once the data is managed as datasets, we can track the data that was used for each experimentation or training run in Azure Machine Learning. This will give us visibility of the data used for a specific training run and for the trained model – an essential step in creating reproducible end-to-end machine learning workflows.
Another benefit of organizing your data into datasets is that you can easily pass a managed dataset to your experimentation or...