Book Image

Microsoft Azure Machine Learning

By : Sumit Mund, Christina Storm
Book Image

Microsoft Azure Machine Learning

By: Sumit Mund, Christina Storm

Overview of this book

Table of Contents (21 chapters)
Microsoft Azure Machine Learning
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Data exploration and preparation


In your experiment, drag the Flight Delays Data sample dataset and click on the Visualize option to explore the dataset. You can find that some columns have lots of missing values. You can clean the missing data using a Clean Missing Data module by replacing it with MICE as the cleaning mode.

There are certain columns, such as DayOfWeek, OriginAirportID, and DestAirportID which contain continuous numbers; however, they are categorical variables. So, use the Metadata Editor module to set them as Categorical.

Feature selection

Before you start developing the model, it is important to select or generate a set of variables that have the most predictive power and remove any redundant and not so important features. In this case, all the data points are of the same year, so the year column is not required here. We are interested in predicting the delays before the journey starts, so the DepDel15 and DepDelay columns are not important. Again, both the ArrDelay and...