Book Image

Microsoft Azure Machine Learning

By : Sumit Mund, Christina Storm
Book Image

Microsoft Azure Machine Learning

By: Sumit Mund, Christina Storm

Overview of this book

Table of Contents (21 chapters)
Microsoft Azure Machine Learning
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction to predictive analytics


Predictive analytics is a niche area of analytics that deals with making predictions of unknown events that may or may not be in future. One example of this would be to predict whether a flight will be delayed or not before the flight takes off. You should not misunderstand that predictive analytics only deals with future events. It can be any concerned event, for example, an event where you need to predict whether a given credit card transaction is a fraud or not when the transaction has already taken place. In this case, the event has already taken place. Similarly, If you are given some properties of soil, and you need to predict a certain other chemical property of soil, then you are actually predicting something that is present.

Predictive Analytics leverages tools and techniques from Mathematics, Statistics, Data Mining and Machine Learning plays a very important role in it. In a typical predictive analytics project, you usually go through different stages in an iterative manner, as depicted in the following figure;

Problem definition and scoping

In the beginning, you need to understand; what are the business needs and the solutions they are seeking? This may lead you to a solution that lies in predictive analytics. Then, you need to translate the business problem in an analytics problem, for example, the business might be interested in giving a boost to the catalog sales for the existing customers. So, your problem might get translated to predict the number of widgets a customer would buy if you know the demographic information about them, such as their age, gender, income, location, and so on, or the price of an item, given their purchase history of the past several years. While defining the problem, you also need to define the scope of the project; otherwise, it might end up in a never-ending process.

Data collection

The solution starts with data collection. In some cases, the data may already be there in enterprise storages or in the cloud, that you just have to utilize and in other cases, you need to collect the data from disparate sources. It may also require you to do some ETL (Extract, Transform, and Load) work as part of data collection.

Data exploration and preparation

After you have all the data you need, you can proceed to understand it fully. You do so by data exploration and visualization. This may also involve some statistical analysis.

Data in the real world is often messy. You should always check the data quality and how it fits for your purpose. You have to deal with missing values, improper data, and so on. Again, data may not be present in the proper format, as you would need it to make predictions. So, you may need some preprocessing to get the data in the desired shape. Often, people call it data wrangling. After this, you can either select or extract the exact features that lead you to the prediction.

Model development

After the data is prepared, you choose the algorithm and build a model to make a prediction. This is where machine learning algorithms come in handy. A subset of the prepared data is taken to train the model and then you can choose to test your model with another set or the rest of the prepared data to evaluate its performance. While evaluating the performance, you can try different algorithms and choose the one that performs the best.

Model deployment

If it is a one-off analysis, you may not bother deploying your trained model. However, often, the prediction made by the model might be used somewhere else. For example, for an e-commerce company, a prediction model might recommend products for a prospective customer visiting the website. In another example, after you have built a model to predict the sales volume for the year, different sales departments across different locations might need to use it to make the forecasts for their region. In such scenarios, you have to deploy your trained model as a web service or in some other type of production, so that others can consume it either by a custom application, Microsoft Excel, or a similar tool.

For most of the practical cases, these phases never remain in isolation and are always worked on in an iterative manner.

This book, with an overview of the different common options available for data exploration and preparation, focuses on model development and deployment. In fact, model development and deployment is the core offering of Azure Machine Learning with the limited options for data exploration and preparation. You can make use of other Azure services, such as HDInsight, Azure SQL Database, and so on, or programming languages outside it for the same.