Book Image

Hands-On Automated Machine Learning

By : Sibanjan Das, Umit Mert Cakmak
Book Image

Hands-On Automated Machine Learning

By: Sibanjan Das, Umit Mert Cakmak

Overview of this book

AutoML is designed to automate parts of Machine Learning. Readily available AutoML tools are making data science practitioners’ work easy and are received well in the advanced analytics community. Automated Machine Learning covers the necessary foundation needed to create automated machine learning modules and helps you get up to speed with them in the most practical way possible. In this book, you’ll learn how to automate different tasks in the machine learning pipeline such as data preprocessing, feature selection, model training, model optimization, and much more. In addition to this, it demonstrates how you can use the available automation libraries, such as auto-sklearn and MLBox, and create and extend your own custom AutoML components for Machine Learning. By the end of this book, you will have a clearer understanding of the different aspects of automated Machine Learning, and you’ll be able to incorporate automation tasks using practical datasets. You can leverage your learning from this book to implement Machine Learning in your projects and get a step closer to winning various machine learning competitions.
Table of Contents (10 chapters)

Scope of machine learning

Machine learning and predictive analytics now help companies to focus on important areas, anticipating problems before they happen, reducing costs, and increasing revenue. This was a natural evolution after working with business intelligence (BI) solutions. BI applications were helping companies to make better decisions by monitoring their business processes in an organized manner, usually using dashboards that have various key performance indicators (KPIs) and performance metrics.

BI tools allow you to dig deeper into your organizations historical data, uncover trends, understand seasonality, find out irregular events, and so on. They can also provide real-time analytics where you can set up some warnings and alerts to manage particular events better. All of these things are quite useful, but today businesses need more than that. What does that mean? BI tools allow you to work with historical and near real-time data, but they do not provide you with answers about the future and don't answer questions such as the following:

  • Which machine in your production line is likely to fail?
  • Which of your customers will probably switch to your competitor?
  • Which company's stock price is going up tomorrow?

Businesses want to answer these kinds of questions nowadays, and it pushes them to search for suitable tools and technologies, which brings them to ML and predictive analytics.

You need to be careful though! When you are working with BI tools, you are more confident about the results that you are going to have, but when you are working with ML models, there's no such guarantee and the ground is slippery. There is definitely a huge buzz about AI and ML nowadays, and people are making outrageous claims about the capabilities of upcoming AI products. After all, computer scientists have long sought to create intelligent machines and occasionally suffered along the way due to unreal expectations. You can have a quick Google search about AI winter and learn more about that period. Although the advancements are beyond imagination and the field is moving quickly, you should navigate through the noise and see what the actual use cases are that ML really shines in and they can help you to create a value for your research or business in measurable terms.

In order to do that, you need to start with small pilot projects where:

  • You have a relatively easier decision making processes
  • You know your assumptions well
  • You know your data well

The key here is to have a well-defined project scope and steps that you are going to execute. Collaboration between different teams is really helpful in this process, that's why you should break silos inside your organization. Also, starting small doesn't mean that your vision should be small too. You should always think about scalability in the future and slowly gear up to harness the big data sources.

There are a variety of ML algorithms that you can experiment with, each designed to solve a specific problem with their own pros and cons. There is a growing body of research in this area and practitioners are coming up with new methods and pushing the limits of this field everyday. Hence, one might get easily lost with all the information available out there, especially when developing ML applications since there are many available tools and techniques for every stage of the model building process. To ease building ML models, you need to decompose a whole process into small pieces. Automated ML (AutoML) pipelines have many moving parts such as feature preprocessing, feature selection, model selection, and hyperparameter optimization. Each of these parts needs to be handled with special care to deliver successful projects.

You will hear a lot about ML concepts throughout the book, but let's step back and understand why you need to pay special attention to AutoML.

As you have more tools and technologies in your arsenal to attack your problems, having too many options usually becomes a problem itself and it requires considerable amount of time to research and understand the right approach for a given problem. When you are dealing with ML problems, it's a similar story. Building high-performing ML models contains several carefully-crafted small steps. Each step leads you to another and if you do not drop the balls on your way, you will have your ML pipeline functioning properly and generalize well when you deploy your pipeline in a production environment.

The number of steps involved in your pipeline could be large and the process could be really lengthy. At every step, there are many methods available, and, once you think about the possible number of different combinations, you will quickly realize that you need a systematic way of experimenting with all these components in your ML pipelines.

This brings us to the topic of AutoML!