Book Image

Practical Automated Machine Learning Using H2O.ai

By : Salil Ajgaonkar
Book Image

Practical Automated Machine Learning Using H2O.ai

By: Salil Ajgaonkar

Overview of this book

With the huge amount of data being generated over the internet and the benefits that Machine Learning (ML) predictions bring to businesses, ML implementation has become a low-hanging fruit that everyone is striving for. The complex mathematics behind it, however, can be discouraging for a lot of users. This is where H2O comes in – it automates various repetitive steps, and this encapsulation helps developers focus on results rather than handling complexities. You’ll begin by understanding how H2O’s AutoML simplifies the implementation of ML by providing a simple, easy-to-use interface to train and use ML models. Next, you’ll see how AutoML automates the entire process of training multiple models, optimizing their hyperparameters, as well as explaining their performance. As you advance, you’ll find out how to leverage a Plain Old Java Object (POJO) and Model Object, Optimized (MOJO) to deploy your models to production. Throughout this book, you’ll take a hands-on approach to implementation using H2O that’ll enable you to set up your ML systems in no time. By the end of this H2O book, you’ll be able to train and use your ML models using H2O AutoML, right from experimentation all the way to production without a single need to understand complex statistics or data science.
Table of Contents (19 chapters)
1
Part 1 H2O AutoML Basics
4
Part 2 H2O AutoML Deep Dive
10
Part 3 H2O AutoML Advanced Implementation and Productization

Minimum system requirements to use H2O AutoML

H2O is very easy to install, but certain minimum standard requirements need to be met for it to run smoothly and efficiently. The following are some of the minimum requirements needed by H2O in terms of hardware capabilities, along with other software support:

  • The minimum hardware required by H2O is as follows:
    • Memory: H2O runs on an in-memory architecture, so it is limited by the physical memory of the system that uses it. Thus, to be able to process huge chunks of data, the more memory the system, has the better.
    • Central Processing Unit (CPU): By default, H2O will use the maximum available CPUs of the system. However, at a minimum, it will need 4 CPUs.
    • Graphical Processing Unit (GPU): GPU support is only available for XGBoost models in AutoML if the GPUs are NVIDIA GPUs (GPU Cloud, DGX Station, DGX-1, or DGX-2) or if it is a CUDA 8 GPU.
  • The operating systems that support H2O are as follows:
    • Ubuntu 12.04
    • OS X 10.9 or later
    • Windows 7 or later
    • CentOS 6 or later
  • The programming languages that support H2O are as follows:
    • Java: Java is mandatory for H2O. H2O requires a 64-bit JDK to build H2O and a 64-bit JRE to run its binary:
      • Java versions supported: Java SE 15, 14, 13, 12, 11, 10, 9, and 8
    • Other Languages: The following languages are only required if H2O is being run in those environments:
      • Python 2.7.x, 3.5.x, or 3.6.x
      • Scala 2.10 or later
      • R version 3 or later
  • Additional requirements: The following requirements are only needed if H2O is being run in these environments:
    • Hadoop: Cloudera CDH 5.4 or later, Hortonworks HDP 2.2 or later, MapR 4.0 or later, or IBM Open Platform 4.2
    • Conda: 2.7, 3.5, or 3.6
    • Spark: Version 2.1, 2.2, or 2.3

Once we have a system that meets the minimum requirements, we need to focus on H2O’s functional dependencies on other software. H2O has only one dependency and that is Java. Let’s see why Java is important for H2O and how we can download and install the correct supported Java version.