In the previous chapter, we explained in detail what an anomaly detection is and how it can be implemented using auto-encoders. We proposed a semi-supervised approach for novelty detection. We introduced H2O and showed a couple of examples (MNIST digit recognition and ECG pulse signals) implemented on top of the framework and running in local mode. Those examples used a small dataset already cleaned and prepared to be used as proof-of-concept.
Real-world data and enterprise environments work very differently. In this chapter, we will leverage H2O and general common practices to build a scalable distributed system ready for deployment in production.
We will use as an example an intrusion detection system with the goal of detecting intrusions and attacks in a network environment.
We will raise a few practical and technical issues that you would probably face in building a data product for intrusion detection.
In particular, you...