In this chapter, we have learned how to effectively monitor and manage host and VM failures for our infrastructure. We used the Ganglia monitoring system initially to gather statistics from all of our nodes and VMs, to extend it through third party or our own modules, and finally to visualize all the gathered data in the PHP web frontend.
When running a large OpenNebula deployment, we saw how it is possible to install and configure an alternative OpenNebula Information Manager that relies on Ganglia monitors instead of using the default Ruby probes executed through SSH.
Finally, we discovered how to configure a simple e-mail alerting script that could warn us if some particular Ganglia metric is out of our predefined ranges.
In the next chapter, we will see how it is possible with OpenNebula not to rely only on on-premises hosts and to use an external cloud provider to offload our resource to it.