Ganglia is an open source, scalable, distributed monitoring system for clusters. It was mainly used in high-performance computing systems and grids in the past, but due to the increasing size of virtualization infrastructure, it is a good candidate for our needs, as shown in the following screenshot:
It leverages widely used technologies, such as XML for data representation, XDR for data transport, and the RRD tool for data storage and visualization. It has been developed with a focus on on achieving very low per-node overheads and high concurrency. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
Many large corporations and universities are using Ganglia to monitor their infrastructure: Berkley University (the birthplace of Ganglia), Harvard, MIT, NASA, CERN, Sun, Cisco, Motorola, HP, Microsoft, Dell, Wikipedia (http://ganglia.wikimedia.org/), Twitter, Flickr, last.fm, and so on.
In this chapter...