MAD stands for Magnetic, Agile, and Deep; and lib denotes a library of scalable, parallel, and advanced in-database functions. The following figure shows the architecture of MADlib. The MADlib version used in the following example is v1.1:

Greenplum Database extensions for MADlib would need to be installed on the segment servers on DCA.
$ pgxn install madlib $ gppkg –i MADlib
The gppkg
utility installs the MADlib extensions on all the Greenplum segment servers in parallel.
MADlib based in-database analytics is benchmarkedagainst PL/R and is found to be superior in terms of scalability and performance, and MADlib is a truly parallelized process as compared to PL/R.
Let us now look at an example of MADlib function implementation for linear regression.
As we have learned in Chapter 3, Advanced Analytics – Paradigms, Tools, and Techniques, linear regression is a statistical technique that helps fit data into a linear equation.
The MADlib prediction function that we would...