Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
In this chapter, we transformed PDI datasets in several ways. First, we learned to transform data at row level by combining values, extracting pieces of a value, creating new fields, just to mention some of the different operations. For each particular operation, we learned how PDI offers different ways of doing the same thing. We encouraged you to experiment with different steps and adopt the ones that best fit your needs.

Then, we learned how to sort data and then aggregate it by adding values and calculating averages, among other common aggregate operations.

After having transformed the dataset, we learned how to filter unwanted data, either discarding it or redirecting it to alternative flows.

At the end of the chapter, we enriched the datasets by looking up external data—both in databases and in secondary streams—and adding it to our main flow.

Now that we have seen the main ways of transforming data coming in from different sources, we are ready to load that data into multiple...