Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Looking up for data

In all the transformations that we have created so far, we had single streams of data. We could, however, create more than one stream, with data coming from different sources. The streams can eventually be merged together—as was the case when we merged data coming from different files in the previous chapters—or they can also be used for looking up data, as we will learn in this section.

Looking for data in a secondary stream

Looking for data in a secondary stream is a common requirement when the data you need comes from a source that is different from your main data—for example, if your data comes from a database, and you need to look up related data in an XML file. In this section, you will learn how to implement this kind of lookup through a simple exercise: We will have a list of European cities, and we will look for their cost of living indexes that are located in a different source. To do this, go through the following steps:


For this exercise, we will use a file...