Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Combining different sources into a single dataset


In this chapter, you have been getting data from different kinds of sources. In all cases, the source of the data was unique; for example, a plain file or the result of a single query. However, what if we had more than one source, and we wanted to combine them and use them as a single dataset? In this section, you will learn how to deal with this very common situation.

Manipulating the metadata

Let's look at the first exercise again, where we read a file containing surveys. On that occasion, we read all of the information in the file. Now, suppose that we are interested in just a few fields: room_idroom_typeneighborhoodoverall_satisfaction, accommodates, and price. Also, we want to rename some fields, and we want them in a different order.

There is a very easy way to do this, as follows:

  1. Open the transformation created in the first exercise and save it under a different name. You can do so from Main Menu or Main Toolbar.
  2. From the Transform...