Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Loading a datamart

Aside from performing CRUD operations, PDI can be used to load datamarts. In order to demonstrate how PDI can help you to do so, we will populate a very simple datamart.


This is just a quick overview of the subject. We assume that you have at least a basic understanding of datawarehouse concepts (for example, dimensions, time dimensions, SCD, and fact tables).

The source data will be the Sports database. We will have a simple fact table and just three dimensions, as shown in the next diagram: 

Injuries datamart

The fact table will keep track of the injuries suffered by sport players. This fact table will have just one measure: the quantity of injuries.

The dimensions involved in this datamart will be as follows:

  • A time dimension, for the injury date
  • A body parts dimension, with the name of the injured body part
  • A person dimension, with the name of the injured player:

By loading this simple model, you will get a synopsis of the steps that PDI offers to build a datamart.