Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

Time for action – running and previewing the hello_world transformation


Let's do some testing and explore the results:

  1. Open the hello_world transformation.

  2. Edit the Generate Rows step, and change the limit from 10 to 1000 so that it generates 1,000 rows.

  3. Select the Logging tab window at the bottom of the screen.

  4. Click on Run.

  5. In the Log level drop-down list, select RowLevel detail.

  6. Click on Launch.

  7. You can see how the logging window shows every task in a very detailed way.

  8. Edit the Generate Rows step, and change the limit to 10,000 so that it generates 10,000 rows.

  9. Select the Step Metrics.

  10. Run the transformation.

  11. You can see how the numbers change as the rows travel through the steps.

What just happened?

You did some tests with the hello_world transformation and saw the results in the Execution Results window.

Previewing the results in the Execution Results window

The Execution Results window shows you what is happening while you preview or run a transformation.

The Logging tab shows the execution of your transformation, step by step. By default, the level of the logging detail is Basic but you can change it to see different levels of detail—from a minimal logging (level Minimal) to a very detailed one (level RowLevel).

The Step Metrics tab shows, for each step of the transformation, the executed operations and several status and information columns. You may be interested in the following columns:

Column

Description

Read

Contains the number of rows coming from previous steps

Written

Contains the number of rows leaving from this step toward the next

Input

Number of rows read from a file or table

Output

Number of rows written to a file or table

Errors

Errors in the execution. If there are errors, the whole row becomes red

Active

Tells the current status of the execution

In the example, you can see that the Generate Rows step writes rows, which then are read by the Dummy step. The Dummy step also writes the same rows, but in this case those go nowhere.

Pop quiz – PDI basics

For each of the following, decide if the sentence is true or false:

  1. There are several graphical tools in PDI, but Spoon is the most used.

  2. You can choose to save Transformations either in files or in a database.

  3. To run a Transformation, an executable file has to be generated from Spoon.

  4. The grid size option in the Look and Feel windows allows you to resize the work area.

  5. To create a transformation, you have to provide external data.