Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

Time for action – creating a hop with the mouse-over assistance


You already know several ways to create a hop between two job entries or two steps. Now you will learn a new way:

  1. Create a job and drag two job entries to the canvas. Name the entries A and B.

  2. Position the mouse cursor over the entry named A and wait until a tiny toolbar shows up below the entry icon as shown:

  3. Click on the output connector (the last icon in the toolbar), and drag toward the entry named B. A grayed hop is displayed.

  4. When the mouse cursor is over the B entry, release the mouse button. A hop is created from the A entry to the B entry.

What just happened?

You created a hop between two job entries by using the mouse-over assistance—a feature incorporated in PDI 4.

Using the mouse-over assistance toolbar

When you position the mouse cursor over a step in a transformation or a job entry in a job,a tiny toolbar shows up to assist you. The following diagram depicts its options:

The following table explains each button in this toolbar:

Button

Description

Edit

Equivalent to double-clicking the job entry/step to edit it.

Menu

Equivalent to right-clicking the job entry/step to bring up the contextual menu.

Inputconnector

Assistant for creating hops leaving from this job entry/step. If the job entry/step doesn't accept any input (that is, START entry job or Generate Rows step), the input connector is disabled.

Outputconnector

Assistant for creating hops directed toward this job entry/step. It's used as shown in the tutorial, but the direction of the created hop is the opposite.

In the tutorial, you created a simple hop between two job entries. You can create hops between steps in the same way. In this case, depending on the kind of source step, you might be prompted for the kind of hop to create. For example, when leaving a Filter rows step, you will be asked if the destination step is where you'll send the "true" data, or where you will send the "false" data, or if this is the main output of the step.

Experiencing the sniff-testing feature

The sniff-testing feature allows you to see the rows that are coming into or out of a step in real time. While a transformation is running, right-click a step, select Sniff test during execution | Sniff test output rows. A window appears showing you the output data as it is being processed. In the same way, you can select Sniff test during execution | Sniff test input rows to see the incoming rows.

Note

Note that the sniff-testing feature slows down the transformation and its use is recommended just for debugging purposes.

Experiencing the job drill-down feature

In Chapters 10 and 11, you learned how to nest jobs and transformations. You even learned how to create subtransformations. Whichever the case, when you ran the main job or transformation, there was a single log tab showing the log for the main and all nested jobs and transformations.

In PDI 4.0, when a job entry is running, you can drill-down into that. Drilling down means opening that entry and seeing what's going on inside that job or transformation. In a separate window, you'll see both the step metrics and the log. If there are more nested transformations or jobs, you can continue drilling down. You can go even further into a running subtransformation. In any of these jobs or transformations, you may sniff test as well, as described above.

Drilling down is useful, for example, to understand why your jobs or transformations don't behave as expected or to find out where a performance problem is.

You can see the job drill-down and sniff-testing in action in two videos made by Matt Casters, Kettle chief leader and author of these features at: http://www.ibridge.be/?p=179.

Experiencing even more visual changes

Besides the features that we have just seen, there are some other UI improvements worth mentioning:

  • Enhanced notes editor: Now you can apply different fonts and colors to the notes you create in Spoon.

  • Color-coded logs: Now it is easier to read a log, as different colors allow you to quickly identify different kinds of log messages.

  • Revamped Repository explorer: The Repository explorer has been completely redesigned, making this a major UI improvement in Kettle 4.0.