Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

Time for action – giving priority to Bouchard by using Append Stream


Suppose you want the Bouchard's row before the other rows. You can modify the transformation as follows:

  1. From the Flow category of steps, drag an Append Streams step to the canvas. Rearrange the steps and hops so the transformation looks like this:

  2. Edit the Append streams step and select as the Head hop the one belonging to the Bouchard's rows, and as the Tail hop the other. Doing this, you indicate toPDI how it has to order the streams.

  3. Do a preview on the Add sequence step. You should see this:

What just happened?

You changed the transformation to give priority to Bouchard's issues.

You made it by using the Append Streams step. By telling that the head hop was the one coming from the Bouchard's file, you got the expected order—first the rows with the tasks assigned to Bouchard, sorted by progress descending, and then the rows with the tasks assigned to other programmers, also sorted by progress descending.

Note

Whether you...