Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

Time for action – generating the files with top scores by nesting jobs


Let's modify the job that updates the global examination file, so at the end it generates updated top scores files:

  1. Open the examinations job you created in the first tutorial of this chapter.

  2. After the last transformation job entry, add a job entry as Job. You will find it under the General category of entries.

  3. Double-click the Job job entry.

  4. Type ${Internal.Job.Filename.Directory}/top_scores_flow.kjb as Job filename.

  5. Click on OK.

  6. Save the job.

  7. Pick an examination that you have not yet appended to the global file—for example, exam5.txt.

  8. Press F9.

  9. In the Arguments grid, type the full path of the chosen file: c:/pdi_files/input/exam5.txt.

  10. Click on Launch.

  11. In the Job metrics tab of the Execution results window, you will see the following:

  12. Also the chosen file should have been added to the global file, and updated files with top scores should have been generated.

What just happened?

You modified the job that updates the global...