Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

Examining and modifying the contents of a repository with the Repository explorer


The Repository explorer shows you a tree view of the repository to which you are connected. From the main Spoon menu, select Repository | Explore Repository and you get to the explorer window. The following screenshot shows you a sample Repository explorer screen:

In the tree you can see: Database connections, Partition schemas, Slave servers (slaves in the tree), Clusters, Transformations, Jobs, Users, and Profiles.

You can sort the different elements by name, user, change data, or description by just clicking on the appropriate column header: Name, User, Changed date, or Description. The sort is made within each flder.

The Repository explorer not only shows you these elements, but also allows you to create, modify, rename, and delete them. The following table summarizes the available actions:

Action

Procedure

Example

Create a new element

(any but transformations and jobs)

Double-click the name of the element at the top of the list.

Alternatively, right-click any element in its category and select the New option.

In order to create a new user, double-click the word Users at the top of the users list, or right-click any user and select New User.

Open an element for editing

Right-click it and select the Open option. Alternatively, double-click it.

In order to edit a job, double-click it, or right-click and select Open job.

Delete an element

Right-click it and select the Delete option.

In order to delete a user, right-click it and select Delete user.

Note

When you explore the repository, you don't see jobs and transformations mixed. Consequently, the whole folder tree appears twice—once under Transformations and then under Jobs.

In order to confirm your work, click on Commit changes. If you make a mistake, click on Rollback changes.