Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

Migrating from a file-based system to a repository-based system and vice-versa


No matter which storage system you are using, file based or repository based, you may want to move your work to the other system. The following tables summarize the procedure for doing that:

Migrating from file-based configuration to repository-based configuration:

PDI element

Procedure for migrating from file to repository

Transformations or jobs

From File | Import from an XML file, browse to locate the .ktr/.kjb file to import and open it. Once the file has been imported, you can save it into the repository as usual.

Database connections, partition schemas, slaves, and clusters

When importing from XML, a job or transformation that uses the database connection, the connection is imported as well. The same applies to partitions, slave servers, and clusters.

Migrating from file-based configuration to repository-based configuration:

PDI element

Procedure for migrating from repository to file

Single transformation or job

Open the job or transformation, select File | Export to an XML file, browse to the folder where you want to save the job or transformation, and save it. Once it has been exported, it will be available to work with under the file storage method or to import from another repository.

All transformations saved in a folder

In the Repository explorer, right-click the name of the folder and select Export transformations. You will be asked to select the directory where the folder along with all its subfolders and transformations will be exported to.

If you right-click the name of the repository or the root folder in the transformation tree, you may export all the transformations.

All jobs saved in a folder

In the Repository explorer, right-click the name of the folder and select Export Jobs. You will be asked to select the directory where the folder along with all its subfolders and jobs will be exported to.

If you right-click the name of the repository or the root folder in the job tree, you may export all the jobs.

Database connections, partition schemas, slaves and clusters

When exporting to XML a job or transformation that uses the database connection, the connection is exported as well (it's saved as part of the KTR/KJB file). The same applies to partitions, slave servers, and clusters.

Note

You have to be logged into the repository in order to perform any of the explained operations.

If you share a database connection, a partition schema, a slave server, or a cluster, it will be available for using both from a file and from any repository, as the shared elements are always saved in the shared.xml file in the Kettle home directory.