Book Image

Pentaho Data Integration Beginner's Guide - Second Edition - Second Edition

By : María Carina Roldán
Book Image

Pentaho Data Integration Beginner's Guide - Second Edition - Second Edition

By: María Carina Roldán

Overview of this book

Capturing, manipulating, cleansing, transferring, and loading data effectively are the prime requirements in every IT organization. Achieving these tasks require people devoted to developing extensive software programs, or investing in ETL or data integration tools that can simplify this work. Pentaho Data Integration is a full-featured open source ETL solution that allows you to meet these requirements. Pentaho Data Integration has an intuitive, graphical, drag-and-drop design environment and its ETL capabilities are powerful. However, getting started with Pentaho Data Integration can be difficult or confusing. "Pentaho Data Integration Beginner's Guide - Second Edition" provides the guidance needed to overcome that difficulty, covering all the possible key features of Pentaho Data Integration. "Pentaho Data Integration Beginner's Guide - Second Edition" starts with the installation of Pentaho Data Integration software and then moves on to cover all the key Pentaho Data Integration concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to do all kinds of data manipulation and work with plain files. Then, the book gives you a primer on databases and teaches you how to work with databases inside Pentaho Data Integration. Moreover, you will be introduced to data warehouse concepts and you will learn how to load data in a data warehouse. After that, you will learn to implement simple and complex processes. Finally, you will have the opportunity of applying and reinforcing all the learned concepts through the implementation of a simple datamart. With "Pentaho Data Integration Beginner's Guide - Second Edition", you will learn everything you need to know in order to meet your data manipulation requirements.
Table of Contents (26 chapters)
Pentaho Data Integration Beginner's Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Best Practices
Index

Migrating from file-based system to repository-based system and vice versa


No matter which storage system you are using, file-based or database repository, you may want to move your work to the other system just to try it or for taking advantage of the benefits of the other system, mentioned at the beginning of this appendix. The following table summarizes the procedure for doing that (migrating from a file-based configuration to a database repository):

PDI element

Procedure for migrating from file to repository

Transformations or jobs

From File | Import from an XML file, browse to locate the .ktr/.kjb file to import, and open it. Once the file has been imported, you can save it into the repository as usual.

Database connections, Partition schemas, Slaves, and Clusters

When importing from an XML file, job, or transformation that uses the database connection, the connection is imported as well.

The same applies to partitions, slave servers, and clusters.

There is also a command-line tool that will allow you to bulk import jobs and transformations into a repository. This is the Import tool, which you can find in the PDI installation directory as import.sh or import.bat.

Note

For examples and a full description of the use of the Import utility, you can visit the website http://wiki.pentaho.com/display/EAI/Import+User+Documentation.

The following table summarizes the procedure for migrating from database repository to file-based configuration:

PDI element

Procedure for migrating from repository to file

Single transformation or job

Open the job or transformation, select File | Export to an XML file, browse the disk to find the folder where you want to save the job or transformation and save it. Once it has been exported, it will be available to work with the under-the-file storage method, or to import from another repository.

All transformations saved in a folder

In the Repository explorer, right-click on the name of the folder and select Export transformations. You will be asked to select the directory where the folder and all its subfolders and transformations will be exported.

If you right-click on the name of the repository or the root folder in the transformation tree, you can export all the transformations.

All jobs saved in a folder

In the Repository explorer, right-click on the name of the folder and select Export Jobs. You will be asked to select the directory where the folder and all its subfolders and jobs will be exported.

If you right-click on the name of the repository or the root folder in the Job tree, you can export all the jobs.

Database connections, Partition schemas, Slaves and Clusters

When exporting to an XML file a job or transformation that uses the database connection, the connection is exported as well (it's saved as part of the .ktr/.kjb file). The same applies to partitions, slave servers, and clusters.

Note

You have to be logged into the repository in order to do any of the explained operations.

If you share a database connection, a partition schema, a slave server, or a cluster; it will be available to use from both a file and a repository as the shared elements are always saved in the shared.xml file in the Kettle home directory.