Pentaho 3.2 Data Integration: Beginner's Guide

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.

Pentaho 3.2 Data Integration Beginner's Guide

Credits

Foreword

The Kettle Project

About the Author

About the Reviewers

Preface

Free Chapter

Getting Started with Pentaho Data Integration

Pentaho Data Integration and Pentaho BI Suite

Pentaho Data Integration

Installing PDI

Time for action – installing PDI

Launching the PDI graphical designer: Spoon

Time for action – starting and customizing Spoon

Time for action – creating a hello world transformation

Time for action – running and previewing the hello_world transformation

PDI element	Procedure for migrating from file to repository
Transformations or jobs	From File \| Import from an XML file, browse to locate the .`ktr`/.`kjb` file to import and open it. Once the file has been imported, you can save it into the repository as usual.
Database connections, partition schemas, slaves, and clusters	When importing from XML, a job or transformation that uses the database connection, the connection is imported as well. The same applies to partitions, slave servers, and clusters.

PDI element	Procedure for migrating from repository to file
Single transformation or job	Open the job or transformation, select File \| Export to an XML file, browse to the folder where you want to save the job or transformation, and save it. Once it has been exported, it will be available to work with under the file storage method or to import from another repository.
All transformations saved in a folder	In the Repository explorer, right-click the name of the folder and select Export transformations. You will be asked to select the directory where the folder along with all its subfolders and transformations will be exported to. If you right-click the name of the repository or the root folder in the transformation tree, you may export all the transformations.
All jobs saved in a folder	In the Repository explorer, right-click the name of the folder and select Export Jobs. You will be asked to select the directory where the folder along with all its subfolders and jobs will be exported to. If you right-click the name of the repository or the root folder in the job tree, you may export all the jobs.
Database connections, partition schemas, slaves and clusters	When exporting to XML a job or transformation that uses the database connection, the connection is exported as well (it's saved as part of the KTR/KJB file). The same applies to partitions, slave servers, and clusters.

Pentaho 3.2 Data Integration: Beginner's Guide

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Related Content you might be interested in

Current Title:

Pentaho 3.2 Data Integration: Beginner's Guide

Migrating from a file-based system to a repository-based system and vice-versa

Note