Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

About the Reviewers

Jens Bleuel is a Senior Consultant and Engineer at Pentaho. He is also working as a project leader, trainer, and product specialist in the services and support department. Before he joined Pentaho in mid 2007, he was software developer and project leader, and his main business was Data Warehousing and the architecture along with designing and developing of user friendly tools. He studied business economics, was on a grammar school for electronics, and has been programming in a wide area of environments such as Assembler, C, Visual Basic, Delphi, .NET, and these days mainly in Java. His customer focus is on the wholesale market and consumer goods industries. Jens is 40 years old and lives with his wife and two boys in Mainz, Germany (near the nice Rhine river). In his spare time, he practices Tai-Chi, Qigong, and photography.

Roland Bouman has been working in the IT industry since 1998, mostly as a database and web application developer. He has also worked for MySQL AB (later Sun Microsystems) as certification developer and as curriculum developer.

Roland mainly focuses on open source web technology, databases, and Business Intelligence. He's an active member of the MySQL and Pentaho communities and can often be found speaking at worldwide conferences and events such as the MySQL user conference, the O'Reilly Open Source conference (OSCON), and at Pentaho community events.

Roland is co-author of the MySQL 5.1 Cluster DBA Certification Study Guide (Vervante, ISBN: 595352502) and Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL (Wiley, ISBN: 978-0-470-48432-6). He also writes on a regular basis for the Dutch Database Magazine (DBM).

Roland is @rolandbouman on Twitter and maintains a blog at http://rpbouman.blogspot.com/.

Matt Casters has been an independent senior BI consultant for almost two decades. In that period he led, designed, and implemented numerous data warehouses and BI solutions for large and small companies. In that capacity, he always had the need for ETL in some form or another. Almost out of pure necessity, he has been busy writing the ETL tool called Kettle (a.k.a. Pentaho Data Integration) for the past eight years. First, he developed the tool mostly on his own. Since the end of 2005 when Kettle was declared an open source technology, development took place with the help of a large community.

Since the Kettle project was acquired by Pentaho in early 2006, he has been Chief of Data Integration at Pentaho as the lead architect, head of development, and spokesperson for the Kettle community.

James Dixon is the Chief Geek and one of the co-founders of Pentaho Corporation—the leading commercial open source Business Intelligence company. He has worked in the business intelligence market since graduating in 1992 from Southampton University with a degree in Computer Science. He has served as Software Engineer, Development Manager, Engineering VP, and CTO at multiple business intelligence software companies. He regularly uses Pentaho Data Integration for internal projects and was involved in the architectural design of PDI V3.0.

He lives in Orlando, Florida, with his wife Tami and son Samuel.

Will Gorman is an Engineering Team Lead at Pentaho. He works on a variety of Pentaho's products, including Reporting, Analysis, Dashboards, Metadata, and the BI Server. Will started his career at GE Research and earned his Masters degree in Computer Science at Rensselaer Polytechnic Institute in Troy, New York. Will is the author of Pentaho Reporting 3.5 for Java Developers (ISBN: 3193), published by Packt Publishing.

Gretchen Moran is a graduate of University of Wisconsin – Stevens Point with a Bachelor's degree in Computer Information Systems with a minor in Data Communications. Gretchen began her career as a corporate data warehouse developer in the insurance industry and joined Arbor Software/Hyperion Solutions in 1999 as a commercial developer for the Hyperion Analyzer and Web Analytics team. Gretchen has been a key player with Pentaho Corporation since its inception in 2004. As Community Leader and core developer, Gretchen managed the explosive growth of Pentaho's open source community for her first 2 years with the company. Gretchen has contributed to many of the Pentaho projects, including the Pentaho BI Server, Pentaho Data Integration, Pentaho Metadata Editor, Pentaho Reporting, Pentaho Charting, and others.