Book Image

Pentaho Data Integration 4 Cookbook

Book Image

Pentaho Data Integration 4 Cookbook

Overview of this book

Pentaho Data Integration (PDI, also called Kettle), one of the data integration tools leaders, is broadly used for all kind of data manipulation such as migrating data between applications or databases, exporting data from databases to flat files, data cleansing, and much more. Do you need quick solutions to the problems you face while using Kettle? Pentaho Data Integration 4 Cookbook explains Kettle features in detail through clear and practical recipes that you can quickly apply to your solutions. The recipes cover a broad range of topics including processing files, working with databases, understanding XML structures, integrating with Pentaho BI Suite, and more. Pentaho Data Integration 4 Cookbook shows you how to take advantage of all the aspects of Kettle through a set of practical recipes organized to find quick solutions to your needs. The initial chapters explain the details about working with databases, files, and XML structures. Then you will see different ways for searching data, executing and reusing jobs and transformations, and manipulating streams. Further, you will learn all the available options for integrating Kettle with other Pentaho tools. Pentaho Data Integration 4 Cookbook has plenty of recipes with easy step-by-step instructions to accomplish specific tasks. There are examples and code that are ready for adaptation to individual needs.
Table of Contents (17 chapters)
Pentaho Data Integration 4 Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

About the Reviewers

Jan Aertsen has worked in IT and decision support for the past 10 years. Since the beginning of his career he has specialized in data warehouse design and business intelligence projects. He has worked on numerous global data warehouse projects within the fashion industry, retail, banking and insurance, telco and utilities, logistics, automotive, and public sector.

Jan holds the degree of Commercial Engineer in international business affairs from the Catholic University of Leuven (Belgium) and extended his further knowledge in the field of business intelligence through a Masters in Artificial Intelligence.

In 1999 Jan started up the business intelligence activities at IOcore together with some of his colleagues, rapidly making this the most important revenue area of the Belgian affiliate. They quickly gained access to a range of customers as KPN Belgium, Orange (now Base), Mobistar, and other Belgian Telcos.

After this experience Jan joined Cap Gemini Ernst & Young in Italy and rapidly became one of their top BI project managers. After having managed some large BI projects (up to 1 million € projects) Jan decided to leave the company and pursue his own ambitions.

In 2002, he founded kJube as an independent platform to develop his ambitions in the world of business intelligence. Since then this has resulted in collaborations with numerous companies as Volvo, Fendi-LVMH, ING, MSC, Securex, SDWorx, Blinck, and Beate Uhse.

Over the years Jan has worked his way through every possible aspect of business intelligence from KPI and strategy definition over budgeting, tool selection, and software investments acquisition to project management and all implementation aspects with most of the available tools. He knows the business side as well as the IT side of the business intelligence, and therefore is one of the rare persons that are able to give you a sound, all-round, vendor-independent advice on business intelligence.

He continues to share his experiences in the field through his blog (blog.kjube.be) and can be contacted at .

Pedro Alves, is the founder of Webdetails. A Physicist by formation, serious video gamer, volleyball player, open source passionate, and dad of two lovely children.

Since his early professional years he has been responsible for Business Software development and his career led him to work as a Consultant in several Portuguese companies.

In 2008 he decided it was time to get his accumulated experience and share his knowledge about the Pentaho Business Intelligence platform on his own. He founded Webdetails and joined the Mozilla metrics team. Now he leads an international team of BI Consultants and keeps nurturing Webdetails as a world reference Pentaho BI solutions provider and community contributor. He is the Ctools (CDF, CDA, CDE, CBF, CST, CCC) architect and, on a daily basis, keeps developing and improving new components and features to extend and maximize Pentaho's capabilities.

Slawomir Chodnicki specializes in data warehousing and ETL, with a background in web development using various programming languages and frameworks. He has established his blog at http://type-exit.org to help fellow BI developers embrace the possibilities of PDI and other open source BI tools.

Paula Clemente was born in Sintra, Portugal, in 1983. Divided between the idea of spending her life caring about people and animals or spending quality time with computers, she started studying Computer Science at IST Engineering College—"the Portuguese MIT"—at a time where Internet Social Networking was a synonym of IRC. She graduated in 2008 after completing her Master thesis on Business Processes Management. Since then she is proudly working as a BI Consultant for Webdetails, a Portuguese company specialized in delivering Pentaho BI solutions that earned the Pentaho "Best Community Contributor 2011" award.

Samatar Hassan is an application developer focusing on data integration and business intelligence. He was involved in the Kettle project since the year it was open sourced. He tries to help the community by contributing in different ways; taking the translation effort for French language, participating in the forums, resolving bugs, and adding new features to the software.

He contributed to the "Pentaho Kettle Solutions" book edited by Wiley and written by Matt Casters, the founder of Kettle.

Nelson Sousa is a business intelligence consultant at Webdetails. He's part of the Metrics team at Mozilla where he helps develop and maintain Mozilla's Pentaho server and solution. He specializes in Pentaho dashboards using CDF, CDE, and CDA and also in PDI, processing vast amounts of information that are integrated daily in the various dashboards and reports that are part of the Metrics team day-to-day life.