Pentaho 3.2 Data Integration: Beginner's Guide

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.

Pentaho 3.2 Data Integration Beginner's Guide

Credits

Foreword

The Kettle Project

About the Author

About the Reviewers

Preface

Free Chapter

Getting Started with Pentaho Data Integration

Pentaho Data Integration and Pentaho BI Suite

Pentaho Data Integration

Installing PDI

Time for action – installing PDI

Launching the PDI graphical designer: Spoon

Time for action – starting and customizing Spoon

Time for action – creating a hello world transformation

Time for action – running and previewing the hello_world transformation

1	a	iii	As mentioned in Chapter 9, despite being designed for building Type II SCDs, the Dimension L/U step can be used for building Type I SCDs as well. So, you have two options: Reuse the table (modifying the transformation that loads it) and get the surrogate key with a Dimension L/U step, or use another table without all fields specific to Type II dimensions and, for getting the surrogate key, use a DB Lookup step. In any case, you may reuse the `id_region` field, as it is a integer and serves in any situation.
	b	i
	c	iii
2	a	ii	The dimension table has to have one record by month. Therefore a different table is needed. For the key you could use a string with the format `yyyymm`. If you don't want to change the fact table, you may reuse the `dt` field leaving blank the last two characters, but it would be more appropriate to have a string field with just 6 positions. For getting the surrogate key you use a Select values step changing the metadata but this time you put as format the new mask `yyyymm`.
	b	ii
	c	i
3	a	ii	The `product_type` field is a string; it's not the proper field for referencing a surrogate key from a fact table, so you have to define a new field for that purpose. For getting the right key you use a Database lookup step.
	b	iii

Pentaho 3.2 Data Integration: Beginner's Guide

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Related Content you might be interested in

Current Title:

Pentaho 3.2 Data Integration: Beginner's Guide

Chapter 12

modifying a star model and loading the star with PDI