Book Image

Talend Open Studio Cookbook

By : Rick Barton
Book Image

Talend Open Studio Cookbook

By: Rick Barton

Overview of this book

Data integration is a key component of an organization's technical strategy, yet historically the tools have been very expensive. Talend Open Studio is the world's leading open source data integration product and has played a huge part in making open source data integration a popular choice for businesses worldwide.This book is a welcome addition to the small but growing library of Talend Open Studio resources. From working with schemas to creating and validating test data, to scheduling your Talend code, you will get acquainted with the various Talend database handling techniques. Each recipe is designed to provide the key learning point in a short, simple and effective manner.This comprehensive guide provides practical exercises that cover all areas of the Talend development lifecycle including development, testing, debugging and deployment. The book delivers design patterns, hints, tips, and advice in a series of short and focused exercises that can be approached as a reference for more seasoned developers or as a series of useful learning tutorials for the beginner.The book covers the basics in terms of schema usage and mappings, along with dedicated sections that will allow you to get more from tMap, files, databases and XML. Geared towards the whole lifecycle, the Talend Open Studio Cookbook shows readers great ways to handle everyday tasks, and provides an insight into all areas of a development cycle including coding, testing, and debugging of code to provide start-to-finish coverage of the product.
Table of Contents (21 chapters)
Talend Open Studio Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Common Type Conversions
Index

Before you begin


Before you begin the exercises in the book, it is worth becoming familiar with some of the key concepts and best practices.

Keep code changes small and test often

When developing using Talend, as with any other development tool, it is recommended to code in short bursts and test (run) frequently.

By keeping each change small, it is much easier to find where and what has caused problems during compilation and execution.

Chapter 10, Debugging, Logging, and Testing, is dedicated to debugging and logging; however, observing the preceding method will save time having to perform debugging steps that can sometimes take a long time.

Document your code

Talend sub-jobs have the ability to add titles, and every component in Talend has the option to add documentation for the component. Where you use Java, you should use the Java comment structures to document the code. Remember to use all these methods as you go along to ensure that your code is well documented.

Contexts and globalMap

context and globalMap are global areas used to store data that can be used by all components within a Talend job.

context variables are predefined prior to job execution in a context group, whereas globalMap variables are created on the fly at any point within a job.

Context variables

Context variables are used by Talend to store parameter information, and can be used:

  • To pass information into a job from the command line and/or a parent job

  • To manage values of parameters between environments

  • To store values within a job or set of jobs

Chapter 6, Managing Context Variables, is dedicated to the use and management of context variables within Talend

globalMap

globalMap is a very important construct within Talend, in that:

  • Almost every component will write information to globalMap once it completes execution (for example NB_LINE is the number of rows processed in a component).

  • Certain components, such as tFlowToIterate or tFileList, will store data in globalMap variables for use by downstream components.

  • Developers can read and write to globalMap to create global variables in an ad hoc fashion. The use of global variables can often be the best way to ensure code is simple and efficient.

Java

Talend is a Java code generator, so having a little Java knowledge can help when using Talend. There are many Java tutorials for beginners online, and a little time spent learning the basics will help speed up your understanding of Talend.

Other background knowledge

As a data integrator, you will be expected to understand many technologies and how to interface with them, and this book assumes a basic knowledge of many of the most frequent data sources and targets.

Chapter 7, Working with Databases, relates to using Talend with databases. We have chosen to use MySQL, because it is quick to install, simple to use, and readily available. Basic knowledge of SQL and MySQL will therefore be required to perform the exercises in this chapter.

Other chapters will also assume knowledge of csv files, MS Excel, XML, and web services.