Book Image

Getting Started with Talend Open Studio for Data Integration

By : Jonathan Bowen
Book Image

Getting Started with Talend Open Studio for Data Integration

By: Jonathan Bowen

Overview of this book

Talend Open Studio for Data Integration (TOS) is an open source graphical development environment for creating custom integrations between systems. It comes with over 600 pre-built connectors that make it quick and easy to connect databases, transform files, load data, move, copy and rename files and connect individual components in order to define complex integration processes. "Getting Started with Talend Open Studio for Data Integration" illustrates common uses and scenarios in a simple, practical manner and, building on knowledge as the book progresses, works towards more complex integration solutions. TOS is a code generator and so does a lot of the "heavy lifting"ù for you. As such, it is a suitable tool for experienced developers and non-developers alike. You'll start by learning how to construct some common integrations tasks ñ transforming files and extracting data from a database, for example. These building blocks form a "toolkit"ù of techniques that you will learn how to apply in many different situations. By the end of the book, once complex integrations will appear easy and you will be your organization's integration expert! Best of all, TOS makes integrating systems fun!
Table of Contents (22 chapters)
Getting Started with Talend Open Studio for Data Integration
Credits
Foreword
Foreword
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

We've all been there. Your boss drops you an e-mail saying:

Good news, we've just bought system X, which is going to make our lives a lot easier. First though, we need to hook it up to system Y for daily product and inventory feeds and system Z to post the financials back for invoicing. Should be easy, right? It's going to be live in two months. Any problems, please let me know. Oh....if you can get some extracts for the data warehouse at the same time, that would be great too.

What to do? Well, you could ask your senior developer to code some integration jobs from scratch, but they might be hard to maintain, particularly if he/she left the company. In addition, you know he/she is working flat out on another important project. Alternatively, you could ask your boss if you can invest in a proprietary integration suite, with a legion of highly paid consultants. That will certainly do the job, but the budget, and timeline might not stretch to this.

Or you can take the new junior developer who joined your company a couple of weeks ago, dust off your business analyst and testing skills, and get the job done on time, on budget with Talend Open Studio for Data Integration.

Getting Started with Talend Open Studio for Data Integration is an introductory guide to solving this problem and many others like it.

What this book covers

Chapter 1, Knowing Talend Open Studio, introduces the reader to Talend Open Studio for Data Integration and what it can be used for. It also covers the installation of Talend Open Studio for Data Integration.

Chapter 2, Working with Talend Open Studio, introduces some common concepts the reader will come across when using Talend Open Studio for Data Integration, including creating a workspace to contain integration jobs, a tour of the Talend Open Studio for Data Integration interface, and use of metadata and schemas. We'll also build a simple "hello world" job.

Chapter 3, Transforming Files, gets into the detail of Talend Open Studio for Data Integration integrations and looks at using Talend Open Studio for Data Integration to transform files from one format to another.

Chapter 4, Working with Databases, looks at databases—how to get data out and how to get data in.

Chapter 5, Filtering, Sorting, and Other Processing Techniques, introduces common data operations: filtering, sorting, and aggregating.

Chapter 6, Managing Files, shows how to manage files during integration jobs. We'll look at renaming, moving, copying, and deleting files; how to timestamp a file; connecting to remote servers to FTP files; and zipping and unzipping files.

Chapter 7, Job Orchestration, will look at more complex integrations and how "one-shot" tasks can be combined to form multi-step jobs. We'll create subjobs and link them together using "if/then" logic. Integrations often produce temporary files, so we'll look at ways to clean up afterwards.

Chapter 8, Managing Jobs, covers the process of packaging, deploying, and scheduling jobs in a live environment.

Chapter 9, Global Variables and Contexts, looks at contexts and we explore how the same job can be used in different environments. We introduce dynamic variables, allowing our integration jobs to run flexibly, based on the current runtime information, rather than introducing complex, hardcoded routines.

Chapter 10, Worked Examples, brings together all of the knowledge from previous chapters in a series of worked examples. A real-life integration project is explored and developed to illustrate the use of Talend Open Studio for Data Integration "in the wild".

Appendix A, Installing Sample Jobs and Data, details how to obtain and use the sample data files required to follow the job development examples in the book. All of the jobs created throughout the book are also provided for reference.

Appendix B, Resources, highlights some resources and further reading to expand your knowledge of Talend Open Studio for Data Integration.

What you need for this book

The hardware and software requirements for this book are:

  • A computer running Windows, Linux, or Mac OS with Java installed

  • Talend Open Studio for Data Integration

  • A text file/XML editor

  • A MySQL database instance

Who this book is for

This book is for developers, business analysts, project managers, business intelligence specialists, system architects, and consultants who need to undertake integration projects. The book assumes a certain level of technical aptitude and readers should be comfortable with some of the following concepts and technologies:

  • Relational database management systems with some SQL (structured query language) experience

  • XML

  • Java

  • File Transfer Protocol (FTP)

  • Programming flow and logic

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "Create a file delimited metadata for the currencies.csv file."

A block of code is set as follows:

String datestamp=TalendDate.getDate("YYYYMMDD");

globalMap.put("dateStamp",datestamp);

Any command-line input or output is written as follows:

sh [file name].sh

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Go to the Debug Run tab and click on Traces Debug".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title through the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website, or added to any list of existing errata, under the Errata section of that title.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.