Getting Started with Talend Open Studio for Data Integration

Getting Started with Talend Open Studio for Data Integration

By : Jonathan Bowen

Buy this Book

Getting Started with Talend Open Studio for Data Integration

By: Jonathan Bowen

Buy this Book

Overview of this book

Talend Open Studio for Data Integration (TOS) is an open source graphical development environment for creating custom integrations between systems. It comes with over 600 pre-built connectors that make it quick and easy to connect databases, transform files, load data, move, copy and rename files and connect individual components in order to define complex integration processes. "Getting Started with Talend Open Studio for Data Integration" illustrates common uses and scenarios in a simple, practical manner and, building on knowledge as the book progresses, works towards more complex integration solutions. TOS is a code generator and so does a lot of the "heavy lifting"ù for you. As such, it is a suitable tool for experienced developers and non-developers alike. You'll start by learning how to construct some common integrations tasks ñ transforming files and extracting data from a database, for example. These building blocks form a "toolkit"ù of techniques that you will learn how to apply in many different situations. By the end of the book, once complex integrations will appear easy and you will be your organization's integration expert! Best of all, TOS makes integrating systems fun!

Getting Started with Talend Open Studio for Data Integration

Credits

Foreword

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Knowing Talend Open Studio

What Talend Open Studio is

Installing Talend Open Studio

Other useful software

Sample jobs and data

Summary

Working with Talend Open Studio

Studio definitions

Starting the Studio

Tour of the Studio

Creating a new project

Creating an example job

Metadata

Summary

Transforming Files

Transforming XML to CSV

Transforming CSV to XML

Maps and expressions

Advanced XML output for complex XML structures

Working with multi-schema XML files

Enriching data with lookups

Extracting data from Excel files

Summary

Working with Databases

Database metadata

Extracting data from a database

Extracts from multiple tables

Writing data to a database

Database to database transfer

Modifying data in a database

Dynamic database lookup

Summary

Filtering, Sorting, and Other Processing Techniques

Filtering data

Sorting data

Aggregating data

Normalizing and denormalizing data

Extracting delimited fields

Find and replace

Sampling rows

Summary

Managing Files

Managing local files

FTP file operations

Summary

Job Orchestration

Run If

Iterating and looping

Duplicating and merging dataflows

Summary

Managing Jobs

Job versions

Exporting and importing jobs

Scheduling jobs

Summary

Global Variables and Contexts

Global variables

Contexts

Summary

Worked Examples

Product catalog

Product inventory data

Order file processing

Order status updates

Automating processes

Summary

Installing Sample Jobs and Data

Downloading job and data files

Resources

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Duplicating and merging dataflows

Our final section in this chapter will look at how we can duplicate and merge dataflows. Duplicating dataflows is particularly useful as it allows us to undertake different processing on the same data without having to read a file twice or query a database twice. Merging dataflows allows us to take data from different sources and rationalize it into a single dataflow.

Duplicating data

Open the job DuplicatingData from the Resources directory.

It starts with a simple database query. The dataflow from this is replicated using a tReplicate component and the same dataflow is subsequently passed to two processing streams. In this case the processing is very simple, a filter on each dataflow to filter for rows from region1 or region3 respectively. As noted previously, the processing on each dataflow could be completely different, for example, one flow being extracted to a CSV file while the other transformed and imported into a different database.

Tip

The tReplicate...

Getting Started with Talend Open Studio for Data Integration

By : Jonathan Bowen

Getting Started with Talend Open Studio for Data Integration

By: Jonathan Bowen

Overview of this book

Related Content you might be interested in

Current Title:

Getting Started with Talend Open Studio for Data Integration

Duplicating and merging dataflows

Duplicating data

Tip