Book Image

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

Book Image

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

Overview of this book

SQL Server Integration Services (SSIS) is a leading tool in the data warehouse industry - used for performing extraction, transformation, and load operations. This book is aligned with the most common methodology associated with SSIS known as Extract Transform and Load (ETL); ETL is responsible for the extraction of data from several sources, their cleansing, customization, and loading into a central repository normally called Data Warehouse or Data Mart.Microsoft SQL Server 2012 Integration Services: An Expert Cookbook covers all the aspects of SSIS 2012 with lots of real-world scenarios to help readers understand usages of SSIS in every environment. Written by two SQL Server MVPs who have in-depth knowledge of SSIS having worked with it for many years.This book starts by creating simple data transfer packages with wizards and illustrates how to create more complex data transfer packages, troubleshoot packages, make robust SSIS packages, and how to boost the performance of data consolidation with SSIS. It then covers data flow transformations and advanced transformations for data cleansing, fuzzy and term extraction in detail. The book then dives deep into making a dynamic package with the help of expressions and variables, and performance tuning and consideration.
Table of Contents (23 chapters)
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Getting familiar with Data Flow Task


While the Control Flow tab under the Package Designer is where the main workflow of the package is manipulated, the Data Flow tab introduced in this recipe is the place where data is transformed and moved between a source and destination. The Data Flow tab under the Package Designer is used to create or edit such transformations and movements for each Data Flow task which is placed in the Control Flow tab.

Getting ready

It's possible to have several Data Flows in the Control Flow, but the order of execution for those Data Flows should be planned carefully. To be familiar with the Data Flow task, this recipe introduces a simple but very common scenario of incorporating the content of an Excel file into a database, a SQL Server database for example.

  1. Open SQL Server Data Tools (SSDT).

  2. Create a new project and provide a name and location for it.

  3. Open the empty package.dtsx created by default and rename it to P01_DataFlowTask.dtsx.

How to do it...

Imagine a case where SSIS needs to periodically integrate data produced by an external system, and this data needs to be prepared to fit the destination requirements such as schema and data values exactly. To accomplish this task, we need to read data from an Excel worksheet, perform data conversions, verify whether data already exists at the destination, and finally insert into an SQL table at the destination.

  1. In the Package Designer select the Control Flow tab.

  2. Drag-and-drop a Data Flow task from the SSIS Toolbox to Control Flow.

  3. Open Data Flow for editing by double-clicking under the task or just right-click and select Edit.

  4. Make sure that the Data Flow tab is selected in the Package Designer. At this step Data Flow is naturally empty.

  5. Because we need to read some data from an Excel file, simply drag-and-drop the Excel Source component (under the Data Sources group) from SSIS Toolbox and place it into the Package Designer area. Note that SSIS Toolbox has different content listed; it has components in spite of tasks that exist when the Control Flow is selected in the Package Designer.

  6. Double-click on the Excel Source component and click on the New button to create a connection to the source Excel file.

  7. Set the Excel file's path to C:\SSIS\Ch01_Getting Start with SSIS\FILES\R04_NewCustomers.xlsx and click on OK.

  8. In the Excel Source Editor, set the name of the Excel sheet to Sheet1$.

  9. Select the Columns tab and choose the columns Firstname and Lastname, which will be used along the data flow.

  10. Click on OK to finish editing Excel Source.

  11. Drag-and-drop Data Conversion from SSIS Toolbox to the Package Designer.

  12. Select the column's FirstName and LastName, and change the Length for each to 50 characters.

  13. Drag-and-drop the Lookup component from SSIS Toolbox into the Package Designer.

  14. Maintain all the properties with default values and create a connection to an SQL destination database (AdventureWorksLT).

  15. In the list to select table or view, select the destination table 'SalesLT'.'Customer'.

  16. Map the source converted columns Copy of FirstName and Copy of LastName to the destination SQL columns.

  17. Click on OK to finish editing the lookup.

  18. Drag-and-drop the OLE DB Destination component from SSIS Toolbox into the Package Designer in order to load data into SQL.

  19. Link the output No Match Rows from Lookup to insert into destination only those records that don't yet exist at the destination.

  20. Edit the destination component and set the SQL connection and also the destination table; maintain all the default values for the remaining controls.

  21. Click on OK to finish editing the OLE DB Destination component .

  22. Run the package by pressing F5.

How it works...

This Data Flow reads some customer data (first name and last name) from an Excel file, applies some common transformations and inserts the data into an SQL table named SalesLT.Customer. Some transformations are usually applied to make source data fit the destination's requirements. In this example, data is converted and we verified whether the current data in the incoming rows already exists at the destination (the purpose is to avoid data duplication).

Detailed descriptions about each source (Transformation and Destination) used in this example will be explored in later chapters. In this recipe, we just need an overview of Data Flow.