Book Image

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

Book Image

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

Overview of this book

SQL Server Integration Services (SSIS) is a leading tool in the data warehouse industry - used for performing extraction, transformation, and load operations. This book is aligned with the most common methodology associated with SSIS known as Extract Transform and Load (ETL); ETL is responsible for the extraction of data from several sources, their cleansing, customization, and loading into a central repository normally called Data Warehouse or Data Mart.Microsoft SQL Server 2012 Integration Services: An Expert Cookbook covers all the aspects of SSIS 2012 with lots of real-world scenarios to help readers understand usages of SSIS in every environment. Written by two SQL Server MVPs who have in-depth knowledge of SSIS having worked with it for many years.This book starts by creating simple data transfer packages with wizards and illustrates how to create more complex data transfer packages, troubleshoot packages, make robust SSIS packages, and how to boost the performance of data consolidation with SSIS. It then covers data flow transformations and advanced transformations for data cleansing, fuzzy and term extraction in detail. The book then dives deep into making a dynamic package with the help of expressions and variables, and performance tuning and consideration.
Table of Contents (23 chapters)
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

DQS Cleansing Transformation—Cleansing Data


Data Cleansing is one of the most important parts in every data transfer scenario. There are many scenarios where the source of data is not well structured, and the source is not consistent. For example, Microsoft is not spelled the same in all data sources, in one of them it is "Micsoft", in another case it is "Micro soft" and in some cases "Microsoft". Data Cleansing means maintaining the consistency of data.

SQL Server 2012 comes with a new service, which is named DQS. DQS stands for Data Quality Services. DQS is one of the services that can be installed and can listen to requests. You can create knowledge bases in DQS with a tool named DQS Client, and then use SSIS DQS Cleansing Component to check matching data with the knowledge bases and standardize them or report their status.

DQS itself is outside the scope of this book, but we will take a quick look at how to install and use DQS. Lastly, we will run a sample to apply DQS Cleansing on a data...