-
Book Overview & Buying
-
Table Of Contents
Getting Started with Talend Open Studio for Data Integration
By :
Database normalization is the process whereby a database schema is designed to reduce data duplication and redundancy. If a database is not designed with normalization principles in mind, it can:
Get overly large, due to duplicated data
Make data maintenance difficult or give rise to data integrity issues if the same data values reside in multiple tables
While we are not directly concerned with database schema design in this chapter, our next two examples look at processing operations borne from the same principles as database normalization, so readers who aren't familiar with the concepts may wish to read some introductory material first. For a good primer on database normalization, go to http://en.wikipedia.org/wiki/Database_normalization .
Our first example shows how we can normalize data. Suppose we have a data file that has two fields: product_id and categories. A product can belong to more than one category and the category values are...