Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

Overview of this book

Microsoft's SQL Server Analysis Services 2008 is an OLAP server that allows users to analyze business data quickly and easily. However, designing cubes in Analysis Services can be a complex task: it's all too easy to make mistakes early on in development that lead to serious problems when the cube is in production. Learning the best practices for cube design before you start your project will help you avoid these problems and ensure that your project is a success. This book offers practical advice on how to go about designing and building fast, scalable, and maintainable cubes that will meet your users' requirements and help make your Business Intelligence project a success. This book gives readers insight into the best practices for designing and building Microsoft Analysis Services 2008 cubes. It also provides details about server architecture, performance tuning, security, and administration of an Analysis Services solution. In this book, you will learn how to design and implement Analysis Services cubes. Starting from designing a data mart for Analysis Services, through the creation of dimensions and measure groups, to putting the cube into production, we'll explore the whole of the development lifecycle. This book is an invaluable guide for anyone who is planning to use Microsoft Analysis Services 2008 in a Business Intelligence project.

Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

Credits

About the Authors

About the Reviewers

Preface

Free Chapter

Designing the Data Warehouse for Analysis Services

The source database

Data modeling for Analysis Services

Physical database design for Analysis Services

Summary

Building Basic Dimensions and Cubes

Choosing an edition of Analysis Services

Setting up a new Analysis Services project

Creating data sources

Creating Data Source Views

Designing simple dimensions

Building a Simple Cube

Summary

Designing More Complex Dimensions

Grouping and Banding

Slowly Changing Dimensions

Junk dimensions

Ragged hierarchies

Summary

Measures and Measure Groups

Measures and aggregation

Measure groups

Dimension/measure group relationships

Summary

Adding Transactional Data such as Invoice Line and Sales Reason

Details about transactional data

Drillthrough

Many-to-many dimension relationships

Summary

Adding Calculations to the Cube

Different kinds of calculated members

Common calculations

Calculation dimensions

Named sets

Summary

Adding Currency Conversion

Introduction to currency conversion

Where to perform currency conversion

The Add Business Intelligence Wizard

Measure expressions

Summary

Query Performance Tuning

How Analysis Services processes queries

Performance tuning methodology

Designing for performance

Performance-specific design features

MDX calculation performance

Caching

Scale-up and scale-out

Summary

Securing the Cube

Sample security requirements

Analysis Services security features

Administrative security

Data security

Dynamic cell security

Accessing Analysis Services from outside a domain

Managing security

Security and query performance

Summary

Productionization

Making changes to a cube in production

Managing partitions

Managing processing

Analysis Services data directory maintenance

Backup

Copying databases between servers

Summary

Monitoring Cube Performance and Usage

Analysis Services and the operating system

Monitoring processing performance

Monitoring query performance

Monitoring usage

How to build a complete monitoring solution

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The source database

Analysis Services cubes are built on top of a database, but the real question is: what kind of database should this be?

We will try to answer this question by analyzing the different kinds of databases we will encounter in our search for the best source for our cube. In the process of doing so we are going to describe the basics of dimensional modeling, as well as some of the competing theories on how data warehouses should be designed.

The OLTP database

Typically, a BI solution is created when business users want to analyze, explore and report on their data in an easy and convenient way. The data itself may be composed of thousands, millions or even billions of rows, normally kept in a relational database built to perform a specific business purpose. We refer to this database as the On Line Transactional Processing (OLTP) database.

The OLTP database can be a legacy mainframe system, a CRM system, an ERP system, a general ledger system or any kind of database that a company has bought or built in order to manage their business.

Sometimes the OLTP may consist of simple flat files generated by processes running on a host. In such a case, the OLTP is not a real database but we can still turn it into one by importing the flat files into a SQL Server database for example. Therefore, regardless of the specific media used to store the OLTP, we will refer to it as a database.

Some of the most important and common characteristics of an OLTP system are:

The OLTP system is normally a complex piece of software that handles information and transactions; from our point of view, though, we can think of it simply as a database.
We do not normally communicate in any way with the application that manages and populates the data in the OLTP. Our job is that of exporting data from the OLTP, cleaning it, integrating it with data from other sources, and loading it into the data warehouse.
We cannot make any assumptions about the OLTP database's structure.
Somebody else has built the OLTP system and is probably currently maintaining it, so its structure may change over time. We do not usually have the option of changing anything in its structure anyway, so we have to take the OLTP system "as is" even if we believe that it could be made better.
The OLTP may well contain data that does not conform to the general rules of relational data modeling, like foreign keys and constraints.
Normally in the OLTP system, we will find historical data that is not correct. This is almost always the case. A system that runs for years very often has data that is incorrect and never will be correct.
When building our BI solution we'll have to clean and fix this data, but normally it would be too expensive and disruptive to do this for old data in the OLTP system itself.
In our experience, the OLTP system is very often poorly documented. Our first task is, therefore, that of creating good documentation for the system, validating data and checking it for any inconsistencies.

The OLTP database is not built to be easily queried, and is certainly not going to be designed with Analysis Services cubes in mind. Nevertheless, a very common question is: "do we really need to build a dimensionally modeled data mart as the source for an Analysis Services cube?" The answer is a definite "yes"!

As we'll see, the structure of a data mart is very different from the structure of an OLTP database and Analysis Services is built to work on data marts, not on generic OLTP databases. The changes that need to be made when moving data from the OLTP database to the final data mart structure should be carried out by specialized ETL software, like SQL Server Integration Services, and cannot simply be handled by Analysis Services in the Data Source View.

Moreover, the OLTP database needs to be efficient for OLTP queries. OLTP queries tend to be very fast on small chunks of data, in order to manage everyday work. If we run complex queries ranging over the whole OLTP database, as BI-style queries often do, we will create severe performance problems for the OLTP database. There are very rare situations in which data can flow directly from the OLTP through to Analysis Services but these are so specific that their description is outside the scope of this book.

Beware of the temptation to avoid building a data warehouse and data marts. Building an Analysis Services cube is a complex job that starts with getting the design of your data mart right. If we have a dimensional data mart, we have a database that holds dimension and fact tables where we can perform any kind of cleansing or calculation. If, on the other hand, we rely on the OLTP database, we might finish our first cube in less time but our data will be dirty, inconsistent and unreliable, and cube processing will be slow. In addition, we will not be able to create complex relational models to accommodate our users' analytical needs.

The data warehouse

We always have an OLTP system as the original source of our data but, when it comes to a data warehouse, it can be difficult to answer this apparently simple question: "Do we have a data warehouse?" The problem is not the answer, as every analyst will happily reply, "Yes, we do have a data warehouse"; the problem is in the meaning of the words "data warehouse".

There are at least two major approaches to data warehouse design and development and, consequently, to the definition of what a data warehouse is. They are described in the books of two leading authors:

Ralph Kimball: if we are building a Kimball data warehouse, we build fact tables and dimension tables structured as data marts. We will end up with a data warehouse composed of the sum of all the data marts.
Bill Inmon: if our choice is that of an Inmon data warehouse, then we design a (somewhat normalized), physical relational database that will hold the data warehouse. Afterwards, we produce departmental data marts with their star schemas populated from that relational database.

If this were a book about data warehouse methodology then we could write hundreds of pages about this topic but, luckily for the reader, the detailed differences between the Inmon and Kimball methodologies are out of the scope of this book. Readers can find out more about these methodologies in Building the Data Warehouse by Bill Inmon and The Data Warehouse Toolkit by Ralph Kimball. Both books should be present on any BI developer's bookshelf.

A picture is worth a thousand words when trying to describe the differences between the two approaches. In Kimball's bus architecture, data flows from the OLTP through to the data marts as follows:

In contrast, in Inmon's view, data coming from the OLTP systems needs to be stored in the enterprise data warehouse and, from there, goes to the data marts:

What is important is to understand is that the simple phrase "data warehouse" has different meanings in each of these methodologies.

We will adopt Inmon's meaning for the term "data warehouse". This is because in Inmon's methodology the data warehouse is a real database, while in Kimball's view the data warehouse is composed of integrated data marts. For the purposes of this chapter, though, what is really important is the difference between the data warehouse and the data mart, which should be the source for our cube.

The data mart

Whether you are using the Kimball or Inmon methodology, the front-end database just before the Analysis Services cube should be a data mart. A data mart is a database that is modeled according to the rules of Kimball's dimensional modeling methodology, and is composed of fact tables and dimension tables.

As a result we'll spend a lot of time discussing data mart structure in the rest of this chapter. However, you will not learn how to build and populate a data mart from reading this chapter; the books by Kimball and Inmon we've already cited do a much better job than we ever could.

Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

Overview of this book

Related Content you might be interested in

Current Title:

Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

The source database

The OLTP database

The data warehouse

The data mart