Book Image

MDX with SSAS 2012 Cookbook - Second Edition

Book Image

MDX with SSAS 2012 Cookbook - Second Edition

Overview of this book

MDX is the BI industry standard for multidimensional calculations and queries. Proficiency with this language is essential for the realization of your Analysis Services' full potential. MDX is an elegant and powerful language, and also has a steep learning curve.SQL Server 2012 Analysis Services has introduced a new BISM tabular model and a new formula language, Data Analysis Expressions (DAX). However, for the multi-dimensional model, MDX is still the only query and expression language. For many product developers and report developers, MDX is the preferred language for both the tabular model and multi-dimensional model. MDX with SSAS 2012 Cookbook is a must-have book for anyone who wants to be proficient in the MDX language and to enhance their business intelligence solutions.MDX with SSAS 2012 Cookbook is packed with immediately usable, practical solutions. It starts with elementary techniques that lay the foundation for designing advanced MDX calculations and queries. The discussions after each solution will provide you with a solid foundation and best practices. It covers a broad range of real-world topics and solutions and provides you with learning materials to become proficient in the language.This book will guide you through the hands-on and practical MDX solutions, best practices, and many intricacies that hide within the MDX calculations and queries. We will start by working with sets, creating time-aware, context-aware calculations, and business analytics solutions, through to the techniques of enhancing the cube design when MDX is not enough. We will then move on to capturing MDX generated by SSAS front-ends and using SSAS stored procedures, and we will explore the whole range of MDX solutions for real-world BI projects.  
Table of Contents (16 chapters)
MDX with SSAS 2012 Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Optimizing MDX queries using the NonEmpty() function


The NonEmpty() function is a very powerful MDX function. It is primarily used to improve query performance by reducing sets before the result is returned.

Both Customer and Date dimensions are relatively large in the Adventure Works DW 2012 database. Putting the cross product of these two dimensions on the query axis can take a long time. In this recipe, we'll show how the NonEmpty() function can be used on the Customer and Date dimensions to improve the query performance.

Getting ready

Start a new query in SSMS and make sure that you're working on the Adventure Works DW 2012 database. Then write the following query and execute it:

SELECT 
    { [Measures].[Internet Sales Amount] } ON 0,
    NON EMPTY
    Filter(
            { [Customer].[Customer].[Customer].MEMBERS } *
            { [Date].[Date].[Date].MEMBERS },
            [Measures].[Internet Sales Amount] > 1000
           ) ON 1
FROM
   [Adventure Works]

The query shows the sales per customer and dates of their purchases, and isolates only those combinations where the purchase was over 1000 USD.

On a typical server, it will take more than a minute before the query will return the results.

Now let's see how to improve the execution time by using the NonEmpty() function.

How to do it…

Follow these steps to improve the query performance by adding the NonEmpty() function:

  1. Wrap NonEmpty() around the cross join of customers and dates so that it becomes the first argument of that function.

  2. Use the measure on columns as the second argument of that function.

  3. This is what the MDX query should look like:

    SELECT 
        { [Measures].[Internet Sales Amount] } ON 0,
    NON EMPTY
        Filter(
          NonEmpty(
                    { [Customer].[Customer].[Customer].MEMBERS } *
                    { [Date].[Date].[Date].MEMBERS },
                    { [Measures].[Internet Sales Amount] }
                   ),
          [Measures].[Internet Sales Amount] > 1000
               ) ON 1
    FROM 
       [Adventure Works]
  4. Execute that query and observe the results as well as the time required for execution. The query returned the same results, only much faster, right?

How it works…

Both the Customer and Date dimensions are medium-sized dimensions. The cross product of these two dimensions contains several million combinations. We know that typically, the cube space is sparse; therefore, many of these combinations are indeed empty. The Filter() operation is not optimized to work in block mode, which means a lot of calculations will have to be performed by the engine to evaluate the set on rows, whether the combinations are empty or not.

Fortunately, the NonEmpty() function exists. This function can be used to reduce any set, especially multidimensional sets that are the result of a cross join operation. It removes the empty combinations of the two sets before the engine starts to evaluate the sets on rows. A reduced set has fewer cells to be calculated, and therefore the query runs much faster.

There's more…

Regardless of the benefits that were shown in this recipe, NonEmpty() should be used with caution. Here are some good practices regarding the NonEmpty() function:

  • Use it with sets, such as named sets and axes.

  • Use it in the functions which are not optimized to work in block mode, such as with the Filter() function.

  • Avoid using it in aggregate functions such as Sum().

  • Avoid using it in other MDX set functions that are optimized to work in block mode. The use of NonEmpty() inside optimized functions will prevent them from evaluating the set in block mode. This is because the set will not be compact once it passes the NonEmpty() function. The function will break it into many small non-empty chunks, and each of these chunks will have to be evaluated separately. This will inevitably increase the duration of the query. In such cases, it is better to leave the original set intact, no matter its size. The engine will know how to run over it in optimized mode.

NonEmpty() versus NON EMPTY

Both the NonEmpty() function and the NON EMPTY keyword can reduce sets, but they do it in a different way.

The NON EMPTY keyword removes empty rows, columns, or both, depending on the axis on which that keyword is used in the query. Therefore, the NON EMPTY operator tries to push the evaluation of cells to an early stage whenever possible. This way the set on axis becomes already reduced and the final result is faster.

Take a look at the initial query in this recipe, remove the Filter() function, run the query, and notice how quickly the results come, although the multidimensional set again counts millions of tuples. The trick is that the NON EMPTY operator uses the set on the opposite axis, the columns, to reduce the set on rows. Therefore, it can be said that NON EMPTY is highly dependent on members on axes and their values in columns and rows.

Contrary to the NON EMPTY operator found only on axes, the NonEmpty() function can be used anywhere in the query.

The NonEmpty() function removes all the members from its first set, where the value of one or more measures in the second set is empty. If no measure is specified, the function is evaluated in the context of the current member.

In other words, the NonEmpty() function is highly dependent on members in the second set, the slicer, or the current coordinate, in general.

Common mistakes and useful tips

If a second set in the NonEmpty() function is not provided, the expression is evaluated in the context of the current measure in the moment of evaluation, and current members of attribute hierarchies, also in the time of evaluation. In other words, if you're defining a calculated measure and you forget to include a measure in the second set, the expression is evaluated for that same measure which leads to null, a default initial value of every measure. If you're simply evaluating the set on the axis, it will be evaluated in the context of the current measure, the default measure in the cube or the one provided in the slicer. Again, this is perhaps not something you expected. In order to prevent these problems, always include a measure in the second set.

NonEmpty() reduces sets, just like a few other functions, namely Filter() and Existing() do. But what's special about NonEmpty() is that it reduces sets extremely efficiently and quickly. Because of that, there are some rules about where to position NonEmpty() in calculations made by the composition of MDX functions (one function wrapping the other). If we're trying to detect multi-select, that is, multiple members in the slicer, NonEmpty() should go inside with the EXISTING function/keyword outside. The reason is that although they both shrink sets efficiently, NonEmpty() works great if the set is intact. EXISTING is not affected by the order of members or compactness of the set. Therefore, NonEmpty() should be applied earlier.

You may get System.OutOfMemory errors if you use the CrossJoin() operation on many large hierarchies because the cross join generates a Cartesian product of those hierarchies. In that case, consider using NonEmpty() to reduce the space to a smaller subcube. Also, don't forget to group the hierarchies by their dimension inside the cross join.