QlikView: Advanced Data Visualization

QlikView: Advanced Data Visualization

By : Miguel Angel Garcia, Barry Harmsen, Stephen Redmond, Karl Pover

Buy this Book

QlikView: Advanced Data Visualization

By: Miguel Angel Garcia, Barry Harmsen, Stephen Redmond, Karl Pover

Buy this Book

Overview of this book

QlikView is one of the most flexible and powerful business intelligence platforms around, and if you want to transform data into insights, it is one of the best options you have at hand. Use this Learning Path, to explore the many features of QlikView to realize the potential of your data and present it as impactful and engaging visualizations. Each chapter in this Learning Path starts with an understanding of a business requirement and its associated data model and then helps you create insightful analysis and data visualizations around it. You will look at problems that you might encounter while visualizing complex data insights using QlikView, and learn how to troubleshoot these and other not-so-common errors. This Learning Path contains real-world examples from a variety of business domains, such as sales, finance, marketing, and human resources. With all the knowledge that you gain from this Learning Path, you will have all the experience you need to implement your next QlikView project like a pro. This Learning Path includes content from the following Packt products: • QlikView for Developers by Miguel Ángel García, Barry Harmsen • Mastering QlikView by Stephen Redmond • Mastering QlikView Data Visualization by Karl Pover

QlikView: Advanced Data Visualization

Contributors

Preface

Free Chapter

Performance Tuning and Scalability

Reviewing basic performance tuning techniques

Generating test data

Understanding how QlikView stores its data

Strategies to reduce the data size and improve performance

Testing chart performance for different load options

Optimizing chart calculation times

The QlikView calculation engine

Using Direct Discovery

Testing scalability with JMeter

Running an example execution

Summary

QlikView Data Modeling

Reviewing basic data modeling

Dimensional data modeling

Handling slowly changing dimensions

Dealing with multiple fact tables in one model

Drilling across with document chaining

Summary

Best Practices for Loading Data

Reviewing data loading concepts

Understanding why you should use an ETL approach

Using an ETL approach to create QVD data layers

Mastering loading techniques

Using QlikView Expressor for ETL

Summary

Advanced Expressions

Reviewing basic concepts

Understanding Dollar-sign Expansion

Using advanced Set Analysis

Calculating vertically

Summary

Advanced Scripting

Reviewing the basic concepts

Counting records

Loading data quickly

Applying variables and the Dollar-sign Expansion in the script

Using control structures

Examining advanced Table File Wizard options

Looking at data from different directions

Reusing code

Summary

What's New in QlikView 12?

Common QIX Engine

What is new in the Qlik product portfolio?

How do the products in the Qlik product portfolio fit together?

Summary

Styling Up

Design requirements

The Document Properties window

The Sheet Properties dialog

Setting the object properties

Working with listboxes

The Multi Box

The Current Selections Box

Adding a Bookmark Object

Aligning and resizing sheet objects

Creating and applying a default color map

Summary

Building Dashboards

User types

Applying the DAR principle to Airline Operations

Creating the Analysis sheet

Creating the new Dashboard sheet

Creating the Reports sheet

Other charts

Summary

Advanced Data Transformation

Data architecture

Loading data already stored in QlikView

Dealing with slowly changing dimensions

Ordering, peeking, and matching all at once

Incremental loads

Summary

Security

Hidden script

Section access

Document-level security

Sheet-level security

Summary

Data Visualization Strategy

Data exploration, visualization, and discovery

Data teams and roles

Agile development

QlikView Deployment Framework

Summary

Sales Perspective

Sales perspective data model

Customer stratification.

Customer churn

QlikView extensions and the cycle plot

Governance – design template

Summary

Financial Perspective

Financial perspective data model

Summary

Marketing Perspective

Marketing data model

Customer profiling

Summary

Working Capital Perspective

Working capital data model

Rotation and average days

Working capital breakdown

Customer stratification

Summary

Operations Perspective

Operations data model

On-Time and In-Full

Planning in QlikView with KliqPlan

Summary

Human Resources

Human resources data model

Personnel productivity

Personnel productivity breakdown

Personal behavior analysis

Summary

Fact Sheets

Customer fact sheet consolidated data model

Customer Fact sheet Agile design

Customer Fact sheet advanced components

Customizing the QlikView User Experience

Summary

Balanced Scorecard

The Balanced Scorecard method

The Balanced Scorecard consolidated data model

The Balanced Scorecard information dashboard design

Creating the filter pane bubble

Measuring success with XmR charts

Summary

Troubleshooting Analysis

Troubleshooting preparation and resources

Reporting issues

Common QlikView application issues

Summary

Mastering Qlik Sense Data Visualization

Qlik Sense and QlikView developers

Visualization extension examples for cross-selling

Plan to master Qlik Sense data visualization

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Testing chart performance for different load options

As well as information about memory use in each data table and symbol table, we can recall that the Memory Statistics option will also export information about charts—both memory use and calculation time. This means that we can create a chart, especially one with multiple dimensions and expressions, and see how long the chart takes to calculate for different scenarios.

Let's load the Order Header and Order Line data with the Calendar information loaded inline (as in the first part of the last example) in the following manner:

Order:
LOAD AutoNumber(OrderID, 'Order') As OrderID, 
     Floor(OrderDate) As DateID, 
     Year(OrderDate) As Year,
     Month(OrderDate) As Month,
     Day(OrderDate) As Day,
     Date(MonthStart(OrderDate), 'YYYY-MM') As YearMonth,
     AutoNumber(CustomerID, 'Customer') As CustomerID, 
     AutoNumber(EmployeeID, 'Employee') As EmployeeID, 
     Freight
FROM
[..\Scripts\OrderHeader.qvd]
(qvd);


OrderLine:
LOAD AutoNumber(OrderID, 'Order') As OrderID, 
     LineNo, 
     ProductID, 
     Quantity, 
     SalesPrice, 
     SalesCost, 
     LineValue, 
     LineCost
FROM
[..\Scripts\OrderLine.qvd]
(qvd);

Now we can add a chart to the document with several dimensions and expressions like this:

We have used YearMonth and CustomerID as dimensions. This is deliberate because these two fields will be in separate tables once we move the calendar fields into a separate table.

The expressions that we have used are shown in the following table:

Expression Label	Expression
`Sales $`	`Sum(LineValue)`
`Sales $ Color`	`ColorMix1(Sum(LineValue)/Max(total Aggr(Sum(LineValue), YearMonth, CustomerID)), White(), ARGB(255, 0, 128, 255))`
`Cost $`	`Sum(LineCost)`
`Margin $`	`Sum(LineValue)-Sum(LineCost)`
`Margin %`	`(Sum(LineValue)-Sum(LineCost))/Sum(LineValue)`
`Cum. Sales $`	`RangeSum(Above(Sum(LineValue),0,RowNo()))`
`# Orders`	`Count(DISTINCT OrderID)`
`Product 101`	`Sum(If(ProductID=101,1,0))`
`Product 102-106`	`Sum(If(Match(ProductID,102,103,104,105,106), 1, 0))`

Turning the cache off

The cache in QlikView is enormously important. Calculations and selections are cached as you work with a QlikView document. The next time you open a chart with the same selections, the chart will not be recalculated; you will get the cached answer instead. This really speeds up QlikView performance. Even within a chart, you might have multiple expressions using the same calculation (such as dividing two expressions by each other to obtain a ratio)—the results will make use of caching.

This caching is really useful for a working document, but a pain if we want to gather statistics on one or more charts. With the cache on, we need to close a document and the QlikView desktop, reopen the document in a new QlikView instance, and open the chart. To help us test the chart performance, it can therefore be a good idea to turn off the cache.

Note

Note that you need to be very careful with this dialog as you could break things in your QlikView installation. Turning off the cache is not recommended for normal use of the QlikView desktop as it can seriously interfere with the performance of QlikView. Turning off the cache to gather accurate statistics on chart performance is pretty much the only use case that one might ever come across for turning off the cache. There is a reason why it is a hidden setting!

Examining the chart calculation time for different scenarios

Now that the cache is turned off, we can open our chart and it will always calculate at the maximum time. We can then export the memory information as usual and load it into another copy of QlikView (here, the Class of Sheetobject is selected):

What we could do now is make some selections and save them as bookmarks. By closing the QlikView desktop client and then reopening it, and then opening the document and running through the bookmarks, we can export the memory file and create a calculation for Avg Calc Time. Because there is no cache involved, this should be a valid representation.

Now, we can comment out the inline calendar and create the Calendar table (as we did in a previous exercise):

Order:
LOAD AutoNumber(OrderID, 'Order') As OrderID, 
     Floor(OrderDate) As DateID, 
//     Year(OrderDate) As Year,
//     Month(OrderDate) As Month,
//     Day(OrderDate) As Day,
//     Date(MonthStart(OrderDate), 'YYYY-MM') As YearMonth,
     AutoNumber(CustomerID, 'Customer') As CustomerID, 
     AutoNumber(EmployeeID, 'Employee') As EmployeeID, 
     Freight
FROM
[..\Scripts\OrderHeader.qvd]
(qvd);


OrderLine:
//Left Join (Order)
LOAD AutoNumber(OrderID, 'Order') As OrderID, 
     LineNo, 
     ProductID, 
     Quantity, 
     SalesPrice, 
     SalesCost, 
     LineValue, 
     LineCost
FROM
[..\Scripts\OrderLine.qvd]
(qvd);

//exit Script;

Calendar:
Load Distinct
    DateID,
     Year(DateID) As Year,
     Month(DateID) As Month,
     Day(DateID) As Day,
     Date(MonthStart(DateID), 'YYYY-MM') As YearMonth
Resident
   Order;

For the dataset size that we are using, we should see no difference in calculation time between the two data structures. As previously established, the second option has a smaller in-memory data size, so that would always be the preferred option.

Optimizing performance by creating counter fields

For many years, it has been a well-established fact among QlikView consultants that a Count() function with a Distinct clause is a very expensive calculation. Over the years, I have heard that Count can be up to 1000 times more expensive than Sum. Actually, since about Version 9 of QlikView, this is no longer true, and the Count function is a lot more efficient.

Note

See Henric Cronström's blog entry at http://community.qlik.com/blogs/qlikviewdesignblog/2013/10/22/a-myth-about-countdistinct for more information.

Count is still a more expensive operation, and the recommended solution is to create a counter field in the table that you wish to count, which has a value of 1. You can then sum this counter field to get the count of rows. This field can also be useful in advanced expressions like Set Analysis.

Using the same dataset as in the previous example, if we create a chart using similar dimensions (YearMonth and CustomerID) and the same expression for Order # as done previously:

Count(Distinct OrderID)

This gives us a chart like the following:

After running through the same bookmarks that we created earlier, we get a set of results like the following:

So, now we modify the Order table load as follows:

Order:
LOAD AutoNumber(OrderID, 'Order') As OrderID, 
     1 As OrderCounter,
     Floor(OrderDate) As DateID, 
     AutoNumber(CustomerID, 'Customer') As CustomerID, 
     AutoNumber(EmployeeID, 'Employee') As EmployeeID, 
     Freight
FROM
[..\Scripts\OrderHeader.qvd]
(qvd);

Once we reload, we can modify the expression for Order # to the following:

Sum(OrderCounter)

We close down the document, reopen it, and run through the bookmarks again. This is an example result:

And yes, we do see that there is an improvement in calculation time—it appears to be a factor of about twice as fast.

The amount of additional memory needed for this field is actually minimal. In the way we have loaded it previously, the OrderCounter field will add only a small amount in the symbol table and will only increase the size of the data table by a very small amount—it may, in fact, appear not to increase it at all! The only increase is in the core system tables, and this is minor.

Note

Recalling that data tables are bit-stuffed but stored as bytes, adding a one-bit value like this to the data table may not actually increase the number of bytes needed to store the value. At worst, only one additional byte will be needed.

In fact, we can reduce this minor change even further by making the following change:

     ...
     Floor(1) As OrderCounter,
     ...

This forces the single value to be treated as a sequential integer (a sequence of one) and the value therefore isn't stored in the symbol table.

Optimizing performance by combining fact tables?

If we load all of our tables, the data structure may look something like the following:

In this format, we have two fact tables—Order and OrderLine. For the small dataset that we have, we won't see any issues here. As the dataset gets larger, it is suggested that it is better to have fewer tables and fewer joins between tables. In this case, between Product and Employee, there are three joins. The best practice is to have only one fact table containing all our key fields and associated facts (measures).

In this model, most of the facts are in the OrderLine table, but there are two facts in the Order table—OrderCounter and Freight. We need to think about what we do with them. There are two options:

Move the EmployeeID, DateID, and CustomerID fields from the Order table into the OrderLine table. Create a script based on an agreed business rule (for example, ratio of line Quantity) to apportion the Freight value across all of the line values. The OrderCounter field is more difficult to deal with, but we could take the option of using Count(Distinct OrderID) (knowing that it is less efficient) in the front end and disposing of the OrderCounter field.
This method is more in line with traditional data warehousing methods.
Move the EmployeeID, DateID, and CustomerID fields from the Order table into the OrderLine table. Leave the Order table as is, as an Order dimension table.
This is more of a QlikView way of doing things. It works very well too.

Although we might be great fans of dimensional modeling methods (see Chapter 2, QlikView Data Modeling), we should also be a big fan of pragmatism and using what works.

Let's see what happens if we go for option 2. The following is the addition to the script to move the key fields:

// Move DateID, CustomerID and EmployeeID to OrderLine
Join (OrderLine)
Load
   OrderID,
   DateID,
   CustomerID,
   EmployeeID
Resident
   Order;
   
Drop Fields DateID, CustomerID, EmployeeID From Order;

// Rename the OrderLine table
RENAME Table OrderLine to Fact;

So, how has that worked? The table structure now looks like the following:

Our expectation, as we have widened the biggest data table (OrderLine) and only narrowed a smaller table (Order), is that the total memory for the document will be increased. This is confirmed by taking memory snapshots before and after the change:

But have we improved the overall performance of the document?

To test this, we can create a new version of our original chart, except now using Customer instead of CustomerID and adding Product. This gives us fields (YearMonth, Customer, and Product) from across the dimension tables. If we use this new straight table to test the before and after state, the following is how the results might look:

Interestingly, the average calculation has reduced slightly. This is not unexpected as we have reduced the number of joins needed across data tables.

Optimizing your numbers

QlikView has a great feature in that it can sometimes default to storing numbers as Dual values—the number along with text representing the default presentation of that number. This text is derived either by applying the default formats during load, or by the developer applying formats using functions such as Num(), Date(), Money(), or TimeStamp(). If you do apply the format functions with a format string (as the second parameter to Num, Date, and so on), the number will be stored as a Dual. If you use Num without a format string, the number will usually be stored without the text.

Thinking about it, numbers that represent facts (measures) in our fact tables will rarely need to be displayed with their default formats. They are almost always only ever going to be displayed in an aggregation in a chart and that aggregated value will have its own format. The text part is therefore superfluous and can be removed if it is there.

Let's modify our script in the following manner:

Order:
LOAD AutoNumber(OrderID, 'Order') As OrderID, 
     Floor(1) As OrderCounter,
     Floor(OrderDate) As DateID, 
     AutoNumber(CustomerID, 'Customer') As CustomerID, 
     AutoNumber(EmployeeID, 'Employee') As EmployeeID, 
     Num(Freight) As Freight
FROM
[..\Scripts\OrderHeader.qvd]
(qvd);


OrderLine:
LOAD AutoNumber(OrderID, 'Order') As OrderID, 
     LineNo, 
     ProductID, 
     Num(Quantity) As Quantity, 
     Num(SalesPrice) As SalesPrice, 
     Num(SalesCost) As SalesCost, 
     Num(LineValue) As LineValue, 
     Num(LineCost) As LineCost
FROM
[..\Scripts\OrderLine.qvd]
(qvd);

The change in memory looks like the following:

We can see that there is a significant difference in the Freight field. The smaller SalesPrice field has also been reduced. However, the other numeric fields are not changed.

Some numbers have additional format strings and take up a lot of space, some don't. Looking at the numbers, we can see that the Freight value with the format string is taking up an average of over 18 bytes per value. When Num is applied, only 8 bytes are taken per value. Let's add an additional expression to the chart:

Expression label	Expression
`Avg. Bytes`	`Sum(Bytes)/Sum(Count)`

Now we have a quick indicator to see whether numeric values are storing unneeded text.

QlikView: Advanced Data Visualization

By : Miguel Angel Garcia, Barry Harmsen, Stephen Redmond, Karl Pover

QlikView: Advanced Data Visualization

By: Miguel Angel Garcia, Barry Harmsen, Stephen Redmond, Karl Pover

Overview of this book

Related Content you might be interested in

Current Title:

QlikView: Advanced Data Visualization

Testing chart performance for different load options

Turning the cache off

Note

Examining the chart calculation time for different scenarios

Optimizing performance by creating counter fields

Note

Note

Optimizing performance by combining fact tables?

Optimizing your numbers