Qlik Sense: Advanced Data Visualization for Your Organization

By : Henric Cronström, Ferran Garcia Pagans, Neeraj Kharpate, James Richardson, Philip Hand

Qlik Sense: Advanced Data Visualization for Your Organization

By: Henric Cronström, Ferran Garcia Pagans, Neeraj Kharpate, James Richardson, Philip Hand

Overview of this book

Qlik Sense is powerful and creative visual analytics software that allows users to discover data, explore it, and dig out meaningful insights in order to make a profit and make decisions for your business. This course begins by introducing you to the features and functions of the most modern edition of Qlik Sense so you get to grips with the application. The course will teach you how to administer the data architecture in Qlik Sense, enabling you to customize your own Qlik Sense application for your business intelligence needs. It also contains numerous recipes to help you overcome challenging situations while creating fully featured desktop applications in Qlik Sense. It explains how to combine Rattle and Qlik Sense Desktop to apply predictive analytics to your data to develop real-world interactive data applications. The course includes premium content from three of our most popular books: [*] Learning Qlik Sense: The Official Guide Second Edition [*] Qlik Sense Cookbook [*] Predictive Analytics using Rattle and Qlik Sense On completion of this course, you will be self-sufficient in improving your data analysis and will know how to apply predictive analytics to your datasets. Through this course, you will be able to create predictive models and data applications, allowing you to explore your data insights much deeper.

Preface

What this learning path covers

What you need for this learning path

Who this learning path is for

Reader feedback

Customer support

Free Chapter

1. Module 1

1. Qlik Sense® and Data Discovery

2. Overview of a Qlik Sense® Application's Life Cycle

3. Empowering Next Generation Data Discovery Consumers

4. Contributing to Data Discovery

5. Authoring Engaging Applications

6. Building Qlik Sense® Data Models

7. Qlik Sense® Apps in the Cloud

8. Extending the Qlik® Analytic Platform

9. Administering Qlik Sense®

10. Sales Discovery

11. Human Resource Discovery

12. Travel Expense Discovery

13. Demographic Data Discovery

2. Module 2

1. Getting Started with the Data

2. Visualizations

3. Scripting

4. Managing Apps and User Interface

5. Useful Functions

6. Set Analysis

7. Extensions in Qlik Sense®

8. What's New in Version 2.1.1?

A. Appendix

3. Module 3

1. Getting Ready with Predictive Analytics

2. Preparing Your Data

3. Exploring and Understanding Your Data

4. Creating Your First Qlik Sense Application

5. Clustering and Other Unsupervised Learning Methods

6. Decision Trees and Other Supervised Learning Methods

7. Model Evaluation

8. Visualizations, Data Applications, Dashboards, and Data Storytelling

9. Developing a Complete Application

A. Bibliography

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Chapter 6. Building Qlik Sense® Data Models

In the previous chapters, we looked at how to create a definition of which data to load. But we did not look at which considerations you should make on how to load and link different tables.

So, in this chapter, we will discuss the following topics:

Data modeling
How a data model reflects business processes

The QIX engine

The QIX engine is the core of the logic in Qlik Sense. All evaluations and calculations are made by this engine.

Every user selection implies a new logical situation. Other field values than the ones used earlier are possible; summations need to be made, so the charts and the KPIs get different values than what we had before. Everything needs to be recalculated, and the data model defines how this is done.

When the user makes a selection, Qlik Sense first evaluates which field values of other fields are possible and which rows are possible. In this evaluation, the key fields are used to propagate the selection to other tables. This is the logical inference. The second step is to calculate all formulas based on the possible rows. In this step, all aggregations are made.

The data model defines how the QIX engine's logical inference and calculations will be made.

The Qlik Sense® data model

Data used in Qlik Sense needs to be in a tabular form, very much like a table in Excel. A column in the table is referred to as a field and every row is a record.

When data is loaded into Qlik Sense, it is stored in the QIX internal database. In the simplest case, the data is just one single table. However, more commonly, the data model consists of several tables with links between them. These define how the different tables relate to each other. It is, hence, a relational model.

In the previous chapter, we saw an example where four tables were used: Customers, Orders, Order lines, and Products:

A simple data model made from four tables

This is in fact the core of a very common business application—a sales analysis based on the registered orders.

The structure is not a coincidence. Rather, the reason why it looks the way it does is that it is a reflection of the real business processes. The relations these four entities have in reality dictate the data model:

A customer may over time place several orders. Hence, customers and orders should be stored in different tables, and the customer ID should be stored in the Orders table.
An order may contain several order lines. Hence, orders and order lines should be stored in different tables, and the order ID should be stored in the Order lines table.
Several different order lines can refer to the same product. Hence, products and order lines should be stored in different tables, and the product ID should be stored in the Order lines table.

You could add a number of additional tables, and for each table you will need to ask yourself what its relation is with the already existing tables. The new table could be a list of countries to which the customers belong, of product groups, of shippers, of suppliers, of invoices, and so on. The business processes define whether a customer can belong to more than one country, whether a product can belong to more than one product group, and whether a single invoice can refer to more than order.

Hence, you should look at the real-life entities and their relationships to understand how they should be modeled in the Qlik Sense data model.

Creating a multitable data model

Loading several tables is technically just a matter of using several Load or Select statements in the script. Joins, in the way a database manager knows it from SQL, are usually not needed.

Normally, each Load or Select statement creates one table in the data model. Hence, if you want to load four tables, you should have four different Load or Select statements, each defining the appropriate table.

There are, however, some exceptions to this rule:

If a loaded table contains exactly the same set of fields as a previously loaded table, the new table will not be created as a separate table. Instead, the loaded records will be appended to the existing table, which most likely is what you want. This way you can add more data to an existing table.
If the Load or Select statement is preceded by the Concatenate or Join keywords, the loaded table will be merged with an existing data table. The Concatenate keyword is used if you want to add records to the table, just like in the previous bullet, but the two tables have slightly different sets of fields. The Join keyword is used to create the product between two tables, that is, the same as a JOIN in SQL.
If the Load or Select statement is preceded by the Generic keyword, the loaded table will be transformed into several data tables. This is a keyword you need to use if your table is a generic database, that is, if the second to last column contains an attribute name and the last column contains the attribute value.
If the Load or Select statements are preceded by the Mapping keyword, the loaded table will not be used as a normal data table. The table will be used for other purposes.
A previously loaded table can be deleted using the Drop command. This is especially useful if you make many transformations and need temporary tables.

Linking tables

Further, when loading multiple tables, the links between the tables are defined by the field names. At the end of the script run, the existing tables will be evaluated. If the same field name is found in more than one table, this field will be considered to be a field that links both the tables. This way, a data model is created. The logic in the script evaluation is hence identical in Qlik Sense and QlikView.

This means you need to make sure that the fields you want to use as links between the different tables, the key fields, are named the same in all tables you want to link. You can do this using the Profiling dialog (refer to the previous chapter) or you can rename them yourself by editing the script.

For instance, if the key is called CustomerNo in one table and CustomerID in the other, you could rename them like this:

Load CustomerNo as CustomerID, … From Table1 … ;
Load CustomerID, … From Table2 … ;

Also, it is important that you make sure that you don't have any unwanted links. For example, you may have a field called Description in two different tables. This is of course not a key, but rather just a short name for something that in one table may be a product description and in another a customer description. Also, here you need to rename the fields:

Load Description as Description1, … From Table1 … ;
Load Description as Description2, … From Table2 … ;

The goal is to create a script that defines a logical, coherent data model that corresponds to the business situation.

Creating a multitable data model

Loading several tables is technically just a matter of using several Load or Select statements in the script. Joins, in the way a database manager knows it from SQL, are usually not needed.

There are, however, some exceptions to this rule:

If a loaded table contains exactly the same set of fields as a previously loaded table, the new table will not be created as a separate table. Instead, the loaded records will be appended to the existing table, which most likely is what you want. This way you can add more data to an existing table.
If the Load or Select statement is preceded by the Concatenate or Join keywords, the loaded table will be merged with an existing data table. The Concatenate keyword is used if you want to add records to the table, just like in the previous bullet, but the two tables have slightly different sets of fields. The Join keyword is used to create the product between two tables, that is, the same as a JOIN in SQL.
If the Load or Select statement is preceded by the Generic keyword, the loaded table will be transformed into several data tables. This is a keyword you need to use if your table is a generic database, that is, if the second to last column contains an attribute name and the last column contains the attribute value.
If the Load or Select statements are preceded by the Mapping keyword, the loaded table will not be used as a normal data table. The table will be used for other purposes.
A previously loaded table can be deleted using the Drop command. This is especially useful if you make many transformations and need temporary tables.

Linking tables

For instance, if the key is called CustomerNo in one table and CustomerID in the other, you could rename them like this:

Load CustomerNo as CustomerID, … From Table1 … ;
Load CustomerID, … From Table2 … ;

Load Description as Description1, … From Table1 … ;
Load Description as Description2, … From Table2 … ;

The goal is to create a script that defines a logical, coherent data model that corresponds to the business situation.

Linking tables

For instance, if the key is called CustomerNo in one table and CustomerID in the other, you could rename them like this:

Load CustomerNo as CustomerID, … From Table1 … ;
Load CustomerID, … From Table2 … ;

Load Description as Description1, … From Table1 … ;
Load Description as Description2, … From Table2 … ;

The goal is to create a script that defines a logical, coherent data model that corresponds to the business situation.

Structuring your data

In a database, there are rules about where different entities are stored. For instance, everything about the customers should be stored in the Customers table. A customer identifier is stored in this table, which means that the necessary data can be retrieved by a simple lookup in the Customers table. So, if you need to refer to a customer from another table, you just store this identifier in the other table.

Normalization

The identifier needs to be unique in the Customers table, so that every record can be addressed. Here the field is called a primary key. In other tables, duplicates may exist. For example, several records in the Orders table might refer to the same customer. If this is the case, the key is called a foreign key.

A consequence of this setup is that a database can have many tables, often thousands. One table for customers, one for orders, one for order lines, one for products, one for product groups, one for shippers, one for invoices, and so on. Each table contains one entity type, and each instance of this entity has exactly one row in the table—one record.

In the customer example, it means that each customer is stored in one row only, and all the attributes of this customer are stored in the different columns of this row. This is called normalization.

The main goal with normalization is to avoid redundancy. In a transactional database, it is important that a piece of information is stored in only one place. Not only does it take less space, it also speeds up the update process and minimizes errors. You can imagine the confusion if, for instance, a customer address is stored in several places and the different occurrences contain different addresses. Which version should you trust?

So, the source data is often highly normalized. But does this mean that the Qlik Sense data model should be normalized too?

Yes and No.

The Qlik Sense data model does not need to be as normalized as the source data. Redundancy is not a problem, and if duplicating data improves the user experience, you should definitely consider using duplicate data. The data model should be optimized for user experience and performance, not for minimal size.

But some normalization has great advantages—structuring the data in entities (tables) simplifies the data modeling work. It also makes maintenance simpler, since a data model and a script can be understood by a developer who has never seen it before. Finally, the QIX engine works better with a normalized model. It is easier to make it calculate numbers correctly and avoid double counting, which is a real problem when you start to de-normalize.

So the bottom line is that you should have some normalization, but it does not need to be as strict as in the source data.

The main case in which you need to de-normalize is if you use the same entity in different places in the data model.

For example, you may use a table listing external organizations in the context of supplier, shipper, as well as customer. This means the Organization table is used in different roles. In such a case, you should load the organization table three times: first as a supplier, then as a shipper, and finally as a customer, linking to the three different foreign keys.

Another common situation is that you have several dates in your data model: OrderDate, RequiredDate, ShippingDate, and InvoiceDate. In other words, the date is used in different roles. Also, here you should load the dimension—the calendar—several times, once per date field.

Another reason to de-normalize is for optimization purposes. One of the cases would be if you have several very large tables linked to each other, for example, if you have an order headers table as well as an order lines table, and both are large (millions of records). From a performance perspective, this is not optimal. The QIX engine will need more CPU cycles than if the information of the two tables had been stored in one single table. So, you might want to join both the tables for performance reasons.

A small word of warning though, joining tables is not always safe. This operation may lead to a record being duplicated on multiple records, and if the record holds a number, the summation made by the QIX engine will lead to an erroneous result—the number will be counted twice. In the case of order headers and order lines, you know that an order line can belong to one order header only, so the numbers in the order lines table will not be duplicated. Hence, it is safe to join here.

However, if you have a number in the orders table, it will be duplicated. But luckily, this table rarely contains any numbers.

Star schema and snowflake schema

The normalization is usually quite different in the source data model and in the analytical model. For the source data, one often talks about transaction tables and master tables. The transaction tables are the ones that store orders, invoices, and other transactions. In these, new records are typically added every hour, minute, or even second.

This is very different from the master tables, where new records are added much more rarely: Products, Customers, and the Calendar are typical master tables.

Master tables are often used for many purposes and are usually linked to several transaction tables, which makes the data model look as if it has circular references. This is, however, not a problem, since every link means a separate lookup in the master table.

For example, in the following relational model, the Organizations table is linked to the transactional data through three keys: Customer, Shipped by, and Supplied by. This means that a specific shipment item can be linked to several organizations—one customer, one shipper, and one supplier:

In an analytical model, the tables are used in a different way. The transactions are joined into one single fact table, and the master tables are loaded as dimensions. The reason for this is partly historical. Older hypercube tools could not work unless all metrics were collected into a fact table. In addition, they could use hierarchical dimensions surrounding the fact table. The model will then look like a star; hence the name star schema:

This model has only one layer of dimensions—all the tables are directly linked to the fact table. But if the model instead uses dimensions in two or more levels, the model is called a snowflake schema:

For Qlik Sense, the difference is minimal. All three data models can be used—provided that master tables used in several roles are also loaded several times. It is also possible to have metrics in any table, not just the fact table.

A star schema is, however, both simple and efficient, so we strongly recommend using this as an initial model for your data. It is then easy to build further and make it more complex.

Pitfalls in the data model

When you create your data model, you should look out for two potential problems: synthetic keys and circular references.

Synthetic keys are automatically created if you have multiple keys between two tables. They are not by themselves bad, but they could be a warning sign of a bad data model. If you have made a mistake when loading the data, the first sign is often one or several synthetic keys. Then, you need to go back and investigate why this has happened:

The preceding diagram shows a synthetic key modeled the way you loaded the data. It is a correct one that you don't need to change. Internally, it is stored differently; refer to the following diagram. In the data model viewer (as shown in the following diagram), you can toggle between these two views:

Qlik Sense will, at the end of the script run, warn you about these potential problems, as shown in the following screenshot:

Usually, you do not want synthetic keys—you want one single key in each table link. However, if you know that you have multiple keys linking two tables, and that this is the way you want your data model, then there is no problem in having synthetic keys.

The second potential problem is circular references. This happens if you load data in such a way that the links between the tables form a loop. The following diagram is a typical example:

The circular reference from a data modeling perspective is an error and not just a warning, and you will get an error message at the end of the script run:

If you have a circular reference, you must rethink your data model with the goal of removing the loop.

It could sometimes be difficult to figure out how to remove a circular reference, but a good advice is to look at every link in your data model and ask, "Are these two fields really the same thing? Or do the fields have different roles?." In the preceding screenshot, you have a circular reference where the Date field has two different roles: one is the date when the order arrived, and the other is the date when the invoice was sent. These two dates need not necessarily be the same. Hence, they should not be linked but instead loaded as two different fields.

Normalization

So, the source data is often highly normalized. But does this mean that the Qlik Sense data model should be normalized too?

Yes and No.

So the bottom line is that you should have some normalization, but it does not need to be as strict as in the source data.

The main case in which you need to de-normalize is if you use the same entity in different places in the data model.

However, if you have a number in the orders table, it will be duplicated. But luckily, this table rarely contains any numbers.

Star schema and snowflake schema

This is very different from the master tables, where new records are added much more rarely: Products, Customers, and the Calendar are typical master tables.

A star schema is, however, both simple and efficient, so we strongly recommend using this as an initial model for your data. It is then easy to build further and make it more complex.

Pitfalls in the data model

When you create your data model, you should look out for two potential problems: synthetic keys and circular references.

Qlik Sense will, at the end of the script run, warn you about these potential problems, as shown in the following screenshot:

The second potential problem is circular references. This happens if you load data in such a way that the links between the tables form a loop. The following diagram is a typical example:

The circular reference from a data modeling perspective is an error and not just a warning, and you will get an error message at the end of the script run:

If you have a circular reference, you must rethink your data model with the goal of removing the loop.

Star schema and snowflake schema

This is very different from the master tables, where new records are added much more rarely: Products, Customers, and the Calendar are typical master tables.

A star schema is, however, both simple and efficient, so we strongly recommend using this as an initial model for your data. It is then easy to build further and make it more complex.

Pitfalls in the data model

When you create your data model, you should look out for two potential problems: synthetic keys and circular references.

Qlik Sense will, at the end of the script run, warn you about these potential problems, as shown in the following screenshot:

The second potential problem is circular references. This happens if you load data in such a way that the links between the tables form a loop. The following diagram is a typical example:

The circular reference from a data modeling perspective is an error and not just a warning, and you will get an error message at the end of the script run:

If you have a circular reference, you must rethink your data model with the goal of removing the loop.

Pitfalls in the data model

When you create your data model, you should look out for two potential problems: synthetic keys and circular references.

Qlik Sense will, at the end of the script run, warn you about these potential problems, as shown in the following screenshot:

The second potential problem is circular references. This happens if you load data in such a way that the links between the tables form a loop. The following diagram is a typical example:

The circular reference from a data modeling perspective is an error and not just a warning, and you will get an error message at the end of the script run:

If you have a circular reference, you must rethink your data model with the goal of removing the loop.

The data model viewer

The script defines the data model, but if you want to view it graphically, you should use data model viewer. This is opened from the toolbar menu, as shown in the following screenshot:

Clicking on the icon to the right in the menu will open Data model viewer in a new tab. Once this is open, you can visually see what the data model looks like. If you have more than one table, they should be linked by the key fields and should look something similar to the following screenshot:

The Data model viewer is an excellent tool to get an overview of the data model. It is also a very good debugging tool in the application development process.

Check whether the data model looks the way you want it to and make sure that you have no circular references. Circular references will be marked with red links and synthetic keys will be named $Syn:

Using preview mode

The Data model viewer option has a useful additional feature, preview mode. This mode allows you to preview both the data and metadata of a field.

Select a field in a table and click on the Preview button to the lower-left corner of the screen. This opens the Preview panel in which you can see data about the field and some sample data records from the table. In addition, you can define dimensions and measures based on the chosen field, as shown in the following screenshot:

Using preview mode

The Data model viewer option has a useful additional feature, preview mode. This mode allows you to preview both the data and metadata of a field.

Summary

In this chapter, we looked at the functions and commands you need in order to create a logical and coherent data model that reflects your business processes.

In the next chapter, we'll move away from app creation and start examining how a Qlik Sense server can be deployed in the cloud.

Qlik Sense: Advanced Data Visualization for Your Organization

By : Henric Cronström, Ferran Garcia Pagans, Neeraj Kharpate, James Richardson, Philip Hand

Qlik Sense: Advanced Data Visualization for Your Organization

By: Henric Cronström, Ferran Garcia Pagans, Neeraj Kharpate, James Richardson, Philip Hand

Overview of this book

Related Content you might be interested in

Current Title:

Qlik Sense: Advanced Data Visualization for Your Organization

Chapter 6. Building Qlik Sense® Data Models

The QIX engine

The Qlik Sense® data model

Creating a multitable data model

Linking tables

Creating a multitable data model

Linking tables

Linking tables

Structuring your data

Normalization

Star schema and snowflake schema

Pitfalls in the data model

Normalization

Star schema and snowflake schema

Pitfalls in the data model

Star schema and snowflake schema

Pitfalls in the data model

Pitfalls in the data model

The data model viewer

Using preview mode

Using preview mode

Summary