Book Image

QlikView: Advanced Data Visualization

By : Miguel Angel Garcia, Barry Harmsen, Stephen Redmond, Karl Pover
Book Image

QlikView: Advanced Data Visualization

By: Miguel Angel Garcia, Barry Harmsen, Stephen Redmond, Karl Pover

Overview of this book

QlikView is one of the most flexible and powerful business intelligence platforms around, and if you want to transform data into insights, it is one of the best options you have at hand. Use this Learning Path, to explore the many features of QlikView to realize the potential of your data and present it as impactful and engaging visualizations. Each chapter in this Learning Path starts with an understanding of a business requirement and its associated data model and then helps you create insightful analysis and data visualizations around it. You will look at problems that you might encounter while visualizing complex data insights using QlikView, and learn how to troubleshoot these and other not-so-common errors. This Learning Path contains real-world examples from a variety of business domains, such as sales, finance, marketing, and human resources. With all the knowledge that you gain from this Learning Path, you will have all the experience you need to implement your next QlikView project like a pro. This Learning Path includes content from the following Packt products: • QlikView for Developers by Miguel Ángel García, Barry Harmsen • Mastering QlikView by Stephen Redmond • Mastering QlikView Data Visualization by Karl Pover
Table of Contents (25 chapters)
QlikView: Advanced Data Visualization
Contributors
Preface
Index

Generating test data


It is enormously useful to be able to quickly generate test data so that we can create QlikView applications and test different aspects of development and discover how different development methods work. By creating our own set of data, we can abstract problems away from the business issues that we are trying to solve because the data is not connected to those problems. Instead, we can resolve the technical issue underlying the business issue. Once we have resolved that issue, we will have built an understanding that allows us to more quickly resolve the real problems with the business data.

We might contemplate that if we are developers who only have access to a certain dataset, then we will only learn to solve the issues in that dataset. For true mastery, we need to be able to solve issues in many different scenarios, and the only way that we can do that is to generate our own test data to do that with.

Generating dimension values

Dimension tables will generally have lower numbers of records; there are a number of websites online that will generate this type of data for you.

The following screenshot demonstrates setting up a Customer extract in Mockaroo:

This allows us to create 1,000 customer records that we can include in our QlikView data model. The extract is in the CSV format, so it is quite straightforward to load into QlikView.

Generating fact table rows

While we might often abdicate the creation of test dimension tables to a third-party website like this, we should always try and generate the Fact table data ourselves.

A good way to do this is to simply generate rows with a combination of the AutoGenerate() and Rand() functions.

For even more advanced use cases, we can look at using statistical functions such as NORMINV to generate normal distributions. There is a good article on this written by Henric Cronström on Qlik Design Blog at http://community.qlik.com/blogs/qlikviewdesignblog/2013/08/26/monte-carlo-methods.

We should be aware of the AutoGenerate() function that will just simply generate empty rows of data. We can also use the Rand() function to generate a random number between 0 and 1 (it works both in charts and in the script). We can then multiply this value by another number to get various ranges of values.

In the following example, we load a previously generated set of dimension tables—Customer, Product, and Employee. We then generate a number of order header and line rows based on these dimensions, using random dates in a specified range.

First, we will load the Product table and derive a couple of mapping tables:

// Load my auto generated dimension files
Product:
LOAD ProductID, 
     Product, 
     CategoryID, 
     SupplierID, 
     Money#(CostPrice, '$#,##0.00', '.', ',') As CostPrice, 
     Money#(SalesPrice, '$#,##0.00', '.', ',') As SalesPrice
FROM
Products.txt
(txt, utf8, embedded labels, delimiter is '\t', msq);


Product_Cost_Map:
Mapping Load
   ProductID,
   Num(CostPrice)
Resident Product;

Product_Price_Map:
Mapping Load
   ProductID,
   Num(SalesPrice)
Resident Product;

Now load the other dimension tables:

Customer:
LOAD CustomerID, 
     Customer, 
     City, 
     Country, 
     Region, 
     Longitude, 
     Latitude, 
     Geocoordinates
FROM
Customers.txt
(txt, codepage is 1252, embedded labels, delimiter is '\t', msq);

Employee:
LOAD EmployeeID, 
     Employee, 
     Grade, 
     SalesUnit
FROM
Employees.txt
(txt, codepage is 1252, embedded labels, delimiter is '\t', msq)
Where Match(Grade, 0, 1, 2, 3);  // Sales people

We will store the record counts from each table in variables:

// Count the ID records in each table
Let vCustCount=FieldValueCount('CustomerID');
Let vProdCount=FieldValueCount('ProductID');
Let vEmpCount=FieldValueCount('EmployeeID');

We now generate some date ranges to use in the data calculation algorithm:

// Work out the days
Let vStartYear=2009;      // Arbitrary - change if wanted
Let vEndYear=Year(Now()); // Generate up to date data
// Starting the date in April to allow
// offset year testing
Let vStartDate=Floor(MakeDate($(vStartYear),4,1));
Let vEndDate=Floor(MakeDate($(vEndYear),3,31));
Let vNumDays=vEndDate-vStartDate+1;

Note

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Run a number of iterations to generate data. By editing the number of iterations, we can increase or decrease the amount of data generated:

// Create a loop of 10000 iterations
For i=1 to 10000

   // "A" type records are for any date/time

   // Grab a random employee and customer
   Let vRnd = Floor(Rand() * $(vEmpCount));
   Let vEID = Peek('EmployeeID', $(vRnd), 'Employee');
   Let vRnd = Floor(Rand() * $(vCustCount));
   Let vCID = Peek('CustomerID', $(vRnd), 'Customer');   

   // Create a date for any Time of Day  9-5
   Let vOrderDate = $(vStartDate) + Floor(Rand() * $(vNumDays)) + ((9/24) + (Rand()/3));

   // Calculate a random freight amount 
   Let vFreight = Round(Rand() * 100, 0.01);

   // Create the header record
   OrderHeader:
   Load
      'A' & $(i)       As OrderID,
      $(vOrderDate)    As OrderDate,
      $(vCID)       As CustomerID,
      $(vEID)       As EmployeeID,
      $(vFreight)    As Freight
   AutoGenerate(1);

   // Generate Order Lines

   // This factor allows us to generate a different number of
   // lines depending on the day of the week 
   Let vWeekDay = Num(WeekDay($(vOrderDate)));
   Let vDateFactor = Pow(2,$(vWeekDay))*(1-(Year(Now())-Year($(vOrderDate)))*0.05);
   
   // Calculate the random number of lines 
   Let vPCount = Floor(Rand() * $(vDateFactor)) + 1;
   
   For L=1 to $(vPCount)
      // Calculate random values
      Let vQty = Floor(Rand() * (50+$(vDateFactor))) + 1;
      Let vRnd = Floor(Rand() * $(vProdCount));
      Let vPID = Peek('ProductID', $(vRnd), 'Product');
      Let vCost = ApplyMap('Product_Cost_Map', $(vPID), 1);
      Let vPrice = ApplyMap('Product_Price_Map', $(vPID), 1);

      OrderLine:
      Load
         'A' & $(i)       As OrderID,
         $(L)         As LineNo,
         $(vPID)         As ProductID,
         $(vQty)         As Quantity,
         $(vPrice)      As SalesPrice,
         $(vCost)      As SalesCost,
         $(vQty)*$(vPrice) As LineValue,
         $(vQty)*$(vCost) As LineCost
      AutoGenerate(1);
   
   Next

   // "B" type records are for summer peak

   // Summer Peak - Generate additional records for summer
   // months to simulate a peak trading period 
   Let vY = Year($(vOrderDate));
   Let vM = Floor(Rand()*2)+7;
   Let vD = Day($(vOrderDate));
   Let vOrderDate = Floor(MakeDate($(vY),$(vM),$(vD))) + ((9/24) + (Rand()/3));

   if Rand() > 0.8 Then
   
      // Grab a random employee and customer
      Let vRnd = Floor(Rand() * $(vEmpCount));
      Let vEID = Peek('EmployeeID', $(vRnd), 'Employee');
      Let vRnd = Floor(Rand() * $(vCustCount));
      Let vCID = Peek('CustomerID', $(vRnd), 'Customer');   
   
      // Calculate a random freight amount 
      Let vFreight = Round(Rand() * 100, 0.01);
      // Create the header record
      OrderHeader:
      Load
         'B' & $(i)       As OrderID,
         $(vOrderDate)    As OrderDate,
         $(vCID)       As CustomerID,
         $(vEID)       As EmployeeID,
         $(vFreight)    As Freight
      AutoGenerate(1);
   
      // Generate Order Lines
   
      // This factor allows us to generate a different number of
      // lines depending on the day of the week 
      Let vWeekDay = Num(WeekDay($(vOrderDate)));
      Let vDateFactor = Pow(2,$(vWeekDay))*(1-(Year(Now())-Year($(vOrderDate)))*0.05);
      
      // Calculate the random number of lines 
      Let vPCount = Floor(Rand() * $(vDateFactor)) + 1;
      
      For L=1 to $(vPCount)
      
         // Calculate random values
         Let vQty = Floor(Rand() * (50+$(vDateFactor))) + 1;
         Let vRnd = Floor(Rand() * $(vProdCount));
         Let vPID = Peek('ProductID', $(vRnd), 'Product');
         Let vCost = ApplyMap('Product_Cost_Map', $(vPID), 1);
         Let vPrice = ApplyMap('Product_Price_Map', $(vPID), 1);
   
         OrderLine:
         Load
            'B' & $(i)       As OrderID,
            $(L)         As LineNo,
            $(vPID)         As ProductID,
            $(vQty)         As Quantity,
            $(vPrice)      As SalesPrice,
            $(vCost)      As SalesCost,
            $(vQty)*$(vPrice) As LineValue,
            $(vQty)*$(vCost) As LineCost
         AutoGenerate(1);
      Next

   End if
Next

// Store the Generated Data to QVD
Store OrderHeader into OrderHeader.qvd;
Store OrderLine into OrderLine.qvd;