Book Image

Mastering QlikView

By : Stephen Redmond
Book Image

Mastering QlikView

By: Stephen Redmond

Overview of this book

Table of Contents (14 chapters)
Mastering QlikView
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Generating test data


It is enormously useful to be able to quickly generate test data so that we can create QlikView applications and test different aspects of development and discover how different development methods work. By creating our own set of data, we can abstract problems away from the business issues that we are trying to solve because the data is not connected to those problems. Instead, we can resolve the technical issue underlying the business issue. Once we have resolved that issue, we will have built an understanding that allows us to more quickly resolve the real problems with the business data.

We might contemplate that if we are developers who only have access to a certain dataset, then we will only learn to solve the issues in that dataset. For true mastery, we need to be able to solve issues in many different scenarios, and the only way that we can do that is to generate our own test data to do that with.

Generating dimension values

Dimension tables will generally have lower numbers of records; there are a number of websites online that will generate this type of data for you.

For quite a while, I used http://www.generatedata.com to generate random data such as company names, and so on. However, in a recent blog entry by Barry Harmsen (Barry is the co-author of QlikView 11 for Developers, Packt Publishing) at http://www.qlikfix.com, he mentioned http://www.mockaroo.com as a resource for generating such tables.

The following screenshot demonstrates setting up a Customer extract in Mockaroo:

This allows us to create 1,000 customer records that we can include in our QlikView data model. The extract is in the CSV format, so it is quite straightforward to load into QlikView.

Generating fact table rows

While we might often abdicate the creation of test dimension tables to a third-party website like this, we should always try and generate the Fact table data ourselves.

A good way to do this is to simply generate rows with a combination of the AutoGenerate() and Rand() functions.

For even more advanced use cases, we can look at using statistical functions such as NORMINV to generate normal distributions. There is a good article on this written by Henric Cronström on Qlik Design Blog at http://community.qlik.com/blogs/qlikviewdesignblog/2013/08/26/monte-carlo-methods.

We should be aware of the AutoGenerate() function that will just simply generate empty rows of data. We can also use the Rand() function to generate a random number between 0 and 1 (it works both in charts and in the script). We can then multiply this value by another number to get various ranges of values.

In the following example, we load a previously generated set of dimension tables—Customer, Product, and Employee. We then generate a number of order header and line rows based on these dimensions, using random dates in a specified range.

First, we will load the Product table and derive a couple of mapping tables:

// Load my auto generated dimension files
Product:
LOAD ProductID, 
     Product, 
     CategoryID, 
     SupplierID, 
     Money#(CostPrice, '$#,##0.00', '.', ',') As CostPrice, 
     Money#(SalesPrice, '$#,##0.00', '.', ',') As SalesPrice
FROM
Products.txt
(txt, utf8, embedded labels, delimiter is '\t', msq);


Product_Cost_Map:
Mapping Load
   ProductID,
   Num(CostPrice)
Resident Product;

Product_Price_Map:
Mapping Load
   ProductID,
   Num(SalesPrice)
Resident Product;

Now load the other dimension tables:

Customer:
LOAD CustomerID, 
     Customer, 
     City, 
     Country, 
     Region, 
     Longitude, 
     Latitude, 
     Geocoordinates
FROM
Customers.txt
(txt, codepage is 1252, embedded labels, delimiter is '\t', msq);

Employee:
LOAD EmployeeID, 
     Employee, 
     Grade, 
     SalesUnit
FROM
Employees.txt
(txt, codepage is 1252, embedded labels, delimiter is '\t', msq)
Where Match(Grade, 0, 1, 2, 3);  // Sales people

We will store the record counts from each table in variables:

// Count the ID records in each table
Let vCustCount=FieldValueCount('CustomerID');
Let vProdCount=FieldValueCount('ProductID');
Let vEmpCount=FieldValueCount('EmployeeID');

We now generate some date ranges to use in the data calculation algorithm:

// Work out the days
Let vStartYear=2009;      // Arbitrary - change if wanted
Let vEndYear=Year(Now()); // Generate up to date data
// Starting the date in April to allow
// offset year testing
Let vStartDate=Floor(MakeDate($(vStartYear),4,1));
Let vEndDate=Floor(MakeDate($(vEndYear),3,31));
Let vNumDays=vEndDate-vStartDate+1;

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Run a number of iterations to generate data. By editing the number of iterations, we can increase or decrease the amount of data generated:

// Create a loop of 10000 iterations
For i=1 to 10000

   // "A" type records are for any date/time

   // Grab a random employee and customer
   Let vRnd = Floor(Rand() * $(vEmpCount));
   Let vEID = Peek('EmployeeID', $(vRnd), 'Employee');
   Let vRnd = Floor(Rand() * $(vCustCount));
   Let vCID = Peek('CustomerID', $(vRnd), 'Customer');   

   // Create a date for any Time of Day  9-5
   Let vOrderDate = $(vStartDate) + Floor(Rand() * $(vNumDays)) + ((9/24) + (Rand()/3));

   // Calculate a random freight amount 
   Let vFreight = Round(Rand() * 100, 0.01);

   // Create the header record
   OrderHeader:
   Load
      'A' & $(i)       As OrderID,
      $(vOrderDate)    As OrderDate,
      $(vCID)       As CustomerID,
      $(vEID)       As EmployeeID,
      $(vFreight)    As Freight
   AutoGenerate(1);

   // Generate Order Lines

   // This factor allows us to generate a different number of
   // lines depending on the day of the week 
   Let vWeekDay = Num(WeekDay($(vOrderDate)));
   Let vDateFactor = Pow(2,$(vWeekDay))*(1-(Year(Now())-Year($(vOrderDate)))*0.05);
   
   // Calculate the random number of lines 
   Let vPCount = Floor(Rand() * $(vDateFactor)) + 1;
   
   For L=1 to $(vPCount)
      // Calculate random values
      Let vQty = Floor(Rand() * (50+$(vDateFactor))) + 1;
      Let vRnd = Floor(Rand() * $(vProdCount));
      Let vPID = Peek('ProductID', $(vRnd), 'Product');
      Let vCost = ApplyMap('Product_Cost_Map', $(vPID), 1);
      Let vPrice = ApplyMap('Product_Price_Map', $(vPID), 1);

      OrderLine:
      Load
         'A' & $(i)       As OrderID,
         $(L)         As LineNo,
         $(vPID)         As ProductID,
         $(vQty)         As Quantity,
         $(vPrice)      As SalesPrice,
         $(vCost)      As SalesCost,
         $(vQty)*$(vPrice) As LineValue,
         $(vQty)*$(vCost) As LineCost
      AutoGenerate(1);
   
   Next

   // "B" type records are for summer peak

   // Summer Peak - Generate additional records for summer
   // months to simulate a peak trading period 
   Let vY = Year($(vOrderDate));
   Let vM = Floor(Rand()*2)+7;
   Let vD = Day($(vOrderDate));
   Let vOrderDate = Floor(MakeDate($(vY),$(vM),$(vD))) + ((9/24) + (Rand()/3));

   if Rand() > 0.8 Then
   
      // Grab a random employee and customer
      Let vRnd = Floor(Rand() * $(vEmpCount));
      Let vEID = Peek('EmployeeID', $(vRnd), 'Employee');
      Let vRnd = Floor(Rand() * $(vCustCount));
      Let vCID = Peek('CustomerID', $(vRnd), 'Customer');   
   
      // Calculate a random freight amount 
      Let vFreight = Round(Rand() * 100, 0.01);
      // Create the header record
      OrderHeader:
      Load
         'B' & $(i)       As OrderID,
         $(vOrderDate)    As OrderDate,
         $(vCID)       As CustomerID,
         $(vEID)       As EmployeeID,
         $(vFreight)    As Freight
      AutoGenerate(1);
   
      // Generate Order Lines
   
      // This factor allows us to generate a different number of
      // lines depending on the day of the week 
      Let vWeekDay = Num(WeekDay($(vOrderDate)));
      Let vDateFactor = Pow(2,$(vWeekDay))*(1-(Year(Now())-Year($(vOrderDate)))*0.05);
      
      // Calculate the random number of lines 
      Let vPCount = Floor(Rand() * $(vDateFactor)) + 1;
      
      For L=1 to $(vPCount)
      
         // Calculate random values
         Let vQty = Floor(Rand() * (50+$(vDateFactor))) + 1;
         Let vRnd = Floor(Rand() * $(vProdCount));
         Let vPID = Peek('ProductID', $(vRnd), 'Product');
         Let vCost = ApplyMap('Product_Cost_Map', $(vPID), 1);
         Let vPrice = ApplyMap('Product_Price_Map', $(vPID), 1);
   
         OrderLine:
         Load
            'B' & $(i)       As OrderID,
            $(L)         As LineNo,
            $(vPID)         As ProductID,
            $(vQty)         As Quantity,
            $(vPrice)      As SalesPrice,
            $(vCost)      As SalesCost,
            $(vQty)*$(vPrice) As LineValue,
            $(vQty)*$(vCost) As LineCost
         AutoGenerate(1);
      Next

   End if
Next

// Store the Generated Data to QVD
Store OrderHeader into OrderHeader.qvd;
Store OrderLine into OrderLine.qvd;

Note

Barry Harmsen, co-author of QlikView 11 for Developers, Packt Publishing, recommends a slightly different method for generating seasonal variation. By using the Sin() or Cos() functions to generate a table containing the number of records to generate for each day, we can loop across this table and use these values to auto-generate rows for the fact table.