Book Image

Instant Creating Data Models with PowerPivot How-to

By : Taehyung Lee
Book Image

Instant Creating Data Models with PowerPivot How-to

By: Taehyung Lee

Overview of this book

Microsoft PowerPivot is a free software designed to enhance Microsoft Excel. It allows the user to make extensive use of his/her computer's powers to draw data from various sources, do analysis across millions of rows of data, and present results in a very unique format. Instant Creating Data Models with PowerPivot How-to is a concise and to-the-point guide that helps you get a jump start on using this strong business intelligence tool, while still working in an environment similar to Excel. You will begin with data generation and manipulation, learning a new feature at each step by building onto the old file, and ultimately creating a comprehensive file. Instant Creating Data Models with PowerPivot How-to will guide the user through database installation, importing data from various sources, creating pivot charts and tables, utilizing a unique feature of PowerPivot called slicers,adding custom columns, and setting custom relationships between data to generate the ultimate customized dataset for analysis. By the end of the book and all the sections of Microsoft PowerPivot for Excel, the reader will be fully experienced and ready to utilize this powerful software.
Table of Contents (7 chapters)

Adding new custom columns (Intermediate)


As we saw in previous recipes, sometimes the data by itself is not what we want. Furthermore, we are restricted to what is already available and are unable to arrange the data the way we want it, even if it is rather simple such as arranging it by every five years instead of by every year.

PowerPivot, just like Excel, supports user-friendly manual inputs into data columns, and we are able to input a Data Analysis Expressions (DAX) formula, which is very similar to Excel formulas. By inputting these additional formulas and creating additional columns, we will be able to select from a wider range of data columns to exactly suit our needs.

We will practice various aspects of creating a custom column and the associated benefits by following the steps in this recipe.

Getting ready

To learn the basic formulas of Excel would be beneficial in configuring additional custom data columns.

Then, we will need a new set of data that we can selectively filter, modify, and use. Open up a new instance of PowerPivot and import from SQL. Select all 19 tables that have the Schema value as Sales and filter out the Modified Date and rowguid columns in order to be more efficient in storing and calculating the data as shown in the following screenshot:

Save the file as CustomColumns_v1.xlsx. Refer to the recipe where we imported the data from SQL if we are stuck at any point.

How to do it...

In this section, we will use the imported data and create additional data using custom columns through multiple sheets that build on one another.

The following are the steps to add a new custom column using the CustomColumns_v1.xlsx file, which results in the creation of the DeliveryTime01 sheet:

  1. The table SalesOrderHeader contains the OrderDate, DueDate, ShipDate, OnlineOrderFlag, ShipMethodID, and TerritoryID columns. These are the relevant data if we are interested in analyzing the shipping time based on: whether the order was an online order or not, which shipping method was chosen, and which area it was delivered to. We can also analyze whether the shipping period met the due date deadline or not.

  2. Create a pivot table in the order you think is necessary to analyze the shipping time for each order. For instance, it would be:

    • Select SalesOrderHeader | ShipMethodID, SalesOrderHeader | OnlineOrderFlag, and SalesTerritory | Name and move them under Slicers Vertical

    • Select SalesOrderHeader | OrderDate and move it under Row Labels

    • Select SalesOrderHeader | ShipDate and SalesOrderHeader | DueDate and move them under Values

  3. However, this does not give us what we want. It only gives us the order date and number of orders that have that order date (Count of ShipDate and Count of DueDate), which is filterable by the ShipMethodID and OnlineOrderFlag columns, and the destination region.

  4. What we want is the duration it took to ship, and whether it met the due date. Because we can't manually type data inside the pivot table, there is no way to know both of these. This is where it is crucial to add custom columns manually.

  5. Return to PowerPivot. Under the Design tab, click on the Add button to add a new column to calculate the number of days it took to ship from the moment of order. Type in the formula as follows:

    = [ShipDate] – [OrderDate]
  6. It will create a column called CalculatedColumn1 showing the bunch of 1900/01/06 0:00:00. Let us change the format to numbers so that it may represent the number of days between order and shipping as shown. Also, change the name of the column to ShippingDays.

  7. Then, because we want to see if it shipped before the due date, create another custom column with the following formula:

    = [ShipDate] <= [DueDate]
  8. This column will return a Boolean (True or False) as to whether the shipped date is equal to or less than (before) the due date. It will return True if it was shipped on or before the due date, and will return False if it was shipped past the due date. Change the column name to DeliveredOnTime.

  9. Return to the pivot table and refresh. Remove all three datasets from Row Labels and Values. Add ShippingDays to Row Labels and Slicers Vertical, add DeliveredOnTime to Slicers Vertical, and add SalesOrderID to Values and change it to Count of SalesOrderID. The pivot table should appear as follows:

  10. Now, we have what we want: number of days it took to ship (7 or 8) and number of orders for each (31456 orders that took 7 days to ship, and 9 orders that took 8 days to ship). By filtering the data using our slicers (ShipMethodID, OnlineOrderFlag, ShippingDays, DeliveredOnTime, and RegionName), we are able to make a number of conclusions such as:

    • All orders were shipped on time (DeliveredOnTime is True for all orders)

    • All online orders took 7 days to ship (when OnlineOrderFlag is True)

    • All online orders are shipped using ship method ID 1, and all non-online orders are shipped using ship method ID 5 (when filtering by OrderOnlineFlag)

    • Germany is the only country where some of the orders took 8 days to arrive (when 8 is selected in the ShippingDays filter)

  11. This is good! We are getting used to modifying the data by adding custom columns to give us what we want—and to make conclusions based on those data.

  12. Save the sheet as DeliveryTime01 and save the file.

The following are the steps to add a new custom column using the DeliveryTime01 sheet in the CustomColumns_v1.xlsx file, which results in the creation of the DeliveryTime02 sheet:

  1. Copy the sheet DeliveryTime01 and rename as DeliveryTime02.

  2. Now we want to expand on this to see the per-month, per-season, and per-year result of number of orders. However, a quick look at the PowerPivot table suggests that the closest data we have is the order date column, which gives date format in a number not orderable by year, by season, or by month. Well then, it's time for custom columns.

  3. Return to the PowerPivot table SalesOrderHeader and add three custom columns with the following formulas and rename them as Year, Month, and Season respectively. It should appear as follows.

    =year([OrderDate])
    =month([OrderDate])
    =if(or([Month]<3, [Month]=12), "Winter", if([Month]<6, "Spring", if([Month]<9, "Summer", "Fall")))
  4. The previous three lines of code are shown in columns Year, Month, and Season as shown in the following screenshot:

  5. Return to the pivot table, and let us reformat the slicers. From Slicers Vertical, since all orders were delivered on time, remove DeliveredOnTime. Since all online orders are shipped using the ship method ID 1 and all non-online orders are shipped using the ship method ID 5, remove ShipMethodID. Then, in Slicers Horizontal, add Year, Month, and Season.

  6. Well that was relatively easy, wasn't it?

  7. Our analysis shows that the deliveries took longer than usual (8 days instead of 7) only during the year of 2008 in February, March, April, and May, for Germany only. We also observe that orders were relatively well distributed over the months as well as the season. The number of orders has been steadily increasing since 2005, so that's good.

  8. One thing rather troublesome in our analysis is that the regions of U.S. are treated separately. Let us fix that and have all the U.S. regions as United States. Since the RegionName value comes from the table SalesTerritory, let's re-open PowerPivot and check the table.

  9. Oh! It seems that there is already a column called CountryRegionCode that shows the country, instead of the region! Let us use this. Return to the pivot table and replace the filter Name with SalesTerritory | CountryRegionCode. We have just saved unnecessary work by finding another column that serves our purpose exactly. It is worthwhile to note that the U.S. only accounts for about 40 percent of our sale, meaning that we are a very international company with much room for growth.

  10. Let's save the file for now.

The following are the steps to add a new custom column using the DeliveryTime02 sheet in the CustomColumns_v1.xlsx file, which results in the creation of the DeliveryTime03 sheet:

  1. Copy over the DeliveryTime02 sheet, and rename as DeliveryTime03.

  2. In treating orders, the number of orders is important. However, what is more important is how much we made from those orders! We want to expand our table even further to see our revenue made from these orders per month. Let us rearrange the data:

    • Remove all datasets from Row Labels and Values

    • Move all datasets in Slicers Horizontal to Slicers Vertical (reorder as Year, Month, Season, CountryRegionCode, OnlineOrderFlag, and ShippingDays)

    • Add Year and Month to Row Labels

    • Add SubTotal, TaxAmt, and Freight, all as a Sum

  3. Now we have a per month, per year figure of sales, tax, freight fees, and total charged on orders. An analysis could be made that while the number of orders is relatively stable throughout the year (based on the previous sheets), the amount of an order is generally lower in January and higher during the months of June to August. The absolute $ amount helps us to understand our businesses better.

  4. However, the sheer numbers may be difficult to see at times. Create a doughnut-style pivot chart and click on Switch Row/Column. As we filter through the data using slicers, the pivot chart makes it easier for us to observe trends. For instance, the tax rates for the United Kingdom and France increased around the middle of 2006; the tax rate for Germany increased around the middle of 2007.

  5. We can further modify the data for our analysis by changing the data type from Sum to Count, Min, Max, Average, and DistinctCount as shown in the following screenshot:

  6. Each will give a unique result: total order dollar amount, total number of orders, cheapest order, most expensive order, average of order, and number of different amounts of order, all on a monthly basis. Modify the fields as you will and try to comprehend the results.

  7. Save the file.

We have practiced creating various custom columns for our analysis. These custom columns are very useful as they allow us to create and store all the relevant data in one table. Let us continue such analysis to a greater extent in the following recipe in a new sheet.

The following are the steps to add a new custom column using the DeliveryTime03 sheet in the CustomColumns_v1.xlsx file, which results in the creation of the DeliveryTime04 sheet:

  1. Copy over the sheet DeliveryTime03 and rename as DeliveryTime04.

  2. To further our analysis, we want to look at which product sold well over the years. Looking at the PowerPivot tables, we know that the detail of each order is stored in the SalesOrderDetail table, with ProductID, UnitPrice, LineTotal, and so on. Based on our experience so far, it is logical that ProductID links the SalesOrderDetail table to another Product table that we don't have right now; let us import that table as well so that we know which product ID corresponds to which product.

  3. Follow the same steps for importing data from the database. Looking through the tables, we find that the table Product contains both the ProductID and Name columns, which states what the product is for each ProductID. Since we are only interested in the names of the products, in Preview & Filter, deselect all columns and select only the ProductID and Name columns for import, as shown in the following screenshot:

  4. Return to the pivot table/chart. Click on the pivot chart and then click on Switch Row/Column Data to have the year and months as the rows, and SubTotal, TaxAmt, and Freight as the columns. Then update the PowerPivot Field List dashboard as follows:

    • In the Values field, delete all (SubTotal, TaxAmt, and Freight), then add LineTotal and OrderQty from the SalesOrderDetail table

    • Add Product | Name to Slicers Horizontal

    • Click on the Create relationships button as requested by PowerPivot

    • Change the formatting of cells to comma style

  5. In the pivot table, we now have the number of orders and dollar amount of those orders on a monthly basis, filterable by products, year, month, season, country, online order flag, and shipping days. This data is relatively easy to comprehend based on the pivot table. For instance, we can easily see that the product All-purpose Bike Stand was first ordered in July 2007 and has been selling at a rate of roughly 20/month. If we analyze even further, we may be able to find which products are popular for each region.

  6. Due to the large number of products, the pivot chart is not useful for this purpose. Similar to CountryRegionCode, if we can find some data or create another custom column to sort products by other criteria such as accessories, bike gear, and clothes, then the pivot table may be useful in helping us comprehend the data quicker.

  7. Save and close the file.

How it works...

Throughout the four sheets, we have combined everything that we have learned so far to analyze various types of results using PowerPivot. With the addition of custom columns, there is no limit or boundary on what kind of business intelligence analysis we could perform.

In the first sheet (DeliveryTime01), we have put together everything we have learned in previous recipes, and learned the methods behind creating custom columns. We used the custom column generated data in the pivot table to give us the filters we want in the format we want.

The first formula we used (= [ShipDate] – [OrderDate]) simply subtracts the order date from the ship date, giving us 0 if shipped on the same day, 1 if shipped the next day, and so on.

Similarly, the second formula used (= [ShipDate] <= [DueDate]) returns TRUE if the due date is equal to or greater than the ship date (shipped before the due date), and returns FALSE if shipped after the due date.

We then rearranged the slicers and values to make a set of conclusions, which are evident simply by changing various slicer settings.

In the second sheet (DeliveryTime02), we further practiced creating custom columns with more advanced and frequently used Excel formulas for date analysis. The first two formulas (year and month) are rather self-explanatory in that they simply return the year/month of the date given. The last formula for season is slightly more complicated. Let us have a look:

=if(or([Month]<3, [Month]=12), "Winter", if([Month]<6, "Spring", if([Month]<9, "Summer", "Fall")))

This can be rearranged as follows:

=if(or([Month]<3, [Month]=12), "Winter", 
   if([Month]<6, "Spring", 
      if([Month]<9, "Summer", 
          "Fall")))

The first part checks if the month is less than 3, or is equal to 12, (that is, it checks if the month is 1, 2, or 12) and returns Winter if true, and moves on to the next statement if false. The second line performs a similar check—if the month is less than 6 (that is, if the month is equal to 3 or 4 or 5), it will return Spring and otherwise move on to the next line.

You may wonder why it does not return 1, 2, 3, 4, or 5, when the formula clearly says less than 6 (<6). This is because the second line is only used when the first line is false, and the first line is false if the month is not 1, 2, or 12. Thus, the second line only checks if the month is equal to 3, 4, or 5.

The third line is similar in format. It will return Summer if less than 9 (or equal to 6, 7, or 8), and return Fall otherwise (is equal to 9, 10, or 11).

We then rearranged our PowerPivot Field List dashboard further and made further analysis. We also changed which data column we were using instead of manually creating another custom column to aggregate regions by country. This is the smarter choice as there is no need to reinvent the wheel.

In the third sheet (DeliveryTime03), we expanded our analysis to include dollar amounts instead of just the number of orders. After all of our practice, this was a rather easy step that just required us to analyze which table/column is relevant, and how to combine them in the pivot table.

We created a further pivot chart and switched rows/columns to view our data more easily, as the percentage value is easier to compare than absolute values. We also learned that we can come up with a totally different set of data values by simply changing how the data is summarized.

In the last sheet (DeliveryTime04), we moved away from the data we were using, and used data columns from another table and imported a new table from the database for our purpose as well. We gave the pivot table a new slicer to filter by products.

In total, we have moved on from a simple date analysis to quantity analysis, then to revenue analysis and extended-revenue analysis. All of this was done by selecting data, using slicers, and adding custom columns appropriately. There is no limit to what kind of analysis could be performed by using these methods, in combination with a pivot table and a pivot chart, to display the data in an intelligent manner that may be easily interpreted.

There's more...

At any time, if PowerPivot does not refresh properly into Excel, restart Excel.

Refer to Excel's built-in formula helper (the Insert Function button) to type in what you need, and use the same formula in creating new custom columns.