Book Image

Instant Creating Data Models with PowerPivot How-to

By : Taehyung Lee
Book Image

Instant Creating Data Models with PowerPivot How-to

By: Taehyung Lee

Overview of this book

Microsoft PowerPivot is a free software designed to enhance Microsoft Excel. It allows the user to make extensive use of his/her computer's powers to draw data from various sources, do analysis across millions of rows of data, and present results in a very unique format. Instant Creating Data Models with PowerPivot How-to is a concise and to-the-point guide that helps you get a jump start on using this strong business intelligence tool, while still working in an environment similar to Excel. You will begin with data generation and manipulation, learning a new feature at each step by building onto the old file, and ultimately creating a comprehensive file. Instant Creating Data Models with PowerPivot How-to will guide the user through database installation, importing data from various sources, creating pivot charts and tables, utilizing a unique feature of PowerPivot called slicers,adding custom columns, and setting custom relationships between data to generate the ultimate customized dataset for analysis. By the end of the book and all the sections of Microsoft PowerPivot for Excel, the reader will be fully experienced and ready to utilize this powerful software.
Table of Contents (7 chapters)

Publishing as Excel (Simple)


One may wonder why there is a recipe on publishing as Excel as the file is already in Excel 2010 format. That is because in typical workplaces without SharePoint server, there are limits to e-mail attachments and inboxes, and sharing large files that we have conducted our analysis on becomes much more difficult. In this recipe, we will practice how to identify irrelevant data and how to remove them from our file for efficiency.

Getting ready

Copy over the file CustomColumns_v1.xlsx and renamed the copied file as AdventureWorks Order Quantity and Revenue Analysis_v1.xlsx. The former file should be about 11.2 MB in size.

How to do it...

  1. First, since all our data is stored in PowerPivot and not in other sheets, we can simply remove all sheets except the final sheet Result. This alone should save about 0.1 MB.

  2. Then, going through all the data tables we have as shown under the PowerPivot Field List dashboard, we notice many of the data tables are not used at all. For instance, the table CreditCard contains none of the information we are using nor contains any significant relationships that we should not remove in order to maintain our data's sanity. Then, logically, let's delete the table.

  3. Open up PowerPivot. We notice sheets such as CreditCard contains about 19,000 rows. Let's delete the whole table by right-clicking on the sheet name and selecting Delete. Luckily for us, the data used in our analysis came from few major tables only so we probably will not make a mistake of removing a crucial table by accident.

  4. Delete all data tables (sheets) except for SalesOrderHeader, SalesOrderDetail, SalesReason, SalesTerritory, and Product. If you are unsure at any point whether the table is used, the simplest way to check is to go back to the pivot table/chart and see which fields are being used (checked) under the PowerPivot Field List dashboard. The file should be about 7.0 MB after deletion.

  5. Even though we managed to delete the unused data tables, the file is still very large for such a basic analysis. If we were to perform a more extensive analysis, no doubt the file will be tens of megabytes. Thus, we will optimize our file further by removing unused columns as well, just like how we had filtered out the date log column when we first imported the data.

  6. In the table SalesOrderHeader, remove the following columns: Revision, Status, SalesOrderNumber, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, BilltoAddressID, ShipToAddressID, CreditCardID, CreditCardApprovalCode, and CurrencyRateID. This saves an additional 2.6 MB, for a file size of 4.4 MB.

  7. In the table SalesOrderDetail, remove the following columns: SalesOrderDetailID and CarrierTrackingNumber. This saves an additional 1.8 MB, for a file size of 2.6 MB.

  8. This is good. We have managed to still maintain all critical data used in our analysis and have shrunk the file size to about 25 percent of the original by simply removing unused tables and columns. 2.5 MB is small enough to be sent over e-mail easily! Save as Excel and close. It will be readable by anyone with access to Excel 2010.

How it works...

Similar to filtering out tables and columns during the data export, we have simply removed unnecessary columns and kept the critical columns as well as a few more columns that may be useful for further analysis.

It is important to note that deleting a column from a table with 100,000 rows is more effective in reducing the file size than deleting 5 columns from a table with 18,000 rows. Such was the case in our file where we deleted 12 columns in SalesOrderHeader, which has 31,465 rows (a total of roughly 360,000 cells deleted), while simply deleting two columns from SalesOrderDetail, which has 121,317 rows (a total of roughly 240,000 rows deleted) reduced the file size in a similar manner.

There's more...

If PowerPivot contains a data table that is not used now but may be used in the future, then we can simply hide the sheet by right-clicking on the sheet and selecting Hide from Client Tools instead of deleting the sheet and re-importing it later on.

Version compatibility issue

With the new Excel 2013, there is now a version compatibility issue between Excel files created with different PowerPivot versions. This further complicates things, especially if the business uses SharePoint servers, as they require the same PowerPivot version installed and configured or otherwise they will encounter errors.

In short, like all programs with different versions, Excel files created with the older version of PowerPivot will be accessible in the newer version, but the Excel file created with newer versions of PowerPivot will not be accessible with an older version of PowerPivot.

For more details, please refer to resources such as: