Book Image

Instant Creating Data Models with PowerPivot How-to

By : Taehyung Lee
Book Image

Instant Creating Data Models with PowerPivot How-to

By: Taehyung Lee

Overview of this book

Microsoft PowerPivot is a free software designed to enhance Microsoft Excel. It allows the user to make extensive use of his/her computer's powers to draw data from various sources, do analysis across millions of rows of data, and present results in a very unique format. Instant Creating Data Models with PowerPivot How-to is a concise and to-the-point guide that helps you get a jump start on using this strong business intelligence tool, while still working in an environment similar to Excel. You will begin with data generation and manipulation, learning a new feature at each step by building onto the old file, and ultimately creating a comprehensive file. Instant Creating Data Models with PowerPivot How-to will guide the user through database installation, importing data from various sources, creating pivot charts and tables, utilizing a unique feature of PowerPivot called slicers,adding custom columns, and setting custom relationships between data to generate the ultimate customized dataset for analysis. By the end of the book and all the sections of Microsoft PowerPivot for Excel, the reader will be fully experienced and ready to utilize this powerful software.
Table of Contents (7 chapters)

Filtering the data to be imported (Intermediate)

PowerPivot stores all the data that it imported from other sources. Despite the fact that PowerPivot is able to handle millions of rows of data (based on memory availability), it is always more efficient and effective to only import the relevant data. We will re-import data from the database and from Excel, and filter the data in the process.

Getting ready

In order to filter data, it is necessary for us to understand which data is relevant and which is not. The term "more the merrier" does not apply in this case, as we will only use these filtered data that are relevant throughout this book.

How to do it...

Reproduce the steps for importing data from the database as follows:

  1. In Excel 2010, open up the PowerPivot window. Select From Database | From SQL Server. Select the details as shown in the following screenshot. If the server name is not available, then type in PC-NAME\SQLEXPRESS.

  2. However, this time, click the Preview & Filter button as highlighted in the following screenshot for each of the 13 tables.

  3. It will display a preview of every table that we are importing—and we haven't explored what each column means. However, one easy thing is to remove logs and other items that may be necessary for the database, but not for us. This may include logs, last modified date, last user modified, and so on. For the first table Address, deselect the last two columns rowguid and ModifiedDate as shown in the following screenshot:

  4. After deselecting, click on OK and we will see that new comment, Applied filters, will appear for the table under the column Filter Details, as shown in the screenshot in step 2. Repeat this procedure for every table, deselecting the data that we are sure will not be useful. Save the final datafile as DatabaseData_v2.xlsx. The file should be about 5 MB, around 11 to 12 MB smaller than the original one.

  5. Similarly, repeat the step for the GDP datafile by deselecting the columns from 1970 to 1999. Import and then save the final data as GDPData_v2.xlsx. This filtered data should be about 375 KB, less than half of the size of the original file that would be 848 KB.

How it works...

Before, even though we only selected 13 tables out of the whole database, each table still contained unnecessary items such as logs. Depending on how large the database is, the amount of logs may be very large and space-consuming, and by simply filtering out those logs and/or other irrelevant data, we are able to work more efficiently and effectively with the correct sets of data. PowerPivot allows this by enabling the user to filter the data during the importation process by unchecking the unnecessary columns, so that only the necessary columns that were checked are imported, as we have previously practiced.

By simply comparing the file size, it is evident that filtering data becomes more necessary as the amount of data for analysis becomes larger. The data that we imported from the database is much less than hundreds of thousands of rows, but we already managed to save about 35 percent of the file size (and the corresponding calculation time) by unchecking non-significant columns during the import stage.

There's more...

As we become more familiar and proficient with data, we can even become more selective, selecting only the most relevant tables with the necessary columns. However, it requires a complete understanding of the database structure.