PowerPivot stores all the data that it imported from other sources. Despite the fact that PowerPivot is able to handle millions of rows of data (based on memory availability), it is always more efficient and effective to only import the relevant data. We will re-import data from the database and from Excel, and filter the data in the process.
In order to filter data, it is necessary for us to understand which data is relevant and which is not. The term "more the merrier" does not apply in this case, as we will only use these filtered data that are relevant throughout this book.
Reproduce the steps for importing data from the database as follows:
In Excel 2010, open up the PowerPivot window. Select From Database | From SQL Server. Select the details as shown in the following screenshot. If the server name is not available, then type in
PC-NAME\SQLEXPRESS
.However, this time, click the Preview & Filter button as highlighted in the following screenshot for each of the 13 tables.
It will display a preview of every table that we are importing—and we haven't explored what each column means. However, one easy thing is to remove logs and other items that may be necessary for the database, but not for us. This may include logs, last modified date, last user modified, and so on. For the first table Address, deselect the last two columns rowguid and ModifiedDate as shown in the following screenshot:
After deselecting, click on OK and we will see that new comment, Applied filters, will appear for the table under the column Filter Details, as shown in the screenshot in step 2. Repeat this procedure for every table, deselecting the data that we are sure will not be useful. Save the final datafile as
DatabaseData_v2.xlsx
. The file should be about 5 MB, around 11 to 12 MB smaller than the original one.Similarly, repeat the step for the GDP datafile by deselecting the columns from 1970 to 1999. Import and then save the final data as
GDPData_v2.xlsx
. This filtered data should be about 375 KB, less than half of the size of the original file that would be 848 KB.
Before, even though we only selected 13 tables out of the whole database, each table still contained unnecessary items such as logs. Depending on how large the database is, the amount of logs may be very large and space-consuming, and by simply filtering out those logs and/or other irrelevant data, we are able to work more efficiently and effectively with the correct sets of data. PowerPivot allows this by enabling the user to filter the data during the importation process by unchecking the unnecessary columns, so that only the necessary columns that were checked are imported, as we have previously practiced.
By simply comparing the file size, it is evident that filtering data becomes more necessary as the amount of data for analysis becomes larger. The data that we imported from the database is much less than hundreds of thousands of rows, but we already managed to save about 35 percent of the file size (and the corresponding calculation time) by unchecking non-significant columns during the import stage.