Book Image

Practical Business Intelligence

Book Image

Practical Business Intelligence

Overview of this book

Business Intelligence (BI) is at the crux of revolutionizing enterprise. Everyone wants to minimize losses and maximize profits. Thanks to Big Data and improved methodologies to analyze data, Data Analysts and Data Scientists are increasingly using data to make informed decisions. Just knowing how to analyze data is not enough, you need to start thinking how to use data as a business asset and then perform the right analysis to build an insightful BI solution. Efficient BI strives to achieve the automation of data for ease of reporting and analysis. Through this book, you will develop the ability to think along the right lines and use more than one tool to perform analysis depending on the needs of your business. We start off by preparing you for data analytics. We then move on to teach you a range of techniques to fetch important information from various databases, which can be used to optimize your business. The book aims to provide a full end-to-end solution for an environment setup that can help you make informed business decisions and deliver efficient and automated BI solutions to any company. It is a complete guide for implementing Business intelligence with the help of the most powerful tools like D3.js, R, Tableau, Qlikview and Python that are available on the market.
Table of Contents (16 chapters)
Practical Business Intelligence
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Understanding business intelligence architecture


As this book progresses, I will continue to emphasize the benefits that arise when the bulk of logic used to produce datasets for visualization is pushed to the database level as opposed to the reporting level. There will always be exceptions, where there will be a need to produce some logic at the reporting level, and I will highlight those scenarios as they arise. Ultimately, the database is more adept at handling complex logic and will also be closer to the source of information, so this will make data quality and data validation more straightforward. If we minimize the number of locations where the logic that feeds a particular BI tool is applied, we can retain our single source of truth for our users. For example, if the requirement for a BI dashboard is to show the current and previous years' sales for US regions only, the filter for region code would be ideally applied in the data warehouse as opposed to within the BI tool. The following is a query written in SQL joining two tables from the AdventureWorks database; it highlights the difference between dimensions and measures. The region column is a dimension column and SalesYTD and SalesPY are measure columns:

Select  
region.Name as Region, round(sum(sales.SalesYTD),2) as SalesYTD, round(sum(sales.SalesLastYear),2) as SalesPY 
FROM [AdventureWorks2014].[Sales].[SalesTerritory] region 
left outer join [AdventureWorks2014].[Sales].[SalesPerson] sales on 
sales.TerritoryID = region.TerritoryID 
where region.CountryRegionCode = 'US' 
Group by region.Name 
order by region.Name asc 

In this example, TerritoryID is serving as the key join between SalesTerritory and 'SalesPerson'. Since the measures are coming from the SalesPerson table, that table will serve as the fact table and SalesPerson.TerritoryID will serve as the foreign key. Since the Region column is dimensional and coming from the SalesTerritory table, that table will serve as the dimensional or lookup table and SalesTerritory.TerritoryID will serve as the dimension ID. In a finely tuned data warehouse, both the fact ID and the dimension ID would be indexed to allow efficient query performance. The output of the SQL statement can be seen in the following screenshot:

This performance is obtained by sorting IDs numerically so that a row from one table that is being joined to another table does not have to be searched through the entire table but only a subset of that table. When the table is only a few hundred rows, it may not seem necessary to index columns, but when the table grows to a few hundred million rows, it may become necessary.

Why is it then that quite often the logic is not applied at the database level but instead at the reporting level on a Tableau dashboard or a Qlik application? Frequently a user of the dashboard will get a request to filter out parts of the results. The user will go to the dashboard developer and put in the request. Sometimes this request goes through an arduous ticketing process with IT that could take weeks or even months. So rather than rely on IT to make the change, the dashboard developer will apply the filter logic at the reporting level instead. If these filters are being performed to correct data quality issues, then applying the logic at the reporting level just masks a more serious issue that needs to be addressed across the entire data warehouse. You will be performing a disservice in the long run as you will be establishing a precedent for data quality being handled by the report developer as opposed to the database administrator. This can ultimately take an organization down a slippery slope towards multiple sources of truth.

Ideal BI tools will quickly connect to the data source and then allow for slicing and dicing of your dimensions and measures in a manner that will quickly inform the business of useful and practical information. Ultimately, the choice of a BI tool by an individual or an organization will come down to the ease of use of the tool as well as the flexibility to showcase data through various components such as graphs, charts, widgets, and infographics.