Book Image

Practical Business Intelligence

Book Image

Practical Business Intelligence

Overview of this book

Business Intelligence (BI) is at the crux of revolutionizing enterprise. Everyone wants to minimize losses and maximize profits. Thanks to Big Data and improved methodologies to analyze data, Data Analysts and Data Scientists are increasingly using data to make informed decisions. Just knowing how to analyze data is not enough, you need to start thinking how to use data as a business asset and then perform the right analysis to build an insightful BI solution. Efficient BI strives to achieve the automation of data for ease of reporting and analysis. Through this book, you will develop the ability to think along the right lines and use more than one tool to perform analysis depending on the needs of your business. We start off by preparing you for data analytics. We then move on to teach you a range of techniques to fetch important information from various databases, which can be used to optimize your business. The book aims to provide a full end-to-end solution for an environment setup that can help you make informed business decisions and deliver efficient and automated BI solutions to any company. It is a complete guide for implementing Business intelligence with the help of the most powerful tools like D3.js, R, Tableau, Qlikview and Python that are available on the market.
Table of Contents (16 chapters)
Practical Business Intelligence
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Understanding the Kimball method


As we discuss the data warehouse where our data is being housed, we will be remised not to bring up Ralph Kimball, one of the original architects of the data warehouse. Kimball's methodology incorporated dimensional modeling, which has become the standard for modeling a data warehouse for business intelligence purposes. Dimensional modeling incorporates joining tables that have detail data and tables that have lookup data.

A detail table is known as a fact table in dimensional modeling. An example of a fact table would be a table holding thousands of rows of transactional sales from a retail store. The table will house several IDs affiliated with the product, the sales person, the purchase date, and the purchaser, just to name a few. Additionally, the fact table will store numeric data for each individual transaction, such as the sales quantity for sales amount. These numeric values are generally referred to as measures.

While there is usually one fact table, there will also be several lookup or dimensional tables that will have one table for each ID that is used in a fact table. So, for example, there would be one dimensional table for the product name affiliated with a product ID. There would be one dimensional table for the month, week, day, and year of the ID affiliated with the date. These dimensional tables are also referred to as lookup tables, because they basically look up what the name of a dimension ID is affiliated with. Usually, you would find as many dimensional tables as there are IDs in the fact table. The dimensional tables will all be joined to one fact table creating something of a "star" look. Hence, the name for this table arrangement is star schema, as seen in the following screenshot:

It is likely that the fact table will be the largest table in a data warehouse, while the lookup tables will most likely have fewer rows, some just one row. The tables are joined by keys, also known as foreign keys and primary keys. Foreign keys are referenced in fact tables to a unique identifier in a separate lookup table as primary keys. Foreign keys allow the most efficient join between a fact table and a dimensional table as they are usually a numeric data type. The purpose of a foreign key is to locate a single row in a lookup table to join to and establish a relationship. This rule is referred to as the referential integrity constraint and it exists to ensure that a key in a detail or fact table has a unique description to a lookup or dimensional table. As more and more rows are added to a lookup table, that new dimension is just given the next number of the identifier in line, usually starting with something like 1. Query performance between table joins suffers when we introduce non-numeric characters into the join, or worse, symbols (although most databases will not allow that).