Book Image

SQL Query Design Patterns and Best Practices

By : Steve Hughes, Dennis Neer, Dr. Ram Babu Singh, Shabbir H. Mala, Leslie Andrews, Chi Zhang
5 (1)
Book Image

SQL Query Design Patterns and Best Practices

5 (1)
By: Steve Hughes, Dennis Neer, Dr. Ram Babu Singh, Shabbir H. Mala, Leslie Andrews, Chi Zhang

Overview of this book

SQL has been the de facto standard when interacting with databases for decades and shows no signs of going away. Through the years, report developers or data wranglers have had to learn SQL on the fly to meet the business needs, so if you are someone who needs to write queries, SQL Query Design and Pattern Best Practices is for you. This book will guide you through making efficient SQL queries by reducing set sizes for effective results. You’ll learn how to format your results to make them easier to consume at their destination. From there, the book will take you through solving complex business problems using more advanced techniques, such as common table expressions and window functions, and advance to uncovering issues resulting from security in the underlying dataset. Armed with this knowledge, you’ll have a foundation for building queries and be ready to shift focus to using tools, such as query plans and indexes, to optimize those queries. The book will go over the modern data estate, which includes data lakes and JSON data, and wrap up with a brief on how to use Jupyter notebooks in your SQL journey. By the end of this SQL book, you’ll be able to make efficient SQL queries that will improve your report writing and the overall SQL experience.
Table of Contents (21 chapters)
1
Part 1: Refining Your Queries to Get the Results You Need
6
Part 2: Solving Complex Business and Data Problems in Your Queries
11
Part 3: Optimizing Your Queries to Improve Performance
14
Part 4: Working with Your Data on the Modern Data Platform

Exploring the impact of row and column reductions on aggregations

Now you know how to reduce the number of rows and specify the columns that you need in your result set, let’s talk about what the impact will be on any aggregations that you may be interested in.

First of all, based on this result set, you can view the number of backorders by any combination of columns. For example, to see the number of backorders based on year and month, you could use the following query:

SELECT Year([Order Date Key]) as "Order Year",
       Month([Order Date Key]) as "Order Month",
  COUNT([WWI Backorder ID]) as "Number of backorders",
  COUNT(distinct [Customer Key]) as "Impacted Customers",
  COUNT([Order Key]) as "Number of orders"
FROM [WideWorldImportersDW].[Fact].[Order]
WHERE [WWI Backorder ID] IS NOT NULL
GROUP BY Year([Order Date Key]),
         Month([Order Date Key])
ORDER BY Year([Order Date Key]),
         Month([Order Date Key]);

You could also run the following query using the view that you created, and you will get the same results:

SELECT [Order Year],
       [Order Month],
   COUNT([WWI Backorder]) as "number of backorders",
   COUNT([customer]) as "number of impacted customers",
   COUNT([Order]) as "number of orders"
FROM v_backorders
GROUP BY [Order Year],
     [Order Month];

Figure 1.6 shows a subset of the results from the query:

Figure 1.6 – Subset of results from the earlier queries

Figure 1.6 – Subset of results from the earlier queries

On closer investigation, you will notice that the values in the number of backorders and number of orders columns are the same. Why is this? The answer is in the filter; it only returns records that have an associated backorder, so you will not get the actual number of orders that have been placed, and any calculations will not be accurate. So, to get an accurate number of orders, you will need to get all the orders in the table, not just the orders associated with a backorder. You will also see that the order of the result set is different for the query that uses the view and the query that does not use the view. This is due to the query that uses the view being stored in a memory heap from the view and accessed on demand, whereas the query that does not use the view is stored on the disk in the primary key order of the table.

This is the impact of using a filter; to get around this, you can add a subquery to remove the filter. So here is how you can update your query to get that additional information:

SELECT Year([Order Date Key]) as [Order Year],
       Month([Order Date Key]) as [Order Month],
   COUNT(distinct [Customer Key]) as [Impacted Customers],
   COUNT(distinct [Stock Item Key]) as [Backorder Items],
   COUNT([WWI Backorder ID]) as [Number of backorders],
   fo.orders,
       fo.Customers
FROM [WideWorldImportersDW].[Fact].[Order] o,
     (select  Year([Order Date Key]) as [Order Year],
              Month([Order Date Key]) as [Order Month],
              COUNT (distinct [Order Key]) as [orders],
              COUNT (distinct [customer Key]) as [Customers]
      FROM [WideWorldImportersDW].[Fact].[Order]
  GROUP BY Year([Order Date Key]),
               Month([Order Date Key])) as fo
WHERE [WWI Backorder ID] IS NOT NULL
AND year(o.[Order Date Key]) = fo.[Order Year]
AND month(o.[Order Date Key]) = fo.[Order Month]
GROUP BY Year([Order Date Key]),
         Month([Order Date Key]),
    fo.orders,
    fo.Customers
ORDER BY Year([Order Date Key]),
         Month([Order Date Key]);

This will give you the following result set, as seen in Figure 1.7. Notice you now see values in the orders, Impacted Customers, Number of backorders, and Customers columns for each Order Month:

Figure 1.7 – Results of using filters in a query

Figure 1.7 – Results of using filters in a query

This can be done for any number of columns as long as the subquery is grouped by the same fields as the main query. To simplify this query, you could very easily create a view of all the orders and then use the views to get the same results with less query development.

Here is a sample of the query to create the order view that you can use in future queries:

CREATE VIEW [dbo].[v_orders] AS
SELECT Year([Order Date Key]) as [Order Year],
      MONTH([Order Date Key]) as [Order Month],
      COUNT(distinct [Order Key]) as [orders],
      COUNT(distinct [customer Key]) as [Customers]
FROM [WideWorldImportersDW].[Fact].[Order]
GROUP BY Year([Order Date Key]),
         Month([Order Date Key])

Here is a sample of the query that uses the two views (v_Backorders and v_orders) that have been created in this chapter:

  SELECT o.[Order Year],
         o.[Order Month],
         o.Customers,
         o.orders,
     COUNT(b.[WWI Backorder]) as [total backorders],
     COUNT(distinct b.[customer]) as [impacted customers]
  FROM [WideWorldImportersDW].[dbo].[v_Backorders] b,
  [WideWorldImportersDW].[dbo].[v_orders] o
  WHERE b.[Order Year] = o.[Order Year]
  AND b.[Order Month] = o.[Order Month]
  GROUP BY o.[Order Year],
           o.[Order Month],
  o.Customers,
  o.orders
  ORDER BY o.[Order Year],
           o.[Order Month];

The following is the result of the query:

   Figure 1.8 – Sample results of using multiple views in a query

Figure 1.8 – Sample results of using multiple views in a query

So as you can see, it is easy to reduce the amount of data that you bring into your result, but sometimes, you will need to include other data that may have been filtered out. This is where views come in handy and allow you to include data that may have been filtered in previous activities. You will need to use caution when deciding between using a view over a subquery due to the performance implications. In this example, the data size is small enough that the performance implications are negligible.

As you can see from this section, there are multiple ways in which you will be able to create result sets and get the same results.