Book Image

Learn T-SQL Querying - Second Edition

By : Pedro Lopes, Pam Lahoud
Book Image

Learn T-SQL Querying - Second Edition

By: Pedro Lopes, Pam Lahoud

Overview of this book

Data professionals seeking to excel in Transact-SQL (T-SQL) for Microsoft SQL Server and Azure SQL Database often lack comprehensive resources. This updated second edition of Learn T-SQL Querying focuses on indexing queries and crafting elegant T-SQL code, catering to all data professionals seeking mastery in modern SQL Server versions and Azure SQL Database. Starting with query processing fundamentals, this book lays a solid foundation for writing performant T-SQL queries. You’ll explore the mechanics of the Query Optimizer and Query Execution Plans, learning how to analyze execution plans for insights into current performance and scalability. Through dynamic management views (DMVs) and dynamic management functions (DMFs), you’ll build diagnostic queries. This book thoroughly covers indexing for T-SQL performance and provides insights into SQL Server’s built-in tools for expedited resolution of query performance and scalability issues. Further, hands-on examples will guide you through implementing features such as avoiding UDF pitfalls, understanding predicate SARGability, Query Store, and Query Tuning Assistant. By the end of this book, you‘ll have developed the ability to identify query performance bottlenecks, recognize anti-patterns, and skillfully avoid such pitfalls.
Table of Contents (18 chapters)
1
Part 1: Query Processing Fundamentals
4
Part 2: Dos and Don’ts of T-SQL
9
Part 3: Assembling Our Query Troubleshooting Toolbox

Logical statement processing flow

When writing T-SQL, it is important to be familiar with the order in which the SQL Database Engine interprets queries, to later create an execution plan. This helps anticipate possible performance issues arising from poorly written queries, as well as helping you understand cases of unintended results. The following steps outline a summarized view of the method that the Database Engine follows to process a T-SQL statement:

  1. Process all the source and target objects stated in the FROM clause (tables, views, and TVFs), together with the intended logical operation (JOIN and APPLY) to perform on those objects.
  2. Apply whatever pre-filters are defined in the WHERE clause to reduce the number of incoming rows from those objects.
  3. Apply any aggregation defined in the GROUP BY or aggregate functions (for example, a MIN or MAX function).
  4. Apply filters that can only be applied on the aggregations as defined in the HAVING clause.
  5. Compute the logic for windowing functions such as ROW_NUMBER, RANK, NTILE, LAG, and LEAD.
  6. Keep only the required columns for the output as specified in the SELECT clause, and if a UNION clause is present, combine the row sets.
  7. Remove duplicates from the row set if a DISTINCT clause exists.
  8. Order the resulting row set as specified by the ORDER BY clause.
  9. Account for any limits stated in the TOP clause.

It becomes clearer now that properly defining how tables are joined (the logical join type) is important to any scalable T-SQL query, namely by carefully planning on which columns the tables are joined. For example, in an inner join, these join arguments are the first level of data filtering that can be enforced, because only the rows that represent the intersection of two tables are eligible for subsequent operations.

Then it also makes sense to filter out rows from the result set using a WHERE clause, rather than applying any post-filtering conditions that apply to sub-groupings using a HAVING clause. Consider these two example queries:

SELECT p.ProductNumber, AVG(sod.UnitPrice)
FROM Production.Product AS p
INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
GROUP BY p.ProductNumber
HAVING p.ProductNumber LIKE 'L%';
SELECT p.ProductNumber, AVG(sod.UnitPrice)
FROM Production.Product AS p
INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
WHERE p.ProductNumber LIKE 'L%'
GROUP BY p.ProductNumber;

While these two queries are logically equivalent, the second one is more efficient because the rows that do not have a ProductNumber starting with L will be filtered out of the results before the aggregation is calculated. This is because the SQL Database Engine evaluates a WHERE clause before a HAVING clause and can limit the row count earlier in the execution phase, translating into reduced I/O and memory requirements, and also reduced CPU usage when applying the post-filter to the group.

The following diagram summarizes the logical statement-processing flow for the building blocks discussed previously in this chapter:

Figure 1.1: Flow chart summarizing the logical statement-processing flow of a query

Figure 1.1: Flow chart summarizing the logical statement-processing flow of a query

Now that we understand the order in which the SQL Database Engine processes queries, let’s explore the essentials of query compilation.