Learn T-SQL Querying - Second Edition

By : Pedro Lopes, Pam Lahoud

Learn T-SQL Querying - Second Edition

By: Pedro Lopes, Pam Lahoud

Overview of this book

Data professionals seeking to excel in Transact-SQL (T-SQL) for Microsoft SQL Server and Azure SQL Database often lack comprehensive resources. This updated second edition of Learn T-SQL Querying focuses on indexing queries and crafting elegant T-SQL code, catering to all data professionals seeking mastery in modern SQL Server versions and Azure SQL Database. Starting with query processing fundamentals, this book lays a solid foundation for writing performant T-SQL queries. You’ll explore the mechanics of the Query Optimizer and Query Execution Plans, learning how to analyze execution plans for insights into current performance and scalability. Through dynamic management views (DMVs) and dynamic management functions (DMFs), you’ll build diagnostic queries. This book thoroughly covers indexing for T-SQL performance and provides insights into SQL Server’s built-in tools for expedited resolution of query performance and scalability issues. Further, hands-on examples will guide you through implementing features such as avoiding UDF pitfalls, understanding predicate SARGability, Query Store, and Query Tuning Assistant. By the end of this book, you‘ll have developed the ability to identify query performance bottlenecks, recognize anti-patterns, and skillfully avoid such pitfalls.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1: Query Processing Fundamentals

Free Chapter

Chapter 1: Understanding Query Processing

Technical requirements

Logical statement processing flow

Query compilation essentials

Query optimization essentials

Query execution essentials

Plan caching and reuse

How query processing impacts plan reuse

The importance of parameters

Summary

Chapter 2: Mechanics of the Query Optimizer

Technical requirements

Introducing the Cardinality Estimator

Understanding the query optimization workflow

Knobs for query optimization

Summary

Part 2: Dos and Don’ts of T-SQL

Chapter 3: Exploring Query Execution Plans

Technical requirements

What is a query plan?

Accessing a query plan

Navigating a query plan

Query plan operators of interest

Query plan properties of interest

Summary

Chapter 4: Indexing for T-SQL Performance

Technical requirements

Understanding predicate SARGability

Data access using indexes

Indexing strategy using rowstore indexes

Index maintenance

Summary

Chapter 5: Writing Elegant T-SQL Queries

Technical requirements

Best practices for T-SQL querying

The perils of SELECT *

Functions in our predicate

Deconstructing table-valued functions

Complex expressions

Optimizing OR logic

NULL means unknown

Fuzzy string matching

Inequality logic

EXECUTE versus sp_executesql

Composable logic

Summary

Chapter 6: Discovering T-SQL Anti- Patterns in Depth

Technical requirements

Implicit conversions

Avoiding unnecessary sort operations

Avoiding UDF pitfalls

Avoiding unnecessary overhead with stored procedures

Pitfalls of complex views

Pitfalls of correlated sub-queries

Properly storing intermediate results

Summary

Part 3: Assembling Our Query Troubleshooting Toolbox

Chapter 7: Building Diagnostic Queries Using DMVs and DMFs

Technical requirements

Introducing DMVs

Exploring query execution DMVs

Exploring query plan cache DMVs

Troubleshooting common scenarios with DMV queries

Investigating blocking

Cached query plan issues

Mining XML query plans

Summary

Chapter 8: Building XEvent Profiler Traces

Technical requirements

Introducing XEvents

Getting up and running with XEvent Profiler

Remote collection with SQL LogScout

Analyzing traces with RML Utilities

Summary

Chapter 9: Comparative Analysis of Query Plans

Technical requirements

Query plan analyzer

Summary

Chapter 10: Tracking Performance History with Query Store

Technical requirements

Introducing the Query Store

Tracking expensive queries

Fixing regressed queries

Features that rely on the Query Store

Summary

Chapter 11: Troubleshooting Live Queries

Technical requirements

Using Live Query Statistics

Understanding the need for lightweight profiling

Activity Monitor gets new life

Summary

Chapter 12: Managing Optimizer Changes

Technical requirements

Understanding where QTA and CE Feedback are needed

Understanding QTA fundamentals

Exploring the QTA workflow

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The importance of parameters

As we discussed in the previous section on caching methods, the primary reason to parameterize queries is to ensure that query execution plans get reused – but why is this important and what other reasons might there be to use parameters?

Security

One reason for using parameterized queries is for security. Using a properly formatted parameterized query can protect against SQL injection attacks. A SQL injection attack is one where a malicious user can execute database code (in this case, T-SQL) on a server by appending it to a data entry field in the application. As an example, assume we have an application that contains a form that asks the user to enter their name into a text box. If the application were to use an ad hoc statement to insert this data into the database, it would generally concatenate a T-SQL string with the user input, as in the following code:

DECLARE @sql nvarchar(MAX);
SET @sql = N'INSERT Users (Name) VALUES (''' + <user input> + ''');';
EXECUTE (@sql);

A malicious user might enter the following value into the text box:

Bob'); DROP TABLE Users; --

If this is the case, the actual code that gets sent to the SQL Database Engine would look like the following:

INSERT Users (Name) VALUES ('Bob'); DROP TABLE Users; --');

This is a valid T-SQL syntax that would successfully execute. It would first insert a row into the Users table with the Name column set to 'Bob', then it would drop the Users table. This would of course break the application, and unless there was some sort of auditing in place, we would never know what happened.

Let’s look at this example again using a parameterized query. The code might look like the following:

EXECUTE sp_executesql @stmt = N'INSERT Users (Name) VALUES (@name)', @params = N'@name nvarchar(100)', @name = <user input>

This time, if the user were to send the same input, rather than executing the query that the user embedded in the string, the Database Engine would insert a row into the Users table with the Name column set to 'Bob'); DROP TABLE Users; --'. This would obviously look a bit strange, but it wouldn’t break the application nor breach security.

Performance

Another reason to leverage parameters is performance. In a busy SQL system, particularly one that has a primarily Online Transaction Processing (OLTP) workload, we may have hundreds or even thousands of queries executing per second.

Assume that each one of these queries takes about 100 ms to compile and consumes about the same amount of CPU. This would mean that each second on the system, the server could be consuming hundreds of seconds of CPU time just compiling queries. That’s a lot of resources to consume just for preparing the queries for execution, and it doesn’t leave a lot of overhead for actually executing them.

Also recall that when plans are not reused, the procedure cache can become very large and consume memory that in turn won’t be available for storing data and executing queries. In short, a system that spends too much time compiling queries may become CPU and/or memory bound and may perform poorly.

Parameter sniffing

Given that query plan reuse is so important, why wouldn’t the SQL Database Engine parameterize every query that comes in by default? One of the reasons for this is to avoid query performance issues that may result from parameter sniffing. Parameter sniffing is something the SQL Database Engine does in order to optimize a parameterized query. The first time a stored procedure or other parameterized query executes, the input parameter values are used to drive the optimization process and produce the execution plan, as discussed in the Query optimization essentials section.

That execution plan will then be cached and reused by subsequent executions of the procedure or query. For most queries, this is a good thing because using a specific value will result in a more accurate cost estimation. In some situations, however, particularly where the data distribution is skewed in some way, the parameters that are sent the first time the query is executed may not represent the typical use case of the query, and the plan that is generated may perform poorly when other parameter values are sent. This is a case where reusing a plan might not be a good thing, because the plan is highly sensitive to user-defined runtime parameters that have widely different data distributions for the same column.

Parameter sniffing, or parameter sensitivity, is a very common cause of plan variability and performance issues in the SQL Database Engine.

Parameter Sensitive Plan Optimization

SQL Server 2022 introduces the Parameter Sensitive Plan Optimization feature (commonly referred to as PSP Optimization), which allows the Database Engine to simultaneously cache multiple plans for a single parameterized query that uses equality predicates.

With PSP Optimization, during the initial compilation of a parameterized query, the Query Optimizer will evaluate up to three parameters that are likely sensitive to non-uniform (skewed) data distributions. The feature uses the statistics histograms to search for where the cardinality difference between the least-occurring value and the most-occurring value for a given column is orders of magnitude off. The result is the creation of what is called a dispatcher plan, which contains the logic (dispatcher expression) that bucketizes the predicates’ values, upon which different plan variants can be compiled independently.

For each cardinality bucket, a query plan variant will only be compiled if needed, based on actual runtime parameters. If the parameter values that would result in a given plan variant are never used at runtime, then that variant of the plan defined in the dispatcher plan will never actually get compiled. This behavior prevents plan-cache bloating by compiling a plan only if and when its predicate value demands it.

The following diagram shows the possible plan variants found for a parameterized query with a WHERE person.ID = @param search predicate:

Figure 1.11: Example of a dispatcher plan defining three query plan variants

We will discuss parameter sensitivity behavior in more detail later in this book, in Chapter 5, Writing Elegant T-SQL Queries, and Chapter 6, Discovering T-SQL Anti-Patterns in Depth.

To cache or not to cache

In general, caching and reusing query plans is a good thing, and writing T-SQL code that encourages plan reuse is recommended.

In some cases, such as with a reporting or OLAP workload, caching queries might make less sense. These types of systems tend to have a heavy ad hoc workload. The queries that run are typically long-running and, while they may consume a large amount of resources in a single execution, they typically run with less frequency than OLTP systems. Since these queries tend to be long-running, saving a few hundred milliseconds by reusing a cached plan doesn’t make as much sense as creating a new plan that is designed specifically for that execution of the query. Spending that time compiling a new plan may even result in saving more time in the long run, since a fresh plan will likely perform better than a plan that was generated based on a different set of parameter values.

In summary, for most workloads in the SQL Database Engine, leveraging stored procedures and/or parameterized queries is recommended to encourage plan reuse. For workloads that have heavy ad hoc queries and/or long-running reporting-style queries, consider enabling the optimize for ad hoc workloads server setting and leveraging the RECOMPILE hint to guarantee a new plan for each execution (provided that the queries are run with a low frequency), or use forced parameterization to improve plan reuse opportunities. Also, be sure to review Chapter 8, Building Diagnostic Queries Using DMVs and DMFs, for techniques to identify single-use plans, monitor for excessive recompilation, and identify plan variability and potential parameter sniffing issues.

Learn T-SQL Querying - Second Edition

By : Pedro Lopes, Pam Lahoud

Learn T-SQL Querying - Second Edition

By: Pedro Lopes, Pam Lahoud

Overview of this book

Related Content you might be interested in

Current Title:

Learn T-SQL Querying - Second Edition

SQL Server Query Tuning and Optimization

SQL Query Design Patterns and Best Practices

SQL Server 2019 Administrator's Guide

The importance of parameters

Security

Performance

Parameter sniffing

Parameter Sensitive Plan Optimization

To cache or not to cache