Learn T-SQL Querying - Second Edition

By : Pedro Lopes, Pam Lahoud

Learn T-SQL Querying - Second Edition

By: Pedro Lopes, Pam Lahoud

Overview of this book

Data professionals seeking to excel in Transact-SQL (T-SQL) for Microsoft SQL Server and Azure SQL Database often lack comprehensive resources. This updated second edition of Learn T-SQL Querying focuses on indexing queries and crafting elegant T-SQL code, catering to all data professionals seeking mastery in modern SQL Server versions and Azure SQL Database. Starting with query processing fundamentals, this book lays a solid foundation for writing performant T-SQL queries. You’ll explore the mechanics of the Query Optimizer and Query Execution Plans, learning how to analyze execution plans for insights into current performance and scalability. Through dynamic management views (DMVs) and dynamic management functions (DMFs), you’ll build diagnostic queries. This book thoroughly covers indexing for T-SQL performance and provides insights into SQL Server’s built-in tools for expedited resolution of query performance and scalability issues. Further, hands-on examples will guide you through implementing features such as avoiding UDF pitfalls, understanding predicate SARGability, Query Store, and Query Tuning Assistant. By the end of this book, you‘ll have developed the ability to identify query performance bottlenecks, recognize anti-patterns, and skillfully avoid such pitfalls.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1: Query Processing Fundamentals

Free Chapter

Chapter 1: Understanding Query Processing

Technical requirements

Logical statement processing flow

Query compilation essentials

Query optimization essentials

Query execution essentials

Plan caching and reuse

How query processing impacts plan reuse

The importance of parameters

Summary

Chapter 2: Mechanics of the Query Optimizer

Technical requirements

Introducing the Cardinality Estimator

Understanding the query optimization workflow

Knobs for query optimization

Summary

Part 2: Dos and Don’ts of T-SQL

Chapter 3: Exploring Query Execution Plans

Technical requirements

What is a query plan?

Accessing a query plan

Navigating a query plan

Query plan operators of interest

Query plan properties of interest

Summary

Chapter 4: Indexing for T-SQL Performance

Technical requirements

Understanding predicate SARGability

Data access using indexes

Indexing strategy using rowstore indexes

Index maintenance

Summary

Chapter 5: Writing Elegant T-SQL Queries

Technical requirements

Best practices for T-SQL querying

The perils of SELECT *

Functions in our predicate

Deconstructing table-valued functions

Complex expressions

Optimizing OR logic

NULL means unknown

Fuzzy string matching

Inequality logic

EXECUTE versus sp_executesql

Composable logic

Summary

Chapter 6: Discovering T-SQL Anti- Patterns in Depth

Technical requirements

Implicit conversions

Avoiding unnecessary sort operations

Avoiding UDF pitfalls

Avoiding unnecessary overhead with stored procedures

Pitfalls of complex views

Pitfalls of correlated sub-queries

Properly storing intermediate results

Summary

Part 3: Assembling Our Query Troubleshooting Toolbox

Chapter 7: Building Diagnostic Queries Using DMVs and DMFs

Technical requirements

Introducing DMVs

Exploring query execution DMVs

Exploring query plan cache DMVs

Troubleshooting common scenarios with DMV queries

Investigating blocking

Cached query plan issues

Mining XML query plans

Summary

Chapter 8: Building XEvent Profiler Traces

Technical requirements

Introducing XEvents

Getting up and running with XEvent Profiler

Remote collection with SQL LogScout

Analyzing traces with RML Utilities

Summary

Chapter 9: Comparative Analysis of Query Plans

Technical requirements

Query plan analyzer

Summary

Chapter 10: Tracking Performance History with Query Store

Technical requirements

Introducing the Query Store

Tracking expensive queries

Fixing regressed queries

Features that rely on the Query Store

Summary

Chapter 11: Troubleshooting Live Queries

Technical requirements

Using Live Query Statistics

Understanding the need for lightweight profiling

Activity Monitor gets new life

Summary

Chapter 12: Managing Optimizer Changes

Technical requirements

Understanding where QTA and CE Feedback are needed

Understanding QTA fundamentals

Exploring the QTA workflow

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Plan caching and reuse

As we have now established, the process of optimizing a query can consume a large amount of resources and take a significant amount of time, so it makes sense to avoid that effort if possible whenever a query is executed. The SQL Database Engine caches nearly every plan that is created so that it can be reused when the same query is executed again. But not all execution plans are eligible for caching; for example, no DDL statements are cached, such as CREATE TABLE. As for DML statements, most simple forms that only have one possible execution plan are also not cached, such as INSERT INTO … VALUES.

There are several different methods for plan caching. The method that is used is typically based on how the query is called from the client. The different methods of plan caching that will be covered in this section are the following:

Stored procedures
Ad hoc plan caching
Parameterization (simple and forced)
The sp_executesql procedure
Prepared statements

Stored procedures

A stored procedure is a group of one or more T-SQL statements that is stored as an object in a SQL database. Stored procedures are like procedures in other programming languages in that they can accept input parameters and return output parameters, they can contain control flow logic such as conditional statements (IF … ELSE), loops (WHILE), and error handling (TRY … CATCH), and they can return a status value to the caller indicating success or failure. They can even contain calls to other stored procedures. There are many benefits to using stored procedures, but in this section, we will focus mainly on their benefit of reducing the overhead of the compilation process through caching.

The first time a stored procedure is executed, the SQL Database Engine compiles and optimizes the T-SQL within the procedure, and the resulting execution plan is cached for future use. Every subsequent call to the procedure reuses the cached plan, until such a time as the plan is removed from the cache due to reasons such as the following:

Memory pressure
Server restart
Plan invalidation – when the underlying objects are changed in some way or a significant amount of data is changed

Stored procedures are the preferred method for plan caching as they provide the most effective mechanism of caching and reusing query plans in the SQL Database Engine.

Ad hoc plan caching

An ad hoc query is a T-SQL query that is sent to the server as a block of text with no parameter markers or other constructs. They are typically built on the fly as needed, such as a query that is typed into a query window in SQL Server Management Studio (SSMS) and executed, or one that is sent to the server using the EXECUTE command as in the following code example, which can be executed in the AdventureWorks sample database:

EXECUTE (N'SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = N''EM'';')

Note

The letter N preceding a string in a T-SQL script indicates that the string should be interpreted as Unicode with UTF-16 encoding. In order to avoid implicit data-type conversions, be sure to specify N for all Unicode string literals when writing T-SQL scripts that involve the NCHAR and NVARCHAR data types. We discuss implicit conversions and their impact on performance in Chapter 6, Discovering T-SQL Anti-Patterns in Depth.

The process of parsing and optimizing an ad hoc query is like that of a stored procedure, and will be just as costly, so it is worth the SQL Database Engine storing the resulting plan in the cache in case the exact same query is ever executed again. The problem with ad hoc caching is that it is extremely difficult to ensure that the resulting plan is reused.

For the SQL Database Engine to reuse an ad hoc plan, the incoming query must match the cached query exactly. Every character must be the same, including spaces, line breaks, and capitalization. The reason for this is that the SQL Database Engine uses a hash function across the entire string to match the T-SQL statement. If even one character is off, the hash values will not match, and the SQL Database Engine will again compile, optimize, and cache the incoming ad hoc statement. For this reason, ad hoc caching cannot be relied upon as an effective caching mechanism.

Note

Even if the database is configured to use case-insensitive collation, this does not apply to query parsing. The ad hoc plan matching is still case sensitive because of the algorithm used to generate the hash value for the query string.

If there are many ad hoc queries being sent to an instance of the SQL Database Engine, the plan cache can become bloated with single-use plans. This can cause performance issues on the system as the plan cache will be unnecessarily large, taking up memory that could be better used elsewhere in the system. In this case, turning on the optimize for ad hoc workloads server configuration option is recommended. When this option is turned on, the SQL Database Engine will cache a small plan stub object the first time an ad hoc query is executed. This object takes up much less space than a full plan object and will minimize the size of the ad hoc cache. If the query is ever executed a second time, the full plan will be cached.

Tip

See the chapter Building Diagnostic Queries using DMVs and DMFs later in this book for a query that will help identify single-use plans in the cache.

Parameterization

Parameterization is the practice of replacing a literal value in a T-SQL statement with a parameter marker. Building on the example from the Ad hoc plan caching section, the following code block shows an example of a parameterized query executed in the AdventureWorks sample database:

DECLARE @PersonType AS nchar(2) = N'EM';
SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = @PersonType;

In this case, the literal value 'EM' is moved from the T-SQL statement itself into a DECLARE statement, and the variable is used in the query instead. This allows the query plan to be reused for different @PersonType values, whereas sending different values directly in the query string would result in a separate cached ad hoc plan.

Simple parameterization

In order to minimize the impact of ad hoc queries, the SQL Database Engine will automatically parameterize some simple queries by default. This is called simple parameterization and is the default setting of the Parameterization database option. With parameterization set to Simple, the SQL Database Engine will automatically replace literal values in an ad hoc query with parameter markers in order to make the resulting query plan reusable. This works for some queries, but there is a very small class of queries that can be parameterized this way.

As an example, the query we introduced previously in the Parameterization section would not be automatically parameterized in simple mode because it is considered unsafe. This is because different PersonType values may yield a different number of rows, and thus require a different execution plan. However, the following query executed in the AdventureWorks sample database would qualify for simple automatic parameterization:

SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE BusinessEntityID = 5;

This query would not be cached as-is. The SQL Database Engine would convert the literal value of 5 to a parameter marker, and it would look something like this in the cache:

(@1 tinyint) SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE BusinessEntityID = @1;

Forced parameterization

If an application tends to generate many ad hoc queries, and there is no way to modify the application to parameterize the queries, the Parameterization database option can be changed to Forced. When forced parameterization is turned on, the SQL Database Engine will replace all literal values in all ad hoc queries with parameter markers for the majority of use cases. However, note that there are documented exceptions that are either of the following:

Edge cases that most developers will not face, such as statements that contain more than 2,097 literals
Non-starters because statements will not be parameterized irrespective of whether forced parameterization is enabled or not, such as when statements contain the RECOMPILE query hint, statements inside the bodies of stored procedures, triggers, user-defined functions, or prepared statements that have already been parameterized on the client-side application

Take the example of the following query executed in the AdventureWorks sample database:

SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = N'EM' AND BusinessEntityID IN (5, 7, 13, 17, 19);

This query would be automatically parameterized under forced parameterization as follows:

(@1 nchar(2), @2 int, @3 int, @4 int, @5 int, @6 int) SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = @1 AND BusinessEntityID IN (@2, @3, @4, @5, @6);

This has the benefit of increasing the reusability of all ad hoc queries, but there are some risks to parameterizing all literal values in all queries, which will be discussed later in the The importance of parameters section.

The sp_executesql procedure

The sp_executesql procedure is the recommended method for sending an ad hoc T-SQL statement to the SQL Database Engine. If stored procedures cannot be leveraged for some reason, such as when T-SQL statements must be constructed dynamically by the application, sp_executesql allows the user to send an ad hoc T-SQL statement as a parameterized query, which uses a similar caching mechanism to stored procedures. This ensures that the plan can be reused whenever the same query is executed again. Building on our example from the Ad hoc plan caching section, we can re-write the query using sp_executesql as in the following example, which can be executed in the AdventureWorks sample database:

EXECUTE sp_executesql @stmt = N'SELECT LastName,
      FirstName, MiddleName
      FROM Person.Person
      WHERE PersonType = @PersonType;',
@params = N'@PersonType nchar(2)',
@PersonType = N'EM';

This ensures that any time the same query is sent with the same parameter markers, the plan will be reused, even if the statement is dynamically generated by the application.

Prepared statements

Another method for sending parameterized T-SQL statements to the SQL Database Engine is by using prepared statements. Leveraging prepared statements involves three different system procedures:

sp_prepare: Defines the statement and parameters that are to be executed, creates an execution plan for the query, and sends a statement handle back to the caller that can be used for subsequent execution.
sp_execute: Executes the statement defined by sp_prepare by sending the statement handle along with any parameters to the SQL Database Engin.
sp_unprepare: Discards the execution plan created by sp_prepare for the query specified by the statement handle

Steps 1 and 2 can optionally be combined into a single sp_prepexec statement to save a round-trip to the server.

This method is not generally recommended for plan reuse as it is a legacy construct and may not take advantage of some of the benefits of parameterized statements that sp_executesql and stored procedures can leverage. It is worth mentioning, however, because it is used by some cross-platform database connectivity libraries such as Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) as the default mechanism for sending queries to the SQL Database Engine.

Now that we’ve learned the different ways that plans may be cached, let’s explore how plans may be reused during query processing.

Learn T-SQL Querying - Second Edition

By : Pedro Lopes, Pam Lahoud

Learn T-SQL Querying - Second Edition

By: Pedro Lopes, Pam Lahoud

Overview of this book

Related Content you might be interested in

Current Title:

Learn T-SQL Querying - Second Edition

SQL Server Query Tuning and Optimization

SQL Query Design Patterns and Best Practices

SQL Server 2019 Administrator's Guide

Plan caching and reuse

Stored procedures

Ad hoc plan caching

Parameterization

Simple parameterization

Forced parameterization

The sp_executesql procedure

Prepared statements