Book Image

Learn T-SQL Querying - Second Edition

By : Pedro Lopes, Pam Lahoud
Book Image

Learn T-SQL Querying - Second Edition

By: Pedro Lopes, Pam Lahoud

Overview of this book

Data professionals seeking to excel in Transact-SQL (T-SQL) for Microsoft SQL Server and Azure SQL Database often lack comprehensive resources. This updated second edition of Learn T-SQL Querying focuses on indexing queries and crafting elegant T-SQL code, catering to all data professionals seeking mastery in modern SQL Server versions and Azure SQL Database. Starting with query processing fundamentals, this book lays a solid foundation for writing performant T-SQL queries. You’ll explore the mechanics of the Query Optimizer and Query Execution Plans, learning how to analyze execution plans for insights into current performance and scalability. Through dynamic management views (DMVs) and dynamic management functions (DMFs), you’ll build diagnostic queries. This book thoroughly covers indexing for T-SQL performance and provides insights into SQL Server’s built-in tools for expedited resolution of query performance and scalability issues. Further, hands-on examples will guide you through implementing features such as avoiding UDF pitfalls, understanding predicate SARGability, Query Store, and Query Tuning Assistant. By the end of this book, you‘ll have developed the ability to identify query performance bottlenecks, recognize anti-patterns, and skillfully avoid such pitfalls.
Table of Contents (18 chapters)
1
Part 1: Query Processing Fundamentals
4
Part 2: Dos and Don’ts of T-SQL
9
Part 3: Assembling Our Query Troubleshooting Toolbox

Plan caching and reuse

As we have now established, the process of optimizing a query can consume a large amount of resources and take a significant amount of time, so it makes sense to avoid that effort if possible whenever a query is executed. The SQL Database Engine caches nearly every plan that is created so that it can be reused when the same query is executed again. But not all execution plans are eligible for caching; for example, no DDL statements are cached, such as CREATE TABLE. As for DML statements, most simple forms that only have one possible execution plan are also not cached, such as INSERT INTO … VALUES.

There are several different methods for plan caching. The method that is used is typically based on how the query is called from the client. The different methods of plan caching that will be covered in this section are the following:

  • Stored procedures
  • Ad hoc plan caching
  • Parameterization (simple and forced)
  • The sp_executesql procedure
  • Prepared statements

Stored procedures

A stored procedure is a group of one or more T-SQL statements that is stored as an object in a SQL database. Stored procedures are like procedures in other programming languages in that they can accept input parameters and return output parameters, they can contain control flow logic such as conditional statements (IF … ELSE), loops (WHILE), and error handling (TRY … CATCH), and they can return a status value to the caller indicating success or failure. They can even contain calls to other stored procedures. There are many benefits to using stored procedures, but in this section, we will focus mainly on their benefit of reducing the overhead of the compilation process through caching.

The first time a stored procedure is executed, the SQL Database Engine compiles and optimizes the T-SQL within the procedure, and the resulting execution plan is cached for future use. Every subsequent call to the procedure reuses the cached plan, until such a time as the plan is removed from the cache due to reasons such as the following:

  • Memory pressure
  • Server restart
  • Plan invalidation – when the underlying objects are changed in some way or a significant amount of data is changed

Stored procedures are the preferred method for plan caching as they provide the most effective mechanism of caching and reusing query plans in the SQL Database Engine.

Ad hoc plan caching

An ad hoc query is a T-SQL query that is sent to the server as a block of text with no parameter markers or other constructs. They are typically built on the fly as needed, such as a query that is typed into a query window in SQL Server Management Studio (SSMS) and executed, or one that is sent to the server using the EXECUTE command as in the following code example, which can be executed in the AdventureWorks sample database:

EXECUTE (N'SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = N''EM'';')

Note

The letter N preceding a string in a T-SQL script indicates that the string should be interpreted as Unicode with UTF-16 encoding. In order to avoid implicit data-type conversions, be sure to specify N for all Unicode string literals when writing T-SQL scripts that involve the NCHAR and NVARCHAR data types. We discuss implicit conversions and their impact on performance in Chapter 6, Discovering T-SQL Anti-Patterns in Depth.

The process of parsing and optimizing an ad hoc query is like that of a stored procedure, and will be just as costly, so it is worth the SQL Database Engine storing the resulting plan in the cache in case the exact same query is ever executed again. The problem with ad hoc caching is that it is extremely difficult to ensure that the resulting plan is reused.

For the SQL Database Engine to reuse an ad hoc plan, the incoming query must match the cached query exactly. Every character must be the same, including spaces, line breaks, and capitalization. The reason for this is that the SQL Database Engine uses a hash function across the entire string to match the T-SQL statement. If even one character is off, the hash values will not match, and the SQL Database Engine will again compile, optimize, and cache the incoming ad hoc statement. For this reason, ad hoc caching cannot be relied upon as an effective caching mechanism.

Note

Even if the database is configured to use case-insensitive collation, this does not apply to query parsing. The ad hoc plan matching is still case sensitive because of the algorithm used to generate the hash value for the query string.

If there are many ad hoc queries being sent to an instance of the SQL Database Engine, the plan cache can become bloated with single-use plans. This can cause performance issues on the system as the plan cache will be unnecessarily large, taking up memory that could be better used elsewhere in the system. In this case, turning on the optimize for ad hoc workloads server configuration option is recommended. When this option is turned on, the SQL Database Engine will cache a small plan stub object the first time an ad hoc query is executed. This object takes up much less space than a full plan object and will minimize the size of the ad hoc cache. If the query is ever executed a second time, the full plan will be cached.

Tip

See the chapter Building Diagnostic Queries using DMVs and DMFs later in this book for a query that will help identify single-use plans in the cache.

Parameterization

Parameterization is the practice of replacing a literal value in a T-SQL statement with a parameter marker. Building on the example from the Ad hoc plan caching section, the following code block shows an example of a parameterized query executed in the AdventureWorks sample database:

DECLARE @PersonType AS nchar(2) = N'EM';
SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = @PersonType;

In this case, the literal value 'EM' is moved from the T-SQL statement itself into a DECLARE statement, and the variable is used in the query instead. This allows the query plan to be reused for different @PersonType values, whereas sending different values directly in the query string would result in a separate cached ad hoc plan.

Simple parameterization

In order to minimize the impact of ad hoc queries, the SQL Database Engine will automatically parameterize some simple queries by default. This is called simple parameterization and is the default setting of the Parameterization database option. With parameterization set to Simple, the SQL Database Engine will automatically replace literal values in an ad hoc query with parameter markers in order to make the resulting query plan reusable. This works for some queries, but there is a very small class of queries that can be parameterized this way.

As an example, the query we introduced previously in the Parameterization section would not be automatically parameterized in simple mode because it is considered unsafe. This is because different PersonType values may yield a different number of rows, and thus require a different execution plan. However, the following query executed in the AdventureWorks sample database would qualify for simple automatic parameterization:

SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE BusinessEntityID = 5;

This query would not be cached as-is. The SQL Database Engine would convert the literal value of 5 to a parameter marker, and it would look something like this in the cache:

(@1 tinyint) SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE BusinessEntityID = @1;

Forced parameterization

If an application tends to generate many ad hoc queries, and there is no way to modify the application to parameterize the queries, the Parameterization database option can be changed to Forced. When forced parameterization is turned on, the SQL Database Engine will replace all literal values in all ad hoc queries with parameter markers for the majority of use cases. However, note that there are documented exceptions that are either of the following:

  • Edge cases that most developers will not face, such as statements that contain more than 2,097 literals
  • Non-starters because statements will not be parameterized irrespective of whether forced parameterization is enabled or not, such as when statements contain the RECOMPILE query hint, statements inside the bodies of stored procedures, triggers, user-defined functions, or prepared statements that have already been parameterized on the client-side application

Take the example of the following query executed in the AdventureWorks sample database:

SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = N'EM' AND BusinessEntityID IN (5, 7, 13, 17, 19);

This query would be automatically parameterized under forced parameterization as follows:

(@1 nchar(2), @2 int, @3 int, @4 int, @5 int, @6 int) SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = @1 AND BusinessEntityID IN (@2, @3, @4, @5, @6);

This has the benefit of increasing the reusability of all ad hoc queries, but there are some risks to parameterizing all literal values in all queries, which will be discussed later in the The importance of parameters section.

The sp_executesql procedure

The sp_executesql procedure is the recommended method for sending an ad hoc T-SQL statement to the SQL Database Engine. If stored procedures cannot be leveraged for some reason, such as when T-SQL statements must be constructed dynamically by the application, sp_executesql allows the user to send an ad hoc T-SQL statement as a parameterized query, which uses a similar caching mechanism to stored procedures. This ensures that the plan can be reused whenever the same query is executed again. Building on our example from the Ad hoc plan caching section, we can re-write the query using sp_executesql as in the following example, which can be executed in the AdventureWorks sample database:

EXECUTE sp_executesql @stmt = N'SELECT LastName,
      FirstName, MiddleName
      FROM Person.Person
      WHERE PersonType = @PersonType;',
@params = N'@PersonType nchar(2)',
@PersonType = N'EM';

This ensures that any time the same query is sent with the same parameter markers, the plan will be reused, even if the statement is dynamically generated by the application.

Prepared statements

Another method for sending parameterized T-SQL statements to the SQL Database Engine is by using prepared statements. Leveraging prepared statements involves three different system procedures:

  1. sp_prepare: Defines the statement and parameters that are to be executed, creates an execution plan for the query, and sends a statement handle back to the caller that can be used for subsequent execution.
  2. sp_execute: Executes the statement defined by sp_prepare by sending the statement handle along with any parameters to the SQL Database Engin.
  3. sp_unprepare: Discards the execution plan created by sp_prepare for the query specified by the statement handle

Steps 1 and 2 can optionally be combined into a single sp_prepexec statement to save a round-trip to the server.

This method is not generally recommended for plan reuse as it is a legacy construct and may not take advantage of some of the benefits of parameterized statements that sp_executesql and stored procedures can leverage. It is worth mentioning, however, because it is used by some cross-platform database connectivity libraries such as Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) as the default mechanism for sending queries to the SQL Database Engine.

Now that we’ve learned the different ways that plans may be cached, let’s explore how plans may be reused during query processing.