Mastering SQL Server 2017

By : Milo≈° Radivojeviƒá, Dejan Sarka, William Durkin, Christian Cote, Matija Lah

Mastering SQL Server 2017

By: Milo≈° Radivojeviƒá, Dejan Sarka, William Durkin, Christian Cote, Matija Lah

Overview of this book

Microsoft SQL Server 2017 uses the power of R and Python for machine learning and containerization-based deployment on Windows and Linux. By learning how to use the features of SQL Server 2017 effectively, you can build scalable apps and easily perform data integration and transformation. You’ll start by brushing up on the features of SQL Server 2017. This Learning Path will then demonstrate how you can use Query Store, columnstore indexes, and In-Memory OLTP in your apps. You'll also learn to integrate Python code in SQL Server and graph database implementations for development and testing. Next, you'll get up to speed with designing and building SQL Server Integration Services (SSIS) data warehouse packages using SQL server data tools. Toward the concluding chapters, you’ll discover how to develop SSIS packages designed to maintain a data warehouse using the data flow and other control flow tasks. By the end of this Learning Path, you'll be equipped with the skills you need to design efficient, high-performance database applications with confidence. This Learning Path includes content from the following Packt books: SQL Server 2017 Developer's Guide by Miloš Radivojevi?, Dejan Sarka, et. al SQL Server 2017 Integration Services Cookbook by Christian Cote, Dejan Sarka, et. al

Title Page

Mastering SQL Server 2017

Contributors

About the Authors

Packt Is Searching for Authors Like You

About Packt

Why Subscribe?

Packt.com

Preface

Who This Book Is For

What This Book Covers

To Get the Most out of This Book

Get in Touch

Free Chapter

Introduction to SQL Server 2017

Security

Summary

SQL Server Tools

Installing and updating SQL Server Tools

New SSMS features and enhancements

SQL Server Data Tools

Tools for developing R and Python code

Summary

JSON Support in SQL Server

Why JSON?

What is JSON?

JSON in SQL Server prior to SQL Server 2016

Retrieving SQL Server data in JSON format

Converting JSON data in a tabular format

JSON storage in SQL Server 2017

Validating JSON data

Extracting values from a JSON text

Modifying JSON data

Performance considerations

Summary

Stretch Database

Stretch DB architecture

Limitations of using Stretch Database

Use cases for Stretch Database

Enabling Stretch Database

Querying stretch databases

SQL Server Stretch Database pricing

Stretch DB management and troubleshooting

Summary

Temporal Tables

What is temporal data?

System-versioned temporal tables in SQL Server 2017

What is missing in SQL Server 2017?

Summary

Columnstore Indexes

Analytical queries in SQL Server

Columnar storage and batch processing

Nonclustered columnstore indexes

Clustered columnstore indexes

Summary

SSIS Setup

Introduction

SQL Server 2016 download

Installing JRE for PolyBase

Installing SQL Server 2016

SQL Server Management Studio installation

SQL Server Data Tools installation

Testing SQL Server connectivity

What Is New in SSIS 2016

Introduction

Creating SSIS Catalog

Custom logging

Azure tasks and transforms

Incremental package deployment

Multiple version support

Error column name

Control Flow templates

Key Components of a Modern ETL Solution

Introduction

Installing the sample solution

Deploying the source database with its data

Deploying the target database

SSIS projects

Framework calls in EP_Staging.dtsx

Dealing with Data Quality

Introduction

Profiling data with SSIS

Creating a DQS knowledge base

Data cleansing with DQS

Creating a MDS model

Matching with DQS

Using SSIS fuzzy components

Unleash the Power of SSIS Script Task and Component

Introduction

Using variables in SSIS Script task

Execute complex filesystem operations with the Script task

Reading data profiling XML results with the Script task

Correcting data with the Script component

Validating data using regular expressions in a Script component

Using the Script component as a source

Using the Script component as a destination

On-Premises and Azure Big Data Integration

Introduction

Azure Blob storage data management

Installing a Hortonworks cluster

Copying data to an on-premises cluster

Using Hive – creating a database

Transforming the data with Hive

Transferring data between Hadoop and Azure

Leveraging a HDInsight big data cluster

Managing data with Pig Latin

Importing Azure Blob storage data

Extending SSIS Custom Tasks and Transformations

Introduction

Designing a custom task

Designing a custom transformation

Managing custom component versions

Scale Out with SSIS 2017

Introduction

SQL Server 2017 download and setup

SQL Server client tools setup

Configuring SSIS for scale out executions

Executing a package using scale out functionality

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Security

The last few years have made the importance of security in IT extremely apparent, particularly when we consider the repercussions of the Edward Snowden data leaks or multiple cases of data theft via hacking. While no system is completely impenetrable, we should always be considering how we can improve the security of the systems we build. These considerations are wide ranging and sometimes even dictated via rules, regulations, and laws. Microsoft has responded to the increased focus on security by delivering new features to assist developers and DBAs in their search for more secure systems.

Row-Level Security

The first technology that was introduced in SQL Server 2016 to address the need for increased/improved security is Row-Level Security (RLS). RLS provides the ability to control access to rows in a table based on the user executing a query. With RLS it is possible to implement a filtering mechanism on any table in a database, completely transparently to any external application or direct T-SQL access.

The ability to implement such filtering without having to redesign a data access layer allows system administrators to control access to data at an even more granular level than before. The fact that this control can be achieved without any application logic redesign makes this feature potentially even more attractive to certain use-cases. RLS also makes it possible, in conjunction with the necessary auditing features, to lock down a SQL Server database so that even the traditional god-mode sysadmin cannot access the underlying data.

Dynamic data masking

The second security feature that we will be covering is Dynamic Data Masking (DDM). DDM allows the system administrator to define column level data masking algorithms that prevent users from reading the contents of columns, while still being able to query the rows themselves. This feature was initially aimed at allowing developers to work with a copy of production data without having the ability to actually see the underlying data. This can be particularly useful in environments where data protection laws are enforced (for example, credit card processing systems and medical record storage). Data masking occurs only at query runtime and does not affect the stored data of a table. This means that it is possible to mask a multi-terabyte database through a simple DDL statement, rather than resorting to the previous solution of physically masking the underlying data in the table we want to mask. The current implementation of DDM provides the ability to define a fixed set of functions to columns of a table, which will mask data when a masked table is queried. If a user has the permission to view the masked data, then the masking functions are not run, whereas a user who may not see masked data will be provided with the data as seen through the defined masking functions.

Always Encrypted

The third major security feature to be introduced in SQL Server 2016 is Always Encrypted. Encryption with SQL Server was previously a (mainly) server-based solution. Databases were either protected with encryption at the database level (the entire database was encrypted) or at the column level (single columns had an encryption algorithm defined). While this encryption was/is fully functional and safe, crucial portions of the encryption process (for example, encryption certificates) are stored inside SQL Server. This effectively gave the owner of a SQL Server instance the ability to potentially gain access to this encrypted data—if not directly, there was at least an increased surface area for a potential malicious access attempt. As ever more companies moved into hosted service and cloud solutions (for example, Microsoft Azure), the previous encryption solutions no longer provided the required level of control/security.

Always Encrypted was designed to bridge this security gap by removing the ability of an instance owner to gain access to the encryption components. The entirety of the encryption process was moved outside of SQL Server and resides on the client side. While a similar effect was possible using homebrew solutions, Always Encrypted provides a fully integrated encryption suite into both the .Net Framework and SQL Server. Whenever data is defined as requiring encryption, the data is encrypted within the .NET framework and only sent to SQL Server after encryption has occurred. This means that a malicious user (or even system administrator) will only ever be able to access encrypted information should they attempt to query data stored via Always Encrypted.

Microsoft has made some positive progress in this area of the product. While no system is completely safe and no single feature can provide an all-encompassing solution, all three features provide a further option in building up, or improving upon, any system's current security level.

Engine features

The Engine features section is traditionally the most important, or interesting, for most DBAs or system administrators when a new version of SQL Server is released. However, there are also numerous engine feature improvements that have tangential meanings for developers too. So, if you are a developer, don't skip this section—or you may miss some improvements that could save you some trouble later on!

Query Store

The Query Store is possibly the biggest new engine feature to come with the release of SQL Server 2016. DBAs and developers should be more than familiar with the situation of a query behaving reliably for a long period, which suddenly changed into a slow-running, resource-killing monster. Some readers may identify the cause of the issue as the phenomenon of parameter sniffing or similarly through stale statistics. Either way, when troubleshooting to find out why one unchanging query suddenly becomes slow, knowing the query execution plan(s) that SQL Server has created and used can be very helpful. A major issue when investigating these types of problems is the transient nature of query plans and their execution statistics. This is where Query Store comes into play; SQL Server collects and permanently stores information on query compilation and execution on a per-database basis. This information is then persisted inside each database that is being monitored by the Query Store functionality, allowing a DBA or developer to investigate performance issues after the fact.

It is even possible to perform longer-term query analysis, providing an insight into how query execution plans change over a longer time frame. This sort of insight was previously only possible via handwritten solutions or third-party monitoring solutions, which may still not allow the same insights as the Query Store does.

Live query statistics

When we are developing inside SQL Server, each developer creates a mental model of how data flows inside SQL Server. Microsoft has provided a multitude of ways to display this concept when working with query execution. The most obvious visual aid is the graphical execution plan. There are endless explanations in books, articles, and training seminars that attempt to make reading these graphical representations easier. Depending upon how your mind works, these descriptions can help or hinder your ability to understand the data flow concepts—fully blocking iterators, pipeline iterators, semi-blocking iterators, nested loop joins... the list goes on. When we look at an actual graphical execution plan, we are seeing a representation of how SQL Server processed a query: which data retrieval methods were used, which join types were chosen to join multiple data sets, what sorting was required, and so on. However, this is a representation after the query has completed execution. Live Query Statistics offers us the ability to observe during query execution and identify how, when, and where data moves through the query plan. This live representation is a huge improvement in making the concepts behind query execution clearer and is a great tool to allow developers to better design their query and index strategies to improve query performance.

Further details of Live Query Statistics can be found in Chapter 2, SQL Server Tools.

Stretch Database

Microsoft has worked a lot in the past few years on their Mobile First, Cloud First strategy. We have seen a huge investment in their cloud offering, Azure, with the line between on-premises IT and cloud-based IT being continually blurred. The features being released in the newest products from Microsoft continue this approach and SQL Server is taking steps to bridge the divide between running SQL Server as a fully on-premises solution and storing/processing relational data in the cloud.

One big step in achieving this approach is the new Stretch Database feature with SQL Server 2016. Stretch Database allows a DBA to categorize the data inside a database, defining which data is hot and which is cold. This categorization allows Stretch Database to then move the cold data out of the on-premises database and into Azure Cloud Storage.

The segmentation of data remains transparent to any user/application that queries the data, which now resides in two different locations. The idea behind this technology is to reduce storage requirements for the on-premises system by offloading large amounts of archive data onto cheaper, slower storage in the cloud.

This reduction should then allow the smaller hot data to be placed on smaller capacity, higher performance storage. The magic of Stretch Database is the fact that this separation of data requires no changes at the application or database query level. This is a purely storage-level change, which means the potential ROI of segmenting a database is quite large.

Further details of Stretch Database can be found in Chapter 4, Stretch Database.

Database scoped configuration

Many DBAs who support multiple third-party applications running on SQL Server can experience the difficulty of setting up their SQL Server instances per the application requirements or best practices. Many third-party applications have prerequisites that dictate how the actual instance of SQL Server must be configured. A common occurrence is a requirement of configuring the Max Degree of Parallelism to force only one CPU to be used for query execution. As this is an instance-wide setting, this can affect all other databases/applications in a multi-tenant SQL Server instance (which is generally the case). With Database Scoped Configuration in SQL Server 2016, several previously instance-level settings have been moved to a database-level configuration option. This greatly improves multi-tenant SQL Server instances, as the decision of, for example, how many CPUs can be used for query execution can be made at the database-level, rather than for the entire instance. This will allow DBAs to host databases with differing CPU usage requirements on the same instance, rather than having to either impact the entire instance with a setting or be forced to run multiple instances of SQL Server and possibly incur higher licensing costs.

Temporal Tables

There are many instances where DBAs or developers are required to implement a change tracking solution, allowing future analysis or assessment of data changes for certain business entities. A readily accessible example is the change in history on a customer account in a CRM system. The options for implementing such a change tracking system are varied and have strengths and weaknesses. One such implementation that has seen wide adoption is the use of triggers, to capture data changes and store historical values in an archive table. Regardless of the implementation chosen, it was often cumbersome to be able to develop and maintain these solutions.

One of the challenges was in being able to incorporate table structure changes in the table being tracked. It was equally challenging creating solutions to allow for querying both the base table and the archive table belonging to it. The intelligence of deciding whether to query the live and/or archive data can require some complex query logic.

With the advent of Temporal Tables, this entire process has been simplified for both developers and DBAs. It is now possible to activate this change tracking on a table and push changes into an archive table with a simple change on a table's structure. Querying the base table and including a temporal attribute to the query is also a simple T-SQL syntax addition. As such, it is now possible for a developer to submit temporal analysis queries, and SQL Server takes care of splitting the query between the live and archive data and returning the data in a single result set.

Further details of Temporal Tables can be found in Chapter 5, Temporal Tables.

Columnstore indexes

Traditional data storage inside SQL Server has used the row-storage format, where the data for an entire row is stored together on the data pages inside the database. SQL Server 2012 introduced a new storage format: columnstore. This format pivots the data storage, combining the data from a single column and storing the data together on the data pages. This storage format provides the ability of massive compression of data; it's orders of magnitude better than traditional row storage.

Initially, only non-clustered columnstore indexes were possible. With SQL Server 2014, clustered columnstore indexes were introduced, expanding the usability of the feature greatly. Finally, with SQL Server 2016, updateable columnstore indexes and support for In-Memory columnstore indexes have been introduced. The potential performance improvements through these improvements are huge.

Further details of columnstore indexes can be found in Chapter 6, Columnstore Indexes.

Containers and SQL Server on Linux

For the longest time, SQL Server has run solely on the Windows operating system. This was a major roadblock for adoption in traditionally Unix/Linux based companies that used alternative RDBM systems instead. Containers have been around in IT for over a decade and have made a major impression in the application development world. The ability to now host SQL Server in a container provides developers with the ability to adopt the development and deployment methodologies associated with containers into database development. A second major breakthrough (and surprise) around SQL Server 2017 was the announcement of SQL Server being ported to Linux. The IT world was shocked at this revelation and what it meant for the other RDBM systems on the market. There is practically no other system with the same feature-set and support network available at the same price point. As such, SQL Server on Linux will open a new market and allow for growth in previously unreachable areas of the IT world.

This concludes the section outlining the engine features. Through Microsoft's heavy move into cloud computing and their Azure offerings, they have had increased need to improve their internal systems for themselves. Microsoft has been famous for its dogfooding approach of using their own software to run their own business and Azure is arguably their largest foray into this area. The main improvements in the database engine have been fueled by the need to improve their own ability to continue offering Azure database solutions at a scale and provide features to allow databases of differing sizes and loads to be hosted together.

Programming

Without programming, a SQL Server isn't very useful. The programming landscape of SQL Server has continued to improve to adopt newer technologies over the years. SQL Server 2017 is no exception in this area. There have been some long-awaited general improvements and also some rather revolutionary additions to the product that changes the way SQL Server can be used in future projects. This section will outline what programming improvements have been included in SQL Server 2017.

Transact-SQL enhancements

The last major improvements in the T-SQL language allowed for better processing of running totals and other similar window functions. This was already a boon and allowed developers to replace arcane cursors with high-performance T-SQL. These improvements are never enough for the most performance conscious developers among us, and as such there were still voices requesting further incorporation of the ANSI SQL standards into the T-SQL implementation.

Notable additions to the T-SQL syntax include the ability to finally split comma-separated strings using a single function call, STRING_SPLIT(), instead of the previous hacky implementations using loops or the Common Language Runtime (CLR).

The sensible opposing syntax for splitting strings is a function to aggregate values together, STRING_AGG(), which returns a set of values in a comma-separated string. This replaces similarly hacky solutions using the XML data type of one of a multitude of looping solutions.

Each improvement in the T-SQL language further extends the toolbox that we, as developers, possess to be able to manipulate data inside SQL Server. The ANSI SQL standards provide a solid basis to work from and further additions of these standards are always welcome.

JSON

It is quite common to meet developers outside of the Microsoft stack who look down on products from Redmond. Web developers, in particular, have been critical of the access to modern data exchange structures, or rather the lack of it. JSON has become the de facto data exchange method for the application development world. It is similar in structure to the previous cool-kid XML, but for reasons beyond the scope of this book, JSON has overtaken XML and is the expected payload for application and database communications.

Microsoft has included JSON as a native data type in SQL Server 2016 and provided a set of functions to accompany the data type.

Further details of JSON can be found in Chapter 3, JSON Support in SQL Server.

In-Memory OLTP

In-Memory OLTP (codename Hekaton) was introduced in SQL Server 2014. The promise of ultra-high performance data processing inside SQL Server was a major feature when SQL Server 2014 was released. As expected with version-1 features, there were a wide range of limitations in the initial release and this prevented many customers from being able to adopt the technology. With SQL Server 2017, a great number of these limitations have been either raised to a higher threshold or completely removed. In-Memory OLTP has received the required maturity and extension in feature set to make it viable for prime production deployment.

SQL Server Tools

Accessing or managing data inside SQL Server and developing data solutions are two separate disciplines, each with their own specific focus on SQL Server. As such, Microsoft has created two different tools, each tailored towards the processes and facets of these disciplines.

SQL Server Management Studio (SSMS), as the name suggests, is the main management interface between DBAs/developers and SQL Server. The studio was originally released with SQL Server 2005 as a replacement and consolidation of the old Query Analyzer and Enterprise Manager tools. As with any non-revenue-generating software, SSMS only received minimal attention over the years, with limitations and missing tooling for many of the newer features in SQL Server. With SQL Server 2016, the focus of Microsoft has shifted and SSMS has been de-coupled from the release cycle of SQL Server itself. This decoupling allows both SSMS and SQL Server to be developed without having to wait for each other or for release windows. New releases of SSMS are created on top of more recent versions of Visual Studio and have seen almost monthly update releases since SQL Server 2016 was released into the market.

SQL Server Data Tools (SSDT) is also an application based on the Visual Studio framework. SSDT is focused on the application/data development discipline. SSDT is much more closely aligned with Visual Studio in its structure and the features offered. This focus includes the ability to create entire database projects and solution files, easier integration into source control systems, the ability to connect projects into automated build processes, and generally offering a developer-centric development environment with a familiarity with Visual Studio. It is possible to design and create solutions in SSDT for SQL Server using the Relational Engine, Analysis Services, Integration Services, Reporting Services, and of course the Azure SQL database.

Further details of SQL Server Tools can be found in Chapter 2, SQL Server Tools.

This concludes the overview of programming enhancements inside SQL Server 2016. The improvements outlined are all solid evolutionary steps in their respective areas. New features are very welcome and allow us to achieve more while requiring less effort on our side. The In-memory OLTP enhancements are especially positive, as they now expand on the groundwork laid down in the release of SQL Server 2014. Please read the respective chapters to gain deeper insight into how these enhancements can help you.

Business intelligence

Business intelligence is a huge area of IT and has been a cornerstone of the SQL Server product since at least SQL Server 2005. As the market and technologies in the business intelligence space improve, so must SQL Server. The advent of cloud-based data analysis systems as well as the recent buzz around big data are driving forces for all data platform providers, and Microsoft is no exception here. While there are multiple enhancements in the business intelligence portion of SQL Server 2016, we will be concentrating on the feature that has a wider audience than just data analysts: the R language in SQL Server.

Release cycles

Microsoft has made a few major public-facing changes in the past 5 years. These changes include a departure from longer release cycles in their main products and a transition towards subscription-based services (for example, Office 365 and Azure services). The ideas surrounding continuous delivery and agile software development have also shaped the way that Microsoft has been delivering on its flagship integrated development environment Visual Studio, with releases occurring approximately every six months. This change in philosophy is now flowing into the development cycle of SQL Server. Due to the similarly constant release cycle of the cloud version of SQL Server (Azure SQL Database), there is a desire to keep both the cloud and on-premises versions of the product as close to each other as possible. As such, it is unsurprising to see that the previous release cycle of every three to 5 years is being replaced with much shorter intervals. A clear example of this is that SQL Server 2016 released to the market in June of 2016, with a Community Technology Preview (CTP) of SQL Server 2017 being released in November of 2016 and the Release To Market (RTM) of SQL Server 2017 happening in October 2017. The wave of technology progress stops for no one. This is very clearly true in the case of SQL Server!

Mastering SQL Server 2017

By : Milo≈° Radivojeviƒá, Dejan Sarka, William Durkin, Christian Cote, Matija Lah

Mastering SQL Server 2017

By: Milo≈° Radivojeviƒá, Dejan Sarka, William Durkin, Christian Cote, Matija Lah

Overview of this book

Related Content you might be interested in

Current Title:

Mastering SQL Server 2017