Book Image

Data Science with SQL Server Quick Start Guide

By : Dejan Sarka
Book Image

Data Science with SQL Server Quick Start Guide

By: Dejan Sarka

Overview of this book

SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you. This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment. You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.
Table of Contents (15 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Advanced data preparation topics


In the last section of this chapter, I will discuss the following:

  • Using the GROUPING SETS in T-SQL
  • Using the rx_data_step() function from the revoscalepy Python package
  • Introducing thedplyrpackage in R

Efficient grouping and aggregating in T-SQL

In Chapter 1, Writing Queries with T-SQL, I discussed the core T-SQL SELECT statement clauses, and showed how you can group and aggregate data. But SQL Server has more hidden gems. Maybe you need to create many different groupings and aggregates. In T-SQL, you can help yourself with the GROUPING SETS clause.

You could create aggregates over multiple different grouping variables by using multiple SELECT statements with a single GROUP BY clause for separate grouping, and then you could use the UNION clause to return all separate result sets as a single unioned result set. However, you can achieve the same result in a single query with the GROUPING SETS clause. You can define multiple sets of variables for grouping, and multiple...