Book Image

Data Science with SQL Server Quick Start Guide

By : Dejan Sarka
Book Image

Data Science with SQL Server Quick Start Guide

By: Dejan Sarka

Overview of this book

SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you. This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment. You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.
Table of Contents (15 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Using frequency tables to understand discrete variables


For an overview of discrete variables, you use frequency tables and charts. A frequency table can show the following:

  • The values
  • The count of the values, or the absolute frequency
  • The proportion of the value, or the absolute percentage
  • The cumulative frequency
  • The cumulative percent
  • Plus, you can create a bar chart or a histogram of the values' absolute percentage

Minimally, you need to calculate the counts of the distinct values of the variable.

Let me start with calculating the frequencies with T-SQL. Window aggregate functions are very handy here. The following query is a very efficient one. If you look at the first part of the query, you'll notice that there is a common table expression (CTE) query that calculates the absolute numbers, or the counts. The cumulative valuesthe running totals—are calculated with the help of window aggregate functions.

The query calculates the frequencies of the CommuteDistance variable from the dbo.vTargetMail...