Book Image

Data Science with SQL Server Quick Start Guide

By : Dejan Sarka
Book Image

Data Science with SQL Server Quick Start Guide

By: Dejan Sarka

Overview of this book

SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you. This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment. You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.
Table of Contents (15 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Measuring dependencies between discrete variables


You can pivot, or cross-tabulate two discrete variables. You measure counts, or absolute frequencies, of each combination of pairs of values of the two variables. You can compare the actual with the expected values in the table. So, what are the expected values? You start with the null hypothesis again—there is no association between the two variables you are examining. For the null hypothesis, you would expect that the distribution of one variable is the same in each class of the other variable, and the same as the overall distribution in the dataset. For example, if you have half married and half single people in the dataset, you expect such a distribution for each level of education. The tables where you show the actual and the expected frequencies are called contingency tables.

Contingency tables show you only visual dependencies. The numerical measure for the association of two discrete variables is the chi-squared value. You calculate...