Book Image

Data Science with SQL Server Quick Start Guide

By : Dejan Sarka
Book Image

Data Science with SQL Server Quick Start Guide

By: Dejan Sarka

Overview of this book

SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you. This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment. You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.
Table of Contents (15 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Using the Naive Bayes algorithm


The Naive Bayes algorithm is quite fast one, useful for the initial analysis of discrete variables. The algorithm calculates frequencies, or probabilities, for each possible state of every input variable in each state of the predictable variable. The probabilities are used for predictions on new datasets with known input attributes. As mentioned, the algorithm supports discrete (or discretized, of course) attributes only. Each input attribute is used separately from other input attributes. Therefore, input attributes are considered to be independent. I will show an example in Python. Let's start with the necessary imports:

from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB

The next step is to create the training and the test set from the SQL Server data I read earlier. In addition, as with other algorithms from the sklearn library, I need to prepare the feature matrix and the target vector:

Xtrain = TM.loc[TM.TrainTest == 1...