Book Image

Data Science with SQL Server Quick Start Guide

By : Dejan Sarka
Book Image

Data Science with SQL Server Quick Start Guide

By: Dejan Sarka

Overview of this book

SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you. This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment. You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.
Table of Contents (15 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Preface

The book will give you a jump-start in data science with Microsoft SQL Server and in-database Machine Learning Services (ML Services). It covers all stages of a data science project, from business and data understanding through data overview, data preparation, and modeling, to using algorithms, model evaluation, and deployment. The book shows how to use the engines and languages that come with SQL Server, including ML Services with R, Python, and Transact-SQL (T-SQL). You will find useful code examples in all three languages mentioned. The book also shows which algorithms to use for which tasks, and briefly explains each algorithm.

Who this book is for

SQL Server only started to fully support data science with its two latest versions, 2016 and 2017. Therefore, SQL Server is not widely used for data science yet. However, there are professionals from the worlds of SQL Server and data science who are interested in using SQL Server and ML Services for their projects. Therefore, this book is intended for SQL Server professionals who want to start with data science, and data scientists who would like to start using SQL Server in their projects.

What this book covers

Chapter 1, Writing Queries with T-SQL, gives a brief overview of T-SQL queries. It introduces all of the important parts of the mighty SELECT statement and focuses on analytical queries.

Chapter 2, Introducing R, introduces the second language in this book, R. R has been supported in SQL Server since version 2016. In order to use it properly, you have to understand the language constructs and data structures.

Chapter 3, Getting Familiar with Python, gives an overview of the second most popular data science language, Python. As a more general language, Python is probably even more popular than R. Lately, Python has been catching up with R in the data science field. 

Chapter 4, Data Overview, deals with understanding data. You can use introductory statistics and basic graphs for this task. You will learn how to perform a data overview in all three languages used in this book.

Chapter 5, Data Preparation, teaches you how to work with the data that you get from your business systems and from data warehouses, which is typically not suited for direct use in a machine learning project. You need to add derived variables, deal with outliers and missing values, and more.

Chapter 6, Intermediate Statistics and Graphs, starts with the real analysis of the data. You can use intermediate-level statistical methods and graphs for the beginning of your advanced analytics journey.

Chapter 7, Unsupervised Machine Learning, explains the algorithms that do not use a target variable. It is like fishing in the mud - you try and see if some meaningful information can be extracted from your data. The most common undirected techniques are clustering, dimensionality reduction, and affinity grouping, also known as basket analysis or association rules.

Chapter 8, Supervised Machine Learning, deals with the algorithms that need a target variable. Some of the most important directed techniques include classification and estimation. Classification means examining a new case and assigning it to a predefined discrete class, for example, assigning keywords to articles and assigning customers to known segments. Next is estimation, where you try to estimate the value of a continuous variable of a new case. You can, for example, estimate the number of children or the family income. This chapter also shows you how you can evaluate your machine learning models and use them for predictions.

To get the most out of this book

In order to run the demo code associated with this book, you will need SQL Server 2017, SQL Server Management Studio, and Visual Studio 2017.

All of the information about the installation of the software needed to run the code is included in the first three chapters of the book.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

 

 

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Data-Science-with-SQL-Server-Quick-Start-Guide. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/DataSciencewithSQLServerQuickStartGuide_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

# R version and contributors
R.version.string
contributors()

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

1 + 2
2 + 5 * 4
3 ^ 4
sqrt(81)
pi

Any command-line input or output is written as follows:

install.packages("RODBC")
library(RODBC)

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.