Book Image

Hands-On Data Science with SQL Server 2017

By : Marek Chmel, Vladimír Mužný
Book Image

Hands-On Data Science with SQL Server 2017

By: Marek Chmel, Vladimír Mužný

Overview of this book

SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features. Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples. By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs.
Table of Contents (14 chapters)

Using the PREDICT keyword

SQL Server 2017 introduces the new PREDICT function. This function makes prediction computations much simpler than those that are calculated using R or Python languages, which we looked at in the preceding section. However, the PREDICT function doesn't work with every model that is trained in the arbitrary R (or Python) library.

When the SQL Server started providing machine learning services, new libraries called RevoScaleR for R and RevoScalePy for Python were introduced. These libraries contain their own implementation of several predictive algorithms and also offer the ability to process data in parallel.

Using one of these libraries is a prerequisite when we want to use the PREDICT function. We must also fulfill the following prerequisites:

  • We should use one of following algorithms:
    • rxLinMod
    • rxLogit
    • rxBTrees
    • rxDtree
    • rxForest
    • rxFastTrees
    • rxFastForest...