Hands-On Data Science with SQL Server 2017

By : Marek Chmel, Vladimír Mužný

Hands-On Data Science with SQL Server 2017

By: Marek Chmel, Vladimír Mužný

Overview of this book

SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features. Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples. By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Data Science Overview

Introducing data science

Data science domains

Summary

SQL Server 2017 as a Data Science Platform

Technical requirements

SQL Server evolution

SQL Server Services and their use with data science

Summary

Data Sources for Analytics

Technical requirements

Getting data from databases

Importing flat files

Working with XML data

Working with JSON

External data with PolyBase

Summary

Data Transforming and Cleaning with T-SQL

Technical requirements

The need for data transformation

Database architectures for data transformations

Transforming data

Denormalizing data

Using views and stored procedures

Performance considerations

Summary

Questions

Data Exploration and Statistics with T-SQL

Technical requirements

T-SQL aggregate queries

Ranking, framing, and windowing

Running aggregates

Summary

Questions

Custom Aggregations on SQL Server

Technical requirements

Overview of SQLCLR

Creating CLR aggregations

Limitations and performance considerations

Summary

Questions

Data Visualization

Technical requirements

Data visualization – preparation phase

Power BI Report Server

SQL Server Reporting Services

Summary

Data Transformations with Other Tools

Technical requirements

Categorization, missing values, and normalization

Using Integration Services for data transformation

Using R for data transformation

Using Data Factory for data transformation

Summary

Questions

Predictive Model Training and Evaluation

Technical requirements

Preparing SQL Server

Creating data structures

Deploying, training, and evaluating a predictive model

Summary

Questions

Making Predictions

Technical requirements

Reading models from a database

Submitting values to an external script

Using the PREDICT keyword

Making the predictive model self-training

Summary

Questions

Getting It All Together - A Real-World Example

Technical requirements

Assignment and preparation

Data exploration

Data transformation

Training and using predictive models for estimations

Summary

Questions

Next Steps with Data Science and SQL

Data science next steps

Next steps with SQL Server

Data science in the cloud

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

What this book covers

Chapter 1, Data Science Overview, covers what the term data science means, the need for data science, the difference compared with traditional BI/DWH, and the competencies and knowledge required in order to be a data scientist.

Chapter 2, SQL Server 2017 as a Data Science Platform, explains the architecture of SQL Server from a data science perspective: in-memory OLTP for data acquisition; integration services as a transformation feature set; reporting services for visualization of input as well as output data; and, probably most importantly of all, T-SQL as a language for data exploration and transformation and machine learning services for making models themselves.

Chapter 3, Data Sources for Analytics, covers relational databases and NoSQL concepts side-by-side as valuable sources of data with a different approach to use. It also provides an overview of technologies such as HDInsight, Apache Hadoop, and Cosmos DB, and querying against such data sources.

Chapter 4, Data Transforming and Cleaning with T-SQL, demonstrates T-SQL techniques that are useful for making data consumable and complete for further utilization in data science, along with database architectures that are useful for transform/cleansing tasks.

Chapter 5, Data Exploration and Statistics with T-SQL, takes a deep dive into T-SQL capabilities, including common grouping and aggregations, framing/windowing, running aggregates, and (if needed) features such as custom CLR aggregates (with performance considerations).

Chapter 6, Custom Aggregations on SQL Server, explains how to create your own aggregations in order to enhance core T-SQL functionality.

Chapter 7, Data Visualization, explains the importance of visualizing data to reveal hidden patterns therein, along with examples of reporting services, PowerView, and PowerBI. By way of an alternative, an overview of R/Python visualization features is also provided (as these languages will play a vital role later in the book).

Chapter 8, Data Transformations with Other Tools, explains how to use integration services, probably R or Python, to transform data into a useful format, replacing missing values, detecting mistakes in datasets, normalization and its purpose, categorization, and finally data denormalization for better analytic purposes using views.

Chapter 9, Predictive Model Training and Evaluation, concerns a wide set of predictive models (clustering, N-point Bayes machines, recommenders) and their implementations via Machine Learning Studio, R, or Python.

Chapter 10, Making Predictions, explains how to use models created, evaluated, and scored in previous chapters. We will also learn how to make the model self-learning from the predictions made.

Chapter 11, Getting It All Together – a Real-World Example, demonstrates how to use certain features to grab, transform, and analyze data for a successful data science case.

Chapter 12, Next Steps with Data Science and SQL, summarizes the main points of all the preceding chapters and concludes outcomes. The chapter also provides ideas of how to continue working with data science, which trends are probably awaited in the future, and which other technologies will play strong roles in data science.

Hands-On Data Science with SQL Server 2017

By : Marek Chmel, Vladimír Mužný

Hands-On Data Science with SQL Server 2017

By: Marek Chmel, Vladimír Mužný

Overview of this book

Related Content you might be interested in

Current Title:

Hands-On Data Science with SQL Server 2017

SQL Server 2019 Administrator's Guide

SQL Server 2017 Machine Learning Services with R

SQL Server 2017 Administrator's Guide