Book Image

SQL Server 2017 Machine Learning Services with R

By : Julie Koesmarno, Tomaž Kaštrun
Book Image

SQL Server 2017 Machine Learning Services with R

By: Julie Koesmarno, Tomaž Kaštrun

Overview of this book

R Services was one of the most anticipated features in SQL Server 2016, improved significantly and rebranded as SQL Server 2017 Machine Learning Services. Prior to SQL Server 2016, many developers and data scientists were already using R to connect to SQL Server in siloed environments that left a lot to be desired, in order to do additional data analysis, superseding SSAS Data Mining or additional CLR programming functions. With R integrated within SQL Server 2017, these developers and data scientists can now benefit from its integrated, effective, efficient, and more streamlined analytics environment. This book gives you foundational knowledge and insights to help you understand SQL Server 2017 Machine Learning Services with R. First and foremost, the book provides practical examples on how to implement, use, and understand SQL Server and R integration in corporate environments, and also provides explanations and underlying motivations. It covers installing Machine Learning Services;maintaining, deploying, and managing code;and monitoring your services. Delving more deeply into predictive modeling and the RevoScaleR package, this book also provides insights into operationalizing code and exploring and visualizing data. To complete the journey, this book covers the new features in SQL Server 2017 and how they are compatible with R, amplifying their combined power.
Table of Contents (12 chapters)

What this book covers

Chapter 1, Introduction to R and SQL Server, begins our data science journey in SQL Server, prior to SQL Server 2016, and brings us to today's SQL Server R integration.

Chapter 2, Overview of Microsoft Machine Learning Server and SQL Server, gives a brief outline and overview of Microsoft Machine Learning Server with an emphasis on SQL Server Machine Learning Services, while exploring how it works and the different versions of R environment. This includes key discussions on the architecture behind it, different computational environments, how the integration among systems work, and how to achieve parallelism and load distribution.

Chapter 3, Managing Machine Learning Services for SQL Server 2017 and R, covers the installation and setup, including how to use PowerShell. It covers exploring the capabilities of a resource governor, setting up roles and security for users to work with SQL Server Machine Learning Services with R, working with sessions and logs, installing any missing or additional R packages for data analysis or predictive modeling, and taking the first steps with using the sp_execute_external_script external procedure.

Chapter 4, Data Exploration and Data Visualization, explores the R syntax for data browsing, analysis, munging, and wrangling for visualization and predictive analysis. Developing these techniques is essential to the next steps covered in this chapter and throughout this book. This chapter introduces various useful R packages for visualization and predictive modeling. In addition, readers will learn how to integrate R with Power BI, SQL Server Reporting Services (SSRS), and mobile reports.

Chapter 5, RevoScaleR Package, discusses the advantages of using RevoScaleR for scalable and distributed statistical computation over large datasets. Using RevoScaleR improves CPU and RAM utilization and improves performance. This chapter introduces readers to RevoScaleR functions on data preparation, descriptive statistics, statistical tests, and sampling, as well as predictive modeling.

Chapter 6, Predictive Modeling, focuses on helping readers who are stepping into the world of prediction modeling for the first time. Using SQL Server and SQL Server Machine Learning Services with R, readers will learn how to create predictions, perform data modeling, explore advanced predictive algorithms available in RevoScaleR and other packages, and how to easily deploy the models and solutions. Finally, calling and running predictions and exposing the results to different proprietary tools (such as Power BI, Excel, and SSRS) complete the world of prediction modeling.

Chapter 7, Operationalizing R Code, provides tips and tricks in operationalizing R code and R predictions. Readers will learn the importance as stable and reliable process flows are essential to combining R code, persistent data, and prediction models in production. In this chapter, readers will have a chance to explore ways to adopt existing and create new R code, followed by integrating this in SQL Server through various readily available client tools such as SQL Server Management Studio (SSMS) and Visual Studio. Furthermore, this chapter covers how readers can use SQL Server Agent jobs, stored procedures, CLR with .NET, and PowerShell to productized R code.

Chapter 8, Deploying, Managing, and Monitoring Database Solutions containing R Code, covers how to manage deployment and change control to database deployment when integrating R code. This chapter provides guidelines on how to do an integrated deployment of the solution and how to implement continuous integration, including automated deployment and how to manage the version control. Here, readers will learn efficient ways to monitor the solution, monitor the effectiveness of the code, and predictive models after the solution is deployed.

Chapter 9, Machine Learning Services with R for DBAs, examines and explores monitoring, performance, and troubleshooting for daily, weekly, and monthly tasks the DBAs are doing. Using simple examples showing that R Services can also be useful for other roles involved in SQL Server, this chapter shows how R Services integrated in SQL Server enables DBAs to be more empowered by evolving their rudimentary monitoring activities into more useful actionable predictions.

Chapter 10, R and SQL Server 2016/2017 Features Extended, covers how new features of SQL Server 2016 and 2017 and R services can be used together, such as taking advantage of the new JSON format with the R language, using new improvements to the in-memory OLTP technology to deliver almost real-time analytics, combining new features in Column store index and R, and how to get the most out of them. It also considers how to leverage PolyBase and Stretch DB to reach beyond on-premises to hybrid and cloud possibilities. Lastly, the query store holds many statistics from execution plans, and R is a perfect tool to perform deeper analysis.