Book Image

SQL Server 2017 Machine Learning Services with R.

By : Julie Koesmarno, Tomaž Kaštrun
Book Image

SQL Server 2017 Machine Learning Services with R.

By: Julie Koesmarno, Tomaž Kaštrun

Overview of this book

R Services was one of the most anticipated features in SQL Server 2016, improved significantly and rebranded as SQL Server 2017 Machine Learning Services. Prior to SQL Server 2016, many developers and data scientists were already using R to connect to SQL Server in siloed environments that left a lot to be desired, in order to do additional data analysis, superseding SSAS Data Mining or additional CLR programming functions. With R integrated within SQL Server 2017, these developers and data scientists can now benefit from its integrated, effective, efficient, and more streamlined analytics environment. This book gives you foundational knowledge and insights to help you understand SQL Server 2017 Machine Learning Services with R. First and foremost, the book provides practical examples on how to implement, use, and understand SQL Server and R integration in corporate environments, and also provides explanations and underlying motivations. It covers installing Machine Learning Services;maintaining, deploying, and managing code;and monitoring your services. Delving more deeply into predictive modeling and the RevoScaleR package, this book also provides insights into operationalizing code and exploring and visualizing data. To complete the journey, this book covers the new features in SQL Server 2017 and how they are compatible with R, amplifying their combined power.
Table of Contents (12 chapters)

Dataset subsetting

Subsetting the data is also relatively straightforward using the rxDataStep() function:

EXEC sp_execute_external_script
      @language = N'R'
      ,@script = N'
                  library(RevoScaleR)
                  df_sql <- InputDataSet
                  df_sql_subset <- rxDataStep(inData = df_sql, varsToKeep = NULL, rowSelection = (BusinessEntityID<=1000))
                  OutputDataSet <- df_sql_subset'
      ,@input_data_1 = N'
                  SELECT 
                   BusinessEntityID
                  ,[Name]
                  ,SalesPersonID
                  FROM [Sales].[Store]'
WITH RESULT SETS
      ((
       BusinessEntityID INT
      ,[Name] NVARCHAR(MAX)
      ,SalesPersonID INT
      ));
  

Keep in mind that subsetting operations using R code might bring unnecessary memory and I/O costs, especially...