SQL Server 2022 Machine Learning Services Unlock the Value of Your Data
SQL Server 2017 and above include Machine Learning Services that give you the ability to run Python and R scripts on SQL Server data for predictive analytics and machine learning. AI/ML is being adopted at a frenetic pace across almost every industry. Financial services have embraced algorithmic trading and real-time fraud detection. Retail relies on AI/ML to improve customer experience and merge online and in-store experiences. Agriculture leverages AL/ML to track plantings and harvests and to decide when and where to apply pesticide. Automobile manufacturers collect data from cars to predict when parts will fail or when they require service.
What better way to tap into the power of AI/ML than with the ubiquitous SQL Server?
Machine Learning Services executes Python and R scripts in-database, saving you the trouble of exporting data and sending it over the network for analysis. You can do everything that a data engineer and an ML engineer need to do:
- Prepare and clean data
- Feature engineering
- Model training
- Model evaluation
- Model deployment within a database.
In essence, you run Python and R scripts on SQL Server within a stored procedure (sp_execute_external_script) that invokes the Launchpad service to the respective library for execution. Machine Learning Services relies on an extensibility framework to run Python and R scripts. Base distributions of R and Python are included, and you can install and use open-source frameworks and libraries such as TensorFlow, scikit-learn and the popular PyTorch.
In this tutorial, you will learn how to install SQL Server Machine Learning Services on Windows, configure development tools and write and run a Python or R script. All of the SQL and Python code can be found here.
Installing SQL Server 2022 Machine Learning Services
Complete the pre-installation checklist.
If you have SQL Server 2022 deployed, then feel free to use that.
If not, please install one of the following, making sure to install Polybase as well:
Run the SQL Server 2022 Setup wizard. Depending on your environment you will either install SQL Server and add ML features or simply add ML features to an existing deployment.
On the Installation tab, select New SQL Server stand-alone installation or add features to an existing installation.
On the next page, Feature Selection, select the following to install:
- Database Engine Services
- Machine Learning Services
The screenshot below shows the minimum features to install:
You will also need a way to interact with SQL Server:
Make sure that SQL Server is running and connect to it using either of the tools mentioned above.
Install Runtimes and Packages for R
Download and install the most recent version of R 4.2 for Windows.
Install dependencies. From RStudio or RStudio terminal, run the following commands
# R Terminal
install.packages("iterators")
install.packages("foreach")
install.packages("R6")
install.packages("jsonlite")
Download and install CompatibilityAPI and RevoScaleR packages:
install.packages("https://aka.ms/sqlml/r4.2/windows/CompatibilityAPI_1.1.0.zip", repos=NULL)
install.packages("https://aka.ms/sqlml/r4.2/windows/RevoScaleR_10.0.1.zip", repos=NULL)
Configure the installed R runtime with SQL Server. Use the RegisterRext.exe command line utility configure R. You will find it in the RevoScaleR directory in your R library (in my case this was C:\Users\msarr\AppData\Local\R\win-library\4.3\RevoScaleR\rxLibs\x64). Use administrator privileges to open a command line from this location and run the following command, but first make sure to use the Rhome path, username, password and SQL Server Instance name for your environment:
.\RegisterRext.exe /configure /user:SA /password:0JK23mdfidf //rhome:"%ProgramFiles%\R\R-4.3.2" /instance:"MSSQLSERVER"
...
Stopping service MSSQLLaunchpad...
Copied RLauncher.dll from C:\Users\msarr\AppData\Local\R\win-library\4.3\RevoScaleR\rxLibs\x64\ to C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\Binn.
Starting service MSSQLLaunchpad...
R extensibility installed successfully.
Use SSMS to connect to the SQL Server instance where you installed Machine Learning Services previously. Open a New Query window and run the following to enable external scripting
EXEC sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDE
Note that you'll only have to enable external scripts once, regardless of the number of languages you want to support.
Verify the command executed successfully when you see 1 returned below for config_value and run_value
EXEC sp_configure 'external scripts enabled';
Restart the SQL Server instance to apply changes. In SSMS, right-click on the server instance and select Restart.
Verify the installation with a simple T-SQL command that returns the R version
EXEC sp_execute_external_script @script=N'print(R.version)',@language=N'R';
GO
Unfortunately, in my case this returned an error that took a bit of effort to overcome. I have to admit that I wasn't surprised given my previous experience with R's quirkiness.
Msg 39021, Level 16, State 13, Line 0
Unable to launch runtime for 'R' script for request id: A06D71E3-B436-4834-B8C3-15153E906CAC. Please check the configuration of the 'R' runtime. See 'https://docs.microsoft.com/en-us/sql/machine-learning/install/sql-machine-learning-services-windows-install-sql-2022?view=sql-server-ver16' for setup instructions.
Msg 39019, Level 16, State 2, Line 0
An external script error occurred:
Unable to launch the runtime. ErrorCode 0x80004005: .
A search for these errors brought me to Troubleshoot Launchpad for Python and R scripts - SQL Server Machine Learning Services. Skimming the article informed me that there was likely a configuration error, perhaps a permissions issue either for the SQL Server Launchpad service or the R working directory or both.
If you encounter this error, the first step is to determine whether or not Launchpad is running. Go to the Windows menu and run SQLServerManager16.msc. Make sure that Launchpad is running; if it isn't then restart it. Note the name of the service account that Launchpad is running under.
If Launchpad isn't running, check the system event log. Open Event Viewer, toggle open Windows Logs and click on System. Check the system event log for details about why the service stopped.
I was able to glean the most troubleshooting information by checking the rlauncher.log under C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\Log\ExtensibilityLog. Scroll down to the bottom and look for errors. In my case, I can see that rterm.exe fails to start and I got the dreaded "Session start failed with: F", which has a workaround posted at Unable to communicate with the runtime for 'R' script in SQL Server and Problem getting SQL Server 2016 R Services (In-Database) working.
After I edited the working directory in the rlauncher.config file
, located in the C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn directory, I restarted SQL Server and Launchpad and ran external R scripts successfully. I'm still working to forgive myself for forgetting that R requires old 8.3 format file names.
Install Runtimes and Packages for Python
Download Python 3.10 for Windows and install it with the following:
- You may need to run Python Setup as administrator to install for all users
- Check Install launcher for all users (recommended)
- From Python Setup, choose Customize installation
- Select the optional features you want to install and click Next
- On the Advanced Options page, enable Install for all users, keep the other default settings and then click the Install button.
Open a command prompt as administrator. Download and install the most recent version of the revoscalepy package and its dependencies.
cd "C:\Program Files\Python310\"
python -m pip install -t "C:\Program Files\Python310\Lib\site-packages" dill numpy==1.22.0 pandas patsy python-dateutil
python -m pip install -t "C:\Program Files\Python310\Lib\site-packages" https://aka.ms/sqlml/python3.10/windows/revoscalepy-10.0.1-py3-none-any.whl
Use the following icacls commands to grant READ and EXECUTE access to SQL Server Launchpad Service and SID S-1-15-2-1 (All_Application_Packages).
icacls "C:\Program Files\Python310\Lib\site-packages" /grant "NT Service\MSSQLLAUNCHPAD":(OI)(CI)RX /T
icacls "C:\Program Files\Python310\Lib\site-packages" /grant *S-1-15-2-1:(OI)(CI)RX /T
Configure Python Runtime with SQL Server
Use the RegisterRext.exe
command-line utility to configure the installed Python runtime in SQL Server 2022. Change the instance name if necessary.
cd "C:\Program Files\Python310\Lib\site-packages\revoscalepy\rxLibs"
.\RegisterRext.exe /configure /pythonhome:"C:\Program Files\Python310" /instance:"MSSQLSERVER"
Return to SSMS and connect to the SQL Server instance where you just installed SQL Server Machine Learning Services. If you have not yet enable external scripts, click New Query and run the following
EXEC sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDE
To verify, confirm that the following query returns 1 for config_value
and run_value
EXEC sp_configure 'external scripts enabled';
Restart the SQL Server service. This will also restart the SQL Server Launchpad service. You can do this by right-clicking on the SQL Server instance in the SSMS object explorer and selecting Restart, or by using the SQL Server Configuration Manager.
Verify that SQL Server is configured correctly for Python external scripts with a command that requests the Python version
EXEC sp_execute_external_script @script=N'import sys;print(sys.version)',@language=N'Python'
GO
This should return the following
STDOUT message(s) from external script:
3.10.0 (tags/v3.10.0:b494f59, Oct 4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)]
Completion time: 2024-01-26T14:01:18.4833562-08:00
Python Operations in SQL Server
Let's take a tour through some of the things you can do with Python and SQL Server. You can find more simple examples at Quickstart: Run Python scripts - SQL machine learning.
You can import additional Python libraries using SSMS. Below, we import Numpy and raise 2 and 5 to the third power.
EXEC sp_execute_external_script
@language =N'Python',
@script = N'
import numpy as np
cube2 = np.power(2,3)
cube5 = np.power(5,3)
print(cube2)
print(cube5)
'
GO
The output will be the cube of 2 and the cube of 5.
One of the most common uses of Python in SQL Server is to conduct basic arithmetic. The example below shows addition, subtraction, multiplication and modulus.
EXEC sp_execute_external_script
@language =N'Python',
@script = N'
number1 = 50
number2 = 10
result = number1 + number2
print(result)
result = number1 - number2
print(result)
result = number1 / number2
print(result)
result = number1 * number2
print(result)
result = number1 % number2
print(result)
'
GO
Output:
STDOUT message(s) from external script:
60
40
5.0
500
0
Completion time: 2024-01-30T09:22:28.1117766-08:00
Assignment operations assign the value of a constant or a variable to another variable. SQL Server supports all the Python assignment operators.
EXEC sp_configure 'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE
EXEC sp_execute_external_script
@language =N'Python',
@script = N'
number1 = 50
number2 = 10
number1 += number2
print(number1)
number1 = 50
number2 = 10
number1 -= number2
print(number1)
number1 = 50
number2 = 10
number1 *= number2
print(number1)
number1 = 50
number2 = 10
number1 /= number2
print(number1)
number1 = 50
number2 = 10
number1 %= number2
print(number1)
'
GO
Output:
STDOUT message(s) from external script:
60
40
500
5.0
0
Completion time: 2024-01-30T09:28:26.1454821-08:00
You can call built-in and custom Python functions from SQL Server. This returns the cube of 5 and the cube of 2.
EXEC sp_execute_external_script
@language =N'Python',
@script = N'
def printCube (num):
print(num ** 3)
printCube(5)
printCube(2)
'
GO
SQL Server supports comparison operations in Python.
EXEC sp_execute_external_script
@language =N'Python',
@script = N'
number1 = 50
number2 = 10
print(number1 == number2)
print(number1 != number2)
print(number1 > number2)
print(number1 < number2)
print(number1 >= number2)
print(number1 <= number2)
'
GO
Output:
STDOUT message(s) from external script:
False
True
True
False
True
False
Completion time: 2024-01-30T09:38:10.6732599-08:00
SQL Server supports the logical Python operators AND, OR, NOT.
EXEC sp_execute_external_script
@language =N'Python',
@script = N'
print(True and True)
print(True or False)
print(not(True))
'
GO
Output:
STDOUT message(s) from external script:
True
True
False
Completion time: 2024-01-30T09:40:18.7572690-08:00
Please see Python in SQL Server: The Basics for additional examples.
Python SQL Machine Learning Services Tutorial
With Python external scripts running successfully, let's walk through the process of preparing data in SQL Server and then using Python to run a linear regression model. This tutorial is based on Python tutorial: Predict ski rental with linear regression with SQL machine learning.
We're going to:
- Download and restore the sample database
- Load data from the database into a Python data frame
- Prepare the data in Python
- Train a linear regression model in Python
- Create a stored procedure that generates the ML model
- Store the model in a database table
- Create a stored procedure that makes predictions using the model
- Execute the model on new data
If you've followed the first portion of this blog post, then you have most of the prerequisites installed.
You can find the code for this tutorial in Blog Assets - SQL ML Code.
Getting Started
If not yet installed, please install these additional Python packages:
- Pandas
- Pyodbc
- Scikit-learn
Download and restore TutorialDB.bak. If you need instructions, please see Back up and restore a database - Azure Data Studio | Microsoft Learn.
Verify that the restored database is present
USE TutorialDB;
SELECT * FROM [dbo].[rental_data];
Explore and Prepare Data
In order to use the data in Python, you must first load it from the database into a pandas data frame. Then you can prepare the data for analysis.
Create a new Python notebook in your editor of choice. I'm using VS Code. Copy and paste the script below into your notebook. Before you run it, update the connection string with your server, username and password. To use Windows authentication, specify Trusted_Connection=Yes
; instead of the UID and PWD parameters in the connection string.
import pyodbc
import pandas
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Connection string to your SQL Server instance
conn_str = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server}; SERVER=<server>; DATABASE=TutorialDB;UID=<username>;PWD=<password>')
query_str = 'SELECT Year, Month, Day, Rentalcount, Weekday, Holiday, Snow FROM dbo.rental_data'
df = pandas.read_sql(sql=query_str, con=conn_str)
print("Data frame:", df)
Results will be similar to the following.
Data frame: Year Month Day Rentalcount WeekDay Holiday Snow
0 2014 1 20 445 2 1 0
1 2014 2 13 40 5 0 0
2 2013 3 10 456 1 0 0
3 2014 3 31 38 2 0 0
4 2014 4 24 23 5 0 0
.. ... ... ... ... ... ... ...
448 2013 2 19 57 3 0 1
449 2015 3 18 26 4 0 0
450 2015 3 24 29 3 0 1
451 2014 3 26 50 4 0 1
452 2015 12 6 377 1 0 1
[453 rows x 7 columns]
Like all good data engineers, you're going to drop the columns that you don't intend to analyze. You must drop RentalCount because it will be used as the prediction target.
columns = df.columns.tolist()
columns = [c for c in columns if c not in ["Year", "Rentalcount"]]
print("Training set:", df[columns])
The stripped down dataset resembles
Training set: Month Day Weekday Holiday Snow
0 1 20 2 1 0
1 2 13 5 0 0
2 3 10 1 0 0
3 3 31 2 0 0
4 4 24 5 0 0
.. ... ... ... ... ...
448 2 19 3 0 1
449 3 18 4 0 0
450 3 24 3 0 1
451 3 26 4 0 1
452 12 6 1 0 1
[453 rows x 5 columns]
Train a Linear Regression Model
Training is the process of finding a model that best describes the dependency between the variables in the dataset. You're about to train model lin_model using a linear regression algorithm.
# Store the variable we'll be predicting on.
target = "Rentalcount"
# Generate the training set. Set random_state to be able to replicate results.
train = df.sample(frac=0.8, random_state=1)
# Select anything not in the training set and put it in the testing set.
test = df.loc[~df.index.isin(train.index)]
# Print the shapes of both sets.
print("Training set shape:", train.shape)
print("Testing set shape:", test.shape)
# Initialize the model class.
lin_model = LinearRegression()
# Fit the model to the training data.
lin_model.fit(train[columns], train[target])
After running the above, you'll see results like
Training set shape: (362, 7)
Testing set shape: (91, 7)
Predict
Use a predict function to use the lin_model
to predict rental counts.
# Generate our predictions for the test set.
lin_predictions = lin_model.predict(test[columns])
print("Predictions:", lin_predictions)
# Compute error between our test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, test[target])
print("Computed error:", lin_mse)
Results:
Predictions: [124.41293228 123.8095075 117.67253182 209.39332151 135.46159387
199.50603805 472.14918499 90.15781602 216.61319499 120.30710327
89.47591091 127.71290441 207.44065517 125.68466139 201.38119194
204.29377218 127.4494643 113.42721447 127.37388762 94.66754136
90.21979191 173.86647615 130.34747586 111.81550069 118.88131715
124.74028405 211.95038051 202.06309706 123.53053083 167.06313191
206.24643852 122.64812937 179.98791527 125.1558454 168.00847713
120.2305587 196.60802649 117.00616326 173.20010759 89.9563518
92.11048236 120.91052805 175.47818992 129.65196995 120.97443971
175.95863082 127.24800008 135.05866542 206.49627783 91.63004147
115.78280925 208.92841718 213.5137192 212.83278197 96.74415948
95.1324457 199.9089665 206.10791806 126.16510228 120.0281266
209.08150631 132.88996619 178.84110582 128.85971386 124.67637239
115.58134503 96.82167192 514.61789505 125.48319717 207.50359894
121.64080826 201.9381774 113.22575025 202.46505762 90.7002328
92.31194658 201.25627228 516.97252195 91.36660136 599.27093251
199.6445585 123.66905128 117.4710676 173.12259514 129.60359486
209.59478573 206.29481361 210.69322009 205.50255751 210.88011563
207.65572019]
Computed error: 35003.54030828391
Deploy a Linear Regression Model with SQL Machine Learning
This is where the action takes place – you'll deploy the model you just created in Python into a SQL Server database using Machine Learning Services. You will create a stored procedure that trains and generates the ML model and then store that model in a table. Then you will create a stored procedure that applies that model to make predictions and execute it.
Use the Python scripts to create a stored procedure generate_rental_py_model
that trains and generates the model using scikit-learn LinearRegression.
Caution! SQL Server is very sensitive to the formatting in the Python code.
Copying and pasting my code and running it gave me external script errors, SqlSatelliteCall errors – it was a time-wasting mess. Avoid the pain, copy and paste the code below into an editor like VS Code that will format it for you and then save it as a SQL query and submit that. Also, all the code from this tutorial can be found here. Don't just copy and paste into SSMS or ADS.
-- Stored procedure that trains and generates a Python model using the rental_data and a linear regression algorithm
DROP PROCEDURE IF EXISTS generate_rental_py_model;
go
CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max) OUTPUT)
AS
BEGIN
EXECUTE sp_execute_external_script
@language = N'Python'
, @script = N'
from sklearn.linear_model import LinearRegression
import pickle
df = rental_train_data
# Get all the columns from the dataframe.
columns = df.columns.tolist()
# Store the variable well be predicting on.
target = "RentalCount"
# Initialize the model class.
lin_model = LinearRegression()
# Fit the model to the training data.
lin_model.fit(df[columns], df[target])
# Before saving the model to the DB table, convert it to a binary object
trained_model = pickle.dumps(lin_model)'
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
, @input_data_1_name = N'rental_train_data'
, @params = N'@trained_model varbinary(max) OUTPUT'
, @trained_model = @trained_model OUTPUT;
END;
GO
Create a table in which to save the model.
USE TutorialDB;
DROP TABLE IF EXISTS dbo.rental_py_models;
GO
CREATE TABLE dbo.rental_py_models (
model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY,
model VARBINARY(MAX) NOT NULL
);
GO
Save the model, linear_model as a binary object in the table.
DECLARE @model VARBINARY(MAX);
EXECUTE generate_rental_py_model @model OUTPUT;
INSERT INTO rental_py_models (model_name, model) VALUES('linear_model', @model);
Create the Predicting Stored Procedure
The stored procedure py_predict_rentalcount
uses the model to make predictions about ski rentals. Again, be very careful formatting SQL that contains Python.
DROP PROCEDURE IF EXISTS py_predict_rentalcount;
GO
CREATE PROCEDURE py_predict_rentalcount (@model varchar(100))
AS
BEGIN
DECLARE @py_model varbinary(max) = (select model from rental_py_models where model_name = @model);
EXECUTE sp_execute_external_script
@language = N'Python',
@script = N'
# Import the scikit-learn function to compute error.
from sklearn.metrics import mean_squared_error
import pickle
import pandas
rental_model = pickle.loads(py_model)
df = rental_score_data
# Get all the columns from the dataframe.
columns = df.columns.tolist()
# Variable you will be predicting on.
target = "RentalCount"
# Generate the predictions for the test set.
lin_predictions = rental_model.predict(df[columns])
print(lin_predictions)
# Compute error between the test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, df[target])
#print(lin_mse)
predictions_df = pandas.DataFrame(lin_predictions)
OutputDataSet = pandas.concat([predictions_df, df["RentalCount"], df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"], df["Year"]], axis=1)
'
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow", "Holiday" from rental_data where Year = 2015'
, @input_data_1_name = N'rental_score_data'
, @params = N'@py_model varbinary(max)'
, @py_model = @py_model
with result sets (("RentalCount_Predicted" float, "RentalCount" float, "Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float, "Year" float));
END;
GO
Create a table to store predictions generated by the stored procedure.
DROP TABLE IF EXISTS [dbo].[py_rental_predictions];
GO
CREATE TABLE [dbo].[py_rental_predictions](
[RentalCount_Predicted] [int] NULL,
[RentalCount_Actual] [int] NULL,
[Month] [int] NULL,
[Day] [int] NULL,
[WeekDay] [int] NULL,
[Snow] [int] NULL,
[Holiday] [int] NULL,
[Year] [int] NULL
) ON [PRIMARY]
GO
Finally, run the stored procedure to predict the number of rentals.
--Insert the results of the predictions for test set into a table
INSERT INTO py_rental_predictions
EXEC py_predict_rentalcount 'linear_model';
-- Select contents of the table
SELECT * FROM py_rental_predictions;
Tada! Results will be similar to those shown below.
The above tutorial showed you a workflow to create, train, deploy and run an ML model as a stored procedure on SQL Server 2022.
Looking for more? Please see Python tutorials - SQL machine learning, plus another great tutorial can be found at Data Interpolation and Transformation using Python in SQL Server 2017.
Python, R and SQL Server for Limitless Analytics and AI
The name SQL Server may conjure up images of a stodgy RDBMS data silo, but in SQL Server 2022 nothing could be further from the truth. The ability to run C, Java, Python and R code as stored procedures extends SQL Server functionality far beyond its built-in features. The ability to query external tables on and make backups to S3-API compatible object storage.
MinIO is the best choice of object storage to pair with SQL Server 2022. This is especially true on-premise where MinIO is your only performant, scalable, fully S3 API compatible and enterprise-grade object storage solution. Data can live anywhere MinIO runs and be available for AI – public cloud, private cloud, on-premise, at the edge.
Download MinIO and get started building machine learning applications today. Questions or comments? Reach out to us on our community slack channel.