DeployLasso
, DeployRandomForest
, or DeployLinearMixedModel
These classes let one deploy custom models on varied datasets via the following workflow:
One can do both classification (ie, predict Y or N) as well as regression (ie, predict a numeric field).
Nope. It’ll help if you can follow these guidelines:
CREATE TABLE [SAM].[dbo].[HCRDeployClassificationBASE] (
[BindingID] [int] ,
[BindingNM] [varchar] (255),
[LastLoadDTS] [datetime2] (7),
[PatientEncounterID] [decimal] (38, 0),
[PredictedProbNBR] [decimal] (38, 2),
[Factor1TXT] [varchar] (255),
[Factor2TXT] [varchar] (255),
[Factor3TXT] [varchar] (255)
)
CREATE TABLE [SAM].[dbo].[HCRDeployRegressionBASE] (
[BindingID] [int],
[BindingNM] [varchar] (255),
[LastLoadDTS] [datetime2] (7),
[PatientEncounterID] [decimal] (38, 0),
[PredictedValueNBR] [decimal] (38, 2),
[Factor1TXT] [varchar] (255),
[Factor2TXT] [varchar] (255),
[Factor3TXT] [varchar] (255)
)
Note these preprocessing steps should first be tested and found useful in the development step.
impute
argument (see below).LinearMixedModelDeployment
(detailed below).selectData
Return: a data frame that represents your data.
library(healthcareai)
library(RODBC)
connection.string = "
driver={SQL Server};
server=localhost;
database=SAM;
trusted_connection=true
"
# This query should pull only rows for training. They must have a label.
query = "
SELECT
[PatientEncounterID]
,[PatientID]
,[SystolicBPNBR]
,[LDLNBR]
,[A1CNBR]
,[GenderFLG]
,[ThirtyDayReadmitFLG]
FROM [SAM].[dbo].[HCRDiabetesClinical]
"
df <- selectData(connection.string, query)
head(df)
Note: if you want a CSV example (ie, an example that you can run as-is), see the built-in docs:
library(healthcareai)
?healthcareai
Set up your parameters via SupervisedModelDevelopmentParams
p <- SupervisedModelDevelopmentParams$new()
p$df = df
p$type = 'classification'
p$impute = TRUE
p$grainCol = 'PatientEncounterID'
p$predictedCol = 'ThirtyDayReadmitFLG'
p$debug = FALSE
p$cores = 1
Create the models via the LassoDevelopment
or RandomForestDevelopment
algorithms.
# Run Lasso
Lasso <- LassoDevelopment$new(p)
Lasso$run()
# Run Random Forest
rf <- RandomForestDevelopment$new(p)
rf$run()
Specify a deploy set
#specify a deploy set
dfDeploy <- df[951:1000,]
Set deployment parameters via SupvervisedModelDeploymentParams
p2 <- SupervisedModelDeploymentParams$new()
p2$type <- "classification"
p2$df <- dfDeploy
p2$grainCol <- "PatientEncounterID"
p2$predictedCol <- "ThirtyDayReadmitFLG"
p2$impute <- TRUE
p2$debug <- FALSE
p2$cores <- 1
Since random forest performed better in step 2, we will choose to deploy the random forest model.
dL <- RandomForestDeployment$new(p2)
dL$deploy()
dfOut <- dL$getOutDf()
dfOut
#### Classification example using SQL Server data ####
# This example requires you to first create a table in SQL Server.
# If you want to write output to a non-default schema, please see the section
# below, titled: "Deploying and Writing to non-default schemas in SQL Server".
# If you prefer to not use SAMD, execute this in SSMS to create output table:
# CREATE TABLE dbo.HCRDeployClassificationBASE(
# BindingID float, BindingNM varchar(255), LastLoadDTS datetime2,
# PatientEncounterID int, <--change to match inputID
# PredictedProbNBR decimal(38, 2),
# Factor1TXT varchar(255), Factor2TXT varchar(255), Factor3TXT varchar(255)
# )
## 1. Loading data and packages.
ptm <- proc.time()
library(healthcareai)
connection.string <- "
driver={SQL Server};
server=localhost;
database=SAM;
trusted_connection=true
"
query <- "
SELECT
[PatientEncounterID] --Only need one ID column for random forest
,[SystolicBPNBR]
,[LDLNBR]
,[A1CNBR]
,[GenderFLG]
,[ThirtyDayReadmitFLG]
FROM [SAM].[dbo].[HCRDiabetesClinical]
"
df <- selectData(connection.string, query)
# Partition develop and deploy data
dfDeploy <- df[951:1000,]
## 2. Train and save the model using DEVELOP
print('Historical, development data:')
str(df)
set.seed(42)
p <- SupervisedModelDevelopmentParams$new()
p$df <- df
p$type <- "classification"
p$impute <- TRUE
p$grainCol <- "PatientEncounterID"
p$predictedCol <- "ThirtyDayReadmitFLG"
p$debug <- FALSE
p$cores <- 1
# Run RandomForest
RandomForest <- RandomForestDevelopment$new(p)
RandomForest$run()
## 3. Load saved model and use DEPLOY to generate predictions.
print('Fake production data:')
str(dfDeploy)
p2 <- SupervisedModelDeploymentParams$new()
p2$type <- "classification"
p2$df <- dfDeploy
p2$grainCol <- "PatientEncounterID"
p2$predictedCol <- "ThirtyDayReadmitFLG"
p2$impute <- TRUE
p2$debug <- FALSE
p2$cores <- 1
dL <- RandomForestDeployment$new(p2)
dL$deploy()
dfOut <- dL$getOutDf()
writeData(MSSQLConnectionString = connection.string,
df = dfOut,
tableName = 'HCRDeployClassificationBASE')
print(proc.time() - ptm)
Start with the arguments. You’ll want to add
#### Classification Example using csv data ####
## 1. Loading data and packages.
ptm <- proc.time()
library(healthcareai)
# setwd('C:/Yourscriptlocation/Useforwardslashes') # Uncomment if using csv
# Can delete this line in your work
csvfile <- system.file("extdata",
"HCRDiabetesClinical.csv",
package = "healthcareai")
# Replace csvfile with 'path/file'
df <- read.csv(file = csvfile,
header = TRUE,
na.strings = c("NULL", "NA", ""))
df$PatientID <- NULL # Only one ID column (ie, PatientEncounterID) is needed remove this column
# Partition develop and deploy data
dfDeploy <- df[951:1000,]
## 2. Train and save the model using DEVELOP
print('Historical, development data:')
str(df)
set.seed(42)
p <- SupervisedModelDevelopmentParams$new()
p$df <- df
p$type <- "classification"
p$impute <- TRUE
p$grainCol <- "PatientEncounterID"
p$predictedCol <- "ThirtyDayReadmitFLG"
p$debug <- FALSE
p$cores <- 1
# Run RandomForest
RandomForest <- RandomForestDevelopment$new(p)
RandomForest$run()
## 3. Load saved model and use DEPLOY to generate predictions.
print('Fake production data:')
str(dfDeploy)
p2 <- SupervisedModelDeploymentParams$new()
p2$type <- "classification"
p2$df <- dfDeploy
p2$grainCol <- "PatientEncounterID"
p2$predictedCol <- "ThirtyDayReadmitFLG"
p2$impute <- TRUE
p2$debug <- FALSE
p2$cores <- 1
dL <- RandomForestDeployment$new(p2)
dL$deploy()
dfOut <- dL$getOutDf()
head(dfOut)
# Write to CSV (or JSON, MySQL, etc) using plain R syntax
# write.csv(dfOut,'path/predictionsfile.csv')
print(proc.time() - ptm)
As of the end of July 2017, there are some known issues with the packages DBI
and odbc
. These issues get in the way of those writing to non-default schemas in SQL-Server. We hope to have the implementation of DBI
and odbc
fixed soon but we are waiting on the developers of those resepctive packages to fix the issue. Links to the issues on github and stack overflow are here, here, and here. For now, we have created a work-around using the package RODBC
.
Note: RODBC
is difficult to install on Mac OS. This work-around is mainly for Windows users. If Mac users can getRODBC
installed on their machines, then the same code will work, but the installation will not be trivial.
You may need to install RODBC using install.packages(“RODBC”). Here is an example of the work-around.
#First, create a table in SQL Server using a non-default schema. The example
#creates this table in the SAM database on localhost. You may also need to
#create a new schema(Cardiovascular) in SSMS for this specific example to work.
#CREATE TABLE [Cardiovascular].[TestTable](
#[a] [float] NULL,
#[b] [float] NULL,
#[c] [varchar](255) NULL)
#install the RODBC pacakge onto your machine. You only need to do this one time.
install.packages("RODBC")
#load the package
library(RODBC)
#create a connection to work with
con <- RODBC::odbcDriverConnect('driver={SQL Server};
server=localhost;
database=SAM;
trusted_connection=true')
#build a df to write to SQL Server. df columns names must match the SSMS table.
df <- data.frame(a = c(10, 20, 30),
b = c(20, 40, 60),
c = c("oneT", "twoT", "threeT"))
#write the df to the table
RODBC::sqlSave(con, df, "Cardiovascular.TestTable", append = TRUE,
rownames = FALSE)
#verify that the table was written to
confirmDf <- sqlQuery(con, 'select * from Cardiovascular.TestTable')
head(confirmDf)
Note: if you need to see the built-in docs (which are always up-to-date):
library(healthcareai)
?healthcareai