How to use CGNM

To illustrate the use of CGNM here we illustrate how CGNM can be used to estimate two sets of the best fit parameters of the pharmacokinetics model when the drug is administered orally (known as flip-flop kinetics).

Prepare the model (\(\boldsymbol f\))

model_function=function(x){

  observation_time=c(0.1,0.2,0.4,0.6,1,2,3,6,12)
  Dose=1000
  F=1

  ka=x[1]
  V1=x[2]
  CL_2=x[3]
  t=observation_time

  Cp=ka*F*Dose/(V1*(ka-CL_2/V1))*(exp(-CL_2/V1*t)-exp(-ka*t))

  log10(Cp)
}

Prepare the data (\(\boldsymbol y^*\))

observation=log10(c(4.91, 8.65, 12.4, 18.7, 24.3, 24.5, 18.4, 4.66, 0.238))

Run Cluster_Gauss_Newton_method

Here we have specified the upper and lower range of the initial guess.

CGNM_result=Cluster_Gauss_Newton_method(nonlinearFunction=model_function,targetVector = observation,
initial_lowerRange =rep(0.01,3),initial_upperRange =  rep(100,3),lowerBound = rep(0,3), saveLog=TRUE, num_minimizersToFind = 500, ParameterNames = c("Ka","V1","CL"))
#> Ka, V1, CL will be transformed internally to impose boundaries. See CGNM_result$runSetting$ReparameterizationDef for exact transformation. Transformed parameters are denoted as x and untransformed parameters are denoted as theta.
#> checking if the nonlinearFunction can be evaluated at the initial_lowerRange
#> Evaluation Successful
#> checking if the nonlinearFunction can be evaluated at the initial_upperRange
#> Evaluation Successful
#> checking if the nonlinearFunction can be evaluated at the (initial_upperRange+initial_lowerRange)/2
#> Evaluation Successful
#> CGNM iteration should finish before: 2024-06-12 07:30:01.711278
#> Generating initial cluster. 494 out of 500 done
#> Generating initial cluster. 500 out of 500 done
#> Iteration:1  Median sum of squares residual=5.93019141072366
#> Rough estimation of remaining computation time: 0.6 min
#> CGNM iteration estimated to finish at: 2024-06-12 07:30:21.045058
#> Iteration:2  Median sum of squares residual=2.33642760729459
#> Iteration:3  Median sum of squares residual=0.991368031073654
#> Iteration:4  Median sum of squares residual=0.92698116203561
#> Iteration:5  Median sum of squares residual=0.851117434916381
#> Iteration:6  Median sum of squares residual=0.386490181890342
#> Iteration:7  Median sum of squares residual=0.0168961333027919
#> Iteration:8  Median sum of squares residual=0.00735709659393878
#> Iteration:9  Median sum of squares residual=0.00734926048607087
#> Iteration:10  Median sum of squares residual=0.00734923412027585
#> Iteration:11  Median sum of squares residual=0.00734923408565502
#> CGNM iteration estimated to finish at: 2024-06-12 07:30:20.079322
#> Iteration:12  Median sum of squares residual=0.00734923408395069
#> Iteration:13  Median sum of squares residual=0.00734923408243488
#> Iteration:14  Median sum of squares residual=0.00734923408241163
#> Iteration:15  Median sum of squares residual=0.00734923408240997
#> Iteration:16  Median sum of squares residual=0.00734923408239416
#> Iteration:17  Median sum of squares residual=0.00734923408238561
#> Iteration:18  Median sum of squares residual=0.00734923408238496
#> Iteration:19  Median sum of squares residual=0.00734923408238248
#> Iteration:20  Median sum of squares residual=0.00734923408238198
#> Iteration:21  Median sum of squares residual=0.00734923408238159
#> CGNM iteration estimated to finish at: 2024-06-12 07:30:20.270039
#> Iteration:22  Median sum of squares residual=0.00734923408238108
#> Iteration:23  Median sum of squares residual=0.00734923408238106
#> Iteration:24  Median sum of squares residual=0.00734923408238101
#> Iteration:25  Median sum of squares residual=0.0073492340823809
#> CGNM computation time:  0.6 min

Obtain the approximate minimizers

kable(head(acceptedApproximateMinimizers(CGNM_result)))

Ka	V1	CL
0.9265056	19.07204	9.877326
0.9265061	19.07205	9.877327
0.5178956	10.66084	9.877326
0.5178956	10.66084	9.877325
0.9265056	19.07204	9.877326
0.5178956	10.66084	9.877326

kable(table_parameterSummary(CGNM_result))

	CGNM: Minimum	25 percentile	Median	75 percentile	Maximum
Ka	0.5178843	0.5178956	0.5178956	0.9265056	0.926524
V1	10.6601001	10.6608371	10.6608409	19.0720415	19.072354
CL	9.8768983	9.8773257	9.8773258	9.8773258	9.877386

Can run residual resampling bootstrap analyses using CGNM as well

CGNM_bootstrap=Cluster_Gauss_Newton_Bootstrap_method(CGNM_result, nonlinearFunction=model_function)
#> checking if the nonlinearFunction can be evaluated at the initial_lowerRange
#> Evaluation Successful
#> checking if the nonlinearFunction can be evaluated at the initial_upperRange
#> Evaluation Successful
#> checking if the nonlinearFunction can be evaluated at the (initial_upperRange+initial_lowerRange)/2
#> Evaluation Successful
#> CGNM iteration should finish before: 2024-06-12 07:30:20.943685
#> Warning in dir.create(saveFolderName): 'CGNM_log_bootstrap' already exists
#> Generating initial cluster. 200 out of 200 done
#> Iteration:1  Median sum of squares residual=0.0127062558505551
#> Rough estimation of remaining computation time: 0.1 min
#> CGNM iteration estimated to finish at: 2024-06-12 07:30:24.754781
#> Iteration:2  Median sum of squares residual=0.0117807139261736
#> Iteration:3  Median sum of squares residual=0.0116867887961055
#> Iteration:4  Median sum of squares residual=0.011685126669813
#> Iteration:5  Median sum of squares residual=0.0116850387674895
#> Iteration:6  Median sum of squares residual=0.0116850385599514
#> Iteration:7  Median sum of squares residual=0.0116850385599514
#> Iteration:8  Median sum of squares residual=0.0116850385599514
#> Iteration:9  Median sum of squares residual=0.0116850385399327
#> Iteration:10  Median sum of squares residual=0.0116850385399327
#> Iteration:11  Median sum of squares residual=0.0116850385395219
#> CGNM iteration estimated to finish at: 2024-06-12 07:30:24.683724
#> Iteration:12  Median sum of squares residual=0.0116850385393764
#> Iteration:13  Median sum of squares residual=0.0116850385393764
#> Iteration:14  Median sum of squares residual=0.0116850385393764
#> Iteration:15  Median sum of squares residual=0.0116850385392747
#> Iteration:16  Median sum of squares residual=0.0116850385392747
#> Iteration:17  Median sum of squares residual=0.0116850385390926
#> Iteration:18  Median sum of squares residual=0.0116850385390926
#> Iteration:19  Median sum of squares residual=0.0116850385390926
#> Iteration:20  Median sum of squares residual=0.0116850385390926
#> Iteration:21  Median sum of squares residual=0.0116850385390926
#> CGNM iteration estimated to finish at: 2024-06-12 07:30:24.558055
#> Iteration:22  Median sum of squares residual=0.0116850385387273
#> Iteration:23  Median sum of squares residual=0.0116850385387273
#> Iteration:24  Median sum of squares residual=0.0116850385387273
#> Iteration:25  Median sum of squares residual=0.0116850385387273
#> CGNM computation time:  0.1 min

kable(table_parameterSummary(CGNM_bootstrap))

	CGNM Bootstrap: Minimum	25 percentile	Median	75 percentile	Maximum	RSE (%)
Ka	0.4885198	0.5159554	0.5667962	0.9399051	1.123803	30.113745
V1	9.4976136	10.5712748	12.2210459	19.1078962	23.680541	29.802505
CL	9.4656483	9.6745091	9.8509610	10.0757591	11.245057	2.948717

Visualize the CGNM modelfit analysis result

To use the plot functions the user needs to manually load ggplot2.

library(ggplot2)

Inspect the distribution of SSR of approximate minimizers found by CGNM

Despite the robustness of the algorithm not all approximate minimizers converge so here we visually inspect to see how many of the approximate minimizers we consider to have the similar SSR to the minimum SSR. Currently the algorithm automatically choose “acceptable” approximate minimizer based on Grubbs’ Test for Outliers. If for whatever the reason this criterion is not satisfactly the users can manually set the indicies of the acceptable approximat minimizers.

plot_Rank_SSR(CGNM_result)

plot_paraDistribution_byHistogram(CGNM_bootstrap, bins = 50)+scale_x_continuous(trans="log10")

visually inspect goodness of fit of top 50 approximate minimizers

plot_goodnessOfFit(CGNM_result, plotType = 1, independentVariableVector = c(0.1,0.2,0.4,0.6,1,2,3,6,12), plotRank = seq(1,50))

plot model prediction with uncertainties based on residual resampling bootstrap analysis

plot_goodnessOfFit(CGNM_bootstrap, plotType = 1, independentVariableVector = c(0.1,0.2,0.4,0.6,1,2,3,6,12))

plot profile likelihood

plot_profileLikelihood(c("CGNM_log","CGNM_log_bootstrap"))+scale_x_continuous(trans="log10")
#> [1] "log saved in /private/var/folders/n8/mrjc8j9j45x93m7t_hmvg9nm0000gn/T/Rtmp0CQdzL/Rbuild10b0ce903238/CGNM/vignettes/CGNM_log is used to draw SSR/likelihood surface"
#> [1] "log saved in /private/var/folders/n8/mrjc8j9j45x93m7t_hmvg9nm0000gn/T/Rtmp0CQdzL/Rbuild10b0ce903238/CGNM/vignettes/CGNM_log_bootstrap is used to draw SSR/likelihood surface"

kable(table_profileLikelihoodConfidenceInterval(c("CGNM_log","CGNM_log_bootstrap"), alpha = 0.25))
#> [1] "WARNING: ALWAYS first inspect the profile likelihood plot (using plot_profileLikelihood()) and then use this table, DO NOT USE this table by itself."
#> [1] "log saved in /private/var/folders/n8/mrjc8j9j45x93m7t_hmvg9nm0000gn/T/Rtmp0CQdzL/Rbuild10b0ce903238/CGNM/vignettes/CGNM_log is used to draw SSR/likelihood surface"
#> [1] "log saved in /private/var/folders/n8/mrjc8j9j45x93m7t_hmvg9nm0000gn/T/Rtmp0CQdzL/Rbuild10b0ce903238/CGNM/vignettes/CGNM_log_bootstrap is used to draw SSR/likelihood surface"

	25 percentile	best-fit	75 percentile	identifiability
Ka	0.4895609	0.9265056	1.068911	identifiable
V1	9.6771919	19.0720416	21.421014	identifiable
CL	9.3107883	9.8773258	10.524178	identifiable

plot profile likelihood surface

plot_2DprofileLikelihood(CGNM_result, showInitialRange=FALSE, alpha = 0.05)+scale_x_continuous(trans="log10")+scale_y_continuous(trans="log10")
#> [1] "log saved in /private/var/folders/n8/mrjc8j9j45x93m7t_hmvg9nm0000gn/T/Rtmp0CQdzL/Rbuild10b0ce903238/CGNM/vignettes/CGNM_log is used to draw SSR/likelihood surface"
#> [1] "log saved in /private/var/folders/n8/mrjc8j9j45x93m7t_hmvg9nm0000gn/T/Rtmp0CQdzL/Rbuild10b0ce903238/CGNM/vignettes/CGNM_log_bootstrap is used to draw SSR/likelihood surface"

Parallel computation

Cluster Gauss Newton method implementation in CGNM package (above version 0.6) can use nonlinear function that takes multiple input vectors stored in matrix (each column as the input vector) and output matrix (each column as the output vector). This implementation was to be used to parallelize the computation. See below for the examples of parallelized implementation in various hardware. Cluster Gauss Newton method is embarrassingly parallelizable so the computation speed is almost proportional to the number of computation cores used especially for the nonlinear functions that takes time to compute (e.g. models with numerical method to solve a large system of ODEs).

model_matrix_function=function(x){
  Y_list=lapply(split(x, rep(seq(1:nrow(x)),ncol(x))), model_function)
  Y=t(matrix(unlist(Y_list),ncol=length(Y_list)))
}

testX=t(matrix(c(rep(0.01,3),rep(10,3),rep(100,3)), nrow = 3))
print("testX")
#> [1] "testX"
print(testX)
#>       [,1]  [,2]  [,3]
#> [1,] 1e-02 1e-02 1e-02
#> [2,] 1e+01 1e+01 1e+01
#> [3,] 1e+02 1e+02 1e+02
print("model_matrix_function(testX)")
#> [1] "model_matrix_function(testX)"
print(model_matrix_function(testX))
#>           [,1]      [,2]     [,3]      [,4]      [,5]      [,6]       [,7]
#> [1,] 1.9782455 2.2578754 2.517166 2.6529261 2.7982741 2.9311513  2.9684634
#> [2,] 1.7756978 1.8804296 1.860008 1.7832148 1.6114094 1.1771685  0.7428740
#> [3,] 0.9609136 0.9175059 0.830647 0.7437881 0.5700703 0.1357758 -0.2985186
#>            [,8]      [,9]
#> [1,]  2.9771626  2.952246
#> [2,] -0.5600094 -3.165776
#> [3,] -1.6014021 -4.207169

print("model_matrix_function(testX)-rbind(model_function(testX[1,]),model_function(testX[2,]),model_function(testX[3,]))")
#> [1] "model_matrix_function(testX)-rbind(model_function(testX[1,]),model_function(testX[2,]),model_function(testX[3,]))"
print(model_matrix_function(testX)-rbind(model_function(testX[1,]),model_function(testX[2,]),model_function(testX[3,])))
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> [1,]    0    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    0    0    0    0
#> [3,]    0    0    0    0    0    0    0    0    0

an example of parallel implementation for Mac using parallel package


# library(parallel)
#
# obsLength=length(observation)
# 
## Given CGNM searches through wide range of parameter combination, it can encounter
## parameter combinations that is not feasible to evaluate. This try catch function
## is implemented within CGNM for regular functions but for the matrix functions
## user needs to implement outside of CGNM
#
# modelFunction_tryCatch=function(x_in){
#  out=tryCatch({model_function(x_in)},
#               error=function(cond) {rep(NA, obsLength)}
#  )
#  return(out)
# }
# 
#  model_matrix_function=function(X){
#   Y_list=mclapply(split(X, rep(seq(1:nrow(X)),ncol(X))), modelFunction_tryCatch,mc.cores = (parallel::detectCores()-1), mc.preschedule = FALSE)
#
#   Y=t(matrix(unlist(Y_list),ncol=length(Y_list)))
# 
#   return(Y)
#  }

an example of parallel implementation for Windows using foreach and doParllel packages

#library(foreach)
#library(doParallel)

#numCore=8
#registerDoParallel(numCore-1)
#cluster=makeCluster(numCore-1, type = "PSOCK")
#registerDoParallel(cl=cluster)

# obsLength=length(observation)

## Given CGNM searches through wide range of parameter combination, it can encounter
## parameter combinations that is not feasible to evaluate. This try catch function
## is implemented within CGNM for regular functions but for the matrix functions
## user needs to implement outside of CGNM

# modelFunction_tryCatch=function(x_in){
#  out=tryCatch({model_function(x_in)},
#               error=function(cond) {rep(NA, obsLength)}
#  )
#  return(out)
# }

#model_matrix_function=function(X){
#  Y_list=foreach(i=1:dim(X)[1], .export = c("model_function", "modelFunction_tryCatch"))%dopar%{ #make sure to include all related functions in .export and all used packages in .packages for more information read documentation of dopar
#      modelFunction_tryCatch((X[i,]))
#    }
  
#  Y=t(matrix(unlist(Y_list),ncol=length(Y_list)))
#}

CGNM_result=Cluster_Gauss_Newton_method(nonlinearFunction=model_matrix_function,
targetVector = observation,
initial_lowerRange =rep(0.01,3),initial_upperRange =  rep(100,3),lowerBound = rep(0,3), saveLog=TRUE, num_minimizersToFind = 500, ParameterNames = c("Ka","V1","CL"))
#> Warning in dir.create(saveFolderName): 'CGNM_log' already exists
#> Ka, V1, CL will be transformed internally to impose boundaries. See CGNM_result$runSetting$ReparameterizationDef for exact transformation. Transformed parameters are denoted as x and untransformed parameters are denoted as theta.
#> nonlinearFunction is given as matrix to matrix function
#> NonlinearFunction evaluation at initial_lowerRange Successful.
#> NonlinearFunction evaluation at (initial_upperRange+initial_lowerRange)/2 Successful.
#> NonlinearFunction evaluation at initial_upperRange Successful.
#> CGNM iteration should finish before: 2024-06-12 07:30:29.063864
#> Warning in dir.create(saveFolderName): 'CGNM_log' already exists
#> Generating initial cluster. 498 out of 500 done
#> Generating initial cluster. 500 out of 500 done
#> Iteration:1  Median sum of squares residual=5.95663065067079
#> Rough estimation of remaining computation time: 0.6 min
#> CGNM iteration estimated to finish at: 2024-06-12 07:31:05.195563
#> Iteration:2  Median sum of squares residual=2.33913682480937
#> Iteration:3  Median sum of squares residual=0.937180137570765
#> Iteration:4  Median sum of squares residual=0.926981116831735
#> Iteration:5  Median sum of squares residual=0.710822510675569
#> Iteration:6  Median sum of squares residual=0.0620837788615178
#> Iteration:7  Median sum of squares residual=0.00755735434972531
#> Iteration:8  Median sum of squares residual=0.00734938304033205
#> Iteration:9  Median sum of squares residual=0.00734923674391747
#> Iteration:10  Median sum of squares residual=0.00734923413437108
#> Iteration:11  Median sum of squares residual=0.00734923408642441
#> CGNM iteration estimated to finish at: 2024-06-12 07:31:03.256961
#> Iteration:12  Median sum of squares residual=0.00734923408249074
#> Iteration:13  Median sum of squares residual=0.00734923408239152
#> Iteration:14  Median sum of squares residual=0.00734923408238754
#> Iteration:15  Median sum of squares residual=0.00734923408238499
#> Iteration:16  Median sum of squares residual=0.00734923408238318
#> Iteration:17  Median sum of squares residual=0.00734923408238204
#> Iteration:18  Median sum of squares residual=0.00734923408238148
#> Iteration:19  Median sum of squares residual=0.00734923408238141
#> Iteration:20  Median sum of squares residual=0.0073492340823813
#> Iteration:21  Median sum of squares residual=0.00734923408238116
#> CGNM iteration estimated to finish at: 2024-06-12 07:31:03.406326
#> Iteration:22  Median sum of squares residual=0.00734923408238102
#> Iteration:23  Median sum of squares residual=0.00734923408238098
#> Iteration:24  Median sum of squares residual=0.00734923408238095
#> Iteration:25  Median sum of squares residual=0.0073492340823809
#> CGNM computation time:  0.6 min


#stopCluster(cluster) #make sure to close the created cluster if needed

unlink("CGNM_log", recursive=TRUE)
unlink("CGNM_log_bootstrap", recursive=TRUE)

CGNM: Cluster Gauss-Newton Method

When and when not to use CGNM

Use CGNM

Not to use CGNM