Fitting individual models

Umut Caglar, Claus O. Wilke

2021-05-07

Sometimes it can be useful to fit only specific models to a dataset rather than fit multiple models and run the decision algorithm. For this purpose, sicegar provides the function multipleFitFunction(). This function fits a chosen model to the input dataset. It calls the fitting algorithm multiple times with different, randomly generated start parameters, to guarantee robust and reliable fitting.

We will demonstrate the use of this function on double-sigmoidal data, generated by adding some noise to the double-sigmoidal curve used for fitting. The curve used for fitting is implemented as doublesigmoidalFitFormula(), and thus can be used to generate model data.

time <- seq(3, 24, 0.5)
noise_parameter <- 0.2
intensity_noise <- runif(n = length(time), min = 0, max = 1) * noise_parameter
intensity <- doublesigmoidalFitFormula(time,
                                       finalAsymptoteIntensityRatio = .3,
                                       maximum = 4,
                                       slope1Param = 1,
                                       midPoint1Param = 7,
                                       slope2Param = 1,
                                       midPointDistanceParam = 8)
intensity <- intensity + intensity_noise
dataInput <- data.frame(time, intensity)
head(dataInput) # the generated input data
##   time intensity
## 1  3.0 0.2342345
## 2  3.5 0.2634529
## 3  4.0 0.2884316
## 4  4.5 0.3297332
## 5  5.0 0.5435398
## 6  5.5 0.7838986

Before we can perform the fit, we need to normalize the data appropriately. All sicegar fit functions work on normalized data, where time and intensity are normalized to the interval from 0 to 1. Sicegar provides a convenient normalization function normalizeData() that normalizes data appropriately while storing the required information to transform fitted parameters back into non-normalized coordinates:

normalizedInput <- normalizeData(dataInput = dataInput, 
                                 dataInputName = "doubleSigmoidalSample")
head(normalizedInput$timeIntensityData) # the normalized time and intensity data
##        time   intensity
## 1 0.1250000 0.000000000
## 2 0.1458333 0.007491139
## 3 0.1666667 0.013895288
## 4 0.1875000 0.024484372
## 5 0.2083333 0.079300988
## 6 0.2291667 0.140925160

The data scaling paratmers and the data input name are stored as well:

normalizedInput$dataScalingParameters # scaling parameters
##      timeRange   intensityMin   intensityMax intensityRange 
##     24.0000000      0.2342345      4.1346318      3.9003973
normalizedInput$dataInputName # data input name
## [1] "doubleSigmoidalSample"

Note that normalizeData() normalizes time with respect to the maximum value the time parameter takes:

timeRange <- time
timeNormalized <- time/timeRange # normalized time values

Whereas intensity is normalized with respect to the intensity interval:

intensityMin <- min(intensity)
intensityMax <- max(intensity)
intensityRange <- intensityMax - intensityMin

intensityNormalized <- (intensity-intensityMin)/intensityRange # normalized intensity values

Fitting the models to the data

To fit a model to the data using the function multipleFitFunction(), we provide it as input the normalized data and the model type to be fitted, which can be "sigmoidal" or "doublesigmoidal". Here we are fitting both models to the same input data:

# Do the sigmoidal fit
sigmoidalModel <- multipleFitFunction(dataInput=normalizedInput,
                                      model="sigmoidal")


# Do the double-sigmoidal fit
doubleSigmoidalModel <- multipleFitFunction(dataInput=normalizedInput,
                                            model="doublesigmoidal")

The two generated model objects contain a large number of computed parameters, described in detail in the following.

t(sigmoidalModel)
##                                      [,1]                   
## maximum_N_Estimate                   "0.5817715"            
## maximum_Std_Error                    "0.0471064"            
## maximum_t_value                      "12.35016"             
## maximum_Pr_t                         "3.144059e-15"         
## slopeParam_N_Estimate                "56.69416"             
## slopeParam_Std_Error                 "62.9224"              
## slopeParam_t_value                   "0.9010171"            
## slopeParam_Pr_t                      "0.3729724"            
## midPoint_N_Estimate                  "0.254053"             
## midPoint_Std_Error                   "0.02232296"           
## midPoint_t_value                     "11.38079"             
## midPoint_Pr_t                        "4.075252e-14"         
## residual_Sum_of_Squares              "2.947177"             
## log_likelihood                       "-3.38678"             
## AIC_value                            "14.77356"             
## BIC_value                            "21.81836"             
## isThisaFit                           "TRUE"                 
## startVector.maximum                  "0.4293649"            
## startVector.slopeParam               "17.12084"             
## startVector.midPoint                 "-0.2478345"           
## dataScalingParameters.timeRange      "24"                   
## dataScalingParameters.intensityMin   "0.2342345"            
## dataScalingParameters.intensityMax   "4.134632"             
## dataScalingParameters.intensityRange "3.900397"             
## model                                "sigmoidal"            
## additionalParameters                 "FALSE"                
## maximum_Estimate                     "2.503374"             
## slopeParam_Estimate                  "2.362257"             
## midPoint_Estimate                    "6.097271"             
## dataInputName                        "doubleSigmoidalSample"
## betterFit                            "4"                    
## correctFit                           "20"                   
## totalFit                             "28"
t(doubleSigmoidalModel)
##                                          [,1]                   
## finalAsymptoteIntensityRatio_N_Estimate  "0.2783729"            
## finalAsymptoteIntensityRatio_Std_Error   "0.005673176"          
## finalAsymptoteIntensityRatio_t_value     "49.06827"             
## finalAsymptoteIntensityRatio_Pr_t        "2.802989e-35"         
## maximum_N_Estimate                       "0.9890303"            
## maximum_Std_Error                        "0.006744761"          
## maximum_t_value                          "146.6368"             
## maximum_Pr_t                             "9.166125e-53"         
## slope1Param_N_Estimate                   "26.72541"             
## slope1Param_Std_Error                    "0.9129164"            
## slope1Param_t_value                      "29.27476"             
## slope1Param_Pr_t                         "3.431542e-27"         
## midPoint1Param_N_Estimate                "0.2942993"            
## midPoint1Param_Std_Error                 "0.001583065"          
## midPoint1Param_t_value                   "185.9048"             
## midPoint1Param_Pr_t                      "1.427289e-56"         
## slope2Param_N_Estimate                   "25.36587"             
## slope2Param_Std_Error                    "1.253865"             
## slope2Param_t_value                      "20.23014"             
## slope2Param_Pr_t                         "1.339911e-21"         
## midPointDistanceParam_N_Estimate         "0.334362"             
## midPointDistanceParam_Std_Error          "0.003221369"          
## midPointDistanceParam_t_value            "103.795"              
## midPointDistanceParam_Pr_t               "3.171515e-47"         
## residual_Sum_of_Squares                  "0.01073605"           
## log_likelihood                           "117.3356"             
## AIC_value                                "-220.6713"            
## BIC_value                                "-208.3429"            
## isThisaFit                               "TRUE"                 
## startVector.finalAsymptoteIntensityRatio "0.7374777"            
## startVector.maximum                      "0.6159543"            
## startVector.slope1Param                  "86.40114"             
## startVector.midPoint1Param               "0.5335278"            
## startVector.slope2Param                  "179.7345"             
## startVector.midPointDistanceParam        "0.212152"             
## dataScalingParameters.timeRange          "24"                   
## dataScalingParameters.intensityMin       "0.2342345"            
## dataScalingParameters.intensityMax       "4.134632"             
## dataScalingParameters.intensityRange     "3.900397"             
## model                                    "doublesigmoidal"      
## additionalParameters                     "FALSE"                
## finalAsymptoteIntensityRatio_Estimate    "0.2783729"            
## maximum_Estimate                         "4.091846"             
## slope1Param_Estimate                     "1.113559"             
## midPoint1Param_Estimate                  "7.063182"             
## slope2Param_Estimate                     "1.056911"             
## midPointDistanceParam_Estimate           "8.024689"             
## dataInputName                            "doubleSigmoidalSample"
## betterFit                                "5"                    
## correctFit                               "20"                   
## totalFit                                 "37"

The calculated quantities can be grouped into several different groups of parameters:

1. Information about the fitting process

2. Estimates of the fitted parameters

These estimates have been converted from the normalized data to the original raw data, and are the main quantities of interest to the user. They depend on the type of the model, sigmoidal vs. double-sigmoidal.

Estimates for the sigmoidal model are:

Estimates for the double-sigmoidal model are:

3. Quantities measuring the overall quality of fit

4. Start point of the gradient descent algorithm

Each time a fit is attempted, the likelihood maximization algorithm starts from a random initiation point and finds the final parameter estimates by gradient descent. The start vector for the best fit is returned in the form of variables whose name starts with startVector., followed by the name of the estimated parameter. For example:

5. Parameters related to the normalization step

6. Error estimates for fitted parameters

For each estimated parameter listed under point 2, the algorithm provides additional statistical parameters, such as the estimate in the normalized scale, the standard error (also in normalized scale), the t value, and the P value. For example, for the maximum estimate, these are: