A detailed example of how to use the suddengains R package

2019-05-02

Please cite this vignette and the R package suddengains as:

citation("suddengains")
#> 
#>   Wiedemann, M., Thew, G. R., Stott, R., & Ehlers, A. (2019,
#>   February 15). suddengains: An R package to identify sudden gains
#>   in longitudinal data. https://doi.org/10.31234/osf.io/2wa84.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     author = {Milan Wiedemann and Graham R Thew and Richard Stott and Anke Ehlers},
#>     title = {{suddengains}: {An} {R} package to identify sudden gains in longitudinal data},
#>     journal = {PsyArXiv Preprints},
#>     year = {2019},
#>     note = {R package version 0.0.2},
#>     doi = {10.31234/osf.io/2wa84},
#>     url = {https://github.com/milanwiedemann/suddengains},
#>   }

Introduction

This vignette shows how the suddengains R package can be used to help with the methods of a research study looking at sudden gains as described by Tang and DeRubeis (1999). More about the theoretical background of sudden gains and why it might be helpful to use this package can be found in our preprint Wiedemann et al. (2019). The following vignette illustrates the main functions of the package using the example data set sgdata.

Data

Below are two interactive tables of depression and rumination scores from the data set (sgdata) that comes with the suddengains package. The data is automatically loaded together with the package when running library(suddengains). Each measured construct contains a baseline measure (s0), twelve weekly measures during therapy (s1 to s12), and two follow-up measures (fu1 and fu2). Note that some values for each measure are missing, here shown as empty cells. For an example of a missing value see bdi_s2 for id = 2 in the table below.

Depression symptoms

Rumination

Preparation of data

Select cases

The package offers two methods to select cases for the sudden gains studies.

  1. "pattern": cases providing enough data to apply the Tang and DeRubeis (1999) criteria will be selected
  2. "min_sess": cases with a minimum number of available data (specified in min_sess_num) will be selected

By default the argument return_id_lgl is set to FALSE, this simply adds a new variable named sg_select at the end of the data frame specified in the data argument. The newly calculated variable sg_select is logical and contains information whether a case is selected (TRUE) or not selected (FALSE) based on the method specified. When the argument return_id_lgl is set to TRUE, only the id variable specified in id_var_name and the new variable sg_select will be returned as the output of this function.

The following code shows how to select cases based on the "pattern" method and save them as an object called sgdata_select. This function goes through the data and selects all cases with at least one of the following data patterns.

Data pattern x1 x2 x3 x4 x5 x6
1. x X x x
2. x X x x
3. x X x x
4. x X x x

Note: x1 to x6 are consecutive data points of the primary outcome measure. x = Present data; Empty cell = Missing data. Bold X represent the pregain session for each “pattern”.

Identification of sudden gains

Define cut-off

This function follows suggestions from Stiles et al. (2003) using the Reliable Change Index (RCI, Jacobson and Truax 1991). The first 4 elements of the output list return the values that were used to calculate the the cut-off:

The last element of the list sg_crit1_cutoff can be used as a cut-off value for the first sudden gains criterion.

# Test define_crit1_cutoff function ----
define_crit1_cutoff(data_sessions = sgdata,
                    data_item = NULL,
                    tx_start_var_name = "bdi_s0",
                    tx_end_var_name = "bdi_s12",
                    reliability = 0.931)
#> The reliability of the measure used to identify sudden gains was specified in the arguement 'reliability = 0.931'.
#> This function calculates a cut-off value that represents a clinically meaningful change based on the Reliable Change Index (RCI; Jacobson & Truax, 1991).
#> The RCI formula was modified so that all statistics can be computed from the data of an individual study following suggestions by Stiles et al. (2003).
#> 
#> See these references for further details:
#> Jacobson, N. S., & Truax, P. A. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59 (1), 12-19. doi:10.1037/0022-006X.59.1.12.
#> Stiles et al. (2003). Early sudden gains in psychotherapy under routine clinic conditions: Practice-based evidence. Journal of Consulting and Clinical Psychology, 71 (1), 14-21. doi:10.1037/0022-006X.71.1.14.
#> Wiedemann, M., Thew, G. R., Stott, R., & Ehlers, A. (2019). suddengains: An R package to identify sudden gains in longitudinal data. https://doi.org/10.31234/osf.io/2wa84.
#> $mean_change_score
#> [1] 19.65854
#> 
#> $standard_deviation_pre
#> [1] 8.598851
#> 
#> $reliability
#> [1] 0.931
#> 
#> $standard_error_measurement
#> [1] 2.258733
#> 
#> $sdiff
#> [1] 3.194331
#> 
#> $sg_crit1_cutoff
#> [1] 12.06222

Identify sudden gains and losses

To identify sudden gains/losses you can use the identify_sg and identify_sl functions. The functions return a data frame with new variables indicating for each between-session interval whether a sudden gain/loss was identified. For example the variable sg_2to3 holds information whether a sudden gains occurred from session two to three, with two being the pregain and three being the postgain session.

Sudden gains

The argument crit123_details = TRUE returns additional information about whether each of the three sudden gains criteria are met. Some more information about this can be found in the section “Adaptations to the original sudden gains criteria” below.

To analyse sudden gains after the first session, we include the option to specify a baseline measure in sg_var_list (in this example "bdi_s0") and set identify_sg_1to2 == TRUE. This will allow the identification of sudden gains immediately after session 1, provided data from the baseline measure and the first session are available.

Sudden losses

To identify sudden losses, you can use the identify_sl function. All arguments are the same as in the identify_sg function, but the sg_crit1_cutoff has to be set to be a negative value.

Adaptations to the original sudden gains criteria

The package allows to change or not use either of the three original sudden gains criteria suggested by Tang and DeRubeis (1999):

  1. The cut-off value for a clinically meaningful change on the measure to be used to identify sudden gains using can be specified using the argument sg_crit1_cutoff. To not apply the first criterion when identifying sudden gains, this argument can switched off by using sg_crit1_cutoff = NULL.
  2. The minimum percentage drop from the pre- to post-gain session can be specified using the argument sg_crit2_pct. The default is a minimum of a 25% drop, i.e. sg_crit2_pct = .25. To not apply the second criterion when identifying sudden gains, this argument can switched off by using sg_crit2_pct = NULL.
  3. The third criterion can only be turned on (sg_crit3 = TRUE) or off (sg_crit3 = FALSE). At the moment there is no option to change the way the third criterion gets applied.
# This example only uses the first and second sudden gains criteria 
# All following examples work the same for the "identify_sl()" function
# The argument "crit123_details = TRUE" returns details about each between session interval for each criterion.
# Details about the third criterion will show NAs for each between session interval because it's not being used (sg_crit3 = FALSE)
identify_sg(data = sgdata,
            sg_crit1_cutoff = 7,
            sg_crit2_pct = .25,
            sg_crit3 = FALSE,
            id_var_name = "id",
            sg_var_list = c("bdi_s1", "bdi_s2", "bdi_s3", "bdi_s4", 
                            "bdi_s5", "bdi_s6", "bdi_s7", "bdi_s8", 
                            "bdi_s9", "bdi_s10", "bdi_s11", "bdi_s12"),
            identify_sg_1to2 = FALSE,
            crit123_details = TRUE)

# This example only uses the first criterion and a modified second criterion (50%) 
identify_sg(data = sgdata,
            sg_crit1_cutoff = 7,
            sg_crit2_pct = .50,
            sg_crit3 = FALSE,
            id_var_name = "id",
            sg_var_list = c("bdi_s1", "bdi_s2", "bdi_s3", "bdi_s4", 
                            "bdi_s5", "bdi_s6", "bdi_s7", "bdi_s8", 
                            "bdi_s9", "bdi_s10", "bdi_s11", "bdi_s12"),
            identify_sg_1to2 = FALSE,
            crit123_details = TRUE)

# This example only uses the first criterion
# Details about the second and third criterion will show NAs for each between session interval
identify_sg(data = sgdata,
            sg_crit1_cutoff = 7,
            sg_crit2_pct = NULL,
            sg_crit3 = FALSE,
            id_var_name = "id",
            sg_var_list = c("bdi_s1", "bdi_s2", "bdi_s3", "bdi_s4", 
                            "bdi_s5", "bdi_s6", "bdi_s7", "bdi_s8", 
                            "bdi_s9", "bdi_s10", "bdi_s11", "bdi_s12"),
            identify_sg_1to2 = FALSE,
            crit123_details = TRUE)

# This example only uses the first criterion
# Details about the second and third criterion will show NAs for each between session interval
identify_sg(data = sgdata,
            sg_crit1_cutoff = 7,
            sg_crit2_pct = NULL,
            sg_crit3 = FALSE,
            id_var_name = "id",
            sg_var_list = c("bdi_s1", "bdi_s2", "bdi_s3", "bdi_s4", 
                            "bdi_s5", "bdi_s6", "bdi_s7", "bdi_s8", 
                            "bdi_s9", "bdi_s10", "bdi_s11", "bdi_s12"),
            identify_sg_1to2 = FALSE,
            crit123_details = TRUE)

Create datasets for further analysis

All gains: One row per gain

In the suddengains R package we refer to this as “bysg” (by sudden gain).

Here we see code to create a “bysg” data set identifying sudden gains (specified using the argument identify = "sg") and save it to the object called “bysg”. The table below shows the output including the following 15 new variables:

Here we see code to create a “bysg” data set identifying sudden losses (specified using the argument identify = "sl") and save it to the object called “bysl”. The following table shows the output.

All cases: One row per case

In the suddengains R package we refer to this as byperson (by person). This data set includes all cases with and all cases without sudden gains. If multiple sudden gains were experienced by a case, the argument multiple_sg_select can be used to specify which gain to select; in the example below the first gain will be selected.

Depending on the research questions it might be of interest to select the largest gain, as shown below. Notice how the selected gain for ID 5 is different depending on how to handle multiple gains. The first gain experienced by ID 5 is from session 3 to 4, whereas the largest gain was experienced from session 8 to 9.

Extract values around sudden gains

The package can extract scores on secondary outcome or process measures around the period of each gain. This function can be applied to either the bysg or byperson dataset, the variables specified in extract_var_list have to be in the data set specified in data.

# For bysg dataset select "id" and "rq" variables first
sgdata_rq <- sgdata %>% 
    dplyr::select(id, rq_s0:rq_s12)

# Join them with the sudden gains data set, here "bysg"
bysg_rq <- bysg %>%
    dplyr::left_join(sgdata_rq, by = "id")

# Extract "rq" scores around sudden gains on "bdi" in the bysg dataset
bysg_rq <- extract_values(data = bysg_rq,
                          id_var_name = "id_sg",
                          extract_var_list = c("rq_s1", "rq_s2", "rq_s3", "rq_s4", 
                                               "rq_s5", "rq_s6", "rq_s7", "rq_s8", 
                                               "rq_s9", "rq_s10", "rq_s11", "rq_s12"),
                          extract_measure_name = "rq",
                          add_to_data = TRUE)

Plots of average change around sudden gains

The plots are created using the ggplot2 R-package (Wickham 2016) in five main steps:

  1. Means for all time points and points
  2. 95% confidence intervals for all time points
  3. Dotted line between the first two values
  4. Straight line between all 5 values around the gain
  5. Dotted line between the last two values
# Create plot of average change in depression symptoms around the gain
plot_sg_bdi <- plot_sg(data = bysg,
                       tx_start_var_name = "bdi_s1",
                       tx_end_var_name = "bdi_s12",
                       sg_pre_post_var_list = c("sg_bdi_2n", "sg_bdi_1n", "sg_bdi_n",
                                                "sg_bdi_n1", "sg_bdi_n2", "sg_bdi_n3"),
                       ylab = "BDI", xlab = "Session",
                       colour = "#239b89ff")

# Create plot of average change in rumination around the gain
plot_sg_rq <- plot_sg(data = bysg_rq,
                       tx_start_var_name = "rq_s1",
                       tx_end_var_name = "rq_s12",
                       sg_pre_post_var_list = c("sg_rq_2n", "sg_rq_1n", "sg_rq_n",
                                                "sg_rq_n1", "sg_rq_n2", "sg_rq_n3"),
                       ylab = "RQ", xlab = "Session",
                       colour = "#440154FF") 


# It is possible apply other ggplot2 functions to the plot now,
# e.g. y axis scale, or x axis labels ...

plot_sg_bdi <- plot_sg_bdi + 
               ggplot2::coord_cartesian(ylim = c(0, 50))

plot_sg_rq <- plot_sg_rq + 
              ggplot2::scale_x_discrete(labels = c("First", "n-2", "n-1", "n",
                                                   "n+1", "n+2", "n+3", "Last"))
#> Scale for 'x' is already present. Adding another scale for 'x', which
#> will replace the existing scale.

Each plot will automatically return a warning message about how many missing values were present for each of the five components mentioned above. The warning messages from the BDI plot can be interpreted as follows:

  1. Means for all time points and points: There are 12 missing values overall
  2. 95% confidence intervals for all time points: There are 12 missing values overall
  3. Dotted line between the first two values: There are 8 missing values at session tx_start_var_name and the first variable specified in sg_pre_post_var_list together
  4. Straight line betwenn all 5 values around the gain: There are 11 missing values together in all variables specified in sg_pre_post_var_list
  5. Dotted line between the last two values: There is 1 missing values at session tx_end_var_name and the last variable specified in sg_pre_post_var_list
plot_sg_bdi
#> Warning: Removed 12 rows containing non-finite values (stat_summary).

#> Warning: Removed 12 rows containing non-finite values (stat_summary).
#> Warning: Removed 8 rows containing non-finite values (stat_summary).
#> Warning: Removed 11 rows containing non-finite values (stat_summary).
#> Warning: Removed 1 rows containing non-finite values (stat_summary).
plot_sg_rq 
#> Warning: Removed 16 rows containing non-finite values (stat_summary).
#> Warning: Removed 16 rows containing non-finite values (stat_summary).
#> Warning: Removed 8 rows containing non-finite values (stat_summary).
#> Warning: Removed 13 rows containing non-finite values (stat_summary).
#> Warning: Removed 4 rows containing non-finite values (stat_summary).

Summarise descriptive statistics

Count between-session intervals

The count_intervals function provides a summary of between-session intervals that were and weren’t analysed for sudden gains. For more info see the help file of this function, help(count_intervals). Here we see code to count only the intervals of the data that was selected for the sudden gains study in the above code using sgdata_select.

Descriptive statistics of sudden gains

The describe_sg() function provides descriptive statistics about the sudden gains based on the variables from the bysg or byperson datasets. The descriptives (e.g. “sg_pct”, the percentage of cases with sudden gains in the specified data set) are always in relation to the input data and therefore will vary depending on whether the structure of the data set is bysg or byperson.

References

Jacobson, Neil S, and Paula A Truax. 1991. “Clinical Significance: A Statistical Approach to Defining Meaningful Change in Psychotherapy Research.” Journal of Consulting and Clinical Psychology 59 (1): 12–19. https://doi.org/10.1037/0022-006X.59.1.12.

Stiles, William B., Chris Leach, Michael Barkham, Mike Lucock, Steve Iveson, David A. Shapiro, Michaela Iveson, and Gillian E. Hardy. 2003. “Early Sudden Gains in Psychotherapy Under Routine Clinic Conditions: Practice-Based Evidence.” Journal of Consulting and Clinical Psychology 71 (1): 14–21. https://doi.org/10.1037/0022-006X.71.1.14.

Tang, Tony Z, and Robert J DeRubeis. 1999. “Sudden Gains and Critical Sessions in Cognitive-Behavioral Therapy for Depression.” Journal of Consulting and Clinical Psychology 67 (6): 894–904. https://doi.org/10.1037/0022-006X.67.6.894.

Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Wiedemann, Milan, Graham R Thew, Richard Stott, and Anke Ehlers. 2019. “suddengains: An R Package to Identify Sudden Gains in Longitudinal Data.” PsyArXiv Preprints. https://doi.org/10.31234/osf.io/2wa84.