Data Analyses with Ceiling/Floor data

Qimin Liu

2018-01-31

Summary

Ceiling and floor effects are common in data. Ceiling or floor effects occur when the tests or scales are relatively easy or difficult such that substantial proportions of individuals obtain either maximum or minimum scores and that the true extent of their abilities cannot be determined.

Ceiling and floor effects, subsequently, causes problems in data analysis. For example, ceiling or floor effects alone would induce, respectively, attenuation or inflation in mean estimates. And both ceiling and floor effects would result in attenuation in variance estimates. This imposes challenges in mean and variance based data analytic methods.

This package implements methods to deal with challenges associated with ceiling/floor effects in the data using paramtric methods that assume normality for the true scores. The current version is capable of mean and variance recovery given data with ceiling/floor effects and of mean comparison tests such as t-test and ANOVA for data with ceiling/floor effects.

Helper functions

The package contains a helper function threeganova.sim that would generate a three-group anova data with a standard normal control group and positive/negative treatment groups of effect with same magnitudes. In addition, one can specify the standard deviation in positive treatment group. To see the specifics of the function, user can enter ?threeganova.sim in the R console.

Another helper function included in the package is induce.cfe where the user can manually induce ceiling and floor effects to healthy data. To see the specifics of the function, user can enter ?induce.cfe in the R console.

Moreover, the function F.star.test allows user to conduct a Brown-Forsythe F star test. This is a variant of the commonly used F test. F star test is robust against violations of homogeneity of variance (HOV) assumption for the F test.

Functions for data analyses

The current version of the package includes three functions that can facilitate the user to conduct data analyses for data with ceiling/floor effects.rec.mean.var estimates the true mean and variance of the data with ceiling/floor effects. That is, as mentioned in the summary, the observed mean and variance of data with ceiling/floor effects are often biased. Thus, rec.mean.var aims to help the user to recover the mean and variance of the data were ceiling/floor effects absent. lw.t.test conducts a t test that adjusts for ceiling/floor effects in the data. As lw.t.test also uses Welch’s t test, the adjusted t test is robust against HOV violation. lw.f.star conducts a F star test for one-way ANOVA that adjusts for ceiling/floor effects in the data. lw.f.star is also robust against HOV violation. For both lw.f.star and lw.t.test: method a is a liberal appraoch that yields accurate effect size estimates but has mildly inflated type I error rates, b is a conservative approach with well-controlled type I error rates that have good, but less accurate than a, effect estimates.

Example 1: an Aging Example

Imagine a scenario where we wish to test the difference in cognitive ability for people of different age groups. In this toy example, we have 1000 participants for three age groups, the younger-aged group has true mean and variance of respectively 30 and 25, the middle-aged group 20 and 25 and the older-aged group 10 and 100. The higher the score, the higher the cognitive ability. We can check the mean and variance of the true mean and variance on the data composed of true scores, ca.true.

# group sample mean
aggregate(ca.true[,1],mean,by=list(ca.true[,2]))
##   Group.1        x
## 1       1 29.86404
## 2       2 20.13646
## 3       3 10.40217
# group sample variance
aggregate(ca.true[,1],var,by=list(ca.true[,2]))
##   Group.1         x
## 1       1  23.98873
## 2       2  25.69791
## 3       3 102.23938

Now consider the fact that a substantial proportion of the younger-aged group may score maximum at the cognitive ability test and a substantial proportion of the older-aged group may score minimum. Let both the ceiling and the floor proportions be 15%, we have the dataset ca.cf.

# group sample mean
aggregate(ca.cf[,1],mean,by=list(ca.cf[,2]))
##   Group.1        x
## 1       1 28.91833
## 2       2 20.13646
## 3       3 12.36256
# grouple sample variance
aggregate(ca.cf[,1],var,by=list(ca.cf[,2]))
##   Group.1        x
## 1       1 13.53518
## 2       2 25.69791
## 3       3 57.82140

We can see that both the mean and the variance estimates from the younger-aged and the older-aged groups are biased. The function rec.mean.var can help recover the mean and variance. In the example of the younger-aged group, we first select all the scores of the younger-aged group and name it as a new variable young and then use our function rec.mean.var to recover the mean and variance. We can do the same for the older-aged group.

# younger-aged group
young=ca.cf[ca.cf[,2]==1,1]
rec.mean.var(young) # true mean and variance are 30 and 25
## $ceiling.percentage
## [1] 0.308
## 
## $floor.percentage
## [1] 0.001
## 
## $est.mean
## [1] 29.8394
## 
## $est.var
## [1] 23.90962
# the estimated floor and ceiling percentages and the recovered mean and variance estimates are displayed above

# older-aged group
old=ca.cf[ca.cf[,2]==3,1]
rec.mean.var(old) # true mean and variance are 10 and 100
## $ceiling.percentage
## [1] 0.001
## 
## $floor.percentage
## [1] 0.321
## 
## $est.mean
## [1] 10.46971
## 
## $est.var
## [1] 102.2337
# the estimated floor and ceiling percentages and the recovered mean and variance estimates are displayed above

Now we wish to conduct an ANOVA in the data with floor and ceiling effects. We can use the function lw.f.star. We can also conduct a t-test between the older-aged and the younger-aged group by using the function lw.t.test. Both methods a and b are used for the illustration purposes.

# ANOVA
lw.f.star(data.frame(ca.cf),score~group,"a")
## $statistic
## [1] 1852.555
## 
## $p.value
## [1] 0
## 
## $est.f.squared
## [1] 1.235037
lw.f.star(data.frame(ca.cf),score~group,"b")
## $statistic
## [1] 1225.603
## 
## $p.value
## [1] 0
## 
## $est.f.squared
## [1] 1.035575
# t-test
lw.t.test(young,old,"a")
## $statistic
## [1] 52.36293
## 
## $p.value
## [1] 2.158547e-255
## 
## $est.d
## [1] 2.311363
## 
## $conf.int
## [1] 18.64354 20.09586
lw.t.test(young,old,"b")
## $statistic
## [1] 54.53693
## 
## $p.value
## [1] 7.002042e-258
## 
## $est.d
## [1] 2.438966
## 
## $conf.int
## [1] 18.67241 20.06698

Both the ANOVA and the t-tests returned significant results.

Example 2: Simulation and Testing

The following example provides an overview of the helper functions in the package that can aid in simulations and further demonstrates data analytic functions in the package.

# Simulate healthy data for two groups
x.1=rnorm(300,2,4)
x.2=rnorm(300,3,5)
# check mean and variance for simulated healthy data
mean(x.1);var(x.1)
## [1] 2.05444
## [1] 16.60983
mean(x.2);var(x.2)
## [1] 3.02112
## [1] 24.23865
# induce ceiling effects of 20% in group 1
x.1.cf=induce.cfe(.2,0,x.1)
# induce floor effects of 10% in group 2
x.2.cf=induce.cfe(0,.1,x.2)
# recover the mean and variance for ceiling/floor data
rec.mean.var(x.1.cf)
## $ceiling.percentage
## [1] 0.003333333
## 
## $floor.percentage
## [1] 0.2233333
## 
## $est.mean
## [1] 2.026828
## 
## $est.var
## [1] 17.27209
rec.mean.var(x.2.cf)
## $ceiling.percentage
## [1] 0.09666667
## 
## $floor.percentage
## [1] 0.003333333
## 
## $est.mean
## [1] 2.999412
## 
## $est.var
## [1] 23.81148
# conduct a t test on healthy data
t.test(x.1,x.2)
## 
##  Welch Two Sample t-test
## 
## data:  x.1 and x.2
## t = -2.6197, df = 577.85, p-value = 0.009031
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.6914272 -0.2419345
## sample estimates:
## mean of x mean of y 
##   2.05444   3.02112
t.test(x.1.cf,x.2.cf)
## 
##  Welch Two Sample t-test
## 
## data:  x.1.cf and x.2.cf
## t = -0.82159, df = 559.89, p-value = 0.4117
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9084296  0.3725991
## sample estimates:
## mean of x mean of y 
##  2.505639  2.773554
# conduct an adjusted t test on ceiling/floor data
lw.t.test(x.1.cf,x.2.cf,"a")
## $statistic
## [1] -2.414067
## 
## $p.value
## [1] 0.01649626
## 
## $est.d
## [1] -0.2132773
## 
## $conf.int
## [1] -1.766068 -0.179101
lw.t.test(x.1.cf,x.2.cf,"b")
## $statistic
## [1] -2.628172
## 
## $p.value
## [1] 0.009040704
## 
## $est.d
## [1] -0.2145894
## 
## $conf.int
## [1] -1.7009239 -0.2442446
# generate a dataframe for ANOVA demo
testdat=threeganova.sim(10000,.0625,1)
# induce ceiling/floor effects in the data
testdat.cf=testdat
testdat.cf[testdat.cf$group==2,]$y=induce.cfe(.2,0,testdat.cf[testdat.cf$group==2,]$y)
# conduct an adjusted F star test on ceiling/floor data
lw.f.star(testdat.cf,y~group,"a")
## $statistic
## [1] 916.6139
## 
## $p.value
## [1] 0
## 
## $est.f.squared
## [1] 0.06110759
lw.f.star(testdat.cf,y~group,"b")
## $statistic
## [1] 824.1566
## 
## $p.value
## [1] 0
## 
## $est.f.squared
## [1] 0.05893779