Data cleaning function in conquestr

Dan Cloney, Dave Jeffries

2024-07-24

Introduction

This vignette demonstrates how to use conquestr to document data, report on the quality of data, clean data, and construct item bundles or derived variables based on several variables.

Document data

conquestr has a built in system file that we will use for this example.

The function getCqItanal will return a list of lists, each list relating to one generalised item from an ‘ACER ConQuest’ itanal output. The list for each item contains the following information: (1) the item name according to the item label, (2) a table of item category statistics for the item, and (3) the item-total and item-rest correlations for the item.

Note that you must use matrixout in your ‘ACER ConQuest’ call to itanal to ensure that these objects are available in the system file from your analysis.


# get default sys file
myEx1Sys <- ConQuestSys()
#> no system file provided, loading the example system file instead

# get itanal lists
myEx1Sys_itanal <- getCqItanal(myEx1Sys)

# show unformatted list objects for first item
print(myEx1Sys_itanal[[1]])
#> $name
#> [1] "item:1 (item one)"
#> 
#> $table
#>   Category Score Count    Percent      Pt Bis Pt Bis t stat.  Pt Bis sig.
#> 1        M     0     6  0.6006006 -0.10716121      -3.403245 6.923947e-04
#> 2        a     1   644 64.4644645  0.45520912      16.142876 2.961496e-52
#> 3        b     0    23  2.3023023 -0.08463114      -2.681876 7.442067e-03
#> 4        c     0    47  4.7047047 -0.19873699      -6.402901 2.344019e-10
#> 5        d     0   104 10.4104104 -0.23879800      -7.764760 2.023005e-14
#> 6        e     0   175 17.5175175 -0.21543829      -6.966112 5.910417e-12
#>   Ability mean (D1) Ability SD (D1)
#> 1        -0.9823725       1.0094503
#> 2         0.3334871       0.8216688
#> 3        -0.5375707       0.9964098
#> 4        -0.7459999       0.7851521
#> 5        -0.6836101       0.6972879
#> 6        -0.4694728       0.7085121
#> 
#> $item_rest_total
#> item-total  item-rest   obs_mean   exp_mean   adj_mean  delta_dot 
#>  0.6059588  0.4552091  0.6446446  0.6468243  0.6467947 -0.7040703

Following the item-specific list objects, the last element of the list returned by getCqItanal contains summary statistics for the full set of items. The summary statistics include raw and latent score distribution statistics and Cronbach’s coefficient \(\alpha\).

Create formatted itanal tables for a report

So far, we have shown how to access the test and item analysis statistics that are available through the itanal command in ‘ACER ConQuest’ and we have shown these without any formatting. One of the many benefits of integrating ‘ACER ConQuest’ output into a markdown document is to permit automated conditional formatting of item analysis output. In this section we show how this conditional formatting can be set up.

Set up criteria for conditional formatting

Pre-specifying criteria for conditionally formatting item analysis output is a key step in an automated workflow. Any number of metrics from the item analysis can be specified for conditional formatting. Several of these can be passed to conquestr functions as will be illustrated in the following sections.


# set statistical criteria for conditional formatting

easyFlag <- 85 # highlight if facility is GREATER than this value
hardFlag <- 15 # highlight if facility is LESS than this value
irestFlag <- 0.2 # highlight if item-rest r is LESS than this value
underfitFlag <- 1.2 # highlight if weighted MNSQ is GREATER than this value
overfitFlag <- 0.8 # highlight if weighted MNSQ is LESS than this value
ptBisFlag <- 0.0 # highlight if non-key ptBis r is MORE than this value

An example of an html itanal table for item categories

The function fmtCqItanal will return a formated version of the itanal object that we read in earlier. Presently this function will apply coloured text to any distractor point biserial correlation that is larger than 0. The following example shows the output for the fourth item in the current item analysis.


# return a conditionally formatted item category statistics table for the fourth item
myEx1Sys_itanal_f <- fmtCqItanal(myEx1Sys_itanal, ptBisFlag = ptBisFlag, textColHighlight = "red")

# print table
myEx1Sys_itanal_f[[4]]$table
Item category statistics for: item:4 (item four)
Category Score Count Percent Pt Bis Pt Bis t stat. Pt Bis sig. Ability mean (D1) Ability SD (D1)
M 0 3 0.30 <span style=” color: black !important;” >-0.07</span> -2.29 0.02 -1.09 1.42
a 0 151 15.12 <span style=” color: red !important;” >0.06</span> 1.77 0.08 -0.07 0.76
b 0 73 7.31 <span style=” color: black !important;” >-0.2</span> -6.39 0.00 -0.64 0.76
c 0 224 22.42 <span style=” color: black !important;” >-0.32</span> -10.59 0.00 -0.51 0.79
d 1 548 54.85 <span style=” color: black !important;” >0.34</span> 11.33 0.00 0.33 0.85

# print summary
myEx1Sys_itanal_f[[length(myEx1Sys_itanal_f)]] # the last object is always the summary
Item Analysis Summary Statistics
Statistic Value
Percent Missing 0.00
N 999.00
Mean 8.44
SD 2.40
Variance 5.78
Skew -0.60
Kurtosis -0.20
Standard error of mean 0.08
Standard error of measurement 1.43
Alpha 0.64

Conclusion

This short vignette has illustrated how to access and display itanal output from an ‘ACER ConQuest’ analysis using conquestr. Future vignettes will demonstrate basic and advanced plotting and the production of publication quality item analysis technical reports.