Introduction

It is often desirable to visualize student success data with the ability to disaggregate by multiple group variables to highlight equity gaps and disproportionate impact (DI) in an interactive dashboard (eg, Tableau or Power BI). It is certainly feasible to calculate disproportionate impact on the fly in standard dashboard tools, but doing so:

  1. increases development time,
  2. increases the likelihood for error in calculations as the code has to be “re-written” for each dashboard, and
  3. is more difficult to maintain and support, especially when transitioning projects between analysts.

A suggested workflow is to:

  1. start with a student-level data set;
  2. call a single function to pre-calculate success rates and disproportionate impact across all levels of disaggregation, cohorts, and scenarios;
  3. export the pre-calculated data set;
  4. import the pre-calculated data set to the dashboard tool of choice for visualization, where every point visualized is a row from the imported data set.

Using this workflow, one could scale up DI calculations and rapidly develop dashboards with the ability to disaggregate and highlight equity gaps / disproportionate impact for many disaggregation variables, many outcomes, and many scenarios / student populations.

The DisImpact package offers the di_iterate function that allows one to accomplish step 2 in the suggested workflow.

Load DisImpact and toy data set

First, load the necessary packages.

library(DisImpact)
library(dplyr) # Ease in manipulations with data frames

Second, load a toy data set.

data(student_equity) # provided from DisImpact
dim(student_equity)
## [1] 20000    11
# head(student_equity)
Ethnicity Gender Cohort Transfer Cohort_Math Math Cohort_English English Ed_Goal College_Status Student_ID
Native American Female 2017 0 2017 1 2017 0 Deg/Transfer First-time College 100001
Native American Female 2017 0 2019 1 NA NA Deg/Transfer First-time College 100002
Native American Female 2017 0 2018 1 2017 0 Deg/Transfer First-time College 100003
Native American Male 2017 1 2017 1 2018 1 Other First-time College 100004
Native American Male 2017 0 2019 1 2019 0 Deg/Transfer Other 100005
Native American Male 2017 1 2017 1 2018 1 Other First-time College 100006

To get a description of each variable, type ?student_equity in the R console.

Execute di_iterate on a data set

Let's illustrate the di_iterate function with some key arguments:

To see the details of these and other arguments, type ?di_iterate in the R console.

df_di_summary <- di_iterate(data=student_equity
                          , success_vars=c('Math', 'English', 'Transfer')
                          , group_vars=c('Ethnicity', 'Gender')
                          , cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
                          , scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
                            )
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
## df_di_summary <- di_iterate(data=student_equity, success_vars=c('Math', 'English', 'Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort', 'Cohort', 'Cohort'), scenario_repeat_by_vars=c('Ed_Goal', 'College_Status'))

## df_di_summary <- di_iterate(data=student_equity, success_vars=c('Math', 'English', 'Transfer'), group_vars=c('Ethnicity', 'Gender'), scenario_repeat_by_vars=c('Ed_Goal', 'College_Status'))

## df_di_summary <- di_iterate(data=student_equity, success_vars=c('Math', 'English', 'Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort'), scenario_repeat_by_vars=c('Ed_Goal', 'College_Status'), ppg_reference_groups=c('White', 'Male'), di_80_index_reference_groups=c('White', 'Male'))

## df_di_summary <- di_iterate(data=student_equity, success_vars=c('Math', 'English', 'Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort'), scenario_repeat_by_vars=c('Ed_Goal', 'College_Status'), ppg_reference_groups=c('all but current'), di_80_index_reference_groups=c('White', 'Male'))

Explore resulting summary data set

Let's explore the resulting summarized data set:

dim(df_di_summary)
## [1] 898  21
df_di_summary %>% head %>% as.data.frame # first few rows
##        Ed_Goal     College_Status success_variable cohort_variable cohort
## 1 Deg/Transfer First-time College             Math     Cohort_Math   2017
## 2 Deg/Transfer First-time College             Math     Cohort_Math   2017
## 3 Deg/Transfer First-time College             Math     Cohort_Math   2017
## 4 Deg/Transfer First-time College             Math     Cohort_Math   2017
## 5 Deg/Transfer First-time College             Math     Cohort_Math   2017
## 6 Deg/Transfer First-time College             Math     Cohort_Math   2017
##   disaggregation           group   n success       pct ppg_reference
## 1      Ethnicity           Asian 800     698 0.8725000     0.8150224
## 2      Ethnicity           Black 260     199 0.7653846     0.8150224
## 3      Ethnicity        Hispanic 549     390 0.7103825     0.8150224
## 4      Ethnicity Multi-Ethnicity 136     108 0.7941176     0.8150224
## 5      Ethnicity Native American  28      22 0.7857143     0.8150224
## 6      Ethnicity           White 903     764 0.8460687     0.8150224
##   ppg_reference_group        moe    pct_lo    pct_hi di_indicator_ppg
## 1             overall 0.03464823 0.8378518 0.9071482                0
## 2             overall 0.06077702 0.7046076 0.8261616                0
## 3             overall 0.04182538 0.6685571 0.7522079                1
## 4             overall 0.08403431 0.7100833 0.8781520                0
## 5             overall 0.18520259 0.6005117 0.9709169                0
## 6             overall 0.03261236 0.8134563 0.8786810                0
##   di_prop_index di_indicator_prop_index di_80_index_reference_group
## 1     1.0705227                       0                       Asian
## 2     0.9390964                       0                       Asian
## 3     0.8716110                       0                       Asian
## 4     0.9743507                       0                       Asian
## 5     0.9640401                       0                       Asian
## 6     1.0380925                       0                       Asian
##   di_80_index di_indicator_80_index
## 1   1.0000000                     0
## 2   0.8772317                     0
## 3   0.8141920                     0
## 4   0.9101635                     0
## 5   0.9005321                     0
## 6   0.9697062                     0

The variables di_indicator_ppg, di_indicator_prop_index, and di_indicator_80_index are DI flags using the three methods.

Next, note that the scenario '- All' is included for all variables passed to scenario_repeat_by_vars by default:

table(df_di_summary$Ed_Goal)
## 
##        - All Deg/Transfer        Other 
##          300          300          298
table(df_di_summary$College_Status)
## 
##              - All First-time College              Other 
##                300                300                298

Also note di_iterate returns non-disaggregated results by default ('- None' scenario):

table(df_di_summary$disaggregation)
## 
##    - None Ethnicity    Gender 
##        90       539       269

Let's inspect the rows corresponding to non-disaggregated results.

# No Disaggregation
df_di_summary %>%
  filter(Ed_Goal=='- All', College_Status=='- All', disaggregation=='- None') %>%
  as.data.frame
Ed_Goal College_Status success_variable cohort_variable cohort disaggregation group n success pct ppg_reference ppg_reference_group moe pct_lo pct_hi di_indicator_ppg di_prop_index di_indicator_prop_index di_80_index_reference_group di_80_index di_indicator_80_index
- All - All Math Cohort_Math 2017 - None - All 4691 3828 0.8160307 0.8160307 overall 0.0300000 0.7860307 0.8460307 0 1 0 - All 1 0
- All - All Math Cohort_Math 2018 - None - All 7416 6108 0.8236246 0.8236246 overall 0.0300000 0.7936246 0.8536246 0 1 0 - All 1 0
- All - All Math Cohort_Math 2019 - None - All 4622 3772 0.8160969 0.8160969 overall 0.0300000 0.7860969 0.8460969 0 1 0 - All 1 0
- All - All Math Cohort_Math 2020 - None - All 1855 1573 0.8479784 0.8479784 overall 0.0300000 0.8179784 0.8779784 0 1 0 - All 1 0
- All - All English Cohort_English 2017 - None - All 5520 4183 0.7577899 0.7577899 overall 0.0300000 0.7277899 0.7877899 0 1 0 - All 1 0
- All - All English Cohort_English 2018 - None - All 8543 6532 0.7646026 0.7646026 overall 0.0300000 0.7346026 0.7946026 0 1 0 - All 1 0
- All - All English Cohort_English 2019 - None - All 3866 2938 0.7599586 0.7599586 overall 0.0300000 0.7299586 0.7899586 0 1 0 - All 1 0
- All - All English Cohort_English 2020 - None - All 913 678 0.7426068 0.7426068 overall 0.0324333 0.7101735 0.7750401 0 1 0 - All 1 0
- All - All Transfer Cohort 2017 - None - All 10000 5140 0.5140000 0.5140000 overall 0.0300000 0.4840000 0.5440000 0 1 0 - All 1 0
- All - All Transfer Cohort 2018 - None - All 10000 5388 0.5388000 0.5388000 overall 0.0300000 0.5088000 0.5688000 0 1 0 - All 1 0

Visualization (emulating dashboard features)

In this section, we emulate what a dashboard could visualize.

Imagine a dashboard with the following dropdown menus and option values:

Each combination of this set of dropdown menus could be visualized using a subset of rows in df_di_summary.

For example, let's visualize non-disaggregated results for math (the dropdown selections are described at the top of the visualization):

# No Disaggregation
df_di_summary %>%
  filter(Ed_Goal=='- All', College_Status=='- All', success_variable=='Math', disaggregation=='- None') %>%
  as.data.frame

Dashboard Viz 1: Non-disaggregated results.{width=100%}

In this dashboard, one could choose to disaggregate by ethnicity and highlight disproportionate impact (for simplicity, let's use the percentage point gap method, or the di_indicator_ppg flag in subsequent visualizations):

# Disaggregation: Ethnicity
df_di_summary %>%
  filter(Ed_Goal=='- All', College_Status=='- All', success_variable=='Math', disaggregation=='Ethnicity') %>%
  select(cohort, group, n, pct, di_indicator_ppg, di_indicator_prop_index, di_indicator_80_index) %>%
  as.data.frame
##    cohort           group    n       pct di_indicator_ppg
## 1    2017           Asian 1456 0.8804945                0
## 2    2017           Black  452 0.7190265                1
## 3    2017        Hispanic  901 0.6947836                1
## 4    2017 Multi-Ethnicity  245 0.8204082                0
## 5    2017 Native American   46 0.8260870                0
## 6    2017           White 1591 0.8522942                0
## 7    2018           Asian 2251 0.9009329                0
## 8    2018           Black  736 0.7160326                1
## 9    2018        Hispanic 1404 0.6972934                1
## 10   2018 Multi-Ethnicity  379 0.8179420                0
## 11   2018 Native American   77 0.7792208                0
## 12   2018           White 2569 0.8579214                0
## 13   2019           Asian 1450 0.8862069                0
## 14   2019           Black  435 0.7195402                1
## 15   2019        Hispanic  866 0.6755196                1
## 16   2019 Multi-Ethnicity  227 0.8193833                0
## 17   2019 Native American   41 0.8048780                0
## 18   2019           White 1603 0.8546475                0
## 19   2020           Asian  582 0.9278351                0
## 20   2020           Black  171 0.7543860                1
## 21   2020        Hispanic  345 0.6956522                1
## 22   2020 Multi-Ethnicity   81 0.8395062                0
## 23   2020 Native American   17 0.8235294                0
## 24   2020           White  659 0.8831563                0
##    di_indicator_prop_index di_indicator_80_index
## 1                        0                     0
## 2                        0                     0
## 3                        0                     1
## 4                        0                     0
## 5                        0                     0
## 6                        0                     0
## 7                        0                     0
## 8                        0                     1
## 9                        0                     1
## 10                       0                     0
## 11                       0                     0
## 12                       0                     0
## 13                       0                     0
## 14                       0                     0
## 15                       0                     1
## 16                       0                     0
## 17                       0                     0
## 18                       0                     0
## 19                       0                     0
## 20                       0                     0
## 21                       0                     1
## 22                       0                     0
## 23                       0                     0
## 24                       0                     0

Dashboard Viz 2: Disaggregated by ethnicity.{width=100%}

In a dashboard, the user might be interested in focusing on degree/transfer students. We emulate this by filtering on Ed_Goal=='Deg/Transer':

# Disaggregation: Ethnicity; Deg/Transfer
df_di_summary %>%
  filter(Ed_Goal=='Deg/Transfer', College_Status=='- All', success_variable=='Math', disaggregation=='Ethnicity') %>%
  select(cohort, group, n, pct, di_indicator_ppg, di_indicator_prop_index, di_indicator_80_index) %>%
  as.data.frame
##    cohort           group    n       pct di_indicator_ppg
## 1    2017           Asian 1000 0.8820000                0
## 2    2017           Black  320 0.7500000                1
## 3    2017        Hispanic  665 0.7037594                1
## 4    2017 Multi-Ethnicity  164 0.8109756                0
## 5    2017 Native American   32 0.8125000                0
## 6    2017           White 1108 0.8546931                0
## 7    2018           Asian 1564 0.8951407                0
## 8    2018           Black  515 0.6990291                1
## 9    2018        Hispanic  989 0.6966633                1
## 10   2018 Multi-Ethnicity  262 0.8511450                0
## 11   2018 Native American   62 0.7741935                0
## 12   2018           White 1763 0.8536585                0
## 13   2019           Asian 1019 0.8802748                0
## 14   2019           Black  310 0.6838710                1
## 15   2019        Hispanic  602 0.6843854                1
## 16   2019 Multi-Ethnicity  166 0.8012048                0
## 17   2019 Native American   26 0.7692308                0
## 18   2019           White 1160 0.8465517                0
## 19   2020           Asian  408 0.9117647                0
## 20   2020           Black  122 0.7459016                1
## 21   2020        Hispanic  244 0.7172131                1
## 22   2020 Multi-Ethnicity   57 0.8245614                0
## 23   2020 Native American    9 0.6666667                0
## 24   2020           White  458 0.8711790                0
##    di_indicator_prop_index di_indicator_80_index
## 1                        0                     0
## 2                        0                     0
## 3                        0                     1
## 4                        0                     0
## 5                        0                     0
## 6                        0                     0
## 7                        0                     0
## 8                        0                     1
## 9                        0                     1
## 10                       0                     0
## 11                       0                     0
## 12                       0                     0
## 13                       0                     0
## 14                       0                     1
## 15                       0                     1
## 16                       0                     0
## 17                       0                     0
## 18                       0                     0
## 19                       0                     0
## 20                       0                     0
## 21                       0                     1
## 22                       0                     0
## 23                       1                     1
## 24                       0                     0

Dashboard Viz 3: Focus on degree/transfer students.{width=100%}

In a dashboard, the user could switch the outcome to English and disaggregate by Gender:

# Disaggregation: Gender; Deg/Transfer; English
df_di_summary %>%
  filter(Ed_Goal=='Deg/Transfer', College_Status=='- All', success_variable=='English', disaggregation=='Gender') %>%
  as.data.frame
##         Ed_Goal College_Status success_variable cohort_variable cohort
## 1  Deg/Transfer          - All          English  Cohort_English   2017
## 2  Deg/Transfer          - All          English  Cohort_English   2017
## 3  Deg/Transfer          - All          English  Cohort_English   2017
## 4  Deg/Transfer          - All          English  Cohort_English   2018
## 5  Deg/Transfer          - All          English  Cohort_English   2018
## 6  Deg/Transfer          - All          English  Cohort_English   2018
## 7  Deg/Transfer          - All          English  Cohort_English   2019
## 8  Deg/Transfer          - All          English  Cohort_English   2019
## 9  Deg/Transfer          - All          English  Cohort_English   2019
## 10 Deg/Transfer          - All          English  Cohort_English   2020
## 11 Deg/Transfer          - All          English  Cohort_English   2020
## 12 Deg/Transfer          - All          English  Cohort_English   2020
##    disaggregation  group    n success       pct ppg_reference
## 1          Gender Female 1916    1424 0.7432150     0.7496751
## 2          Gender   Male 1863    1411 0.7573806     0.7496751
## 3          Gender  Other   68      49 0.7205882     0.7496751
## 4          Gender Female 2833    2151 0.7592658     0.7597185
## 5          Gender   Male 3003    2296 0.7645688     0.7597185
## 6          Gender  Other  132      87 0.6590909     0.7597185
## 7          Gender Female 1385    1032 0.7451264     0.7577753
## 8          Gender   Male 1308    1003 0.7668196     0.7577753
## 9          Gender  Other   40      36 0.9000000     0.7577753
## 10         Gender Female  307     213 0.6938111     0.7192429
## 11         Gender   Male  315     234 0.7428571     0.7192429
## 12         Gender  Other   12       9 0.7500000     0.7192429
##    ppg_reference_group        moe    pct_lo    pct_hi di_indicator_ppg
## 1              overall 0.03000000 0.7132150 0.7732150                0
## 2              overall 0.03000000 0.7273806 0.7873806                0
## 3              overall 0.11884246 0.6017458 0.8394307                0
## 4              overall 0.03000000 0.7292658 0.7892658                0
## 5              overall 0.03000000 0.7345688 0.7945688                0
## 6              overall 0.08529805 0.5737929 0.7443890                1
## 7              overall 0.03000000 0.7151264 0.7751264                0
## 8              overall 0.03000000 0.7368196 0.7968196                0
## 9              overall 0.15495161 0.7450484 1.0549516                0
## 10             overall 0.05593155 0.6378795 0.7497426                0
## 11             overall 0.05521674 0.6876404 0.7980739                0
## 12             overall 0.28290163 0.4670984 1.0329016                0
##    di_prop_index di_indicator_prop_index di_80_index_reference_group
## 1      0.9913829                       0                        Male
## 2      1.0102784                       0                        Male
## 3      0.9612007                       0                        Male
## 4      0.9994041                       0                        Male
## 5      1.0063843                       0                        Male
## 6      0.8675462                       0                        Male
## 7      0.9833077                       0                       Other
## 8      1.0119352                       0                       Other
## 9      1.1876871                       0                       Other
## 10     0.9646408                       0                       Other
## 11     1.0328321                       0                       Other
## 12     1.0427632                       0                       Other
##    di_80_index di_indicator_80_index
## 1    0.9812967                     0
## 2    1.0000000                     0
## 3    0.9514216                     0
## 4    0.9930641                     0
## 5    1.0000000                     0
## 6    0.8620427                     0
## 7    0.8279182                     0
## 8    0.8520217                     0
## 9    1.0000000                     0
## 10   0.9250814                     0
## 11   0.9904762                     0
## 12   1.0000000                     0

Dashboard Viz 4: switch outcome to English and disaggregate by Gender.{width=100%}