Warning message with perccalc package

Jorge Cimentada

2017-09-14

While the other vignette shows you how to use perccalc appropriately, there are instances where there’s just too few categories to estimate percentiles properly. Imagine estimating a distribution of 1:100 percentiles with only three ordered categories, it just sounds too far fetched.

Let’s load our packages.

library(perccalc)
library(tidyverse)

For example, take the survey data on smoking habits.

smoking_data <-
  MASS::survey %>% # you will need to install the MASS package
  as_tibble() %>%
  select(Sex, Smoke, Pulse) %>%
  rename(
    gender = Sex,
    smoke = Smoke,
    pulse_rate = Pulse
  )

The final results is this dataset:

## # A tibble: 237 x 3
##    gender  smoke pulse_rate
##    <fctr> <fctr>      <int>
##  1   Male  Never         35
##  2 Female  Never         40
##  3 Female  Never         48
##  4   Male  Never         48
##  5 Female  Never         50
##  6 Female  Regul         50
##  7   Male  Regul         54
##  8   Male  Never         55
##  9   Male  Never         56
## 10   Male  Never         59
## # ... with 227 more rows

Note that there’s only four categories in the smoke variable. Let’s try to estimate the percentile difference.

smoking_data <-
  smoking_data %>%
  mutate(smoke = factor(smoke,
                        levels = c("Never", "Occas", "Regul", "Heavy"),
                        ordered = TRUE))

perc_diff(smoking_data, smoke, pulse_rate)
## Warning in perc_diff(smoking_data, smoke, pulse_rate): Too few categories in categorical variable to estimate the
##       variance-covariance matrix and standard errors. Proceeding without
##       estimated standard errors but perhaps you should increase the number
##       of categories
## difference 
##   385.1357

perc_diff returns the estimated coefficient but also warns you that it’s difficult for the function to estimate the standard error. This happens similarly for perc_dist.

perc_dist(smoking_data, smoke, pulse_rate) %>%
  head()
## Warning in perc_dist(smoking_data, smoke, pulse_rate): Too few categories in categorical variable to estimate the
##       variance-covariance matrix and standard errors. Proceeding without
##       estimated standard errors but perhaps you should increase the number
##       of categories
##   percentile  estimate
## 1          1  24.23446
## 2          2  47.82656
## 3          3  70.78474
## 4          4  93.11743
## 5          5 114.83308
## 6          6 135.94011