The data.frame
provided to the data
argument must be arranged in a particular way (i.e. “long” or “tidy” format). Each row should be an alternative from a choice observation. The choice observations do not have to be symmetric (i.e. each choice observation could have a different number of alternatives). The data must include columns for each of the following arguments in the logitr()
function:
choiceName
: A dummy variable that identifies which alternative was chosen (1
= chosen, 0
= not chosen).obsIDName
: A sequence of numbers that identifies each unique choice occasion. For example, if the first three choice occasions had 2 alternatives each, then the first 9 rows of the obsID
variable would be .parNames
: The names of the variables that will be used as model covariates. For WTP space models, do not include the price variable in parNames
- this is provided separately with the priceName
argument.The data sets included in this package all follow this format (e.g. the yogurt data set).
Numeric variables are by default estimated with a single “slope” coefficient.
Example: Consider a data frame that contains a price
variable with the following three levels: c(10, 15, 20)
. Adding price
to the parNames
argument in the main logitr()
function would result in a single price
coefficient for the “slope” of the change in price.
Categorical variables (i.e. “character” or “factor” type variables) are by default estimated with a coefficient for all but the first “level”, which serves as the “baseline” or "0"
level. Categorical variables are automatically “dummy” coded: 0
for FALSE
and 1
for TRUE
.
Example: Consider a data frame that contains a brand
variable with the following four levels: c("dannon", "hiland", "weight", "yoplait")
. Adding brand
to the parNames
argument in the main logitr()
function would result in three covariates: brand_hiland
, brand_weight
, and brand_yoplait
, with brand_dannon
serving as the “dummied out” baseline level.
To model a continuous variable as a discrete variable with a coefficient for all but the first level, there are two options:
"character"
or "factor"
type.dummyCode()
function.The second approach of using the dummyCode()
function allows the modeler to specify the baseline level. It can also be used to create dummy-coded variables of a categorical variable.
Details for each approach are provided below.
The simplest way to model a continuous variable as a discrete variable is to convert the column in the data frame to a "character"
or "factor"
type prior to estimating the model. For example, consider the following model:
<- logitr(
model_default data = cars_us,
choiceName = 'choice',
obsIDName = 'obsnum',
parNames = c(
'price', 'hev', 'phev10', 'phev20', 'phev40', 'bev75', 'bev100',
'bev150', 'american', 'japanese', 'chinese', 'skorean',
'phevFastcharge', 'bevFastcharge','opCost', 'accelTime')
)
#> Running Model...
#> Done!
In this model, since the price
variable is a "double"
variable type, it is by default modeled as a continuous variable with a single “slope” coefficient:
typeof(cars_us$price)
#> [1] "double"
summary(model_default)
#> =================================================
#> MODEL SUMMARY:
#>
#> Model Space: Preference
#> Model Run: 1 of 1
#> Iterations: 20
#> Elapsed Time: 0h:0m:0.47s
#> Weights Used?: FALSE
#>
#> Model Coefficients:
#> Estimate StdError tStat pVal signif
#> price -0.073882 0.002049 -36.0612 0.0000 ***
#> hev 0.059741 0.073667 0.8110 0.4174
#> phev10 0.086141 0.078725 1.0942 0.2739
#> phev20 0.121737 0.079619 1.5290 0.1263
#> phev40 0.190580 0.079013 2.4120 0.0159 *
#> bev75 -1.185508 0.087262 -13.5856 0.0000 ***
#> bev100 -0.960710 0.086753 -11.0740 0.0000 ***
#> bev150 -0.707314 0.084204 -8.4000 0.0000 ***
#> american 0.173212 0.058839 2.9438 0.0033 **
#> japanese -0.027662 0.058507 -0.4728 0.6364
#> chinese -0.758692 0.062305 -12.1771 0.0000 ***
#> skorean -0.445575 0.060899 -7.3166 0.0000 ***
#> phevFastcharge 0.212802 0.059931 3.5508 0.0004 ***
#> bevFastcharge 0.215705 0.066998 3.2196 0.0013 **
#> opCost -0.120876 0.004429 -27.2948 0.0000 ***
#> accelTime -0.125380 0.011587 -10.8207 0.0000 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Model Fit Values:
#>
#> Log.Likelihood. -4616.9517861
#> Null.Log.Likelihood. -6328.0067827
#> AIC. 9265.9036000
#> BIC. 9372.4427000
#> McFadden.R2. 0.2703940
#> Adj..McFadden.R2 0.2678655
#> Number.of.Observations. 5760.0000000
To model price
as a categorical variable, simple change it to a "character"
or "factor"
type:
$price <- as.character(cars_us$price)
cars_ustypeof(cars_us$price)
#> [1] "character"
Now re-estimate the model:
<- logitr(
model_character_price data = cars_us,
choiceName = 'choice',
obsIDName = 'obsnum',
parNames = c(
'price', 'hev', 'phev10', 'phev20', 'phev40', 'bev75', 'bev100',
'bev150', 'american', 'japanese', 'chinese', 'skorean',
'phevFastcharge', 'bevFastcharge','opCost', 'accelTime')
)
#> Running Model...
#> Done!
Now price
is modeled as a categorical variable with a coefficient for all but the first level:
typeof(cars_us$price)
#> [1] "character"
summary(model_character_price)
#> =================================================
#> MODEL SUMMARY:
#>
#> Model Space: Preference
#> Model Run: 1 of 1
#> Iterations: 22
#> Elapsed Time: 0h:0m:0.6s
#> Weights Used?: FALSE
#>
#> Model Coefficients:
#> Estimate StdError tStat pVal signif
#> hev 0.062222 0.073722 0.8440 0.3987
#> phev10 0.088042 0.078788 1.1175 0.2638
#> phev20 0.121140 0.079657 1.5208 0.1284
#> phev40 0.192805 0.079087 2.4379 0.0148 *
#> bev75 -1.189301 0.087337 -13.6175 0.0000 ***
#> bev100 -0.962399 0.086843 -11.0820 0.0000 ***
#> bev150 -0.711546 0.084280 -8.4426 0.0000 ***
#> american 0.174072 0.058863 2.9573 0.0031 **
#> japanese -0.024672 0.058548 -0.4214 0.6735
#> chinese -0.758227 0.062362 -12.1585 0.0000 ***
#> skorean -0.445432 0.060921 -7.3116 0.0000 ***
#> phevFastcharge 0.211813 0.059977 3.5316 0.0004 ***
#> bevFastcharge 0.217633 0.067059 3.2454 0.0012 **
#> opCost -0.121119 0.004435 -27.3095 0.0000 ***
#> accelTime -0.125424 0.011592 -10.8196 0.0000 ***
#> price_18 -0.191639 0.054849 -3.4939 0.0005 ***
#> price_23 -0.657617 0.057072 -11.5227 0.0000 ***
#> price_32 -1.317372 0.060378 -21.8188 0.0000 ***
#> price_50 -2.546410 0.077090 -33.0317 0.0000 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Model Fit Values:
#>
#> Log.Likelihood. -4614.4422294
#> Null.Log.Likelihood. -6328.0067827
#> AIC. 9266.8845000
#> BIC. 9393.3996000
#> McFadden.R2. 0.2707906
#> Adj..McFadden.R2 0.2677880
#> Number.of.Observations. 5760.0000000
For the second option, you can use the dummyCode()
function to create new dummy-coded variables for all the levels of a continuous variable and then use those variables in the model:
<- dummyCode(df = cars_us, vars = "price")
cars_us_dummy names(cars_us_dummy)
#> [1] "id" "obsnum" "choice" "hev"
#> [5] "phev10" "phev20" "phev40" "bev75"
#> [9] "bev100" "bev150" "phevFastcharge" "bevFastcharge"
#> [13] "opCost" "accelTime" "american" "japanese"
#> [17] "chinese" "skorean" "weights" "price"
#> [21] "price_15" "price_18" "price_23" "price_32"
#> [25] "price_50"
The new cars_us_dummy
data frame now contains variables for each level of the price
column. This approach allows the modeler to specify the baseline level. In this example, I’ll use the price_50
level as the baseline:
<- logitr(
model_dummy_price data = cars_us_dummy,
choiceName = 'choice',
obsIDName = 'obsnum',
parNames = c(
"price_15", "price_18", "price_23", "price_32",
'hev', 'phev10', 'phev20', 'phev40', 'bev75', 'bev100',
'bev150', 'american', 'japanese', 'chinese', 'skorean',
'phevFastcharge', 'bevFastcharge','opCost', 'accelTime')
)
#> Running Model...
#> Done!
Now price
is modeled with a specified coefficient for all but the price_50
level:
summary(model_dummy_price)
#> =================================================
#> MODEL SUMMARY:
#>
#> Model Space: Preference
#> Model Run: 1 of 1
#> Iterations: 23
#> Elapsed Time: 0h:0m:0.7s
#> Weights Used?: FALSE
#>
#> Model Coefficients:
#> Estimate StdError tStat pVal signif
#> price_15 2.546463 0.077091 33.0321 0.0000 ***
#> price_18 2.354811 0.076783 30.6686 0.0000 ***
#> price_23 1.888845 0.075124 25.1431 0.0000 ***
#> price_32 1.229120 0.074531 16.4915 0.0000 ***
#> hev 0.062127 0.073722 0.8427 0.3994
#> phev10 0.087935 0.078787 1.1161 0.2644
#> phev20 0.121048 0.079657 1.5196 0.1287
#> phev40 0.192727 0.079086 2.4369 0.0148 *
#> bev75 -1.189114 0.087334 -13.6157 0.0000 ***
#> bev100 -0.962333 0.086842 -11.0814 0.0000 ***
#> bev150 -0.711515 0.084279 -8.4424 0.0000 ***
#> american 0.173844 0.058862 2.9534 0.0032 **
#> japanese -0.024653 0.058548 -0.4211 0.6737
#> chinese -0.758252 0.062361 -12.1590 0.0000 ***
#> skorean -0.445564 0.060921 -7.3138 0.0000 ***
#> phevFastcharge 0.211886 0.059977 3.5328 0.0004 ***
#> bevFastcharge 0.217423 0.067058 3.2423 0.0012 **
#> opCost -0.121112 0.004435 -27.3084 0.0000 ***
#> accelTime -0.125420 0.011592 -10.8193 0.0000 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Model Fit Values:
#>
#> Log.Likelihood. -4614.4422281
#> Null.Log.Likelihood. -6328.0067827
#> AIC. 9266.8845000
#> BIC. 9393.3996000
#> McFadden.R2. 0.2707906
#> Adj..McFadden.R2 0.2677880
#> Number.of.Observations. 5760.0000000