Exercise 4: Multinomial probit

Kenneth Train and Yves Croissant

2020-10-02

We have data on the mode choice of 453 commuters. Four modes are available: (1) bus, (2) car alone, (3) carpool, and (4) rail. We have data for each commuter on the cost and time on each mode and the chosen mode.

mlogit estimates the multinomial probit model if the probit argument is TRUE using the GHK procedure. This program estimates the full covariance matrix subject to normalization.

More precisely, utility differences are computed respective to the reference level of the response (by default the bus alternative) and the 3 \(\times\) 3 matrix of covariance is estimated. As the scale of utility is unobserved, the first element of the matrix is further set to 1. The Choleski factor of the covariance is :

\[ L = \left( \begin{array}{ccc} 1 & 0 & 0 \\ \theta_{32} & \theta_{33} & 0 \\ \theta_{42} & \theta_{43} & \theta_{44} \end{array} \right) \]

that is, five covariance parameters are estimated. The covariance matrix of the utility differences is then \(\Omega = L L^{\top}\). By working in a Choleski factor that has this form, the normalization constraints are automatically imposed and the covariance matrix is guaranteed to be positive semi-definite (with the covariance matrix for any error differences being positive-definite).

Estimate a model where mode choice is explained by the time and the cost of an alternative, using 100 draws and set the seed to

Calculate the covariance matrix that is implied by the estimates of the Choleski factor. What, if anything, can you say about the degree of heteroskedasticity and correlation? For comparison, what would the covariance matrix be if there were no heteroskedasticity or correlation (ie, iid errors for each alternative)? Can you tell whether the covariance is higher between car alone and carpool or between bus and rail?

library("mlogit")
data("Mode", package="mlogit")
Mo <- dfidx(Mode, choice = "choice", varying = 2:9)

p1 <- mlogit(choice ~ cost + time, Mo, seed = 20, 
             R = 100, probit = TRUE)

summary(p1)

## 
## Call:
## mlogit(formula = choice ~ cost + time, data = Mo, probit = TRUE, 
##     R = 100, seed = 20)
## 
## Frequencies of alternatives:choice
##     bus     car carpool    rail 
## 0.17881 0.48124 0.07064 0.26932 
## 
## bfgs method
## 20 iterations, 0h:0m:18s 
## g'(-H)^-1g = 7.71E-07 
## gradient close to zero 
## 
## Coefficients :
##                       Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept):car      1.8308660  0.2506434  7.3047 2.780e-13 ***
## (Intercept):carpool -1.2816819  0.5677813 -2.2574 0.0239861 *  
## (Intercept):rail     0.3093510  0.1151701  2.6860 0.0072305 ** 
## cost                -0.4134401  0.0731593 -5.6512 1.593e-08 ***
## time                -0.0466552  0.0068263 -6.8347 8.220e-12 ***
## car.carpool          0.2599724  0.3850288  0.6752 0.4995472    
## car.rail             0.7364869  0.2145744  3.4323 0.0005985 ***
## carpool.carpool      1.3078947  0.3916729  3.3393 0.0008400 ***
## carpool.rail        -0.7981842  0.3463735 -2.3044 0.0212000 *  
## rail.rail            0.4301303  0.4874624  0.8824 0.3775677    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -347.92
## McFadden R^2:  0.36012 
## Likelihood ratio test : chisq = 391.62 (p.value = < 2.22e-16)

The estimated Choleski factor \(L_1\) is :

L1 <- matrix(0, 3, 3)
L1[! upper.tri(L1)] <- c(1, coef(p1)[6:10])

Multiplying L1 by its transpose gives \(\Omega_1\) :

L1 %*% t(L1)

##           [,1]       [,2]       [,3]
## [1,] 1.0000000  0.2599724  0.7364869
## [2,] 0.2599724  1.7781743 -0.8524746
## [3,] 0.7364869 -0.8524746  1.3645231

With iid errors, \(\Omega_1\) would be :

\[ \left( \begin{array}{ccc} 1 & 0.5 & 0.5 \\ 0.5 & 1 & 0.5 \\ 0.5 & 0.5 & 1 \\ \end{array} \right) \]

I find it hard to tell anything from the estimated covariance terms.

I agree: it is hard -- if not impossible -- to meaningfully interpret the covariance parameters when all free parameters are estimated. However, the substitutiuon patterns that the estimates imply can be observed by forecasting with the model; we do this in exercise 4 below. Also, when structure is placed on the covariance matrix, the estimates are usually easier to interpret; this is explored in exercise 6.]

Change the seed to 21 and rerun the model. (Even though the seed is just one digit different, the random draws are completely different.) See how much the estimates change. Does there seem to be a relation between the standard error of a parameter and the amount that its estimate changes with the new seed? (Of course, you are only getting two estimates, and so you are not estimating the true simulation variance very well. But the two estimates will probably give you an indication.)

p2 <- mlogit(choice ~ cost + time, Mo, seed = 21, 
             R = 100, probit = TRUE)

coef(p2)

##     (Intercept):car (Intercept):carpool    (Intercept):rail 
##          1.87149948         -1.28893595          0.31455318 
##                cost                time         car.carpool 
##         -0.43068703         -0.04752315          0.22888163 
##            car.rail     carpool.carpool        carpool.rail 
##          0.69781113          1.33071717         -0.56802431 
##           rail.rail 
##          0.71060138

The estimates seem to change more for parameters with larger standard error, though this is not uniformly the case by any means. One would expect larger samplign variance (which arises from a flatter \(\ln L\) near the max) to translate into greater simulation variance (which raises when the location of the max changes with different draws).

Compute the probit shares (average probabilities) under user-specified parameters and data. How well do predicted shares match the actual share of commuters choosing each mode?

The actual shares are :

actShares <- tapply(Mo$choice, Mo$id2, mean)

The predicted shares are :

predShares <- apply(fitted(p1, outcome = FALSE), 2, mean)
rbind(predShares, actShares)

##                  bus       car    carpool      rail
## predShares 0.1759623 0.4823958 0.06923727 0.2720977
## actShares  0.1788079 0.4812362 0.07064018 0.2693157

sum(predShares)

## [1] 0.9996931

The correspondence is very close but not exact.

Note: Simulated GHK probabilities do not necessarily sum to one over alternatives. This summing-up error at the individual level tends to cancel out when the probabilities are averaged over the sample. The forecasted shares (ie, average probabilities) sum to 0.9991102, which is only slightly different from 1.

We want to examine the impact of a large tax on driving alone. Raise the cost of the car alone mode by 50% and forecast shares at these higher costs. Is the substitution proportional, as a logit model would predict? Which mode has the greatest percent increase in demand? What is the percent change in share for each mode?

Mo2 <- Mo
Mo2[idx(Mo2, 2) == 'car', 'cost'] <- Mo2[idx(Mo2, 2) == 'car', 'cost'] * 2
newShares <- apply(predict(p1, newdata = Mo2), 2, mean)
cbind(original = actShares, new = newShares, 
      change = round((newShares - actShares) / actShares * 100))

##           original       new change
## bus     0.17880795 0.2517897     41
## car     0.48123620 0.1689331    -65
## carpool 0.07064018 0.1432554    103
## rail    0.26931567 0.4356097     62

Substitution is not proportional. Carpool gets the largest percent increase.