problem-11.14

problem-11.14  The boxplot of amount broken up by year shows skewed data.
> boxplot(amount ~ factor(year), data=npdb)
    
Once a logarithm is taken, the data appears to come from a symmetric distribution.
> boxplot(log(amount) ~ factor(year), data=npdb)
    
A one-way analysis of variance can be performed using lm() as follows:
> res = lm(log(amount) ~ factor(year), subset= year < 2003, npdb)
> summary(res)

Call:
lm(formula = log(amount) ~ factor(year), data = npdb, subset = year <
    2003)

Residuals:
    Min      1Q  Median      3Q     Max
-6.7957 -1.2743  0.0014  1.4730  6.3266

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
(Intercept)       10.7078     0.0276  387.40   <2e-16 ***
factor(year)2001  -0.4872     0.0519   -9.39   <2e-16 ***
factor(year)2002  -1.2851     0.0955  -13.45   <2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 1.87 on 6789 degrees of freedom
Multiple R-Squared: 0.0336,     Adjusted R-squared: 0.0333
F-statistic:  118 on 2 and 6789 DF,  p-value: <2e-16
    
The p-value for the F-test is tiny, indicating that the null hypothesis is not likely to have yielded this data.