problem-4.17

problem-4.17  The following commands will create the boxplots
> attach(npdb)
> tmp = split(amount,ID)
> df = data.frame(sum=sapply(tmp,sum),number=sapply(tmp,length))
> boxplot(sum ~ number, data = df) ## or even better
> boxplot(log(sum) ~ number, data = df)
> detach(npdb)
    
Based on the latter graph, the two or more awards appear higher; the total amounts aren't even comparable. To see this, again we can use and then sapply() as follows:
> attach(df)
> tmp =  sapply(split(sum,number),sum)
> tmp
         1          2          3          4          5
1034406350   81199650    4400500    2593750    1995000
         6          8         11         15         22
   1090000     960000     243550    1492500     855250
        73
    813500
> tmp/sum(tmp)
        1         2         3         4         5         6
0.9153633 0.0718549 0.0038941 0.0022953 0.0017654 0.0009646
        8        11        15        22        73
0.0008495 0.0002155 0.0013207 0.0007568 0.0007199
    
(An obvious complaint-that there aren't enough years in the data set to catch all the repeat offenders-is valid. The full data set shows a much less skewed picture.)