Introduction

This is a companion to the Wikiversity article on “Time to extinction of civilization”. That article assumes the 1962 Cuban Missile Crisis and the 1983 Soviet nuclear false alarm incident provide one observation on the time between major nuclear crises, with a second time between such crises being censored at the present.

What can we say about the distribution of the time between such major nuclear crises?

With one observed time and a second censored, we can construct a likelihood function, which we can then use to estimate the mean time between such crises and the uncertainty in that estimate. With further estimates of the probability that such a crisis would lead to a nuclear war and nuclear winter, we can simulate such times and obtain plausible bounds on uncertainty in our estimates.

This methodology could later be expanded to consider a larger list of nuclear crises with a broader range of probabilities for each crisis escalating to a nuclear war and winter. The fact that no such nuclear war has occurred as of this writing puts an upper limit on such probabilities. A rough lower limit can be estimated from comments from people like Robert McNamara and Daniel Ellsberg, both of whom have said that as long as there are large nuclear arsenals on earth, it is only a matter of time before a nuclear crises escalates to such a nuclear Armageddon. McNamara was US Secretary of Defense during the 1962 Cuban Missile Crisis, and Ellsberg as a leading nuclear war planner advising McNamara and the rest of President Kennedy’s team during that crisis. For more on this, see the companion Wikiversity article on “Time to extinction of civilization”.

We start by being explicit about the observed and censored times between major nuclear crises.

Times of major nuclear crises

str(eventDates <- c(as.Date(
  c('1962-10-16', '1983-09-26')), Sys.Date()))
##  Date[1:3], format: "1962-10-16" "1983-09-26" "2022-10-06"
(daysBetween <- difftime(tail(eventDates, -1), 
        head(eventDates, -1), units='days'))
## Time differences in days
## [1]  7650 14255
(yearsBetween <- as.numeric(daysBetween)/365.24)
## [1] 20.94513 39.02913
names(yearsBetween) <- c('observed', 'censored')
str(yearsBetween)
##  Named num [1:2] 20.9 39
##  - attr(*, "names")= chr [1:2] "observed" "censored"

Likelihood of times between major nuclear crises

Appendix 1 of that Wikiversity article provides the following likelihood assuming we observe times between major nuclear crises, \(T_1, ..., T_{k-1},\) plus one censoring time, \(T_k\), and the times between such crises follow an exponential distribution that does not change over time:

\[ L(\lambda | \mathbf{T}) = \exp[−S_k / \lambda ] / \lambda^{k-1} \]

where \(\mathbf{T}\) = the vector consisting of \(T_1, ..., T_k\), and

\[ S_k = \sum_{i=1}^k{T_i}. \]

The exponential distribution is the simplest lifetime distribution. It is widely used for applications like this and seems reasonable in this context.

[For setting math in RMarkdown, we are following Cosma Shalizi (2016) “Using R Markdown for Class Reports”.]

We code this as follows:

Lik <- function(lambda, Times=yearsBetween){
  Lik <- (exp(-sum(Times)/lambda) / 
      (lambda^(length(Times)-1)))
  Lik
}

From this, we compute the log(likelihood) as follows:

\[ l(\lambda | \mathbf{T}) = [(−S_k / \lambda) - (k-1)\log(\lambda)]. \]

We code this as follows:

logLk <- function(lambda, Times=yearsBetween){
  logL <- (-sum(Times)/lambda - 
        (length(Times)-1)*log(lambda))
  logL
}

By differentiating \(l\) with respect to \(\lambda\) or \(u = \log(\lambda)\) or \(\theta = 1/\lambda\), we get a score function that is zero when \(\lambda\) is \(\sum T_i/(k-1)\), where the “-1” comes from assuming that only the last of Times is censored.

The value of parameter(s) that maximize the likelihood is (are) called maximum likelihood estimates (MLEs), and it is standard to distinguish an MLE with a circumflex (^). We use this convention to write the following:

\[ \hat\lambda = \sum T_i / (k-1). \]

This is commonly read “lambda hat”. We code it as follows:

(lambdaHat <- (sum(yearsBetween) / 
                 (length(yearsBetween)-1)))
## [1] 59.97426

From Wilks’ theorem, we know that 2*log(likelihood ratio) is approximately Chi-squared with degrees of freedom equal to the number of parameters estimated, which is 1 in this case.

#(chisq2 <- qchisq(
#             c(.8, .95, .99, .999, 1-1e-6), 1))
(chisq2 <- qchisq(c(.2, .05, .01, .001, 1e-6), 
                  1, lower.tail=FALSE))
## [1]  1.642374  3.841459  6.634897 10.827566 23.928127

In reality, because of the questionable nature of our assumptions, we may wish to place less confidence in these numbers than what is implied by the stated confidence levels. However, we will not change these numbers but leave it to the reader to downgrade them as seems appropriate.

Also, in the following, we will mark the 80, 95, and 99 percent confidence intervals on the plots, leaving the more extreme tails for separate computations. For now, we want to plot 2*log(likelihood) in a neighborhood of the MLE. For this, we will focus on the region that is closer than chisq2/2 of the maximum:

lambda <- lambdaHat+seq(-50, 1000, 50)
(logLR2 <- 2*(logLk(lambdaHat) - logLk(lambda)))
##  [1] 6.4379881 0.0000000 0.3033580 0.7119937 1.0773924 1.3947204 1.6721103
##  [8] 1.9174482 2.1369335 2.3352823 2.5160945 2.6821542 2.8356477 2.9783179
## [15] 3.1115741 3.2365704 3.3542632 3.4654531 3.5708169 3.6709316 3.7662931
## [22] 3.8573305

After several attempts at adjusting the parameters for the seq function while including 0 in the sequence, I gave up trying to get a range with 0 inside and just over qchisq(0.99, 1) = 6.63 on both ends: Obviously, I got it for \(\lambda\) small. However, it seemed infeasible to do this with only a relatively few points for \(\lambda\) large: The MLE here is only the second of 22 evenly-spaced points on the \(\lambda\) scale while the value for the 22nd point is not close to the target 6.63. Let’s try \(u = \log(\lambda)\):

l_lam <- log(lambdaHat)+seq(-2, 6, 1)
(logLR2_u <- 2*(logLk(lambdaHat) - logLk(exp(l_lam))))
## [1]  8.7781122  1.4365637  0.0000000  0.7357589  2.2706706  4.0995741  6.0366313
## [8]  8.0134759 10.0049575

This seems more sensible, though still somewhat skewed, with the MLE as the third of 9 evenly-spaced points on the \(u\) scale. What about \(\theta = 1/\lambda\)?

theta <- (1/lambdaHat + seq(-.016, .06, .004))
(logLR2_th <- 2*(logLk(lambdaHat) - logLk(1/theta)))
##  [1] 4.49809127 1.10434426 0.34747298 0.06880869 0.00000000 0.04973740
##  [7] 0.17578229 0.35509289 0.57370772 0.82252381 1.09527493 1.38746302
## [13] 1.69574980 2.01758940 2.35099587 2.69439012 3.04649587 3.40626684
## [19] 3.77283463 4.14547051

This looks worse: With \(\hat\theta = 1/\hat\lambda = 0.0178\), it seems infeasible to get a sequence of only a few equally-spaced positive numbers that include 0 and still produce numbers just over qchisq(0.99, 1) = 6.63 on either ends, as we created on the \(\lambda\) scale, let alone both as we have on the \(u\) scale.

This suggests we should parameterize our analysis in terms of \(u = \log(\lambda)\). To confirm this, let’s create plots on all three scales, starting with \(u\).

However, to save space on CRAN, we will not plot them by default; to see the plots, a user will need to manually set makePlots <- TRUE:

makePlots <- FALSE 

library(grDevices)
outType <- ''
#outType = 'png'

switch(outType, 
       svg=svg('yrs2Armageddon.svg'), 
#   need png(..., 960, 960), because the default 480 
#   is not sufficiently clear to easily read the labels       
       png=png('yrs2Armageddon.png', 960, 960)
)
op <- par(mar=c(6, 4, 4, 2)+.1)

# Experiment with the range of "seq" here until 
# head and tail of logLR2_u. are just over 6.63:
u. <- log(lambdaHat)+seq(-1.86, 4.36, .02)
lam. <- exp(u.)
logLR2_u. <- 2*(logLk(lambdaHat) - logLk(lam.))
head(logLR2_u., 1)
## [1] 7.127474
tail(logLR2_u., 1)
## [1] 6.745557
if(makePlots){
  plot(lam., logLR2_u., type='l', bty='n', log='x', 
     xlab='', ylab='', las=1, axes=FALSE, lwd=2)
  axis(1, padj=-1)
  axis(2, las=1)

# xlab = \lambda:  
# Greek letters did not render in GIMP 2.10.8 on 2018-12-30, 
# so don't use svg until this is fixed.  
  switch(outType, 
# cex doesn't work properly with svg > GIMP 
# Therefore, I can NOT use svg
    svg={cex2 <- 2; mtext('lambda', 1, 1.6, cex=cex2)}, 
    png={cex2 <- 2; mtext(expression(lambda), 1, 
                        1.6, cex=cex2)}, 
    {cex2 <- 1.3; mtext(expression(lambda), 1, 
                       1.6, cex=cex2)}
  )

  lamTicks <- axTicks(1)
  thTicks <- 1/lamTicks
  axis(1, lamTicks, thTicks, line=3, padj=-1)
  switch(outType, 
    svg=mtext('theta == 1/lambda', 1, 4.9, cex=cex2), 
    mtext(expression(theta == 1/lambda), 1, 4.9, cex=cex2)
  )

  abline(h=chisq2, col='red', lty=c('dotted', 'dashed'), 
       lwd=2)
  (CI.8 <- range(lam.[logLR2_u. <= chisq2[1]]))
  text(lambdaHat, chisq2[1], 
     paste0('80% CI =\n(', 
       paste(round(CI.8), collapse=', '), ')'), 
     cex=cex2)

  (CI.95 <- range(lam.[logLR2_u. <= chisq2[2]]))
  text(lambdaHat, chisq2[2], 
     paste0('95% CI =\n(', 
       paste(round(CI.95), collapse=', '), ')'), 
     cex=cex2)
  abline(v=CI.8, col='red', lty='dotted', lwd=2)
  abline(v=CI.95, col='red', lty='dashed', lwd=2)

  (CI.99 <- range(lam.[logLR2_u. <= chisq2[3]]))
  text(lambdaHat, chisq2[3], 
     paste0('99% CI =\n(', 
       paste(round(CI.99), collapse=', '), ')'), 
     cex=cex2)

  abline(v=CI.8, col='red', lty='dotted', lwd=2)
  abline(v=CI.95, col='red', lty='dashed', lwd=2)
  abline(v=CI.99, col='red', lty='dashed', lwd=2)
  if(outType != '')dev.off()

  par(op)
}

Let’s produce this same plot without the log scale for \(\lambda\):

# copy the code from the last snippet 
# and delete "log='x'", then adjust the placement 
# of CI text
switch(outType, 
       svg=svg('yrs2Armageddon_lin.svg'), 
#   need png(..., 960, 960), because the default 480 
#   is not sufficiently clear to easily read the labels
       png=png('yrs2Armageddon_lin.png', 960, 960)
)
op <- par(mar=c(6, 4, 4, 2)+.1)

u. <- log(lambdaHat)+seq(-1.86, 4.36, .02)
lam. <- exp(u.)
logLR2_u. <- 2*(logLk(lambdaHat) - logLk(lam.))
head(logLR2_u., 1)
## [1] 7.127474
tail(logLR2_u., 1)
## [1] 6.745557
if(makePlots){
  plot(lam., logLR2_u., type='l', bty='n', 
     xlab='', ylab='', las=1, axes=FALSE, lwd=2)
  axis(1, padj=-1)
  axis(2, las=1)

# xlab = \lambda:  
# Greek letters did not render in GIMP 2.10.8 on 2018-12-30, 
# so don't use svg until this is fixed.  
  switch(outType, 
# cex doesn't work properly with svg > GIMP 
# Therefore, I can NOT use svg
    svg={cex2 <- 2; mtext('lambda', 1, 1.6, cex=cex2)}, 
    png={cex2 <- 2; mtext(expression(lambda), 1, 
                          1.6, cex=cex2)}, 
    {cex2 <- 1.3; mtext(expression(lambda), 1, 
                        1.6, cex=cex2)}
  )

  lamTicks <- axTicks(1)
  thTicks <- 1/lamTicks
  axis(1, lamTicks, thTicks, line=3, padj=-1)
  switch(outType, 
    svg=mtext('theta == 1/lambda', 1, 4.9, cex=cex2), 
    mtext(expression(theta == 1/lambda), 1, 4.9, cex=cex2)
  )

  abline(h=chisq2, col='red', lty=c('dotted', 'dashed'), 
       lwd=2)
  (CI.8 <- range(lam.[logLR2_u. <= chisq2[1]]))
#text(lambdaHat, chisq2[1], 
  text(400, chisq2[1], 
      paste0('80% CI =\n(', 
       paste(round(CI.8), collapse=', '), ')'), 
     cex=cex2)

  (CI.95 <- range(lam.[logLR2_u. <= chisq2[2]]))
#text(lambdaHat, chisq2[2], 
  text(800, chisq2[2], 
     paste0('95% CI =\n(', 
       paste(round(CI.95), collapse=', '), ')'), 
     cex=cex2)
  abline(v=CI.8, col='red', lty='dotted', lwd=2)
  abline(v=CI.95, col='red', lty='dashed', lwd=2)

  (CI.99 <- range(lam.[logLR2_u. <= chisq2[3]]))
#text(lambdaHat, chisq2[3], 
  text(3000, chisq2[3],      
     paste0('99% CI =\n(', 
       paste(round(CI.99), collapse=', '), ')'), 
     cex=cex2)

  abline(v=CI.8, col='red', lty='dotted', lwd=2)
  abline(v=CI.95, col='red', lty='dashed', lwd=2)
  abline(v=CI.99, col='red', lty='dashed', lwd=2)

  if(outType != '')dev.off()
}
par(op)

The plot vs. \(\log(\lambda)\) is obviously skewed, but this linear plot is vastly worse.

What about linear in \(\theta = 1/\lambda\)?

switch(outType, 
       svg=svg('yrs2Armageddon_inverse.svg'), 
       png=png('yrs2Armageddon_inverse.png', 960, 960)
)
op <- par(mar=c(6, 4, 4, 2)+.1)

# This will require more changes than just deleting log='x':
u. <- log(lambdaHat)+seq(-1.86, 4.36, .02)
lam. <- exp(u.)
logLR2_u. <- 2*(logLk(lambdaHat) - logLk(lam.))
head(logLR2_u., 1)
## [1] 7.127474
tail(logLR2_u., 1)
## [1] 6.745557
if(makePlots){
  plot(-1/lam., logLR2_u., type='l', bty='n', 
     xlab='', ylab='', las=1, axes=FALSE, lwd=2)

  thTicks <- (-axTicks(1))
  axis(1, -thTicks, abs(1/thTicks), padj=-1)
  axis(2, las=1)

# xlab = \lambda:  
# Greek letters did not render in GIMP 2.10.8 on 2018-12-30, 
# so don't use svg until this is fixed.  
  switch(outType, 
# cex doesn't work properly with svg > GIMP 
# Therefore, I can NOT use svg
    svg={cex2 <- 2; mtext('lambda', 1, 1.6, cex=cex2)}, 
    png={cex2 <- 2; mtext(expression(lambda), 1, 
                          1.6, cex=cex2)}, 
    {cex2 <- 1.3; mtext(expression(lambda), 1, 
                        1.6, cex=cex2)}
  )

  axis(1, -thTicks, thTicks, line=3, padj=-1)
  switch(outType, 
    svg=mtext('theta == 1/lambda', 1, 4.9, cex=cex2), 
    mtext(expression(theta == 1/lambda), 1, 4.9, cex=cex2)
  )

  abline(h=chisq2, col='red', lty=c('dotted', 'dashed'), 
       lwd=2)
  (CI.8 <- range(lam.[logLR2_u. <= chisq2[1]]))
#text(lambdaHat, chisq2[1], 
  text(-.02, chisq2[1], 
      paste0('80% CI =\n(', 
       paste(round(CI.8), collapse=', '), ')'), 
     cex=cex2)

  (CI.95 <- range(lam.[logLR2_u. <= chisq2[2]]))
#text(lambdaHat, chisq2[2], 
  text(-.04, chisq2[2], 
     paste0('95% CI =\n(', 
       paste(round(CI.95), collapse=', '), ')'), 
     cex=cex2)
  abline(v=CI.8, col='red', lty='dotted', lwd=2)
  abline(v=CI.95, col='red', lty='dashed', lwd=2)

  (CI.99 <- range(lam.[logLR2_u. <= chisq2[3]]))
#text(lambdaHat, chisq2[3], 
  text(-.06, chisq2[3],      
     paste0('99% CI =\n(', 
       paste(round(CI.99), collapse=', '), ')'), 
     cex=cex2)

  abline(v=-1/CI.8, col='red', lty='dotted', lwd=2)
  abline(v=-1/CI.95, col='red', lty='dashed', lwd=2)
  abline(v=-1/CI.99, col='red', lty='dashed', lwd=2)

  if(outType != '')dev.off()
}
par(op)

Clearly, we don’t want to mess with the \(\theta = 1/\lambda\) scale, and $ () seems the best for understanding what’s happening here.

Monte Carlo the time between major nuclear crises

If we had a probability distribution for \(\lambda\), \(u = \log(\lambda)\), or \(\theta = 1/\lambda\), we could simulate that. To get such, we recall that one statement of Bayes’ theorem is that the posterior is proportional to the likelihood times the prior.

However, what should we use as a prior? The Wikipedia article on the exponential distribution describes several more or less standard priors for that distribution. There’s not just one, and they all seem more complicated than what we need here. Instead, we will use the improper prior that is uniform in \(u = \log(\lambda)\). To support this, we note that the exponential distribution is closer to the lognormal than to a normal, and the distribution of the reciprocal of an exponential random variable is even farther from normal, as we see in the following simulation:

set.seed(1)
simExp <- rexp(1000)
if(makePlots){
  qqnorm(simExp, datax=TRUE)
  qqnorm(simExp, datax=TRUE, log='x')
  qqnorm(1/simExp, datax=TRUE)
}

Let’s rewrite the above likelihood in terms of \(u\):

\[ L(u | \mathbf{T}) = \exp[−S_k e^{-u} - (k-1)u]. \]

With an improper prior locally uniform in \(u = \log(\lambda)\), we get the following:

\[ P(a < \lambda \leq b | \mathbf{T}) \propto \int_{\log(a)}^{\log(b)}{\exp[-S_k e^{-u}] e^{-(k-1)u} du} \]

Let’s transform this back by replacing \(u\) with \(\lambda = e^u\):

\[ P(a < \lambda \leq b | \mathbf{T}) \propto \int_a^b \exp[-S_k / \lambda] \lambda^{-k} d\lambda \]

This says that the posterior for \(\lambda\) follows an inverse-gamma distribution with shape parameter \((k-1)\) and scale \(S_k\). The moments for this distribution are as follows:

\[ \mathbb{E}(\lambda^r | \mathbf{T}) = S_k^r \Gamma(k-1-r) / \Gamma(k-1). \]

If \((k-1-r)\) is an integer less than 1, this is infinite, which it is for \(k\) = 2 and \(r\) = 1, the case of most interest here. This, in turn, means that the sample moments of real or Monte Carlo data will be highly erratic.

This also elevates the priority for increasing \(k\) by considering a larger list of nuclear crises, as previously mentioned.

Functions to compute the density, cumulative probability distribution (CDF), quantiles, and random numbers for this distribution are available in the CRAN package invgamma. We will use those in the following.

library(invgamma)
set.seed(123)

rlambda2 <- function(n, sumTimes=lambdaHat){
#  -sumTimes/log(runif(n))
# k <- 2; rinvgamma(n k-1, scale=sumTimes)  
  rinvgamma(n, 1, rate=sumTimes)
}
simLam <- rlambda2(1e4)
quantile(simLam)
##           0%          25%          50%          75%         100% 
## 5.732371e+00 4.381998e+01 8.829476e+01 2.090371e+02 2.569904e+05
mean(simLam)
## [1] 467.9889

This distribution is obviously highly skewed as expected, with a mode (MLE) at 56 years and a mean estimated here at 439

Probability that a major nuclear war might lead to the extinction of civilization

The analysis in the Wikiversity article on “time to extinction of civilization” includes estimated of the probability that a major nuclear crisis like the 1962 Cuban Missile Crisis or the 1983 Soviet nuclear false alarm incident would lead to a major nuclear war. The numbers given there ranged from 0.3 to 0.6 with a typical number of 0.45. For present purposes, we shall assume that (0.3, 0.6) represent an 80 percent, equal tail confidence interval of a beta distribution and estimate its two shape parameters, \(\alpha\) and \(\beta\).

Doing this requires an iteration, because no simple formula exists for this. We want to shape1 = \(\alpha\) and shape2 = \(\beta\) to satisfy the following:

0.1 = pbeta(0.3, shape1, shape2)
0.9 = pbeta(0.6, shape1, shape2)

The most reliable way to solve equations like these is to convert this into a minimization problem and use something like optim:

Dev2 <- function(shapes, p=c(.3, .6), q=c(.1, .9)){
  devs <- (q - pbeta(p, shapes[1], shapes[2]))
  sum(devs^2)
}
# test
Dev2(c(1, 1))
## [1] 0.13

The beta distribution with parameters (1, 1) is just the uniform distribution. Manual computation shows that this is the correct answer for this case.

(betaSolve <-optim(c(1,1), Dev2, 
            method="L-BFGS-B", lower=c(0,0)))
## $par
## [1] 7.940077 9.757428
## 
## $value
## [1] 3.313248e-14
## 
## $counts
## function gradient 
##       16       16 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

What’s the mean of this distribution?

Recall that the mean of the \(B(\alpha, \beta)\) distribution is \(\alpha / (\alpha+\beta)\), and its variance is as follows:

\[ \mathrm{var}(Q) = \alpha\beta / [(\alpha+\beta)^2 (\alpha+\beta+1)] \]

a.b <- sum(betaSolve$par)
(meanBeta <- betaSolve$par[1]/a.b)
## [1] 0.4486552
(varBeta <- with(betaSolve, par[1]*par[2] / 
            (a.b^2 * (a.b+1))))
## [1] 0.01322977

That’s quite close to the representative value of 0.45 discussed in the companion Wikiversity article on “time to extinction of civilization”.

Monte Carlo time to extinction of civilization

This section will start with a function to generate N random times to Armageddon as follows:

  1. start timing
  2. Generate N random variates Q ~ \(B(\alpha, \beta)\) indicating the probability that each simulated crisis in a sequence would produce a nuclear Armageddon.
  3. From this, generate N random variables K ~ \(NB(Q, 1)\) indicating the number of simulated crises in a series required to produce one nuclear Armageddon with Q[i] = probability of each being the last, for i = 1, ..., N.
  4. For each i, compute Time[i] <- sum(rlambda2(K[i])).
  5. compute elapsed.time
  6. Return (Time, gammapars, gammaGOF, elapsed.time)

First do this with set.seed(1), N=10 and time the result. Then set.seed(j), N=10^j, j = 2, 3, …, timing each one. Save the results until the time gets too long to continue or we get to N = 1e7.

mcArmageddon <- function(N, 
    betapars=betaSolve$par, 
    sumTimes=lambdaHat){
# 1.  Start time
  start <- proc.time()
# 2.  Q ~ B(\alpha, \beta)
  Q <- rbeta(N, betapars[1], betapars[2])
# 3.  K ~ NB(Q, 1)
  K <- (1+rnbinom(N, 1, Q))
# 4.  Time[i] <- sum(rlambda2(K[i]))
  Time <- numeric(N)
  for(i in 1:N){
    Time[i] <- sum(rlambda2(K[i], 
          sumTimes=sumTimes))
  }
  attr(Time, 'Qbar') <- mean(Q)
  attr(Time, 'quantileQ') <- quantile(Q)
  attr(Time, 'Kbar') <- mean(K)
  attr(Time, 'quantileK') <- quantile(K)
# 5.  quantiles
  cat('meanTime = ', mean(Time), '\n')
  print(quantile(Time))
# 6.  elapsed.time 
  et <- (proc.time()-start)
# 7.  Return et as an attribute
  attr(Time, 'elapsed.time') <- et
  cat('et = ', et, '\n')
  Time
}  

set.seed(1)
(mcArm1 <- mcArmageddon(10))
## meanTime =  476.8543 
##         0%        25%        50%        75%       100% 
##   18.64681  216.15279  248.36926  612.35102 1915.65221 
## et =  0.002 0 0.002 0 0
##  [1]  249.32887  207.48353  247.40965  242.16057 1915.65221  367.73845
##  [7]  802.11009   24.12387   18.64681  693.88855
## attr(,"Qbar")
## [1] 0.4652404
## attr(,"quantileQ")
##        0%       25%       50%       75%      100% 
## 0.3381862 0.3763977 0.4832317 0.5115655 0.6668375 
## attr(,"Kbar")
## [1] 2.7
## attr(,"quantileK")
##   0%  25%  50%  75% 100% 
##  1.0  1.0  2.5  3.0  9.0 
## attr(,"elapsed.time")
##    user  system elapsed 
##   0.002   0.000   0.002

This all looks sensible. Let’s try larger sample sizes:

set.seed(2)
mcArm2 <- mcArmageddon(100)
## meanTime =  1311.944 
##          0%         25%         50%         75%        100% 
##    20.92154   125.42700   259.32291   723.06359 47207.13799 
## et =  0.002 0 0.001 0 0
attributes(mcArm2)
## $Qbar
## [1] 0.4447869
## 
## $quantileQ
##        0%       25%       50%       75%      100% 
## 0.1327921 0.3396490 0.4205039 0.5277240 0.7451057 
## 
## $Kbar
## [1] 2.5
## 
## $quantileK
##   0%  25%  50%  75% 100% 
##    1    1    2    3   18 
## 
## $elapsed.time
##    user  system elapsed 
##   0.002   0.000   0.001
set.seed(3)
mcArm3 <- mcArmageddon(1000)
## meanTime =  874.5373 
##           0%          25%          50%          75%         100% 
## 9.354686e+00 8.827015e+01 2.502618e+02 7.086187e+02 1.152045e+05 
## et =  0.007 0.001 0.007 0 0
attributes(mcArm3)
## $Qbar
## [1] 0.4454203
## 
## $quantileQ
##        0%       25%       50%       75%      100% 
## 0.1536354 0.3622609 0.4477545 0.5233040 0.7677098 
## 
## $Kbar
## [1] 2.441
## 
## $quantileK
##   0%  25%  50%  75% 100% 
##    1    1    2    3   16 
## 
## $elapsed.time
##    user  system elapsed 
##   0.007   0.001   0.007

N = 1000 still takes only 0.009 seconds.

set.seed(4)
mcArm4 <- mcArmageddon(1e4)
## meanTime =  1502.584 
##           0%          25%          50%          75%         100% 
## 7.041540e+00 8.256632e+01 2.368599e+02 6.662689e+02 1.870828e+06 
## et =  0.066 0.007 0.073 0 0
attributes(mcArm4)
## $Qbar
## [1] 0.4486821
## 
## $quantileQ
##        0%       25%       50%       75%      100% 
## 0.1097186 0.3672180 0.4469580 0.5286158 0.8215966 
## 
## $Kbar
## [1] 2.4073
## 
## $quantileK
##   0%  25%  50%  75% 100% 
##    1    1    2    3   34 
## 
## $elapsed.time
##    user  system elapsed 
##   0.066   0.007   0.073

The time was still only 0.074 seconds, so let’s try N=1e5:

set.seed(5)
mcArm5 <- mcArmageddon(1e5)
## meanTime =  2518.567 
##           0%          25%          50%          75%         100% 
## 5.168789e+00 8.572897e+01 2.331675e+02 6.704225e+02 7.995641e+07 
## et =  0.608 0.018 0.629 0 0
attributes(mcArm5)
## $Qbar
## [1] 0.447376
## 
## $quantileQ
##         0%        25%        50%        75%       100% 
## 0.06245706 0.36597625 0.44482916 0.52712949 0.90567388 
## 
## $Kbar
## [1] 2.41038
## 
## $quantileK
##   0%  25%  50%  75% 100% 
##    1    1    2    3   38 
## 
## $elapsed.time
##    user  system elapsed 
##   0.608   0.018   0.629

The time was still only 0.535 – well under 10 times N= 1e4.

What about a million?

set.seed(6)
mcArm6 <- mcArmageddon(1e6)
## meanTime =  3701.578 
##           0%          25%          50%          75%         100% 
## 4.763493e+00 8.585189e+01 2.336927e+02 6.662740e+02 1.399609e+09 
## et =  5.824 0.113 5.953 0 0
attributes(mcArm6)
## $Qbar
## [1] 0.44865
## 
## $quantileQ
##         0%        25%        50%        75%       100% 
## 0.05289224 0.36735541 0.44658543 0.52786836 0.91813004 
## 
## $Kbar
## [1] 2.404252
## 
## $quantileK
##   0%  25%  50%  75% 100% 
##    1    1    2    3   52 
## 
## $elapsed.time
##    user  system elapsed 
##   5.824   0.113   5.953

This too just over 5 seconds.

For the Wikiversity article on “Time to extinction of civilization”, we’d like the percentages of these times that are less than 40 and 60 years, representing roughly the remaining lives of half the people currently alive today and the time remaining in the twenty-first century as of this writing, as well as the quantiles of one in a million and one in a thousand chances:

mean(mcArm6)
## [1] 3701.578
mean(mcArm6<40)
## [1] 0.101695
mean(mcArm6<60)
## [1] 0.173887
quantile(mcArm6, c(1e-6, 1e-3))
##  0.0001%     0.1% 
## 4.870240 9.784849

Let’s see if we can generate 1e7 random times in, hopefully, just over 50 seconds:

if(!fda::CRAN()){
# Don't run this with CRAN tests, 
# because it takes too long   
  set.seed(7)
  mcArm7 <- mcArmageddon(1e7)
  print(attributes(mcArm7))
  print(mean(mcArm7))
  print(quantile(mcArm7, c(1e-6, 1e-3)))
}
## meanTime =  2966.374 
##           0%          25%          50%          75%         100% 
## 3.943978e+00 8.576955e+01 2.334601e+02 6.677721e+02 2.145281e+09 
## et =  58.084 2.098 60.59 0 0 
## $Qbar
## [1] 0.4486429
## 
## $quantileQ
##         0%        25%        50%        75%       100% 
## 0.03291029 0.36729410 0.44656173 0.52794313 0.93453413 
## 
## $Kbar
## [1] 2.406332
## 
## $quantileK
##   0%  25%  50%  75% 100% 
##    1    1    2    3   79 
## 
## $elapsed.time
##    user  system elapsed 
##  58.084   2.098  60.590 
## 
## [1] 2966.374
##  0.0001%     0.1% 
## 4.757518 9.814917

Let’s make a normal probability plot of log(mcArm7). With this many points, the standard qqnorm function can take a long time creating the plot. Let’s start by sorting the points as a separate step:

if(fda::CRAN()){
  mcArm. <- mcArm6
} else mcArm. <- mcArm7

mcArm.s <- sort(mcArm.)
quantile(mcArm.s)
##           0%          25%          50%          75%         100% 
## 3.943978e+00 8.576955e+01 2.334601e+02 6.677721e+02 2.145281e+09

Next, let’s call qqnorm without plotting:

str(qq7 <-as.data.frame(qqnorm(mcArm.s,
                          plot.it=FALSE)))
## 'data.frame':    10000000 obs. of  2 variables:
##  $ x: num  -5.33 -5.12 -5.03 -4.96 -4.91 ...
##  $ y: num  3.94 4.18 4.5 4.51 4.58 ...

Let’s cut the data down to the first and last 10 plus 9 of the next 90 from each end plus 1 percent of the rest:

N. <- length(mcArm.s)
#index.5 <- c(1:10, seq(20, 100, 10), 
#             seq(200, 1000, 100), 
#             seq(3000, (N./2)-2000, 1000))
index.5 <- c(1:1000, 
        seq(2000, (N./2)-2000, 1000))
index <- c(index.5, N.+1-rev(index.5))
tail(index, 30) 
##  [1]  9999971  9999972  9999973  9999974  9999975  9999976  9999977  9999978
##  [9]  9999979  9999980  9999981  9999982  9999983  9999984  9999985  9999986
## [17]  9999987  9999988  9999989  9999990  9999991  9999992  9999993  9999994
## [25]  9999995  9999996  9999997  9999998  9999999 10000000
length(index)
## [1] 11994
# yes:  I think I did this right.  
switch(outType, 
       svg=svg('yrs2ArmageddonQQ.svg'), 
#   need png(..., 960, 960), 
#     because the default 480 is not sufficiently
#     clear to easily read the labels       
       png=png('yrs2ArmageddonQQ.png', 960, 960)
)
op <- par(mar=c(5, 5, 4, 5)+.1)

if(makePlots){
  with(qq7, plot(y[index], x[index], type='l', 
               log='x', las=1, bty='n', lwd=2, 
               xlab='', ylab='', 
               cex.lab=2, axes=FALSE) )
#               xlab='years to Armageddon', 
#               ylab='standard normal scores', 
  axis(1, cex.axis=2)
  axis(2, cex.axis=2, las=1)
  probs <- c(.001, .01, .1, .25, .5, .75, 
           .9, .99, .999)
  z <- qnorm(probs)
  if(outType==''){
    cex.txt <- 1.5 
    cex.ax4 <- 1.3
  } else {
    cex.txt <- 3
    cex.ax4 <- 2
  }

  axis(4, z, probs, cex.axis=cex.ax4, 
     las=1, line=-.5)

  p40 <- mean(mcArm.s<40)
  p60 <- mean(mcArm.s<60)
  z40.60 <- qnorm(c(p40, p60))
  max7 <- tail(mcArm.s, 1)

  lines(c(rep(40, 2), max7), 
      c(-5, rep(z40.60[1], 2)), 
      lty='dotted', lwd=2, col='red')
  lines(c(rep(60, 2), max7), 
      c(-5, rep(z40.60[2], 2)), 
      lty='dashed', lwd=2, col='purple')

  text(15, -5, '40', col='red', cex=cex.txt)
  text(200, -2.5, '60', col='purple', cex=cex.txt)

  text(.2*max7, z40.60[1]-.6, 
     paste0(round(100*p40), "%"), cex=cex.txt, 
     col='red')
  text(.2*max7, z40.60[2]+.6, 
     paste0(round(100*p60), "%"), cex=cex.txt, 
     col='purple')
}
par(op)
if(outType != '')dev.off()