This vignette provides an overview of the functions that can be used to estimate the sample size needed to detect a pathogen variant in a population, given a periodic sampling scheme.

By specifying `sampling_freq = "cont"`

, the
`vartrack_samplesize_detect()`

function can be used to
calculate the sample size needed to detect a particular variant in the
population within a specific number of days since its introduction, OR
by the time a variant reaches a specific frequency. As when specifying
cross-sectional sampling with `sampling_freq = "xsect"`

(see
*Estimating the sample size needed for variant monitoring:* *cross-sectional
sampling* for details), applying the
`vartrack_samplesize_detect()`

function to calculate a sample
size assuming periodic sampling (**Figure 1**) requires
knowledge of the coefficient of detection ratio between two pathogen
variants (or, more commonly, one variant and the rest of the pathogen
population). The coefficient of detection ratio for two variants can be
calculated using the `vartrack_cod_ratio()`

function (see *Estimating bias in observed
variant prevalence*). Since we are only interested in the ratio
of the coefficients of detection, applying this function only requires
providing parameters which are expected to differ between variants. The
ratio between any variants not provided is assumed to be equal to
one.

Once we have an estimate of the coefficient of detection ratio, we can calculate the sample size needed for detection from the following parameters:

Param | Variable Name | Description |
---|---|---|

\(p\) |
prob | the desired probability of detection |

\(t\) |
t | the number of days after introduction a variant should be detected by |

\(P_{V_1}\) |
p_v1 | the desired prevalence to detect a variant by |

\(P_{0_{V_1}}\) |
p0_v1 | the initial variant prevalence (# introductions / population size) |

\(r_{V_1}\) |
r_v1 | the estimated logistic growth rate of the variant (per day) |

\(\omega\) |
omega | the sequencing success rate |

\(\frac{C_{V_1}}{C_{V_2}}\) |
c_ratio | the coefficient of detection ratio, calculated as the ratio of the
coefficients of variant 1 to variant 2 (can be calculated using
`calc_cod_ratio()` ) |

To calculate the sample size needed for detection assuming periodic
sampling, we must provide *either* the number of days after
introduction a variant should be detected by (\(t\)) OR the desired prevalence to detect a
variant by (\(P_{V_1}\)), but not both.
All other parameters listed above are required.

Therefore, if we would like to know the sample size needed per day to
ensure detection of a variant by the time it reaches 1% prevalence in
the population, we can apply the
`vartrack_samplesize_detect()`

function as follows:

`library(phylosamp)`

```
<- vartrack_cod_ratio(phi_v1=0.975, phi_v2=0.95, gamma_v1=0.8, gamma_v2=0.6)
c1_c2 vartrack_samplesize_detect(prob=0.95, p_v1=0.01, p0_v1=3/10000, r_v1=0.1,
omega=0.8, c_ratio=c1_c2, sampling_freq="cont")
```

`## Calculating sample size for variant detection assuming periodic sampling`

`## [1] 25.81341`

In other words, 26 samples *per day* are needed to detect a
variant at 1% (or higher) in a population with 95% probability of
detection, given a coefficient of detection ratio (\(\frac{C_{V_1}}{C_{V_2}}\)) of 1.368 (as
calculated from parameters listed above). This assumes that all samples
sequenced (or otherwise characterized) will be successful. We assume an
80% success rate (\(\omega = 0.8\)),
which ensures the 21 high quality data points that can be used to detect
the presence of a pathogen variant from a selection of 27 samples. If
sequencing will occur on a weekly basis, this means we need to process
\(26*7=182\) samples per week (ideally
from infections spread evenly over the 7 days) to ensure we detect a
variant by the time it reaches 1% frequency in the population.

If instead we are interested in detecting a variant within the first
month of its introduction into the population (assuming all other
parameters are the same as above), we can use the
`vartrack_samplesize_detect()`

function as follows:

```
<- vartrack_cod_ratio(phi_v1=0.975, phi_v2=0.95, gamma_v1=0.8, gamma_v2=0.6)
c1_c2 vartrack_samplesize_detect(prob=0.95, t=30, p0_v1=3/10000, r_v1=0.1,
omega=0.8, c_ratio=c1_c2, sampling_freq="cont")
```

`## Calculating sample size for variant detection assuming periodic sampling`

`## [1] 47.98564`

In this case, we can see that we will need to process 48 samples per day (which, assuming an 80% sequencing success rate, means generating 39 high quality sequences per day) to ensure detection within the first month of variant introduction into the population.

For information on functions that can be used to estimate the sample
size given a cross-sectional sampling approach, see *Estimating the
sample size needed for variant monitoring:* *cross-sectional
sampling*. These functions can also be used in “reverse”, to
calculate the probability of detection given some sampling scheme, as in
the *Estimating the probability of detecting a variant* *cross-sectional* and *periodic* vignettes.