Introduction to the halk

The halk package is built to implement the hierarchical age-length key (HALK) method. This method creates a series of aggregate age-length keys using data from across levels, and then assigns age from length using the aggregate age-length key from the most specific level.

What the heck is a HALK

The HALK is a data-borrowing age assignment method primarily used in fisheries ecology. It extends the traditional method of an age-length key (ALK) by borrowing data across time, space, or any other nested level to create aggregate, nested ALKs used to assign age to fish based on empirically measured length. For example, a HALK can be created by borrowing age-length data from a single lake across time, from other waterbodies within a certain area, or any other nested categorical level. Subsequent surveys that are then sampled for length, but not age, can be assigned ages using the HALK.

Implementing a HALK

A HALK is created by passing paired age-length data to the make_halk function. There are two main arguments to this function: data, which represents the paired age-length data, and levels, which is a character vector of the column names that represent the different nested levels in the HALK. For example, if the following data were used in HALK creation, you could pass any combination of spp, county, and waterbody as levels:

#> # A tibble: 6 × 5
#>   spp      county   waterbody   age length
#>   <chr>    <chr>    <chr>     <int>  <dbl>
#> 1 bluegill county_A lake_a        0    1  
#> 2 bluegill county_A lake_a        0    1  
#> 3 bluegill county_A lake_a        0    0.9
#> 4 bluegill county_A lake_a        0    1  
#> 5 bluegill county_A lake_a        0    1.1
#> 6 bluegill county_A lake_a        0    1.1

This will fit a HALK based on the user specified levels. Say that we include spp, county and waterbody as levels to the function make_halk. This will create an ALK for each waterbody, each county, and then a species-wide global ALK.

spp_county_wb_alk <- make_halk(
  wb_spp_laa_data, 
  levels = c("spp", "county", "waterbody")
)
head(spp_county_wb_alk)
#> # A tibble: 6 × 4
#>   spp      county   waterbody alk            
#>   <chr>    <chr>    <chr>     <list>         
#> 1 bluegill county_A lake_a    <alk [10 × 10]>
#> 2 bluegill county_A lake_b    <alk [12 × 10]>
#> 3 bluegill county_A lake_c    <alk [11 × 10]>
#> 4 bluegill county_A <NA>      <alk [13 × 10]>
#> 5 bluegill county_B lake_a    <alk [11 × 10]>
#> 6 bluegill county_B lake_b    <alk [12 × 10]>

The returned tibble contains a list-column named alk that stores an ALK for each level provided to the levels argument (note that the ALK for county_A has an NA in the waterbody column indicating that this is a county-wide ALK). Each object in this list-column is simply an ALK that is created using all data from the level indicated by the respective non-NA columns in that row.

# Bluegill ALK for lake_a in county_A, from row #1 above
head(spp_county_wb_alk$alk[[1]])
#> # A tibble: 6 × 10
#>   length  age0   age1   age2   age3  age4  age5  age6  age7  age8
#>    <dbl> <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1      0     1 0      0      0          0     0     0     0     0
#> 2      1     1 0      0      0          0     0     0     0     0
#> 3      3     0 1      0      0          0     0     0     0     0
#> 4      4     0 0.918  0.0816 0          0     0     0     0     0
#> 5      5     0 0.0870 0.870  0.0435     0     0     0     0     0
#> 6      6     0 0      0.756  0.244      0     0     0     0     0

Assigning ages from a HALK

The halk package makes it easy to assign ages to length data from a HALK using the assign_ages function. Once you have created a HALK, simply pass it to assign_ages along with the length data you wish to have ages estimated on—make sure that your length data has all columns used in the levels argument used in make_halk.

est_ages <- assign_ages(wb_spp_length_data, spp_county_wb_alk)
head(est_ages)
#> # A tibble: 6 × 7
#>   spp      county   waterbody length est.age alk       alk.n
#>   <chr>    <chr>    <chr>      <dbl>   <dbl> <chr>     <int>
#> 1 bluegill county_A lake_a       1         0 waterbody   371
#> 2 bluegill county_A lake_a       1         0 waterbody   371
#> 3 bluegill county_A lake_a       1.1       0 waterbody   371
#> 4 bluegill county_A lake_a       1.1       0 waterbody   371
#> 5 bluegill county_A lake_a       1         0 waterbody   371
#> 6 bluegill county_A lake_a       1.2       0 waterbody   371

Notice below that there are lakes in the est_ages object that were not present in the original length-at-age data used to create the spp_county_wb_alk object (lake_x in county_A). Ages for the lengths in lake_x in this example were assigned using the ALK from the county level (specifically, the ALK from county_A) as noted in the alk column.

head(est_ages[est_ages$waterbody == "lake_x", ])
#> # A tibble: 6 × 7
#>   spp      county   waterbody length est.age alk    alk.n
#>   <chr>    <chr>    <chr>      <dbl>   <dbl> <chr>  <int>
#> 1 bluegill county_A lake_x       1         0 county  1088
#> 2 bluegill county_A lake_x       1.3       0 county  1088
#> 3 bluegill county_A lake_x       1         0 county  1088
#> 4 bluegill county_A lake_x       1.1       0 county  1088
#> 5 bluegill county_A lake_x       1.2       0 county  1088
#> 6 bluegill county_A lake_x       1.2       0 county  1088