Network tools for finding extended pedigrees and path tracing

Introduction

This vignette showcases two key features that capitalize on the network structure inherent in pedigrees: 1. Finding extended families with any connecting relationships between members. This feature strictly uses a person’s ID, mother’s ID, and father’s ID to find out which people in a dataset are remotely related by any path, effectively finding all separable extended families in a dataset. 2. Using path tracing rules to quantify the amount of relatedness between all pairs of individuals in a dataset. The amount of relatedness can be characterized by additive nuclear DNA, shared mitochondrial DNA, sharing both parents, or being part of the same extended pedigree.

Loading Required Libraries and Data

library(BGmisc)
data(hazard)

Finding Extended Families

Many pedigree datasets only contain information on the person, their mother, and their father, often without nuclear or extended family IDs. Recognizing which sets of people are unrelated simplifies many pedigree-related tasks. This function facilitates those tasks by finding all the extended families. People within the same extended family have at least some form of relation, however distant, while those in different extended families have no relations.

Hazard Pedigree

Hazard Pedigree

We will use the hazard pedigree data as an example.


ds <- ped2fam(hazard, famID = "newFamID")
table(ds$FamID, ds$newFamID)
#>    
#>      1  2
#>   1 18  0
#>   2  0 25

Because the hazard data already had a family ID variable we compare our newly created variable to the pre-existing one. They match!

Computing Relatedness

Once you know which sets of people are related at all to one another, you’ll likely want to know how much. For additive genetic relatedness, you can use the ped2add() function.

add <- ped2add(hazard)

This computes the additive genetic relatedness for everyone in the data. It returns a square, symmetric matrix that has as many rows and columns as there are IDs.

add[1:7, 1:7]
#>     1   2   3   4 7   5   6
#> 1 1.0 0.0 0.5 0.5 0 0.5 0.5
#> 2 0.0 1.0 0.5 0.5 0 0.5 0.5
#> 3 0.5 0.5 1.0 0.5 0 0.5 0.5
#> 4 0.5 0.5 0.5 1.0 0 0.5 0.5
#> 7 0.0 0.0 0.0 0.0 1 0.0 0.0
#> 5 0.5 0.5 0.5 0.5 0 1.0 0.5
#> 6 0.5 0.5 0.5 0.5 0 0.5 1.0

The entry in the ith row and the jth column gives the relatedness between person i and person j. For example, person 1 and person 11 share 0.125

table(add)
#> add
#>      0 0.0625  0.125   0.25    0.5      1 
#>   1144    110    182    188    182     43

It’s probably fine to do this on the whole dataset when your data have fewer than 10,000 people. When the data get large, however, it’s much more efficient to compute this relatedness separately for each extended family.

add_list <- lapply(
  unique(hazard$FamID),
  function(d) {
    tmp <- hazard[hazard$FamID %in% d, ]
    ped2add(tmp)
  }
)

Other relatedness measures

The function works similarly for mitochondrial (ped2mit), common nuclear environment through sharing both parents (ped2cn), and common extended family environment (ped2ce).

Computing mitochondrial relatedness

Here we calculate the mitochondrial relatedness between all pairs of individuals in the hazard dataset.

mit <- ped2mit(hazard)
mit[1:7, 1:7]
#>   1 2 3 4 7 5 6
#> 1 1 0 1 1 0 1 1
#> 2 0 1 0 0 0 0 0
#> 3 1 0 1 1 0 1 1
#> 4 1 0 1 1 0 1 1
#> 7 0 0 0 0 1 0 0
#> 5 1 0 1 1 0 1 1
#> 6 1 0 1 1 0 1 1
table(mit)
#> mit
#>    0    1 
#> 1590  259

As you can see, some of the family members share mitochrondial DNA, such as person 1 and person 3 1, whereas person 1 and person 2 do not.

Computing relatedness through common nuclear environment

Here we calculate the relatedness between all pairs of individuals in the hazard dataset through sharing both parents.

commonNuclear <- ped2cn(hazard)
commonNuclear [1:7, 1:7]
#>   1 2 3 4 7 5 6
#> 1 1 0 0 0 0 0 0
#> 2 0 1 0 0 0 0 0
#> 3 0 0 1 1 0 1 1
#> 4 0 0 1 1 0 1 1
#> 7 0 0 0 0 1 0 0
#> 5 0 0 1 1 0 1 1
#> 6 0 0 1 1 0 1 1

table(commonNuclear)
#> commonNuclear
#>    0    1 
#> 1744  105

Computing relatedness through common extended family environment

Here we calculate the relatedness between all pairs of individuals in the hazard dataset through sharing an extended family.

extendedFamilyEnvironment <- ped2ce(hazard)
extendedFamilyEnvironment[1:7, 1:7]
#>   1 2 3 4 7 5 6
#> 1 1 1 1 1 1 1 1
#> 2 1 1 1 1 1 1 1
#> 3 1 1 1 1 1 1 1
#> 4 1 1 1 1 1 1 1
#> 7 1 1 1 1 1 1 1
#> 5 1 1 1 1 1 1 1
#> 6 1 1 1 1 1 1 1
table(extendedFamilyEnvironment)
#> extendedFamilyEnvironment
#>    1 
#> 1849