Background

nhanesA was developed to enable fully customizable retrieval of data from the National Health and Nutrition Examination Survey (NHANES). The survey is conducted by the National Center for Health Statistics (NCHS), and data are publicly available at: http://www.cdc.gov/nchs/nhanes.htm. NHANES data are reported in well over one thousand peer-reviewed journal publications every year.

NHANES Data

Since 1999, the NHANES survey has been conducted continuously, and the surveys during that period are referred to as “continous NHANES” to distinguish from several prior surveys. Continuous NHANES surveys are grouped in two-year intervals, with the first interval being 1999-2000.

Most NHANES data are in the form of tables in SAS ‘XPT’ format. The survey is grouped into five data categories that are publicly available, as well as an additional category (Limited access data) that requires written justification and prior approval before access. Package nhanesA is intended mostly for use with the publicly available data, but some information pertaining to the limited access data can also be retrieved.

The five publicly available data categories are: - Demographics (DEMO) - Dietary (DIET) - Examination (EXAM) - Laboratory (LAB) - Questionnaire (Q) The abbreviated forms in parentheses may be substituted for the long form in nhanesA commands.

List NHANES Tables

To quickly get familiar with NHANES data, it is helpful to display a listing of tables. Use nhanesTables to get information on tables that are available for a given category for a given year.

suppressWarnings(library(nhanesA))
nhanesTables('EXAM', 2005)
##    FileName                                       Description
## 1     AUX_D                                        Audiometry
## 2   AUXAR_D                      Audiometry - Acoustic Reflex
## 3  AUXTYM_D                         Audiometry - Tympanometry
## 4     BPX_D                                    Blood Pressure
## 5     BMX_D                                     Body Measures
## 6   DXXAG_D Dual Energy X-ray Absorptiometry - Android/Gynoid
## 7  DXXFEM_D          Dual Energy X-ray Absorptiometry - Femur
## 8  DXXSPN_D          Dual Energy X-ray Absorptiometry - Spine
## 9  OPXFDT_D     Ophthalmology - Frequency Doubling Technology
## 10 OPXRET_D                   Ophthalmology - Retinal Imaging
## 11    OHX_D                                       Oral Health
## 12 PAXRAW_D                         Physical Activity Monitor
## 13    VIX_D                                            Vision

Note that the two-year survey intervals begin with the odd year. For convenience, only a single 4-digit year is entered such that nhanesTables('EXAM', 2005) and nhanesTables('EXAM', 2006) yield identical output.

List Variables in an NHANES Table

After viewing the output, we decide we are interested in table ‘BMX_D’ that contains body measures data. To better determine if that table is of interest, we can display detailed information on the table contents using nhanesTableVars.

nhanesTableVars('EXAM', 'BMX_D')
##    Variable.Name                Variable.Description
## 1           SEQN         Respondent sequence number.
## 2       BMDSTATS Body Measures Component status Code
## 3          BMXWT                         Weight (kg)
## 4          BMIWT                      Weight Comment
## 5       BMXRECUM               Recumbent Length (cm)
## 6       BMIRECUM            Recumbent Length Comment
## 7        BMXHEAD             Head Circumference (cm)
## 8        BMIHEAD          Head Circumference Comment
## 9          BMXHT                Standing Height (cm)
## 10         BMIHT             Standing Height Comment
## 11        BMXBMI           Body Mass Index (kg/m**2)
## 12        BMXLEG               Upper Leg Length (cm)
## 13        BMILEG            Upper Leg Length Comment
## 14       BMXCALF     Maximal Calf Circumference (cm)
## 15       BMICALF                Maximal Calf Comment
## 16       BMXARML               Upper Arm Length (cm)
## 17       BMIARML            Upper Arm Length Comment
## 18       BMXARMC              Arm Circumference (cm)
## 19       BMIARMC           Arm Circumference Comment
## 20      BMXWAIST            Waist Circumference (cm)
## 21      BMIWAIST         Waist Circumference Comment
## 22      BMXTHICR            Thigh Circumference (cm)
## 23      BMITHICR         Thigh Circumference Comment
## 24        BMXTRI               Triceps Skinfold (mm)
## 25        BMITRI            Triceps Skinfold Comment
## 26        BMXSUB           Subscapular Skinfold (mm)
## 27        BMISUB        Subscapular Skinfold Comment

We see that there are 27 columns in table BMX_D. The first column (SEQN) is the respondent sequence number and is included in every NHANES table. Effectively, SEQN is a subject identifier that is used to join information across tables. We now import BMX_D along with the demographics table DEMO_D.

bmx_d  <- nhanes('BMX_D')
## Processing SAS dataset BMX_D      ..
demo_d <- nhanes('DEMO_D')
## Processing SAS dataset DEMO_D     ..

We then merge the tables and compute average values by gender for several variables:

bmx_demo <- merge(demo_d, bmx_d)
options(digits=4)
aggregate(cbind(BMXHT, BMXWT, BMXLEG, BMXCALF, BMXTHICR)~RIAGENDR, bmx_demo, mean)
##   RIAGENDR BMXHT BMXWT BMXLEG BMXCALF BMXTHICR
## 1        1 170.0 76.91  40.50   37.48    51.46
## 2        2 158.9 68.18  37.19   36.89    51.09

Translation of Coded Values

NHANES uses coded values for many fields. In the preceding example, gender is coded as 1 or 2. To determine what the values mean, we can list the code translations for the gender field RIAGENDR in table DEMO_D

nhanesTranslate('DEMO_D', 'RIAGENDR')
## $RIAGENDR
##   Code.or.Value Value.Description
## 1             1              Male
## 2             2            Female
## 3             .           Missing

If desired, we can use nhanesTranslate to apply the code translation to demo_d directly by assigning data=demo_d.

levels(as.factor(demo_d$RIAGENDR))
## [1] "1" "2"
demo_d <- nhanesTranslate('DEMO_D', 'RIAGENDR', data=demo_d)
## Translated columns: RIAGENDR
levels(demo_d$RIAGENDR)
## [1] "Male"   "Female"
bmx_demo <- merge(demo_d, bmx_d)
aggregate(cbind(BMXHT, BMXWT, BMXLEG, BMXCALF, BMXTHICR)~RIAGENDR, bmx_demo, mean)
##   RIAGENDR BMXHT BMXWT BMXLEG BMXCALF BMXTHICR
## 1     Male 170.0 76.91  40.50   37.48    51.46
## 2   Female 158.9 68.18  37.19   36.89    51.09

Downloading a Complete Survey

The primary goal of nhanesA is to enable fully customizable processing of select NHANES tables. However, it is quite easy to download entire surveys using nhanesA functions. Say we want to download every questionnaire in the 2007-2008 survey. We first get a list of the table names by using nhanesTables with namesonly = TRUE. The tables can then be downloaded using nhanes with lapply.

q2007names  <- nhanesTables('Q', 2007, namesonly=TRUE)
q2007tables <- lapply(q2007names, nhanes)
names(q2007tables) <- q2007names

Apply All Possible Code Translations to a Table

An NHANES table may have dozens of columns with coded values. Translating all possible columns is a three step process. 1: Download the table 2: Save a list of table variables 3: Pass the table and variable list to nhanesTranslate

bpx_d <- nhanes('BPX_D')
## Processing SAS dataset BPX_D      ..
head(bpx_d[,6:11])
##   BPQ150A BPQ150B BPQ150C BPQ150D BPAARM BPACSZ
## 1      NA      NA      NA      NA     NA     NA
## 2       2       2       2       2      1      3
## 3       1       2       2       2      1      4
## 4       2       2       2       2      1      3
## 5       2       2       2       2      1      4
## 6       2       2       2       2      1      4
bpx_d_vars  <- nhanesTableVars('EXAM', 'BPX_D', namesonly=TRUE)
#Alternatively may use bpx_d_vars = names(bpx_d)
bpx_d <- suppressWarnings(nhanesTranslate('BPX_D', bpx_d_vars, data=bpx_d))
## Translated columns: PEASCST1 PEASCCT1 BPQ150A BPQ150B BPQ150C BPQ150D BPAARM BPACSZ BPXPULS BPXPTY BPAEN2 BPAEN3 BPAEN4
head(bpx_d[,6:11])
##   BPQ150A BPQ150B BPQ150C BPQ150D BPAARM        BPACSZ
## 1    <NA>    <NA>    <NA>    <NA>   <NA>          <NA>
## 2      No      No      No      No  Right Adult (12X22)
## 3     Yes      No      No      No  Right Large (15X32)
## 4      No      No      No      No  Right Adult (12X22)
## 5      No      No      No      No  Right Large (15X32)
## 6      No      No      No      No  Right Large (15X32)

Some discretion is applied by nhanesTranslate such that not all of the coded columns will be translated. In general, columns that have at least two categories (e.g. Male, Female) will be translated. In some cases the code translations are quite long, thus to improve readability the maximum translation string should be limited. The default translation string length is 32 but can be set as high as 128.

Import Dual X-Ray Absorptiometry Data

Dual X-Ray Absorptiometry (DXA) Data were acquired from 1999-2006. The tables are considerably larger than most NHANES data tables and are available via ftp server only. More information may be found at http://www.cdc.gov/nchs/nhanes/dxx/dxa.htm. By default the DXA data are imported into the R environment, however, because the tables are quite large it may be desirable to save the data to a local file then import to R as needed. Note that nhanesTranslate can be applied to DXA data but that only the 2005-2006 translation tables are used as those are the only DXA codes that are currently available in html format.

#Import into R
dxx_b <- nhanesDXA(2001)
#Save to file
nhanesDXA(2001, destfile="dxx_b.xpt")
#Import supplemental data
dxx_c_s <- nhanesDXA(2003, suppl=TRUE)
#Apply code translations
dxalist <- c('DXAEXSTS', 'DXITOT', 'DXIHE')
dxx_b <- nhanesTranslate(colnames=dxalist, data=dxx_b, dxa=T)

If you are interested in working with accelerometer data from 2003-2006 then please see packages nhanesaccel http://r-forge.r-project.org/R/?group_id=1733 and accelerometry http://cran.r-project.org/package=accelerometry.

Searching across the comprehensive list of NHANES variables

The NHANES repository is extensive, thus it is helpful to be able to perform a targeted search to identify relevant tables and variables easily. A comprehensive list of NHANES variables is maintained at http://wwwn.cdc.gov/nchs/nhanes/search/variablelist.aspx The nhanesSearch function allows the investigator to input search terms, match against the comprehensive variable descriptions, and retrieve the list of matching variables.

# nhanesSearch use examples
#
# Search on the word bladder, restrict to the 2001-2008 surveys, 
# print out 50 characters of the variable description
nhanesSearch("bladder", ystart=2001, ystop=2008, nchar=50)
#
# Search on "urin" (will match urine, urinary, etc), from 1999-2010, return table names only
nhanesSearch("urin", ignore.case=TRUE, ystop=2010, namesonly=TRUE)
#
# Search on "tooth" or "teeth", all years
nhanesSearch(c("tooth", "teeth"), ignore.case=TRUE)
#
# Search for variables where the variable description begins with "Tooth"
nhanesSearch("^Tooth")

Please send any feedback or requests to cjendres1@gmail.com. Hope you enjoy your experience with nhanesA!

Sincerely,
Christopher Endres