---
title: "Documentation and Dictionaries"
output: 
  rmarkdown::html_vignette:
    df_print: tibble
vignette: >
  %\VignetteIndexEntry{Documentation and Dictionaries}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Naming and structure

#### Understanding Demographic Datasets

::: {style="text-align: justify;"}
Demographic data is provided from multiple datasets of the same source. To download the data, a naming framework has been implemented, which includes the source, group, year and final details for individual identification. Details are different for every dataset and are related to the internal information they contain. The general frame can be used as follows:
:::

::: {.alert .alert-info}
SOURCE_GROUP_YEARS_DETAILS
:::

::: {style="text-align: justify;"}
Demographic datasets are available for municipalities and departments, and contain data for Dwellings, Households, Population and Population Projections in five categories.

-   Viviendas (Dwellings).
-   Hogares (Households).
-   Personas Social (Persons Social).
-   Personas Demográfico (Persons Demographic).

All datasets are retrieved from the National Administrative Department of Statistics (DANE). Naming is stated as follows:

-   Source: **DANE**.
-   Group: Names include the categories.
    -   Viviendas: **CNPVV**.
    -   Hogares: **CNPVH**.
    -   Personas Social: **CNPVPS**.
    -   Personas Demográfico: **CNPVPD**.
-   Year:
    -   Census data: **2018**
-   Details: These are related to each individual dataset. For further details please check the function `list_datasets()` below.
:::

::: {style="text-align: justify;"}
For hands on examples please check [A Deep Dive into Colombian Demographics Using ColOpenData](https://epiverse-trace.github.io/ColOpenData/articles/demographic_data.html).
:::

#### Understanding Geospatial Datasets

::: {style="text-align: justify;"}
Geospatial datasets naming is related to the level of aggregation, since they are available from Blocks to Departments. All these datasets come from DANE, and are part of the National Geostatistical Framework (MGN), which for 2018 included a summarized version of the National Population and Dwelling Census (CNPV). Available spatial levels include: department, municipality, urban and rural sector, urban and rural section, urban zone and blocks. Please check [Maps and plots with ColOpenData](https://epiverse-trace.github.io/ColOpenData/articles/geospatial_data.html) for further details.
:::

## Understanding Climate Dataset

::: {style="text-align: justify;"}
This module's data is stored in an unique dataset, and the information required to use the related functions is the area of interest, dates, and tags be consulted. Individual tags are required to download data and include:
:::

```{r IDEAM table, echo = FALSE}
tags <- c(
  "TSSM_CON", "THSM_CON", "TMN_CON", "TMX_CON", "TSTG_CON", "HR_CAL",
  "HRHG_CON", "TV_CAL", "TPR_CAL", "PTPM_CON", "PTPG_CON", "EVTE_CON",
  "FA_CON", "NB_CON", "RCAM_CON", "BSHG_CON", "VVAG_CON", "DVAG_CON",
  "VVMXAG_CON", "DVMXAG_CON"
)
variable <- c(
  "Dry-bulb Temperature", "Wet-bulb Temperature",
  "Minimum Temperature", "Maximum Temperature",
  "Dry-bulb Temperature (Termograph)", "Relative Humidity",
  "Relative Humidity (Hydrograph)", "Vapour Pressure", "Dew Point",
  "Precipitation (Daily)", "Precipitation (Hourly)", "Evaporation",
  "Atmospheric Phenomenon", "Cloudiness", "Wind Trajectory",
  "Sunshine Duration", "Wind Speed", "Wind Direction",
  "Maximum Wind Speed", "Maximum Wind Direction"
)

IDEAM_tags <- data.frame(
  Tags = tags, Variable = variable,
  stringsAsFactors = FALSE
)
knitr::kable(IDEAM_tags)
```

::: {style="text-align: justify;"}
These tags are meant to be used for download using `download_climate()`, `download_climate_geom()` and `download_climate_stations()`. See [How to download climate data using ColOpenData](https://epiverse-trace.github.io/ColOpenData/articles/climate_data.html) for further details.
:::

## Understanding Population Projections

::: {style="text-align: justify;"}
Population projections and back-projections are available for national, department and municipality levels, and divided by sex and ethnicity (the latter is only available for municipalities). The names of the datasets relate to the source, years included, sex and ethnicity.

For examples on how to consult the data please refer to [Population Projection with ColOpenData](https://epiverse-trace.github.io/ColOpenData/articles/population_projections.html)
:::

## List Data

::: {style="text-align: justify;"}
To check available datasets you can use the `list_datasets()` function. The associated information can be filtered with the `module` parameter to indicate a specific module. Default is `"all"`, but can be filtered by `"demographic"`, `"geospatial"`, `"climate"` and `"population_projections"`. This function can also be presented both in English (EN) and Spanish (ES) with the `language` parameter. Default is `"ES"`, but can be `"EN"` as well.  
:::

```{r setup}
library(ColOpenData)
```

```{r list datasets}
datasets <- list_datasets(language = "EN")

head(datasets)
```

To list only demographic datasets we can use:

```{r list demographic datasets}
demographic_datasets <- list_datasets(module = "demographic", language = "EN")

head(demographic_datasets)
```

::: {style="text-align: justify;"}
We highly recommend using `View()` instead of `head()` in the local environment for a cleaner and easier visualization of the information.

Using this function, we can retrieve all names, source, aggregation level and information for individual datasets.
:::

## List Data Using Keywords

::: {style="text-align: justify;"}
Sometimes, going through each dataset to find specific information can be tiring. If you want to look for an specific word or set of words within datasets quickly, you can use the `look_up()` function, which takes by parameter:

1.  The word (or words) you are interested in (input as a character or vector of characters).
2.  The module you wish to search within (default is `"all"`).
3.  The search condition: `"and"` to find datasets containing all specified words, or `"or"` to find datasets containing any of the specified words (default is `"or"`). If you are searching for a single word, you can use either `"and"`or `"or"` for this parameter.
4.  The language the keywords would be, can be `"EN"` or `"ES"` (default is `"EN"`).
:::

```{r list datasets with information by age}
age_datasets <- look_up(keywords = "age")

head(age_datasets)
```

We can specify a module to make a more narrow and precise search.

```{r list datasets with information by area and sex in demographic module}
area_sex_datasets <- look_up(
  keywords = c("area", "sex"),
  module = "demographic",
  logic = "and",
  language = "EN"
)

head(area_sex_datasets)
```

## Geospatial dictionaries

::: {style="text-align: justify;"}
Datasets inside the geospatial module contain a summarized version of the census and a dictionary is needed to understand all aggregated variables. These dictionaries contain the necessary metadata to use the available information. To retrieve them, we can use the function `geospatial_dictionary()`, using the spatial level and language as parameters:
:::

```{r dictionary for MGNCNPV at municipalities}
dict_mpio <- geospatial_dictionary(
  spatial_level = "municipality",
  language = "EN"
)

head(dict_mpio)
```

## Climate tags

::: {style="text-align: justify;"}
Climate data is not stored in multiple datasets but as an unique dataset with numerous tags. These tags can also be consulted through the function `get_climate_tags()`, which takes by parameter the tags language, that can be `"EN"` or `"ES"` (default is `"ES"`).
:::

```{r dicionary for climate data}
dict_climate <- get_climate_tags(language = "EN")

head(dict_climate)
```

## DIVIPOLA

::: {style="text-align: justify;"}
DIVIPOLA codification is a standardized frame for the whole country, and contains departments' and municipalities' codes. Departments have two digits for individual identification, while municipalities have five. The five numbers in municipalities' codes include the department where they are located (first two digits) and the number of the municipality within the department (last three digits). The codes for each municipality and department can be consulted using the `divipola_table()` function.
:::

```{r divipola-table}
divipola <- divipola_table()
head(divipola)
```

::: {style="text-align: justify;"}
To get the DIVIPOLA code of a municipality or department we can use the auxiliary functions `divipola_municipality_code()` and `divipola_department_code()` in **ColOpenData**. To retrieve a department code we only have to include the department's name:
:::

```{r cordoba}
name_to_code_dep(department_name = "Guajira")
```

::: {style="text-align: justify;"}
To retrieve a municipality code we must include the department name and the municipality name. This is to consider repetition among municipalities' names across departments.
:::

```{r divipola tunja}
name_to_code_mun(
  department_name = "Boyacá",
  municipality_name = "Tunja"
)
```

::: {style="text-align: justify;"}
These individual codes can be used to filter information in the datasets.

On the other hand, departments' and municipalities' codes can be translated to retrieve their official names using `divipola_municipality_name()` and `divipola_department_name()`.
:::

```{r tunja name}
code_to_name_mun(municipality_code = "15001")
```