Charting Data Frames

Joshua Kunst

Introduction

We chart data. Data can come in different ways: numeric or character vectors, as time series objects, etc. but the most common object with data is a data frame. So, why can chart this type of object in highcharter?

Highcharter have two main functions to create a chart from data and another to add data to an existing highchart object.

  1. hchart: A generic function which take an object (like vector, time series, data frames, likert object, etc) and return a highchart object (chart)
  2. hc_add_series: A generic function which add data to a existing highchart object depending the type (class) of the data.

There are a last function will be useful to chart data from data frame. The functions is hcaes which will define the aesthetic mappings. This 3 functions are inspired in ggplot2 package. So:

The main differences with ggplot2 are here we need the data and the aesthetics explicit in every highchart functions.

Examples

Lets see examples to be more clear.

data("mpg", package = "ggplot2")
head(mpg)
## # A tibble: 6 x 11
##   manufacturer model displ  year   cyl trans  drv     cty   hwy fl    class
##   <chr>        <chr> <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr>
## 1 audi         a4      1.8  1999     4 auto(~ f        18    29 p     comp~
## 2 audi         a4      1.8  1999     4 manua~ f        21    29 p     comp~
## 3 audi         a4      2    2008     4 manua~ f        20    31 p     comp~
## 4 audi         a4      2    2008     4 auto(~ f        21    30 p     comp~
## 5 audi         a4      2.8  1999     6 auto(~ f        16    26 p     comp~
## 6 audi         a4      2.8  1999     6 manua~ f        18    26 p     comp~
hchart(mpg, "point", hcaes(x = displ, y = cty))

The previous code is same as:

highchart() %>% 
  hc_add_series(mpg, "point", hcaes(x = displ, y = cty))

With highcharter you can have other type of charts.

data("diamonds", package = "ggplot2")
dfdiam <- diamonds %>% 
  group_by(cut, clarity) %>%
  summarize(price = median(price))

head(dfdiam)
## # A tibble: 6 x 3
## # Groups:   cut [1]
##   cut   clarity price
##   <ord> <ord>   <dbl>
## 1 Fair  I1      2397 
## 2 Fair  SI2     3681 
## 3 Fair  SI1     3528.
## 4 Fair  VS2     3190 
## 5 Fair  VS1     2830.
## 6 Fair  VVS2    2484
hchart(dfdiam, "heatmap", hcaes(x = cut, y = clarity, value = price)) 
data(economics_long, package = "ggplot2")

economics_long2 <- filter(economics_long,
                          variable %in% c("pop", "uempmed", "unemploy"))

head(economics_long2)
## # A tibble: 6 x 4
## # Groups:   variable [1]
##   date       variable  value value01
##   <date>     <fct>     <dbl>   <dbl>
## 1 1967-07-01 pop      198712 0      
## 2 1967-08-01 pop      198911 0.00163
## 3 1967-09-01 pop      199113 0.00328
## 4 1967-10-01 pop      199311 0.00490
## 5 1967-11-01 pop      199498 0.00643
## 6 1967-12-01 pop      199657 0.00773
hchart(economics_long2, "line", hcaes(x = date, y = value01, group = variable))

You can even chart a treemaps:

data(mpg, package = "ggplot2")

mpgman <- mpg %>% 
  group_by(manufacturer) %>% 
  summarise(n = n(),
            unique = length(unique(model))) %>% 
  arrange(-n, -unique)

head(mpgman)
## # A tibble: 6 x 3
##   manufacturer     n unique
##   <chr>        <int>  <int>
## 1 dodge           37      4
## 2 toyota          34      6
## 3 volkswagen      27      4
## 4 ford            25      4
## 5 chevrolet       19      4
## 6 audi            18      3
hchart(mpgman, "treemap", hcaes(x = manufacturer, value = n, color = unique))

Extra parameters

You can add other parameters to add options to the data series:

mpgman2 <- count(mpg, manufacturer, year)

head(mpgman2)
## # A tibble: 6 x 3
##   manufacturer  year     n
##   <chr>        <int> <int>
## 1 audi          1999     9
## 2 audi          2008     9
## 3 chevrolet     1999     7
## 4 chevrolet     2008    12
## 5 dodge         1999    16
## 6 dodge         2008    21
hchart(mpgman2, "bar", hcaes(x = manufacturer, y = n, group = year),
       color = c("#FCA50A", "#FCFFA4"),
       name = c("Year 1999", "Year 2008"))

A more advanced examples

Using the broom package is really great due the you can work with tidy data:

library(dplyr)
library(broom)
data(diamonds, package = "ggplot2")

set.seed(123)
data <- diamonds %>% 
  filter(carat > 0.75, carat < 3) %>% 
  sample_n(500)

modlss <- loess(price ~ carat, data = data)
fit <- arrange(augment(modlss), carat)

head(fit)
## # A tibble: 6 x 5
##   price carat .fitted .se.fit   .resid
##   <int> <dbl>   <dbl>   <dbl>    <dbl>
## 1  2347  0.76   2348.    439.    -1.41
## 2  3311  0.76   2348.    439.   963.  
## 3  2148  0.76   2348.    439.  -200.  
## 4  2518  0.76   2348.    439.   170.  
## 5  2763  0.77   2475.    410.   288.  
## 6  3697  0.77   2475.    410.  1222.

Now we try to be specific in what parameter we use.

highchart() %>% 
  hc_add_series(
    data,
    type = "scatter",
    hcaes(x = carat, y = price, size = depth, group = cut),
    maxSize = 5 # max size for bubbles
    ) %>%
  hc_add_series(
    fit,
    type = "spline",
    hcaes(x = carat, y = .fitted),
    name = "Fit",
    id = "fit", # this is for link the arearange series to this one and have one legend
    lineWidth = 1 
    ) %>% 
  hc_add_series(
    fit,
    type = "arearange",
    hcaes(x = carat, low = .fitted - 3*.se.fit, high = .fitted + 3*.se.fit),
    linkedTo = "fit", # here we link the legends in one.
    color = hex_to_rgba("gray", 0.2),  # put a semi transparent color
    zIndex = -3 # this is for put the series in a back so the points are showed first
    )