Choosing a Scale Type

After deciding on the geographic level of detail of your choropleth, the next most important decision will be on the type of scale to use. choroplethr supports three type of scales: continuous, discrete and manual. They are controlled with the num_buckets argument.

Continuous Scales

Here is a choropleth map of US state populations using a continuous scale.

library(choroplethr)
data(choroplethr)
choroplethr(df_pop_state, "state", num_buckets=1, title="2012 State Population Estimates, Continuous Scale") 

plot of chunk unnamed-chunk-1

One striking feature of this map is how much brighter California is than all its neighbors. Indeed, in the western half of the map only California and Texas stand out at all. The reason for this can be seen by looking at a boxplot of state populations; California (37M) and Texas (25M) are the two largest outliers:

library(ggplot2)
library(scales)
ggplot(df_pop_state, aes(factor(1), value)) + 
  geom_boxplot() + 
  ggtitle("Boxplot of 2012 State Population Estimates") + 
  scale_y_continuous(label=comma)

plot of chunk unnamed-chunk-2

This is both the blessing and curse of choropleths which use a continuous scale: outliers stand out dramatically, but it is difficult to detect differences among the non-outliers.

Discrete Scales

Setting num_buckets to a value between 2 and 9 will color the regions in equally sized buckets. By default num_buckets = 9.

choroplethr(df_pop_state, "state", num_buckets=9, title="2012 State Population Estimates, 9 Buckets") 

plot of chunk unnamed-chunk-3

Here California and Texas still stand out. But there are clear differences in the remaining states in the western half of the US. In particular, there are a cluster of low population states in northwest.

A common misconception is to think that more buckets is always better. For example, using two buckets highlights regions above and below the median value, which is itself quite informative.

choroplethr(df_pop_state, "state", num_buckets=2, title="2012 State Population Estimates, 2 Buckets") 

plot of chunk unnamed-chunk-4

Manual Scales

Oftentimes when analyzing spatial data you want to manually decide the values of a scale. Here is how you would highlight states with populations above or below 1M residents:

library(Hmisc) # for cut2
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
df_pop_state$value = cut2(df_pop_state$value, cuts=c(0,1000000,Inf))
choroplethr(df_pop_state, "state", title="States with Populations Above or Below 1 Million")

plot of chunk unnamed-chunk-5