1 Introduction

A common problem in R is labelling scatter plots with large numbers of points and/or labels. We provide a utility for easy labelling of scatter plots and quick plotting of volcano plots and MA plots for gene expression analyses. Using an interactive shiny and plotly interface, users can hover over points to see where specific points are located and click on points to easily label them. Labels can be toggled on/off simply by clicking. An input box and batch input window provides an easy way to label points by name. Labels can be dragged around the plot to place them optimally. Notably we provide an easy way to export directly to PDF for publication.

2 Installation

Install from CRAN

install.packages("easylabel")
library(easylabel)

Install from Github

devtools::install_github("myles-lewis/easylabel")
library(easylabel)

If you wish to use the optional useQ argument with easyVolcano() and easyMAplot(), you will need to install additional package qvalue from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("qvalue")

If you wish to use the optional fullGeneNames argument, you will need to install packages AnnotationDbi and org.Hs.eg.db from Bioconductor:

BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")

3 Scatter plots

Use easylabel() to open a shiny app and plot and label scatter plots. A table of the main data is supplied in the Table tab for easy viewing of data.

data(mtcars)
easylabel(mtcars, x = 'mpg', y = 'wt',
          colScheme = 'royalblue')

3.1 Export plot to PDF

  • Hover over and click on/off genes which you want to label.
  • When you have selected all your chosen genes, then drag gene names to move label positions.
  • Click the save button to export a PDF in base graphics.

3.2 Export SVG from plotly

  • Switch to SVG when finalised (only do this at last moment as otherwise editing can very slow, particularly with large numbers of points).
  • Press the camera button in the plotly modebar to save image as SVG.

3.3 Export as plotly object

The output_shiny option enables users to switch between invoking the shiny app or directly outputting a plotly object. For example, to extract a plotly figure with draggable annotations:

data(mtcars)

p1 <- easylabel(mtcars, x = 'mpg', y = 'wt', col = 'cyl', 
                startLabels = rownames(mtcars)[mtcars$gear == 5], 
                output_shiny = FALSE) %>% 
  layout(yaxis = list(zeroline = FALSE))

p2 <- easylabel(mtcars, x = 'mpg', y = 'drat', col = 'vs', 
                colScheme = c("dodgerblue", "orange"), 
                startLabels = rownames(mtcars)[mtcars$gear == 5], 
                output_shiny = FALSE) %>% 
  layout(xaxis = list(zeroline = FALSE))

plotly::subplot(p1, p2, nrows = 2, shareY = TRUE, titleX = TRUE, margin = 0.05)

Note that while the labels and lines can still be dragged and moved in the exported plotly object, clicking on points to add them, exporting to PDF and borders around label text, which are all shiny/ PDF output features, are not available from a pure plotly object.

Plotly objects can also be exported as a return value by pressing the ‘Export plotly & exit’ button in the shiny app. In this case the object retains the labelled points and their edited positions.

3.4 Colour

Similar to ggplot2 and plotly, colour can either be set as a single colour, using colScheme = 'blue' or can be set to change with the level of a factor variable within data by setting col. Transparency can be altered by setting alpha. Currently colour gradients are not supported.

easylabel(mtcars, x = 'mpg', y = 'wt',
          col = 'cyl')

3.5 Shape

Marker shapes can either be set as a single shape using shapeScheme = 21, based on the base graphics pch symbol scheme (see points()), or shapes can be set to change with the level of a factor variable within data by setting shape. Shapes and colours can be combined.

# gapminder data set
if(!require(gapminder)) {install.packages("gapminder")}
library(gapminder)
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
          col = 'continent', shape = 'continent',
          size = 10,
          labs = 'country', 
          zeroline = FALSE)

3.6 Size

The size of all markers can be adjusted by setting size to a single number e.g. size = 6 (default 8). Size can be assigned to a column of continuous data within data to produce a bubble chart. The size of points is automatically scaled to fit within a range of point sizes, determined by scaleRange.

library(gapminder)
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
          col = 'continent', labs = 'country', 
          size = 'pop',
          alpha = 0.6,
          zeroline = FALSE)

3.7 Choosing label names

By default, easylabel() takes the rownames of the main dataframe as the label names. Any column can be selected for label names by setting labs to the name of a column within data, as for example in the Gapminder dataset plots above.

3.8 Controlling axes

3.8.1 Axis titles

Axis titles can be set using xlab and ylab. These accept R expressions to allow maths symbols. This is primarily designed to work when exporting plots to PDF via base graphics, since plotly does not natively understand expressions.

easylabel(xymatrix, x = 'x', y = 'y', col = 'col',
          colScheme = c('darkgrey', 'green3', 'gold3', 'blue'),
          xlab = expression("log"[2] ~ " fold change post-Rituximab"),
          ylab = expression("log"[2] ~ " fold change post-Tocilizumab"),
          showgrid = TRUE)

3.8.2 Axis limits and outliers

xlim and ylim allow control over the range of each axis. Outlying points are shown as diamonds. Outliers can be toggled off using showOutliers = FALSE. Outlier symbol, colour and line width can be set using outlier_shape, outlier_col and outlier_lwd respectively.

3.8.3 Axis tick marks

Axis tick marks can be controlled by setting xaxp or yaxp to a vector of form c(x1, x2, n) where x1 and x2 are the limits of the tick marks and n is the number of intervals between tick marks.

For full customisation of axis tick coordinates and labels use xticks or yticks, which are specified as a list of two named vectors at = ... and labels = ....

# example axis ticks
if(!require(gapminder)) {install.packages("gapminder")}
library(gapminder)
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
          col = 'continent', shape = 'continent',
          size = 10,
          labs = 'country', 
          zeroline = FALSE,
          xaxp = c(0, 50000, 10),
          yaxp = c(40, 85, 9),
          showgrid = TRUE)

3.9 Gridlines

Gridlines can be shown by setting showgrid = TRUE. If showgrid is set to "x" or "y" then gridlines are shown only for that axis. Density of gridlines can be controlled by setting xaxp or yaxp (see Axis tick marks).

Black lines through the origin can be controlled by setting zeroline to TRUE or FALSE.

Dashed grey horizontal lines can be added at specific levels by setting hline with a vector of values. Similarly dashed grey vertical lines can be added by setting vline with a vector of values.

3.10 Legends

Setting showLegend to TRUE or FALSE can be used to show or hide the legend.

Legends can be positioned by setting legendxy. This gives a vector of coordinates for the position of the legend in plotly paper reference with c(0, 0) being the bottom left corner and c(1, 1) being the top right corner of the plot window.

Plotly has unusual behaviour in that the x coordinate always aligns the left side of the legend. However, the y coordinate aligns the top, middle or bottom of the legend dependent on whether it is in the top, middle or bottom 1/3 of the plot window. So c(1, 0) positions the legend in the bottom right corner outside the right margin of the plot, while c(1, 0.5) centre aligns the legend around the centre of y axis.

3.11 Further control of plotting

Further control of plotting can be achieved by passing other arguments to plot() via R’s ... argument. For example, a box around the plot can be added using bty = 'o' (see par()).

3.12 panel.last

Base graphics’ panel.last argument is a flexible way to add plotting commands. This provides a way to add trend lines, loess lines or any additional features such as extra legends to the plot after the points are plotted, but before labels are drawn. Note that panel.last only works when saving to PDF, and is not available in plotly.

# example adding a trend line using panel.last
fit <- lm(xymatrix$y ~ xymatrix$x)
easylabel(xymatrix, x = 'x', y = 'y', col = 'col',
          colScheme = c('darkgrey', 'green3', 'gold3', 'blue'),
          xlab = expression("log"[2] ~ " fold change post-Rituximab"),
          ylab = expression("log"[2] ~ " fold change post-Tocilizumab"),
          showgrid = TRUE, fullGeneNames = TRUE,
          panel.last = {
            abline(fit, col='red')
          })

3.13 plotly_filter

This argument is used to reduce the number of scatter points displayed by plotly, to prevent plotly becoming sluggish or unusable, which happens above about 200,000 points. To plot large datasets such as genomic data, we recommend adding a logical column to your dataset which filters out points of interest for inspecting and labelling via plotly. plotly_filter specifies the name of this column. Saving to PDF retains the full dataset and all points will be plotted. This argument is used by the easyManhattan() function (see Manhattan plot).

4 Volcano plots

Use the easyVolcano() function to quickly plot a volcano plot from DESeq2 or EdgeR objects. The useQ argument will switch to using q values for FDR.

# Example DESeq2 object
head(volc1)
##               baseMean log2FoldChange     lfcSE         stat     pvalue
## 5_8S_rRNA   0.01625099     0.02726923 2.9811510  0.009147217 0.99270168
## 5S_rRNA     3.66321881     0.05989097 0.4115732  0.145517159 0.88430257
## 7SK        12.96667458    -0.46156215 1.3668722 -0.337677614 0.73560615
## A1BG      290.75694559     0.23486554 0.3514587  0.668259368 0.50396804
## A1BG-AS1  170.77906789     0.11800652 0.2001594  0.589562618 0.55548392
## A1CF        4.56855647    -1.23495475 0.4855212 -2.543565265 0.01097276
##                padj
## 5_8S_rRNA        NA
## 5S_rRNA   0.9823854
## 7SK       0.9528927
## A1BG      0.8956205
## A1BG-AS1  0.9117136
## A1CF      0.4531638

# Example limma object
head(volc2)
##               logFC  AveExpr         t      P.Value   adj.P.Val           B
## A1BG      0.7697662 6.389039  4.692064 0.0011239293 0.008610736 -1.16895727
## A1BG-AS1  0.5298897 3.610850  4.688657 0.0011293516 0.008638889 -0.85241114
## AAAS     -0.4162720 5.349529 -5.578986 0.0003398373 0.004148831  0.17585449
## AACS     -0.2049703 3.690271 -1.465073 0.1768461859 0.299064278 -5.88763988
## AAGAB     0.2970478 5.538436  5.446618 0.0004033869 0.004553688 -0.02478892
## AAK1      0.1969886 1.861988  1.286456 0.2302992497 0.362402197 -5.74356735
# Typical DESeq2 workflow
volc1 <- results(dds)
easyVolcano(volc1, useQ = TRUE)

If you want to specify your own columns within the dataset set x to the column containing log fold change, y to the column containing raw p values and padj to the column containing adjusted p values. labs can be used to specify the column containing label names, otherwise this defaults to rownames(data).

# Manually specify columns
easyVolcano(volc1, x = 'log2FoldChange', y = 'pvalue', padj = 'padj')

# To use nominal unadjusted p value for significant genes
easyVolcano(volc1, x = 'log2FoldChange', y = 'pvalue')

Note that if you specify y for a column of p values but leave padj blank, the function assumes you are using nominal unadjusted p values for the cutoff for significance.

4.1 MA plots

Use the easyMAplot() function to quickly plot an MA plot from DESeq2 or EdgeR objects.

easyMAplot(volc2, useQ = TRUE)

4.2 Expanding full gene names

The fullGeneNames argument will use Bioconductor package AnnotationDbi and the org.Hs.eg.db human gene database to expand gene symbols in the Table tab. Both will need to be installed from Bioconductor.

BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
easyVolcano(volc1, useQ = TRUE, fullGeneNames = TRUE)

Full gene names are also shown when hovering over points in the scatter plot, which can make it easier to label points of interest. For mouse genes, install org.Mm.eg.db and set AnnotationDb = org.Mm.eg.db (note the lack of quotes).

BiocManager::install("org.Mm.eg.db")
library(org.Mm.eg.db)
easyVolcano(volc1,
            fullGeneNames = TRUE,
            AnnotationDb = org.Mm.eg.db)

4.3 Colour schemes

For volcano plots a simple colour scheme with downregulated genes in blue and upregulated genes in red can be rendered by setting fccut = 0.

easyVolcano(volc2,
            useQ = TRUE,
            fccut = 0,
            main = "Volcano title")

4.4 Titles

A title can be added using main = "Title". The font size of the title can be modified using cex.main = 2 (default 1.2) if saving to PDF.

4.5 Adding left/right subtitles

You can add left and right sided titles using Ltitle and Rtitle to explain the direction of effect for up/downregulation. The use of expression in the example below shows how to add left/right arrow symbols to the titles. The symbols only appear in the saved PDF - they are not compatible with plotly.

LRtitle_side = 1 puts these titles on the bottom while LRtitle_side = 3 puts them on the top.

cex.lab controls font size for these titles as well as axis titles. cex.axis controls font size for axis numbering.

easyVolcano(volc1,
            useQ = TRUE, fullGeneNames = TRUE,
            Ltitle = expression(symbol("\254") ~ "Non-responder"),
            Rtitle = expression("Responder" ~ symbol("\256")),
            LRtitle_side = 1,
            cex.lab = 0.9, cex.axis = 0.8,
            fccut = c(1, 2), fdrcutoff = 0.2,
            ylim = c(0, 6), xlim = c(-5, 5),
            colScheme = c('darkgrey', 'blue', 'orange', 'red'))

4.6 Setting P value cut-off

The FDR cutoff is set using fdrcutoff (volcano plots allow a single value, while MA plots allow multiple values). In order to force significant genes to be shown based on unadjusted P values, this can be achieved by a workaround setting both y and padj manually to the unadjusted p value column. Note this does mean the legend incorrectly states ‘FDR <’ etc.

easyVolcano(volc1, y = 'pvalue', padj = 'pvalue', fdrcutoff = 0.01)

4.7 Axis limits and outliers

xlim and ylim allow control over the range of each axis. Outlying points are shown as diamonds (see example above). Outliers can be toggled off using showOutliers = FALSE. Outlier symbol, colour and line width can be set using outlier_shape, outlier_col and outlier_lwd respectively.

4.8 Complex colour schemes

The colour scheme system has been expanded to allow multiple fold change cut-offs. In the example above the colours are symmetrical.

In the next 2 plots, the colours range from blue for downregulated genes, through to red for upregulated genes. Vertical lines can be added using vline.

colScheme <- c('darkgrey', 'blue', 'lightblue', 'orange', 'red')
easyVolcano(volc1, fccut = 1, fdrcutoff = 0.2,
            ylim = c(0, 6), xlim = c(-5, 5),
            colScheme = colScheme, 
            vline = c(-1, 1))

The next example has 6 colours and also shows how to remove the white outlines around points using outline_col = NA and use transparency instead via alpha.

library(RColorBrewer)
colScheme <- c('darkgrey', brewer.pal(9, 'RdYlBu')[c(9:7, 3:1)])
easyVolcano(volc1, fccut = c(1, 2), fdrcutoff = 0.2, 
            ylim = c(0, 6), xlim = c(-5, 5),
            colScheme = colScheme, 
            alpha = 0.75, outline_col = NA)

Similarly 6 colours can be applied to MA plots using 3 levels of cut-off for FDR (note the colour scheme is in a slightly different order).

colScheme <- c('darkgrey', brewer.pal(9, 'RdYlBu')[c(7:9, 3:1)])
easyMAplot(volc2, fdrcutoff = c(0.05, 0.01, 0.001), size = 6, useQ = TRUE,
           alpha = 0.75, outline_col = NA,
           colScheme = colScheme)

5 Controlling labels

Label lines can be altered using the argument labelDir or by selecting the Label direction pulldown menu in the shiny app. A couple of examples are shown below:

easyVolcano(volc1, labelDir = "horiz")
easyMAplot(volc1, labelDir = "vert")

5.1 Label line and text

Label line colour can be set using line_col. Label text colour can be set using text_col. Each of these can be set to match the colour of each point by setting each to "match". Default label line length can be altered by changing lineLength (default 75 in pixels). cex.text can be changed to alter the font size of labels (default 0.72).

5.2 Adding label boxes

Rectangles can be drawn around the label text, using rectangles = TRUE, and will appear when saving to PDF (they are not supported in plotly). Rectangle fill colour can be set using rect_col and rectangle border colour can be set using border_col. Use border_col = NA to turn off rectangle borders. The amount of padding in pixels around label text can be specified by setting padding. The amount of roundedness in pixels of rectangle corners is specified by border_radius.

# Simple outlines
easyVolcano(volc2, useQ = TRUE, fccut = 0,
            rectangles = TRUE)

# Red outlined labels, rounded ends
easyVolcano(volc2, useQ = TRUE, fullGeneNames = TRUE,
            rectangles = TRUE,
            padding = 5,
            border_radius = 10,
            line_col = 'red',
            border_col = 'red',
            text_col = 'red')

# Transparent grey rectangles, rounded ends
easyMAplot(volc2, fdrcut = c(0.05, 0.01, 0.001), size = 6, useQ = TRUE,
           alpha = 0.75, outline_col = NA,
           fullGeneNames = TRUE,
           colScheme = c('darkgrey', brewer.pal(9, 'RdYlBu')[c(7:9, 3:1)]),
           rectangles = TRUE,
           border_col = NA,
           padding = 5,
           rect_col = adjustcolor('grey', alpha.f = 0.6),
           border_radius = 20)

# White text on black background, no rounding
easyVolcano(volc2, useQ = TRUE, fullGeneNames = TRUE,
            fccut = 0,
            rectangles = TRUE,
            padding = 4,
            border_radius = 0,
            rect_col = 'black',
            text_col = 'white',
            border_col = NA)

5.3 Matching label & point colours

While label text and label lines can be adjusted by setting text_col and line_col to individual colours, label text and label lines can also be set to mirror the colour of each point by setting text_col = "match" and line_col = "match" respectively. Label box fill colour or rectangle outline colour can also be set to match each point’s colour by setting rect_col = "match" or border_col = "match" (label boxes are not supported in plotly).

# Label text and label lines match point colours
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
          col = 'continent', labs = 'country', 
          size = 'pop',
          alpha = 0.6,
          line_col = "match", text_col = "match",
          zeroline = FALSE, showgrid = "y")

# Rectangle fill colour and label line match point colours, rounded rectangles
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
          col = 'continent', labs = 'country', 
          size = 'pop',
          alpha = 0.6,
          line_col = "match", text_col = "white",
          rectangles = TRUE, border_col = NA,
          rect_col = "match", border_radius = 20, padding = 5,
          zeroline = FALSE, showgrid = "y")

6 Manhattan plot

Manhattan plots can be labelled using the function easyManhattan(). Plotly struggles with more than about 100,000 points, so we use a filtering system to reduce the number of points shown in the plotly interactive plot, while the full plot with all points is produced when saving to PDF. An example is shown below:

# Manhattan plot using SLE GWAS data from Bentham et al 2015
# FTP download full summary statistics from
# https://www.ebi.ac.uk/gwas/studies/GCST003156
library(data.table)
SLE_gwas <- fread('../bentham_2015_26502338_sle_efo0002690_1_gwas.sumstats.tsv')
# Simple Manhattan plot
easyManhattan(SLE_gwas)

# 4 colours for chromosomes
easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(4, 'Paired'))

By default Manhattan plots display only the top 100,000 points by p value in the plotly window and plot the top 1 million SNPs when saving to PDF. These defaults can be altered by setting npoints and nplotly. Setting npoints to NA will plot all points when saving to PDF.

# Examples
# 12 colours for chromosomes, no separate colour for significant points
easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(12, 'Paired'), 
              sigCol = NA)

# Label peaks automatically, add horizontal gridlines
easyManhattan(SLE_gwas, npeaks = 20, showgrid = "y")

# Vertical version
easyManhattan(SLE_gwas, transpose = TRUE, height = 1000, width = 600)

# Chr1 only
easyManhattan(SLE_gwas[SLE_gwas$chrom == 1, ])

# Add symbols for the significant SNPs
easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(4, 'Paired'),
              size = 8,
              shape = 'col',
              shapeScheme = c(rep(20, 4), 18))

6.1 Locus plots

To look in a specific genomic region

# Create a locus plot over one chromosomal region
library(plotly)
p1 = easyManhattan(SLE_gwas[SLE_gwas$chrom == 6 & 
                              SLE_gwas$pos >= 28e6 & 
                              SLE_gwas$pos <= 34e6, ], 
                   output_shiny = FALSE, labs = "rsid", 
                   startLabels=c("rs115466242", "rs2853999"), 
                   npeaks = 3)



# To annotate genes in that region
source("https://raw.githubusercontent.com/KatrionaGoldmann/BioOutputs/master/R/bio_gene_locations.R")
library(ggbio)
library(gginnards)
library(ggrepel)
if (! "EnsDb.Hsapiens.v75" %in% rownames(installed.packages()))
  BiocManager::install("EnsDb.Hsapiens.v75")

p2 = bio_gene_locations(6, c(28e6, 34e6),  
                        subset_genes = c('HLA-F', 'HLA-G', 'HLA-A', 'HLA-E', 
                                         'HLA-C', 'HLA-B', 'HLA-DRA', 
                                         'HLA-DRB5', 'HLA-DRB1', 'HLA-DQA1', 
                                         'HLA-DQB1', 'HLA-DQA2', 'HLA-DQB2', 
                                         'HLA-DOB', 'HLA-DMB', 'HLA-DMA', 
                                         'HLA-DOA', 'HLA-DPA1', 'HLA-DPB1')) 

plotly::subplot(p1, p2$plotly_location %>% layout(yaxis=list(range=c(0.25, 2))), 
                shareY = T, titleX = T, margin=0.05,
                nrows=2, heights=c(0.7, 0.3)) 

7 Known issues

  • Plotly’s scattergl function, which uses webGL, seems to have a limit of around 400,000 scatter points per plot. Above this level, the shiny/plotly app becomes sluggish or freezes and becomes unusable. We recommend limiting plots to around 100,000 points using the plotly_filter argument. See plotly_filter.