The R extension of the markdown language (Yihui Xie, Allaire, and Grolemund 2018; Allaire et al.
2023) enables reproducible statistical reports with nice
typesetting in HTML, Microsoft Word, and LaTeX. Moreover, since recently
(R Core Team 2022, version 4.2), R's
manual pages include support for mathematical expressions (Sarkar and Hornik 2022; Viechtbauer 2022),
which already is a big improvement. However, except for special cases
such as regression models (Anderson, Heiss, and
Sumners 2023) and R’s own plotmath annotation, rules for the
mapping of built-in language elements to their mathematical
representation are still lacking. So far, R expressions such as
`pbinom(k, N, p)`

are printed as they are and pretty
mathematical formulae such as \(P_{\mathrm{Bi}}(X \le k; N, p)\) require
explicit LaTeX commands like
`P_{\mathrm{Bi}}\left(X \le k; N, p\right)`

. Except for very
basic use cases, these commands are tedious to type in and their source
code is hard to read.

The present R package defines a set of rules for the automatic translation of R expressions to mathematical output in R Markdown documents (Y. Xie, Dervieux, and Riederer 2020) and Shiny Apps (Chang et al. 2022). The translation is done by an embedded Prolog interpreter that maps nested expressions recursively to MathML and LaTeX/MathJax, respectively. User-defined hooks enable to extend the set of rules, for example, to represent specific R elements by custom mathematical signs.

The main feature of the package is that the same R expressions and equations can be used for both mathematical typesetting and calculations. This saves time and potentially reduces mistakes, as will be illustrated below.

Similar to other high-level programming languages, R is homoiconic,
that is, R commands (i.e., R “calls”) are, themselves, symbolic data
structures that can be created, parsed and modified. Because the default
response of the R interpreter is to evaluate a call and return its
result, this property is not transparent to the general user. There
exists, however, a number of built-in R functions (e.g.,
`quote()`

, `call()`

etc.) that allow the user to
create R calls which can be stored in regular variables and then, for
example, evaluated at a later stage or in a specific environment (Wickham 2019). The present package includes a
set of rules that translate such calls to a mathematical representation
in MathML and LaTeX. For a first illustration of the *mathml*
package, we consider the binomial probability.

```
<- quote(pbinom(k, N, p))
term term
```

`## pbinom(k, N, p)`

The term is quoted to avoid its immediate evaluation (which would
raise an error anyway since the variables `k`

,
`N`

, `p`

have not yet been defined). Experienced
readers will remember that the quoted expression above is a short form
for

`<- call("pbinom", as.name("k"), as.name("N"), as.name("p")) term `

As it can be seen from the output, to the variable `term`

is not assigned the result of the calculation, yet an R call (see, e.g., Wickham 2019, for details on “non-standard
evaluation”), which can eventually be evaluated with
`eval()`

,

```
<- 10
k <- 22
N <- 0.4
p eval(term)
```

`## [1] 0.77195`

The R package *mathml* can now be used to render the call in
MathML, that is the dialect for mathematical elements on HTML webpages
or in MathJax/LaTeX, as shown below (some of the curly braces are not
really needed in this simple example, but are necessary in edge
cases).

```
library(mathml)
mathjax(term)
```

`## [1] "${P}_{\\mathrm{Bi}}{\\left({{X}{\\le}{k}}{{;}{{N}{{,}{p}}}}\\right)}$"`

Some of the curly braces are not really needed in the LaTeX output,
but are necessary in edge cases. The package also includes a function
`mathout()`

that wraps a call to `mathml()`

for
HTML output and `mathjax()`

for LaTeX output. Moreover, the
function `math(x)`

adds the class `"math"`

to its
argument, such that a special knitr printing function is invoked (see the vignette on custom print methods in Yihui Xie
2023).

${P}_{\text{Bi}}\left(X\le k;N,p\right)$

Similarly, `inline()`

produces inline output,
``r inline(term)``

yields
${P}_{\text{Bi}}\left(X\le k;N,p\right)$.

*mathml* is an R package for pretty mathematical
representation of R functions and objects in data analysis, scientific
reports and interactive web content. The currently supported features
are listed below, roughly following the order proposed by Murrell and
Ihaka (2000).

*mathml* handles the basic elements of everyday mathematical
expressions, such as numbers, Latin and Greek letters, multi-letter
identifiers, accents, subscripts, and superscripts.

```
<- quote(1 + -2L + a + abc + "a" + phi + Phi + varphi + roof(b)[i, j]^2L)
term math(term)
```

$1.00+-2+a+\mathrm{abc}+\text{a}+\phi +\Phi +\varphi +{\widehat{b}}_{i\text{}j}^{2}$

```
<- quote(round(3.1415, 3L) + NaN + NA + TRUE + FALSE + Inf + (-Inf))
term math(term)
```

$3.142+\mathrm{nan}+\mathrm{na}+T+F+\infty +\left(-\infty \right)$

An expression such as `1 + -2`

may be considered
unsatisfactory from an aesthetical perspective. It is correct R syntax,
though, and is reproduced accordingly, without the parentheses.
Parentheses around negated numbers or symbols can be added as shown
above for `+ (-Inf)`

.

To avoid name clashes with package *stats*,
`roof()`

is used instead of `hat()`

to put a hat
on a symbol (see next section for further decorations). Note that an
R function `roof()`

does not exist in base R, it is provided
by the package for convenience and points to the identity function.

The package offers some support for different fonts as well as accents and boxes etc. Internally, these decorations are implemented as identity functions, so they can be introduced into R expressions without side-effects.

```
<- quote(bold(b[x, 5L]) + bold(b[italic(x)]) + italic(ab) + italic(42L))
term math(term)
```

${\mathbf{b}}_{\mathbf{x}\text{}5}+{\mathbf{b}}_{\mathit{x}}+\mathit{ab}+42$

```
<- quote(tilde(a) + mean(X) + boxed(c) + cancel(d) + phantom(e) + prime(f))
term math(term)
```

$\tilde{a}+\overline{X}+\overline{)c}+\overline{)d}+\phantom{e}+{f}^{\prime}$

Note that the font styles only affect the display of identifiers, whereas numbers, character strings etc. are left untouched.

Arithmetic operators and parentheses are translated as they are, as illustrated below.

```
<- quote(a - ((b + c)) - d*e + f*(g + h) + i/j + k^(l + m) + (n*o)^{p + q})
term math(term)
```

$a-\left[\left(b+c\right)\right]-de+f\cdot \left(g+h\right)+i/j+{k}^{\left(l+m\right)}+{\left(no\right)}^{p+q}$

```
<- quote(dot(a, b) + frac(1L, nodot(c, d + e)) + dfrac(1L, times(g, h)))
term math(term)
```

$a\cdot b+\frac{1}{c\left(d+e\right)}+{\displaystyle \frac{1}{g\times h}}$

For multiplications involving only numbers and symbols, the
multiplication sign is omitted. This heuristic does not always produce
the desired result; therefore, *mathml* defines alternative
R functions `dot()`

, `nodot()`

, and
`times()`

. These functions calculate a product and produce
the respective multiplication signs. Similarly, `frac()`

and
`dfrac()`

can be used for small and large fractions.

For standard operators with known precedence, *mathml* is
generally able to detect if parentheses are needed; for example,
parentheses are automatically placed around `d + e`

in the
`nodot`

-example. However, we note unnecessary parentheses
around `l + m`

above. Thes parentheses are a consequence of
`quote(a^(b + c))`

actually producing a nested R call of the
form `'^'(a, (b + c))`

instead of
`'^'(a, b + c)`

:

```
<- quote(a^(b + c))
term paste(term)
```

`## [1] "^" "a" "(b + c)"`

For the present purpose, this feature is unfortunate because extra
parentheses around `b + c`

are not needed. The preferred
result is obtained by using the functional form
`quote('^'(k, l + m))`

of the power, or curly braces as a
workaround (see `p + q`

above).

Whereas in standard infix operators, the parentheses typically follow the rules for precedence, undesirable results may be obtained in custom operators.

```
<- quote(mean(X) %+-% 1.96 * s / sqrt(N))
term math(term)
```

$\left(\overline{X}\pm 1.96\right)\cdot s/\sqrt{N}$

```
<- quote('%+-%'(mean(X), 1.96 * s / sqrt(N))) # functional form of '%+-%'
term <- quote(mean(X) %+-% {1.96 * s / sqrt(N)}) # the same
term math(term)
```

$\overline{X}\pm 1.96s/\sqrt{N}$

The example is a reminder that it is not possible to define the precedence of custom operators in R, and that expressions with such operators are evaluated strictly from left to right. Again, the problem can be worked around by the functional form of the operator, or a curly brace to hide the parenthesis but enforce the correct operator precedence.

More operators are shown in Table 1, including the suggestions by Murrell and Ihaka (2000) for graphical annotations and arrows in R figures.

Operator | Output | Operator | Output | Operator | Arrow |
---|---|---|---|---|---|

A %*% B | $A\times B$ | A != B | $A\ne B$ | A %<->% B | $A\leftrightarrow \ufe0eB$ |

A %.% B | $A\cdot B$ | A ~ B | $A\sim B$ | A %->% B | $A\to B$ |

A %x% B | $A\otimes B$ | A %~~% B | $A\approx B$ | A %<-% B | $A\leftarrow B$ |

A %/% B | $\lfloor A/B\rfloor $ | A %==% B | $A\equiv B$ | A %up% B | $A\uparrow B$ |

A %% B | $\mathrm{mod}\left(A,B\right)$ | A %=~% B | $A\cong B$ | A %down% B | $A\downarrow B$ |

A & B | $A\wedge B$ | A %prop% B | $A\propto B$ | A %<=>% B | $A\iff B$ |

A | B | $A\vee B$ | A %in% B | $A\in B$ | A %=>% B | $A\Rightarrow B$ |

xor(A, B) | $A\u22bbB$ | intersect(A, B) | $A\cap B$ | A %<=% B | $A\Leftarrow B$ |

!A | $\neg A$ | union(A, B) | $A\cup B$ | A %dblup% B | $A\Uparrow B$ |

A == B | $A=B$ | crossprod(A, B) | ${A}^{\text{T}}\times B$ | A %dbldown% B | $A\Downarrow B$ |

A <- B | $A=B$ | is.null(A) | $A=\varnothing $ | $\text{}$ |

There is support for most functions from package *base*, with
adequate use and omission of parentheses.

```
<- quote(sin(x) + sin(x)^2L + cos(pi/2L) + tan(2L*pi) * expm1(x))
term math(term)
```

$\mathrm{sin}x+{\left(\mathrm{sin}x\right)}^{2}+\mathrm{cos}\left(\pi /2\right)+\mathrm{tan}\left(2\pi \right)\cdot \left(\mathrm{exp}x-1\right)$

```
<- quote(choose(N, k) + abs(x) + sqrt(x) + floor(x) + exp(frac(x, y)))
term math(term)
```

$\left(\genfrac{}{}{0ex}{}{N}{k}\right)+\left|x\right|+\sqrt{x}+\lfloor x\rfloor +\mathrm{exp}\left(\frac{x}{y}\right)$

A few more examples are shown in Table 2, including functions from
*stats*.

Function | Output | Function | Output |
---|---|---|---|

sin(x) | $\mathrm{sin}x$ | dbinom(k, N, pi) | ${P}_{\text{Bi}}\left(X=k;N,\pi \right)$ |

cosh(x) | $\mathrm{cosh}x$ | pbinom(k, N, pi) | ${P}_{\text{Bi}}\left(X\le k;N,\pi \right)$ |

tanpi(alpha) | $\mathrm{tan}\left(\alpha \pi \right)$ | qbinom(p, N, pi) | ${\mathrm{argmin}}_{k}\left[{P}_{\text{Bi}}\left(X\le k;N,\pi \right)>p\right]$ |

asinh(x) | ${\mathrm{sinh}}^{-1}x$ | dpois(k, lambda) | ${P}_{\text{Po}}\left(X=k;\lambda \right)$ |

log(p) | $\mathrm{log}p$ | ppois(k, lambda) | ${P}_{\text{Po}}\left(X\le k;\lambda \right)$ |

log1p(x) | $\mathrm{log}\left(1+x\right)$ | qpois(p, lambda) | ${\mathrm{argmax}}_{k}\left[{P}_{\text{Po}}\left(X\le k;\lambda \right)>p\right]$ |

logb(x, e) | ${\mathrm{log}}_{e}x$ | dexp(x, lambda) | ${f}_{\text{Exp}}\left(x;\lambda \right)$ |

exp(x) | $\mathrm{exp}x$ | pexp(x, lambda) | ${F}_{\text{Exp}}\left(x;\lambda \right)$ |

expm1(x) | $\mathrm{exp}x-1$ | qexp(p, lambda) | ${F}_{\text{Exp}}^{-1}\left(p;\lambda \right)$ |

choose(n, k) | $\left(\genfrac{}{}{0ex}{}{n}{k}\right)$ | dnorm(x, mu, sigma) | $\phi \left(x;\mu ,\sigma \right)$ |

lchoose(n, k) | $\mathrm{log}\left(\genfrac{}{}{0ex}{}{n}{k}\right)$ | pnorm(x, mu, sigma) | $\Phi \left(x;\mu ,\sigma \right)$ |

factorial(n) | $n!$ | qnorm(alpha/2L) | ${\Phi}^{-1}\left(\alpha /2\right)$ |

lfactorial(n) | $\mathrm{log}n!$ | 1L - pchisq(x, 1L) | $1-{F}_{{\chi}^{2}\left(1\phantom{\rule{thinmathspace}{0ex}}\text{df}\right)}\left(x\right)$ |

sqrt(x) | $\sqrt{x}$ | qchisq(1L - alpha, 1L) | ${F}_{{\chi}^{2}\left(1\phantom{\rule{thinmathspace}{0ex}}\text{df}\right)}^{-1}\left(1-\alpha \right)$ |

mean(X) | $\overline{X}$ | pt(t, N - 1L) | $P\left(T\le t;N-1\phantom{\rule{thinmathspace}{0ex}}\text{df}\right)$ |

abs(x) | $\left|x\right|$ | qt(alpha/2L, N - 1L) | ${T}_{\alpha /2}\left(N-1\phantom{\rule{thinmathspace}{0ex}}\text{df}\right)$ |

For self-written functions, the matter is somewhat more complicated.
For a function such as `g <- function(...) ...`

, the name
*g* is not transparent to R, because only the function body is
represented. We can still display functions in the form
`head(x) = body`

if we embed the object to be shown into a
call `"<-"(head, body)`

.

```
<- function(x)
sgn
{if(x == 0L) return(0L)
if(x < 0L) return(-1L)
if(x > 0L) return(1L)
}
math(sgn)
```

$\{\begin{array}{}0\text{,}\phantom{\rule{thinmathspace}{0ex}}\text{if}\phantom{\rule{thinmathspace}{0ex}}x=0-1\text{,}\phantom{\rule{thinmathspace}{0ex}}\text{if}\phantom{\rule{thinmathspace}{0ex}}x<01\text{,}\phantom{\rule{thinmathspace}{0ex}}\text{if}\phantom{\rule{thinmathspace}{0ex}}x>0\end{array}$

`math(call("<-", quote(sgn(x)), sgn))`

$\mathrm{sgn}x=\{\begin{array}{}0\text{,}\phantom{\rule{thinmathspace}{0ex}}\text{if}\phantom{\rule{thinmathspace}{0ex}}x=0-1\text{,}\phantom{\rule{thinmathspace}{0ex}}\text{if}\phantom{\rule{thinmathspace}{0ex}}x<01\text{,}\phantom{\rule{thinmathspace}{0ex}}\text{if}\phantom{\rule{thinmathspace}{0ex}}x>0\end{array}$

As shown in the example, we can still display functions in the form
`head(x) = body`

if we embed the object to be shown into a
call `"<-"(head, body)`

.

The function body is generally a nested R call of the form
`'{'(L)`

, with `L`

being a list of commands (the
semicolon, not necessary in R, is translated to a newline). As
illustrated in the example, *mathml* provides limited support for
control structures such as `if`

.

Indices in square brackets are rendered as subscripts, powers are
rendered as superscript. Moreover, *mathml* defines the functions
`sum_over(x, from, to)`

, and
`prod_over(x, from, to)`

that simply return their first
argument. The other two arguments serve as decorations (*to* is
optional), for example, for summation and product signs.

```
<- quote(S[Y]^2L <- frac(1L, N) * sum(Y[i] - mean(Y))^2L)
term math(term)
```

${S}_{Y}^{2}=\frac{1}{N}\cdot {\sum \left({Y}_{i}-\overline{Y}\right)}^{2}$

```
<- quote(log(prod_over(L[i], i==1L, N)) <- sum_over(log(L[i]), i==1L, N))
term math(term)
```

$\mathrm{log}{\prod}_{i=1}^{N}{L}_{i}={\sum}_{i=1}^{N}\mathrm{log}{L}_{i}$

R's `integrate`

function takes a number of arguments, the
most important ones being the function to integrate, and the lower and
the upper bound of the integration.

```
<- quote(integrate(sin, 0L, 2L*pi))
term math(term)
```

$\underset{0}{\overset{2\pi}{\int}}\mathrm{sin}x\phantom{\rule{thinmathspace}{0ex}}dx$

`eval(term)`

`## 2.221501e-16 with absolute error < 4.4e-14`

For mathematical typesetting in the form of \(\int f(x)\, dx\), *mathml* needs to
find out the name of the integration variable. For that purpose, the
underlying Prolog bridge provides a predicate `r_eval/3`

that
calls R from Prolog. In the example above, this predicate evaluates
`formalArgs(args(sin))`

, which returns the names of the
arguments of `sin`

, namely, `x`

.

Note that in the example above, the quoted term is an abbreviation
for `call("integrate", quote(sin), ...)`

, with
`sin`

being an R symbol, not a function. While the R function
`integrate()`

can handle both symbols and functions,
*mathml* needs the symbol because it is unable to determine the
function name of custom functions.

One of R’s great features is the possibility to refer to function
arguments by their names, not only by their position in the list of
arguments. At the other end, Prolog does not have such a feature.
Therefore, the Prolog handlers for R calls are rather rigid, for
example, `integrate/3`

accepts exactly three arguments in a
particular order and without names, that is,
`integrate(lower=0L, upper=2L*pi, sin)`

, would not print the
desired result.

To “canonicalize” function calls with named arguments and arguments
in unusual order, *mathml* provides an auxiliary R function
`canonical(f, drop)`

that reorders the argument list of calls
to known R functions and, if `drop=TRUE`

(which is the
default), also removes the names of the arguments.

```
<- quote(integrate(lower=0L, upper=2L*pi, sin))
term canonical(term)
```

`## integrate(sin, 0L, 2L * pi)`

`math(canonical(term))`

$\underset{0}{\overset{2\pi}{\int}}\mathrm{sin}x\phantom{\rule{thinmathspace}{0ex}}dx$

This function can be used to feed mixtures of partially named and
positional arguments into the renderer. For details, see the R function
`match.call()`

.

Of course, *mathml* also supports matrices and vectors.

```
<- 1:3
v math(call("t", v))
```

${\left(1\phantom{\rule{thinmathspace}{0ex}}2\phantom{\rule{thinmathspace}{0ex}}3\right)}^{\text{T}}$

```
<- matrix(data=11:16, nrow=2, ncol=3)
A <- matrix(data=21:26, nrow=2, ncol=3)
B <- call("+", A, B)
term math(term)
```

$\left(\begin{array}{lll}11& 13& 15\\ 12& 14& 16\end{array}\right)+\left(\begin{array}{lll}21& 23& 25\\ 22& 24& 26\end{array}\right)$

Note that the seemingly more convenient
`term <- quote(A + B)`

yields \(A + B\) in the output—instead of the
desired matrix representation. This behavior is expected because
quotation of R calls also quote the components of the call (here,
*A* and *B*).

In typical R functions, variable names are typically longer than just single letters, which may yield unsatisfactory results in the mathematical output.

```
hook(successes, k)
hook(quote(Ntotal), quote(N), quote=FALSE)
hook(prob, pi)
<- quote(dbinom(successes, Ntotal, prob))
term math(term)
```

${P}_{\text{Bi}}\left(X=k;N,\pi \right)$

To improve the situation, *mathml* provides a simple hook that
can be used to replace elements (e.g., verbose variable names) of the
code by concise mathematical symbols, as illustrated in the example. To
simplify notation, the `quote`

flag of `hook()`

defaults to TRUE, and `hook()`

uses non-standard evaluation
to unpack its arguments. If quote is FALSE, as shown above, the user has
to provide the quoted expressions. Care should be taken to avoid
recursive hooks such as `hook(s, s["A"])`

that endlessly
replace the \(s\) from \(s_{\mathrm{A}}\) as in \(s_{\mathrm{A}_{\mathrm{A}_{\mathrm{A}\cdots}}}\).

The hooks can also be used for more complex elements such as R calls, with dotted symbols representing Prolog variables.

```
hook(pbinom(.K, .N, .P), sum_over(dbinom(i, .N, .P), i=0L, .K))
math(term)
```

${P}_{\text{Bi}}\left(X=k;N,\pi \right)$

We consider the \(t\)-statistic for
independent samples with equal variance. To avoid clutter in the
equation, the pooled variance \(s^2_{\mathrm{pool}}\) is abbreviated, and a
comment is given with the expression for \(s^2_{\mathrm{pool}}\). For this purpose,
*mathml* provides a function
`denote(abbr, expr, info)`

, with `expr`

actually
being evaluated, `abbr`

being rendered, plus a comment of the
form “with `expr`

denoting `info`

”.

```
hook(m_A, mean(X)["A"]) ; hook(s2_A, s["A"]^2L) ;
hook(n_A, n["A"])
hook(m_B, mean(X)["B"]) ; hook(s2_B, s["B"]^2L)
hook(n_B, n["B"]) ; hook(s2_p, s["pool"]^2L)
<- quote(t <- dfrac(m_A - m_B,
term sqrt(denote(s2_p, frac((n_A - 1L)*s2_A + (n_B - 1L)*s2_B, n_A + n_B - 2L),
"the pooled variance.") * (frac(1L, n_A) + frac(1L, n_B)))))
math(term)
```

$t={\displaystyle \frac{{\overline{X}}_{\text{A}}-{\overline{X}}_{\text{B}}}{\sqrt{{s}_{\text{pool}}^{2}\cdot \left(\frac{1}{{n}_{\text{A}}}+\frac{1}{{n}_{\text{B}}}\right)}}}$, with ${s}_{\text{pool}}^{2}=\frac{\left({n}_{\text{A}}-1\right)\cdot {s}_{\text{A}}^{2}+\left({n}_{\text{B}}-1\right)\cdot {s}_{\text{B}}^{2}}{{n}_{\text{A}}+{n}_{\text{B}}-2}$ denoting the pooled variance.

The term is evaluated below. `print()`

is needed because
the return value of an assignment of the form
`t <- dfrac(...)`

is not visible in R.

```
<- 1.5; s2_A <- 2.4^2; n_A <- 27; m_B <- 3.9; s2_B <- 2.8^2; n_B <- 20
m_A print(eval(term))
```

`## [1] -3.157427`

Consider an educational scenario in which we want to highlight a certain element of a term, for example, that a student has forgotten to subtract the null hypothesis in a \(t\)-ratio:

```
<- quote(dfrac(omit_right(mean(D) - mu[0L]), s / sqrt(N)))
t math(t, flags=list(error="highlight"))
```

$\frac{\overline{D}\phantom{\rule{thinmathspace}{0ex}}\overline{)-\phantom{\rule{thinmathspace}{0ex}}{\mu}_{0}}}{s/\sqrt{N}}$

`math(t, flags=list(error="fix"))`

$\frac{\overline{D}\phantom{\rule{thinmathspace}{0ex}}\overline{)-\phantom{\rule{thinmathspace}{0ex}}{\mu}_{0}}}{s/\sqrt{N}}$

The R function `omit_right(a + b)`

uses non-standard
evaluation techniques (e.g., Wickham 2019)
to return only the left part an operation, and cancels the right part.
This may not always be desired, for example, when illustrating how to
fix the mistake.

For this purpose, the functions `mathml()`

or
`mathjax()`

have an optional argument `flags`

which is a list with named elements. In this example, we use this
argument to tell *mathml* how to render such erroneous
expressions using the flag `error`

which is one of asis,
highlight, fix, or ignore. For more examples, see Table 3.

Operation | error = asis | highlight | fix | ignore |
---|---|---|---|---|

omit_left(a + b) | $b$ | $\overline{)a\phantom{\rule{thinmathspace}{0ex}}+}\phantom{\rule{thinmathspace}{0ex}}b$ | $\overline{)a\phantom{\rule{thinmathspace}{0ex}}+}\phantom{\rule{thinmathspace}{0ex}}b$ | $a+b$ |

omit_right(a + b) | $a$ | $a\phantom{\rule{thinmathspace}{0ex}}\overline{)+\phantom{\rule{thinmathspace}{0ex}}b}$ | $a\phantom{\rule{thinmathspace}{0ex}}\overline{)+\phantom{\rule{thinmathspace}{0ex}}b}$ | $a+b$ |

list(quote(a), quote(omit(b))) | $a\phantom{\rule{thinmathspace}{0ex}}\text{}$ | $a\phantom{\rule{thinmathspace}{0ex}}\overline{)b}$ | $a\phantom{\rule{thinmathspace}{0ex}}\overline{)b}$ | $a\phantom{\rule{thinmathspace}{0ex}}b$ |

add_left(a + b) | $a+b$ | $\overline{)a\phantom{\rule{thinmathspace}{0ex}}+}\phantom{\rule{thinmathspace}{0ex}}b$ | $\overline{)a\phantom{\rule{thinmathspace}{0ex}}+}\phantom{\rule{thinmathspace}{0ex}}b$ | $b$ |

add_right(a + b) | $a+b$ | $a\phantom{\rule{thinmathspace}{0ex}}\overline{)+\phantom{\rule{thinmathspace}{0ex}}b}$ | $a\phantom{\rule{thinmathspace}{0ex}}\overline{)+\phantom{\rule{thinmathspace}{0ex}}b}$ | $a$ |

list(quote(a), quote(add(b))) | $a\phantom{\rule{thinmathspace}{0ex}}b$ | $a\phantom{\rule{thinmathspace}{0ex}}\overline{)b}$ | $a\phantom{\rule{thinmathspace}{0ex}}\overline{)b}$ | $a\phantom{\rule{thinmathspace}{0ex}}\text{}$ |

instead(a, b) + c | $a+c$ | $\underset{\text{instead}\phantom{\rule{thinmathspace}{0ex}}\text{of}\phantom{\rule{thinmathspace}{0ex}}b}{\underset{\u23df}{a}}+c$ | $\overline{)b}+c$ | $b+c$ |

Further customization requires the assertion of new Prolog rules
`math/2`

, `ml/3`

, `jax/3`

, as shown in
the Appendix.

This package allows R to render its terms in pretty mathematical equations. It extends the current features of R and existing packages for displaying mathematical formulas in R (Murrell and Ihaka 2000), but most importantly, bridges the gap between computational needs, presentation of results, and their reproducibility. The package supports both MathML and LaTeX/MathJax for use in R Markdown documents, presentations and Shiny App webpages.

Researchers or teachers can already use R Markdown to conduct analyses and show results, and smoothes this process and allows for integrated calculations and output. As shown in the case study of the previous section, can help to improve data analyses and statistical reports from an aesthetical perspective, as well as regarding reproducibility of research.

Furthermore, the package may also allow for a better detection of possible mistakes in R programs. Similar to most programming languages (Green 1977), R code is notoriously hard to read, and the poor legibility of the language is one of the main sources of mistakes. For illustration, we consider again Equation 10 in Schwarz (1994).

```
hook(mu_A, mu["A"])
hook(mu_B, mu["B"])
hook(sigma_A, sigma["A"])
hook(sigma_B, sigma["B"])
<- function(tau)
f1 dfrac(c, mu_A) + (dfrac(1L, mu_A) - dfrac(1L, mu_A + mu_B) *
{ *tau - c) * pnorm(dfrac(c - mu_A*tau, sqrt(sigma_A^2L*tau)))
((mu_A- (mu_A*tau + c) * exp(dfrac(2L*mu_A*tau, sigma_A^2L))
* pnorm(dfrac(-c - mu_A*tau, sqrt(sigma_A^2L*tau)))))
}
math(f1)
```

$\frac{c}{{\mu}_{\text{A}}}}+\left\{{\displaystyle \frac{1}{{\mu}_{\text{A}}}}-{\displaystyle \frac{1}{{\mu}_{\text{A}}+{\mu}_{\text{B}}}}\cdot \left[\left({\mu}_{\text{A}}\tau -c\right)\cdot \Phi \left({\displaystyle \frac{c-{\mu}_{\text{A}}\tau}{\sqrt{{\sigma}_{\text{A}}^{2}\tau}}}\right)-\left({\mu}_{\text{A}}\tau +c\right)\cdot \mathrm{exp}\left({\displaystyle \frac{2{\mu}_{\text{A}}\tau}{{\sigma}_{\text{A}}^{2}}}\right)\cdot \Phi \left({\displaystyle \frac{-c-{\mu}_{\text{A}}\tau}{\sqrt{{\sigma}_{\text{A}}^{2}\tau}}}\right)\right]\right\$

The first version has a wrong parenthesis, which is barely visible in the code, whereas in the mathematical representation, the wrong curly brace is immediately obvious (the correct version is shown below for comparison).

```
<- function(tau)
f2 dfrac(c, mu_A) + (dfrac(1L, mu_A) - dfrac(1L, mu_A + mu_B)) *
{ *tau - c) * pnorm(dfrac(c - mu_A*tau, sqrt(sigma_A^2L*tau)))
((mu_A- (mu_A*tau + c) * exp(dfrac(2L*mu_A*tau, sigma_A^2L))
* pnorm(dfrac(-c - mu_A*tau, sqrt(sigma_A^2L*tau))))
}
math(f2)
```

$\frac{c}{{\mu}_{\text{A}}}}+\left({\displaystyle \frac{1}{{\mu}_{\text{A}}}}-{\displaystyle \frac{1}{{\mu}_{\text{A}}+{\mu}_{\text{B}}}}\right)\cdot \left[\left({\mu}_{\text{A}}\tau -c\right)\cdot \Phi \left({\displaystyle \frac{c-{\mu}_{\text{A}}\tau}{\sqrt{{\sigma}_{\text{A}}^{2}\tau}}}\right)-\left({\mu}_{\text{A}}\tau +c\right)\cdot \mathrm{exp}\left({\displaystyle \frac{2{\mu}_{\text{A}}\tau}{{\sigma}_{\text{A}}^{2}}}\right)\cdot \Phi \left({\displaystyle \frac{-c-{\mu}_{\text{A}}\tau}{\sqrt{{\sigma}_{\text{A}}^{2}\tau}}}\right)\right]$

As the reader may know from own experience, missed parentheses are frequent causes of wrong results and errors that are hard to locate in programming code. This particular example shows that mathematical rendering can help to substantially reduce the amount of careless errors in programming.

One limitation of the package is the lack of a convenient way to
insert line breaks. This is mostly due to lacking support by MathML and
LaTeX renderers. For example, in its current stage, the LaTeX package
`breqn`

(Robertson et al. 2021)
is mostly a proof of concept. Moreover, is one-way road, that is, it is
not possible to translate from LaTeX or HTML back to R (see Capretto 2023, for an example).

The package is available for R version 4.2 and later, and can be
easily installed using the usual
`install.packages("mathml")`

. At its present stage, it
supports output in HTML, LaTeX, and Microsoft Word (via pandoc, MacFarlane 2022). The source code
of the package is found at https://github.com/mgondan/mathml.

For convenience, the translation of R expressions to mathematical
output is achieved through a Prolog interpreter from R package
*rolog* (Gondan 2022). Prolog is a
classical logic programming language with many applications in expert
systems, computer linguistics and symbolic artificial intelligence. The
main strength of Prolog is its concise representation of facts and rules
for knowledge and grammar, as well as its efficient built-in search
engine for closed world domains. As it is well-known, R is a statistical
programming language for data analysis and statistical modeling. Whereas
Prolog is weak in statistical computation, but strong in symbolic
manipulation, the converse may be said for the R language. The
*rolog* package bridges this gap by providing an interface to a
SWI-Prolog distribution (Wielemaker et al.
2012) in R. The communication between the two systems is mainly
in the form of queries from R to Prolog, but two predicates allow
Prolog to ring back and evaluate terms in R.

The proper term for a Prolog “function” is predicate, and it is
typically written with name and arity (i.e., number of arguments),
separated by a forward slash. Thus, at the Prolog end, a predicate
`math/2`

translates the call `pbinom(K, N, Pi)`

into a “function” `fn/2`

with the name `P_Bi`

, one
argument `X =< K`

, and the two parameters `N`

and `Pi`

.

```
K, N, Pi), M)
math(pbinom(M = fn(subscript('P', "Bi"), (['X' =< K] ; [N, Pi])). =>
```

`math/2`

operates like a “macro” that translates one
mathematical element (here, `pbinom(K, N, Pi)`

) to a
different mathematical element, namely
`fn(Name, (Args ; Pars))`

. The low-level predicate
`ml/3`

is used to convert these basic elements to MathML.

```
Flags, fn(Name, (Args ; Pars)), M)
ml(Flags, Name, N),
=> ml(Flags, paren(list(op(;), [list(op(','), Args), list(op(','), Pars)])), X),
ml(M = mrow([N, mo(&(af)), X]).
```

The relevant rule for `ml/3`

builds the MathML entity
`mrow([N, mo(&(af)), X])`

, with `N`

representing the name of the function and `X`

its arguments
and parameters, enclosed in parentheses. A corresponding rule
`jax/3`

does the same for MathJax/LaTeX. A list of flags can
be used for context-sensitive translation (see, e.g., the section on
errors above).

Several ways exist for translating new R terms to their mathematical representation. We have already seen above how to use “hooks” to translate long variable names from R to compact mathematical signs, as well as functions such as cumulative probabilities \(P(X \le k)\) to different representations like \(\sum_{i=0}^k P(X = i)\). Obviously, the hooks require that there already exists a rule to translate the target representation into MathML and MathJax.

In this appendix we describe a few more ways to extend the set of
translations according to a user’s needs. As stated in the background
section, the Prolog end provides two classes of rules for translation,
macros `math/2,3,4`

mirroring the R hooks mentioned above,
and the low-level predicates `ml/3`

and `jax/3`

that create proper MathML and LateX terms.

To render the model equation of a linear model such as
`lm(EOT ~ T0 + Therapy, data=d)`

in mathematical form, it is
sufficient to map the `Formula`

in
`lm(Formula, Data)`

to its respective equation (see also Anderson, Heiss, and Sumners 2023).
This can in two ways, using either the hooks described above, or a new
`math/2`

macro at the Prolog end.

`hook(lm(.Formula, .Data), .Formula)`

The hook is simple, but is a bit limited because only R’s tilde-form of linear models is shown, and it only works for a call with exactly two arguments.

Below is an example how to build a linear equation of the form \(Y = b_0 + b_1X_1 + ...\) using the Prolog
macros from *mathml*.

```
LM, M) :-
math_hook(compound(LM),
LM =.. [lm, ~(Y, Sum) | _Tail],
Sum, Predictors),
summands(, X) * X, member(X, Predictors), Terms),
findall(subscript(bModel, Terms),
summands(M = (Y == subscript(b, 0) + Model + epsilon).
```

The predicate `summands/2`

unpacks an expression
`A + B + C`

to a list `[C, B, A]`

and vice-versa
(see the file `lm.pl`

for details).

```
::consult(system.file(file.path("pl", "lm.pl"), package="mathml"))
rolog
<- quote(lm(EOT ~ T0 + Therapy, data=d, na.action=na.fail))
term math(term)
```

$\mathrm{EOT}={b}_{0}+{b}_{\mathrm{T0}}\mathrm{T0}+{b}_{\mathrm{Therapy}}\mathrm{Therapy}+\epsilon $

Base R does not provide a function like `cuberoot(x)`

or
`nthroot(x, n)`

, and the present package does not support the
respective representation. To obtain a cube root, a programmer would
typically type `x^(1/3)`

or better `x^{1/3}`

(see
the practice section why the curly brace is preferred in an exponent),
resulting in \(x^{1/3}\) which may
still not match everyone’s taste. Here describe the steps needed to
represent the \(n\)-th root as \(\sqrt[n]x\).

We assume that `nthroot(x, n)`

is available in the current
namespace (manually defined, or from R package
*pracma*, Borchers 2022), so that the names of the
arguments and their order are accessible to `canonical()`

if
needed. As we can see below, *mathml* uses a default
representation `name(arguments)`

for such unknown
functions.

```
<- function(x, n)
nthroot ^{1L/n}
x
<- canonical(quote(nthroot(n=3L, 2L)))
term math(term)
```

$\mathrm{nthroot}\left(2,3\right)$

A proper MathML term is obtained by `mlx/3`

(the x in mlx
indicates that it is an extension and is prioritized over the default
ml/3 rules). `mlx/3`

recursively invokes `ml/3`

for translating the function arguments *X* and *N*, and
then constructs the correct MathML entity
`<mroot>...</mroot>`

.

```
X, N), M, Flags) :-
mlx(nthroot(X, X1, Flags),
ml(N, N1, Flags),
ml(M = mroot([X1, N1]).
```

The explicit unification `M = ...`

in the last line serves
to avoid clutter in the head of `mlx/3`

. The Prolog file
`nthroot.pl`

also includes the respective rule for LateX and
can be consulted from the package folder via the underlying package
*rolog*.

```
::consult(system.file(file.path("pl", "nthroot.pl"), package="mathml"))
rolog
<- quote(nthroot(a * (b + c), 3L)^2L)
term math(term)
```

${\left[\sqrt[3]{a\cdot \left(b+c\right)}\right]}^{2}$

```
<- quote(a^(1L/3L) + a^{1L/3L} + a^(1.0/3L))
term math(term)
```

$\sqrt[3]{a}+{a}^{1/3}+{a}^{\left(1.00/3\right)}$

The file `nthroot.pl`

includes three more statements
`precx/3`

and `parenx/3`

, as well as a
`math_hook/2`

macro. The first sets the operator precedence
of the cubic root above the power, thereby putting a parentheses around
nthroot in \((\sqrt[3]{\ldots})^2\).
The second tells the system to increase the counter of the parentheses
below the root, such that the outer parenthesis becomes a square
bracket.

The last rule maps powers like `a^(1L/3L)`

to
`nthroot/3`

, as shown in the first summand. Of course,
*mathml* is not a proper computer algebra system. As is
illustrated by the other terms in the sum, such macros are limited to
purely syntactical matching, and terms like `a^{1L/3L}`

with
the curly brace or `a^(1.0/3L)`

with a floating point number
in the numerator are not detected.

Supported by the Erasmus+ program of the European Commission (2019-1-EE01-KA203-051708).

Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier
Luraschi, Kevin Ushey, Aron Atkins, et al. 2023. *Rmarkdown: Dynamic
Documents for r*. https://github.com/rstudio/rmarkdown.

Anderson, Daniel, Andrew Heiss, and Jay Sumners. 2023. *Equatiomatic:
Transform Models into ’LaTeX’ Equations*.

Borchers, Hans W. 2022. *Pracma: Practical Numerical Math
Functions*. https://CRAN.R-project.org/package=pracma.

Capretto, Tomas. 2023. *Latex2r: Translate Latex Formulas to r
Code*. https://github.com/tomicapretto/latex2r.

Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke,
Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara
Borges. 2022. *Shiny: Web Application Framework for r*. https://CRAN.R-project.org/package=shiny.

Gondan, Matthias. 2022. *Rolog: Query ’SWI’-’Prolog’ from r*. https://github.com/mgondan/rolog.

Green, T. R. G. 1977. “Conditional Program Statements and Their
Comprehensibility to Professional Programmers.” *Journal of
Occupational Psychology* 50: 93–109.

MacFarlane, John. 2022. *Pandoc: A Universal Document Converter*.

Murrell, Paul, and Ross Ihaka. 2000. “An Approach to Providing
Mathematical Annotation in Plots.” *Journal of Computational
and Graphical Statistics* 9: 582–99.

R Core Team. 2022. *R: A Language and Environment for Statistical
Computing*. Vienna, Austria: R Foundation for Statistical Computing.
https://www.R-project.org/.

Robertson, Will, Joseph Wright, Frank Mittelbach, and Ulrike Fischer.
2021. *Breqn: Automatic Line Breaking of Displayed Equations*. https://www.ctan.org/pkg/breqn.

Sarkar, Deepayan, and Kurt Hornik. 2022. *Enhancements to
HTML Documentation*. https://blog.r-project.org/2022/04/08/enhancements-to-html-documentation/index.html.

Schwarz, Wolf. 1994. “Diffusion, Superposition, and the
Redundant-Targets Effect.” *Journal of Mathematical
Psychology* 38: 504–20.

Viechtbauer, Wolfgang. 2022. *Mathjaxr: Using ’Mathjax’ in Rd
Files*. https://CRAN.R-project.org/package=mathjaxr.

Wickham, H. 2019. *Advanced R*. Cambridge: Chapman
and Hall/CRC.

Wielemaker, Jan, Tom Schrijvers, Markus Triska, and Torbjörn Lager.
2012. “SWI-Prolog.” *Theory and Practice of
Logic Programming* 12 (1-2): 67–96.

Xie, Y., C. Dervieux, and E. Riederer. 2020. *R Markdown
Cookbook*. Cambridge: Chapman and Hall/CRC.

Xie, Yihui. 2023. *Knitr: A General-Purpose Package for Dynamic
Report Generation in r*. https://yihui.org/knitr/.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. *R Markdown:
The Definitive Guide*. Boca Raton, Florida: Chapman and Hall/CRC. https://bookdown.org/yihui/rmarkdown.