Matthias Gondan
Universität Innsbruck
Department of Psychology
Innrain 9
A-6020 Innsbruck
Matthias.Gondan-Rochon@uibk.ac.at
Prolog is a classical logic programming language with many
applications in expert systems, computer linguistics and traditional,
that is, symbolic artificial intelligence. The main strength of Prolog
is its concise representation of facts and rules for the representation
of knowledge and grammar, as well as its efficient built-in search
engine for closed world domains. R is a statistical programming language
for data analysis and statistical modeling which is widely used in
academia and industry. Besides the core library, a lot of packages have
been developed for all kinds of statistical problems, including
statistics-based artificial intelligence tools such as neural networks
for machine learning and deep learning. Whereas Prolog is weak in
statistical computation, but strong in symbolic manipulation, the
converse may be said for the R language. SWI-Prolog is a widely used
Prolog system that offers a wide range of extensions for real world
applications, and there already exist two Prolog “packs” to invoke R
(rserve-client
, real
) from SWI-Prolog. Given
the large user community of R, there may also be a need for a connection
in the reverse direction that allows invoking Prolog queries in R
computations. The R package rolog
connects to the
SWI-Prolog system, thus enabling deterministic and non-deterministic
queries to the Prolog interpreter. Usage of rolog
is
illustrated by a few examples.
Statistics; Logic Programming; Artificial Intelligence; R; Prolog
The R (R Core Team 2021) programming language and environment is a widely used open source software for statistical data analysis. The basic R is a functional language with lots of support for storage and manipulation of different data types, and a strong emphasis on operations involving vectors and arrays. Moreover, a huge number of packages (e.g., CRAN, https://cran.r-project.org/) have been contributed that cover problems from areas as diverse as bioinformatics, machine learning, specialized statistical methods, web programming and connections to other programming languages.
An interface to Prolog is lacking so far. Based on earlier work by Kowalski, the logic programming language Prolog was invented in the 1970ies by Colmerauer and Roussel (Kowalski 1988), mostly for the purpose of natural language processing. Since then, logic programming has become an important driving force in research on artificial intelligence, natural language processing, program analysis, knowledge representation and theorem proving (Shoham 1994; Lally and Fodor 2011; Carro 2004; Hsiang and Srivas 1987). SWI-Prolog (Wielemaker et al. 2012) is an open-source implementation of Prolog that mainly targets developers of applications, with many users in academia, research and industry. SWI-Prolog includes a large number of libraries for “the real world”, for example, a web server, encryption, interfaces to C/C++ and other programming languages, as well as a development environment and debugger. In addition, pluggable extensions (so-called packs) are available for specific tasks to enhance its capabilities.
Unlike R, Prolog is a declarative programming language consisting of
facts and rules that define relations, for example, in a problem space
(Newell and Simon 1972). Prolog’s major
strength is its built-in query-driven search engine that efficiently
deals with complex structured data, with the data not necessarily being
numerical. In fact, Prolog only provides a basic collection of
arithmetic calculations via a purely functional interface
(is/2
). More complex calculations such as matrix algebra,
statistical models or machine learning need help from other systems, for
example, from R.
Angelopoulos et al. (2013) summarize work at the intersection of symbolic knowledge representation and statistical inference, especially in the area of model fits (EM algorithms, MCMC, Sato and Kameya 2013; Angelopoulos and Cussens 2008) and stochastic logic programs (Cussens 2000; Kimmig et al. 2011). One of the major strengths of logic programming is handling constraints; and a number of systems for constraint satisfaction tools have been developed (constraint logic programming on booleans, finite domains, reals, and intervals) for that purpose (e.g., Frühwirth 1998; Triska 2018). Some constraint handlers exist in R (see the CRAN task view for optimization problems), but more of them would be available via a bridge between R and Prolog.
Earlier approaches to connect Prolog and R have been published as
SWI-Prolog packs (real, rserve_client,
Angelopoulos et al. 2013; Wielemaker 2021b) and as a YAP module
(YapR, Azevedo 2011). Whereas
real
establishes a direct link to an embedded instance of
R, rserve-client
communicates with a local or remote R
service (Urbanek 2021). The former
approach emphasizes speed, the latter might be preferred from a security
perspective, especially in systems such as SWISH (Wielemaker, Lager, and Riguzzi 2015) that
accept only a set of sandboxed commands for Prolog, but do not impose
restrictions on R. A common feature of the two packages is that they
provide an interface for R calls from Prolog, but not the other way
round, that is, querying Prolog from R is not possible, so far.
The present package fills this gap through Prolog queries in R
scripts, for example, to perform efficient symbolic computations,
searches in complex graphs, parsing natural language and definite clause
grammars. In addition, two Prolog predicates are provided that enable
Prolog to ring back to the R system for bidirectional communication.
Similar to real
, tight communication between the two
systems is established by linking to a shared library that embeds the
current SWI-Prolog runtime. The exchange of data is facilitated by the
C++ interfaces of the two languages (Eddelbuettel
and Balamuta 2018; Wielemaker 2021a). A less tight connection
might be established using the recently developed machine query
interface (Zinda 2021) that allows
socket-based communication between foreign languages and SWI-Prolog
(and, in fact, the MQI
documentation includes an example in
which R is called).
A bidirectional bridge between R and Prolog might overcome the
limitations of both languages, thereby combining the extensive numerical
and statistical power of the R system with Prolog’s skills in the
representation of knowledge and reasoning. In addition to the useful
little tools shown in the examples below, rolog
can
therefore contribute to progress at the intersection of traditional
artificial intelligence and contemporary statistical programming.
The next section presents the interface of rolog
in
detail. Section 3 presents possible extensions of the package at both
ends, in R and Prolog. Section 4 is a list of illustrative examples that
offer useful extensions to the R system. Conclusions and further
perspectives are summarized in Section 5.
rolog
has a rather minimalistic syntax, providing only
the basic ingredients to establish communication with the SWI-Prolog
runtime. Ways to extend the interface are described in Section 3.
After installation with install.packages("rolog")
, the
package is loaded in the standard way.
library(rolog)
#> Found SWI-Prolog at /usr/local/lib/swipl
#> Welcome to SWI-Prolog (threaded, 64 bits, version 9.3.7-48-g5748256ce-DIRTY)
We can see a short message telling the user which SWI-Prolog was
found. The package searches for SWI-Prolog based on the environment
variable SWI_HOME_DIR
, the registry (Windows only), an
executable swipl
in the PATH
, and if
everything fails, R package rswipl
(Gondan 2023). The message can be silenced by
the usual option quietly=TRUE
of the library
command.
Most of the work is done using the three R functions
query
, submit
, and clear
. The R
program in Listing 1 illustrates a query to Prolog’s
member/2
using rolog
’s syntax rules.
# member(1, [1, 2.0, a, "b", X, true])
query(call("member", 1L, list(1L, 2.0, quote(a), "b", expression(X), TRUE)))
#> [1] TRUE
#> attr(,"query")
#> [1] "member(1, [1, 2.0, a, b, X, true])"
# returns an empty list, stating that member(1, [1 | _]) is satisfied
submit()
#> list()
# returns a list with constraints, stating that the query is also satisfied
# if the fifth element of the list, X, is 1
submit()
#> $X
#> [1] 1
# close the query
clear()
member/2
predicate.
query
. The function query(call, options)
is
used to create a Prolog query (without invoking it yet). The first
argument is a regular R call that is created using R’s function
call(name, ...)
. This call represents the Prolog query that
will be submitted in the later course. The creation of such predicates
and Prolog terms is described below and can become quite contrived (see
the examples in Section 4). The second argument, options
,
may be used for ad hoc modifications of the translation between R and
Prolog, see the section below. The function returns TRUE
on
success. Note that query
does not check if a Prolog
predicate corresponding to call
actually exists (see
submit()
below). Only a single query can be opened at a
given time. If a new query Q is created while another query
R is still open, a warning is shown and R is
closed.
submit
. Once a query has been created, it can be
submitted using submit()
. If the query fails, the return
value is FALSE
. If the query succeeds, a list of
constraints is returned, with bindings for the variables that satisfy
the query. Repeated calls to submit are possible, returning the
different solutions of a query (until it eventually fails). The
distinction between the different types of return values for success and
failure (list vs. FALSE
) is facilitated by the R function
isFALSE(x)
.
clear
. Closes the query. The name of the function was
chosen to avoid name clashes with R’s own built-in function
close
. The function returns an invisible TRUE
,
even if there is no open query.
Three more functions consult
, once
, and
findall
are provided for convenience.
consult
. In most applications, a number of Prolog facts
and rules will be loaded into the system. To facilitate this recurrent
task, the Prolog directive consult/1
has been mirrored into
R, consult(filename)
, with filename
being a
string or a vector of strings if multiple files are to be consulted. The
function returns TRUE
on success; in case of problems, it
returns FALSE
and an error message is shown.
once
and findall
. The function
once(call, options)
is a convenience function that acts as
a shortcut for query(call, options)
, submit()
,
and clear()
. Similarly, findall(call, options)
abbreviates the commands query(call, options)
, repetition
of submit()
until failure, and clear()
,
returning a list collecting the return values of the individual
submissions.
Table 1 summarizes the rules for the translation from R objects to Prolog. Most rules work in both directions, but a few exceptions exist.
R | Prolog | Note/Alternatives |
---|---|---|
expression(X) |
Variable X | not necessarily uppercase |
as.symbol(abc) |
Atom abc | as.name , quote |
TRUE , FALSE ,
NULL |
Atoms true, false, null | |
"abc" |
String "abc" | |
3L |
Integer 3 | |
3 |
Float 3.0 | |
call("term", 1L, 2L) |
term(1, 2) | |
list(1L, 2L, 3L) |
List [1, 2, 3] | |
list(a=1, b=2, c=3) |
List [a-1, b-2, c-3] | |
c(1, 2, 3, Inf) |
##(1.0, 2.0, 3.0, 1.0Inf) | vectors of length > 1 |
c(1L, 2L, 3L) or 1:3 |
'%%'(1, 2, 3) | |
c("a", "b", "c") |
$$("a", "b", "c") | |
c(TRUE, FALSE, NA) |
!!(true, false, na) | |
sin |
function(x) :- sin(x) | primitive function |
function(x) sin(x) |
function(x) :- sin(x) | self-written function |
matrix(1:4, nrow=2) |
'%%%'('%%'(1, 3), …) | see also ###, $$$, !!! |
In R, the basic elements such as integers, floating point numbers, character strings, and logicals are vectorized, and scalar entities are treated like vectors with one element. Conversely, Prolog does not natively support vectors or matrices. The problem is solved in the following way:
##/N
, %%/N
,
$$/N
, and !!/N
for floating point numbers,
integers, strings and logicals, respectively.###/R
,
%%%/R
, $$$/R
, and !!!/R
with the
respective row vectors as arguments.In the reverse direction, Prolog terms like ##/N
are
translated back to R vectors of length N, including the terms
##/0
and ##/1
that map to R vectors of length
0 and 1, respectively. Translation of a polymorphic Prolog term such as
##(a, 1.5)
to R will fail, since rolog
expects
the arguments to be numeric.
If a Prolog object cannot be translated to R (e.g., a cyclic term),
an error is raised. If an R object that lacks a suitable representation
in Prolog (e.g., S4 class), a warning is printed and the result is
unified with na
.
To summarize, the rules for translation are not fully symmetrical. A
quick check for symmetry of the representation is obtained by a query to
=/2
or even r_eval/2
(see also below,
subsection Prolog interface):
<- call("=", expression(X), c(1, 2, NA, NaN, Inf))
Q once(Q, options=list(portray=TRUE))
#> $X
#> [1] 1 2 NA NaN Inf
#>
#> attr(,"query")
#> [1] "X= ##(1.0, 2.0, na, 1.5NaN, 1.0Inf)"
<- call("r_eval", c(1, 2, NA, NaN, Inf), expression(X))
Q once(Q)
#> $X
#> [1] 1 2 NA NaN Inf
The optional argument env
to query, once and findall
allows to raise the query (and, as a consequence, r_eval/1,2 in a
specific environment.
A few package-specific options have been defined to allow some fine-tuning of the rules for translation between R and Prolog.
##
)###
)%%
/%%%
)!!
/!!!
)$$
/$$$
). The single dollar cannot be used
because it is the list operator in R.TRUE
(default), R vectors
of length 1 are translated to scalars in Prolog. If FALSE
(rarely used), R vectors are always translated to ##/N
, or
%%/N
, !!/N
, $$/N
, even if they
have only one element.TRUE
(default in
query
), the result of query
, once
and findall
includes an attribute with a text
representation of the query in Prolog.rolog
’s own preproc
function that maps R’s
x <= y
to Prolog’s x =< y
and
!=
to \=
. Preprocessing can be turned off by
assigning the R function dontCheck
to the preproc
option.rolog
’s own postproc
function that reverses
the mapping from preproc
.The command rolog_options()
returns a list with all the
options. The options can be globally modified with
options()
or in the optional argument of
query
, once
, and findall
.
options(rolog.intvec="iv")
<- call("member", expression(X), list(c(1L, 2L), c(3.5, 4.5)))
Q query(Q, options=list(realvec="rv"))
#> [1] TRUE
#> attr(,"query")
#> [1] "member(X, [iv(1, 2), rv(3.5, 4.5)])"
submit()
#> $X
#> [1] 1 2
clear()
rolog
offers some basic support to call R from Prolog,
that is, connecting the two systems in the reverse direction. Two
predicates can be used for this purpose, r_eval(Call)
and
r_eval(Function, Result)
. The former just invokes R with
the command Call
(ignoring the result); the latter
evaluates Function
and unifies the result with
Result
. Note that proper quoting of R functions is needed
at the Prolog end, especially with R functions that start with uppercase
letters and/or contain a dot in their name (see Section 4).
Package rolog
has limited support for exception
handling. If Prolog raises an exception, the error string is forwarded
to R using the stop
function. The examples below illustrate
this by querying an undefined Prolog predicate.
<- call("membr", expression(X), list(1, 2, 3))
Q query(Q)
#> [1] TRUE
#> attr(,"query")
#> [1] "membr(X, [1.0, 2.0, 3.0])"
try(submit())
#> Warning in .submit():
#> error(existence_error(procedure,membr/2),context(system:call/1,_))
#> [1] FALSE
clear()
See Section 4 for another example with an error resulting from a
malformed query to r_eval/2
.
R is a functional language, whereas Prolog is declarative. Obviously, there cannot be a perfect one-to-one correspondence between the syntactic components of two programming languages that follow completely different paradigms. Whereas symbols, functions, numbers and character strings are easily mapped between R and Prolog, there are loose ends at both sides. The package is intentionally kept minimalistic, but can easily be extended by convenience functions at both ends, Prolog and R, to facilitate recurrent tasks and/or avoid cumbersome syntax.
In particular, Prolog variables are translated from and to R
expressions (not to be confused with R symbols), and R vectors
of length greater than 1 are translated to the Prolog terms
#/N
, %/N
, !/N
, and
$$/N
, as mentioned above. These rules are, in principle,
arbitrary and can be intercepted at several stages.
as.rolog
)The process is illustrated in Figure 1.
rolog
uses a default preprocessing function
preproc(query)
to map the R operators <=
and !=
to their Prolog counterparts =</2
and \=/2
, respectively.
However, we have seen above that raising even simple everyday Prolog
queries such as member(X, [1, 2, 3, a, b])
require
complicated R expressions like
call("member", expression(X), list(1, 2, 3, quote(a), quote(b)))
.
The R function as.rolog(query)
is meant to simplify this a
bit by translating symbols starting with a dot to Prolog variables, and
calls like ""[1, 2, 3, a, b]
to lists. In the example
below, as.rolog
is added to the queue of preprocessing
functions.
<- 5
a <- quote(member(.X, ""[1, 2, 3, a, (a), 1 <= 2]))
Q once(Q, options=list(preproc=list(as.rolog, preproc), portray=TRUE))
#> $X
#> [1] 1
#>
#> attr(,"query")
#> [1] "member(X, [1.0, 2.0, 3.0, a, 5.0, 1.0=<2.0])"
Note that the name of the variable will still be X
in
the later course, not “dot-X”. As illustrated by the example above,
as.rolog
treats the argument a
as a symbol; to
evaluate the respective variable (i.e., “unquote”), it can be put in
parentheses.
Preprocessing can be turned off by setting the option
preproc
to the identity function
dontCheck
.
Section 3 includes an example for mathematical rendering of R expressions. In that example, a preprocessing function is used to bring function calls with named arguments to a canonical form which is then handled in Prolog. More sophisticated work with quasi-quotations and unquoting expressions is described in “Advanced R” (Wickham 2019).
In most cases, postprocessing will revert the manipulations during
preprocessing, and the default function postproc(query)
actually translates the Prolog operators =<
and
\=
back to their respective counterparts in R.
Many Prolog programmers are used to operate with atoms, whereas
character strings are the preferred representation of symbolic
information in R. In the example below, a second hook is put in the
queue that converts the result of a query like
member(X, [a, b, c])
to strings.
<- function(x)
stringify
{if(is.symbol(x))
return(as.character(x))
if(is.call(x))
-1] <- lapply(x[-1], FUN=stringify)
x[
if(is.list(x))
<- lapply(x, FUN=stringify)
x
if(is.function(x))
body(x) <- stringify(body(x))
return(x)
}
<- quote(member(.X, ""[a, b, c]))
Q <- findall(Q, options=list(preproc=list(as.rolog, preproc),
R postproc=list(stringify, postproc)))
unlist(R)
#> X X X
#> "a" "b" "c"
In other words, the query is satisfied if X
is either
“a”, or “b”, or “c”.
Recent versions of SWI-Prolog support so-called dictionaries of the
form Tag{Key1:Value1, Key2:Value2, ...}
. The tag is
typically an atom (but can be a variable, as well), the keys are unique
atom or integers; the values can be anything. Suppose we have a Prolog
predicate that does something with dicts, and we would like to query it
from R. The simplest solution is a wrapper in Prolog that translates
key-value pairs
[Key1-Value1, Key2-Value2, ...]
back and forth to
dicts:
Pairs0, Pairs1) :-
do_something_with_pairs(Dict0, my_dict, Pairs0),
dict_pairs(Dict0, Dict1),
do_something_with_dicts(Dict1, my_dict, Pairs1). dict_pairs(
do_something_with_pairs/2
can then be queried from R
using, for example, lists with named elements (see Table 1).
once(call("do_something_with_pairs", list(a=1, b=2), expression(X)))
In the code above, dict_pairs/2
takes the role of both
preproc/2
and postproc/2
in Figure 1. It
illustrates that complicated syntax on the R side can be much simplified
when doing the conversion at the Prolog end. Ways to extend Prolog by
add-ons (“packs”) are shown in the next section.
In this section we present a few usage examples for package
rolog
in increasing complexity. Although the code snippets
are mostly self-explanatory, some familiarity with the Prolog language
is helpful.
Prolog’s typical hello world example is a search through a directed acyclic graph (DAG), for example, a family tree like the one given in Listing 2.
, bob). parent(bob, ann). parent(bob, pat). parent(pat, jim).
parent(pam
X, Z) :-
ancestor(X, Z).
parent(
X, Z) :-
ancestor(X, Y),
parent(Y, Z). ancestor(
Listing 2 is included in the package and is accessed using the
function system.file
. Within Prolog, the normal workflow is
to consult the code with [family]
and then to raise queries
such as ancestor(X, jim)
, which returns, one by one, four
solutions for the variable X. In R, we obtain the following
results:
library(rolog)
consult(system.file(file.path("pl", "family.pl"), package="rolog"))
query(call("ancestor", expression(X), quote(jim)))
#> [1] TRUE
#> attr(,"query")
#> [1] "ancestor(X, jim)"
submit() # solutions for X
#> $X
#> pat
submit() # etc.
#> $X
#> pam
clear() # close the query
As stated above, consult
loads the facts and rules of
Listing 2 into the Prolog database. query
initializes a
query, and the subsequent calls to submit
return the
conditions under which the query succeeds. In this example, the query
succeeds if X
is either pat
, pam
,
or bob
. A query is closed with clear()
, or
automatically if the query fails. If we are interested in just the first
solution, we can use once(Call)
as a shortcut to
query(Call)
, then submit()
, then
clear()
. If we want to collect all solutions of a query
with a finite set of solutions, we can use
findall(Call)
.
As mentioned in Section 2, a simplified syntax is provided by
as.rolog
that accepts quoted expressions with dots
indicating Prolog variables:
<- quote(ancestor(.X, jim))
Q findall(Q, options=list(preproc=as.rolog))
A useful application of DAGs is confounder adjustment in causal
analysis (Greenland, Pearl, and Robins 1999;
Barrett 2021). The Prolog file backdoor.pl
is an
implementation of Greenland et al.’s criteria for the backdoor test for
d-separation in DAGs, with a predicate minimal/3
that searches for minimally sufficient sets of variables for confounder
adjustment on the causal path between exposure and outcome. The nodes
and arrows refer to Figure 12 in Greenland et al.
consult(system.file(file.path("pl", "backdoor.pl"), package="rolog"))
<- function(N) invisible(once(call("assert", call("node", N))))
node node("a"); node("b"); node("c"); node("f"); node("u")
node("e") # exposure
node("d") # outcome
<- function(X, Y) invisible(once(call("assert", call("arrow", X, Y))))
arrow arrow("a", "d"); arrow("a", "f"); arrow("b", "d"); arrow("b", "f")
arrow("c", "d"); arrow("c", "f"); arrow("e", "d"); arrow("f", "e")
arrow("u", "a"); arrow("u", "b"); arrow("u", "c")
<- findall(call("minimal", "e", "d", expression(S)))
R unlist(R)
#> S1 S2 S3 S
#> "a" "b" "c" "f"
The query to minimal/3
returns two minimally sufficient
sets of covariates for confounder adjustment (namely, {a, b, c} and
{f}).
One of the main driving forces of Prolog development was natural language processing (Dahl 1981). Therefore, the next example is an illustration of sentence parsing using so-called definite clause grammars. As Listing 3 shows, rolog can access modules from SWI’s standard library (e.g., “dcg/basics.pl”).
:- use_module(library(dcg/basics)).
NP, VP)) --> np(NP, C), blank, vp(VP, C).
s(s(NP, C) --> pn(NP, C).
np(Det, N), C) --> det(Det, C), blank, n(N, C).
np(np(Det, N, PP), C) --> det(Det, C), blank, n(N, C), blank, pp(PP).
np(np(V, NP), C) --> v(V, C), blank, np(NP, _).
vp(vp(V, NP, PP), C) --> v(V, C), blank, np(NP, _), blank, pp(PP).
vp(vp(P, NP)) --> p(P), blank, np(NP, _).
pp(pp(
, sg) --> `a`.
det(det(a), _) --> `the`.
det(det(the), sg) --> `john`.
pn(pn(john), sg) --> `man`.
n(n(man), pl) --> `men`.
n(n(men), sg) --> `telescope`.
n(n(telescope), sg) --> `sees`.
v(v(sees), pl) --> `see`.
v(v(see)--> `with`.
p(p(with))
% Translate R string to code points and invoke phrase/2
Tree, Sentence) :-
sentence(Sentence, Codes),
string_codes(phrase(s(Tree), Codes).
sentence/2
preprocesses the R
call.
As in the first example, we first consult a little Prolog program
with a minimalistic grammar and lexicon (Listing 3, see also
pl/telescope.pl
), and then raise a query asking for the
syntactic structure of “john sees a man with a telescope”. Closer
inspection of the two results reveals the two possible meanings, “john
sees a man who carries a telescope” versus “john sees a man
through a telescope”. Further Prolog examples of natural
language processing are found in , including the resolution of anaphoric
references and the extraction of semantic meaning.
consult(system.file(file.path("pl", "telescope.pl"), package="rolog"))
<- quote(sentence(.Tree, "john sees a man with a telescope"))
Q unlist(findall(Q, options=list(preproc=as.rolog)))
#> $Tree
#> s(pn(john), vp(v(sees), np(det(a), n(man), pp(p(with), np(det(a),
#> n(telescope))))))
#>
#> $Tree
#> s(pn(john), vp(v(sees), np(det(a), n(man)), pp(p(with), np(det(a),
#> n(telescope)))))
In description of the previous example, we noted in passing that
rolog
can access the built-in libraries of SWI-Prolog
(e.g., by calls to use_module/1
). It is also possible to
extend the installation by add-ons, including add-ons that require
compilation, if the build tools (essentially, RTools under Windows, and
xcode under macOS) are properly configured. This is illustrated below by
the demo add-on environ
(Wielemaker
2012) that collects the current environment variables.
once(call("pack_install", quote(environ), list(quote(interactive(FALSE)))))
once(quote(use_module(library(environ))))
once(call("environ", expression(X)))
The query then unifies X with a list with
Key=Value
terms. The purpose of this example is obviously
not to mimic the built-in function Sys.getenv()
from R, but
to illustrate the installation and usage of Prolog extensions from
within R. In most situations, the user would install the pack from
within Prolog with pack_install(environ).
.
Prolog is homoiconic, that is, code is data. In this example, we make use of Prolog’s ability to match expressions against given patterns and modify these expressions according to a few predefined “buggy rules” (Brown and Burton 1978), inspired by recurrent mistakes in the statistics exams of our students. Consider the \(t\)-statistic for comparing an observed group average to a population mean:
\[ T = \frac{\overline{X} - \mu}{s / \sqrt{N}} \]
Some mistakes may occur in this calculation, for example, omission of the implicit parentheses around the numerator and the denominator when typing the numbers into a calculator, resulting in \(\overline{X} - \frac{\mu}{s} \div \sqrt{N}\), or forgetting the square root around \(N\), or both. Prolog code for the two buggy rules is given in Listing 4.
% Correct steps and mistakes
X, Mu, S, N), frac(X - Mu, S / sqrt(N))).
expert(tratio(X - Mu, S / SQRTN), X - frac(Mu, S) / SQRTN).
buggy(frac(N), N).
buggy(sqrt(
% Apply expert and buggy rules, or enter expressions
X, Y) :-
step(X, Y) ; buggy(X, Y).
expert(X, Y) :-
step(compound(X),
, X, Y),
mapargs(searchX, Y).
dif(
% Search through problem space
X, X).
search(X, Z) :-
search(X, Y),
step(Y, Z). search(
The little e-learning system shown in Listing 4 produces six response alternatives. The fourth and the sixth result are combinations of the same two buggy rules (parenthesis, then square root, and the other way round). Some additional filters would be needed to eliminate trivial and redundant solutions .
consult(system.file(file.path("pl", "buggy.pl"), package="rolog"))
<- quote(search(tratio(x, mu, s, n), .S))
Q unlist(findall(Q, options=list(preproc=as.rolog)))
#> $S
#> tratio(x, mu, s, n)
#>
#> $S
#> frac(x - mu, s/sqrt(n))
#>
#> $S
#> x - frac(mu, s)/sqrt(n)
#>
#> $S
#> x - frac(mu, s)/n
#>
#> $S
#> frac(x - mu, s/n)
#>
#> $S
#> x - frac(mu, s)/n
An important feature of such a term manipulation is that the evaluation of the term can be postponed; for example, there is no need to instantiate the variables x, mu, s, and n with given values before raising a query. This is especially helpful for variables that may represent larger sets of data in later steps.
It should be mentioned that R is homoiconic, too, and the Prolog code above can, in principle, be rewritten in R using non-standard evaluation techniques (Wickham 2019). Prolog’s inbuilt pattern matching algorithm simplifies things a lot, though.
The R extension of the markdown language (Xie,
Dervieux, and Riederer 2020) enables reproducible statistical
reports with nice typesetting in HTML, Microsoft Word, and Latex.
However, so far, R expressions such as pbinom(k, N, p)
are
typeset as-is; prettier mathematical expressions such as \(P_\mathrm{Bi}(X \le k; N, p)\) require
Latex commands like
P_\mathrm{Bi}\left(X \le k; N, p\right)
, which are
cumbersome to type in and hard to read even if the expressions are
simple. Since recently, manual pages include support for mathematical
expressions (Sarkar and Hornik 2022),
which already is a big improvement.
Below Prolog’s grammar rules are used for an automatic
translation of R calls to MathML. The result can then be used for
calculations or it can be rendered on a web page. A limited set of rules
for translation from R to MathML is found in pl/mathml.pl
of package rolog
. A more comprehensive translator is
provided by the R package mathml
(Gondan 2022). The relevant code snippets are
shown in the listings below, along with their output.
library(rolog)
consult(system.file(file.path("pl", "mathml.pl"), package="rolog"))
# R interface to Prolog predicate r2mathml/2
<- function(term)
mathml
{<- once(call("r2mathml", term, expression(X)))
t cat(paste(t$X, collapse=""))
}
The first example is easy. At the Prolog end, there is a handler for
pbinom/3
that translates the term into a pretty MathML
syntax like P_bi(X <= k; N, pi).
<- quote(pbinom(k, N, p))
term
# Pretty print
mathml(term)
${P}_{\text{Bi}}\left(X\le k;N,p\right)$
# Do some calculations with the same term
<- 10
k <- 22
N <- 0.4
p eval(term)
[1] 0.77195
The next example is interesting because Prolog needs to find out the
name of the integration variable for sin
. For that purpose,
rolog provides a predicate r_eval/2
that calls R from
Prolog (i.e., the reverse direction, see also next example). Here, the
predicate is used for the R function formalArgs(args(sin))
,
which returns the name of the function argument of sin
,
that is, x
.
<- quote(integrate(sin, 0L, 2L*pi))
term mathml(term)
$\underset{0}{\overset{2\pi}{\int}}\mathrm{sin}\left(x\right)\phantom{\rule{thinmathspace}{0ex}}dx$
eval(term)
2.221501e-16 with absolute error < 4.4e-14
Note that the Prolog end, the handler for integrate/3
is
rather rigid; it accepts only these three arguments in that particular
order, and without names, that is,
integrate(sin, lower=0L, upper=2L * pi)
would not print the
desired result.
The extra R function canonical()
applies
match.call()
to non-primitive R calls, basically cleaning
up the arguments and bringing them into the correct order. Moreover, an
extra handler maps the extractor function $(Fn, "value")
to
Fn
.
<- function(term)
canonical
{if(is.call(term))
{<- match.fun(term[[1]])
f if(!is.primitive(f))
<- match.call(f, term)
term
# Recurse into arguments
-1] <- lapply(term[-1], canonical)
term[
}
return(term)
}
<- function(u)
g sin(u)
# Mixture of (partially) named and positional arguments in unusual order
<- quote(2L * integrate(low=-Inf, up=Inf, g)$value)
term mathml(canonical(term))
$2\cdot \underset{-\infty}{\overset{\infty}{\int}}g\left(u\right)\phantom{\rule{thinmathspace}{0ex}}du$
# It is a bit of a mystery that R knows the result of this integral.
eval(term)
[1] 0
Note that both sin
nor g
in the above terms
are R symbols, not R functions. In order to render something like
call("integrate", low=-Inf, up=Inf, g)
, or
call("integrate", low=-Inf, up=Inf, sin)
, with
g
and sin
referring to the respective
functions, one would need to determine its name, which is not possible
in general.
print(g)
#> function(u)
#> sin(u)
#> <bytecode: 0x56244331ac98>
The basic workflow of the bridge from R to Prolog is to (A) translate
an R expression into a Prolog term (i.e., a predicate), (B) query the
predicate, and then, (C) translate the result (i.e., the bindings of the
variables) back to R (see also Figure 1). The reverse direction is
straightforward, we start by translating a Prolog term to an
R expression (i.e. Step C), evaluate the R expression, and then
translate the result back to a Prolog term (Step A). Package
rolog
provides two predicates for that purpose,
r_eval(Expr)
and r_eval(Expr, Res)
. The former
is used to invoke an R expression Expr
for its side effects
(e.g., initializing a random number generator); it does not return a
result. The latter is used to evaluate the R expression and return the
result Res
. The code snippet in Listing 6
(r_eval.pl
) illustrates this behavior.
Seed) :-
r_seed('set.seed'(Seed)).
r_eval(
N, L) :-
r_norm(N), L). r_eval(rnorm(
r_eval/1
and
r_eval/2
. The R call set.seed
is quoted
because the dot is an operator in Prolog.
consult(system.file(file.path("pl", "r_eval.pl"), package="rolog"))
invisible(once(call("r_seed", 123L)))
once(call("r_norm", 3L, expression(X)))
#> $X
#> [1] -0.5604756 -0.2301775 1.5587083
The example in Listing 6 is a bit trivial, basically illustrating the
syntax and the workflow. More serious applications of are shown in the
next two sections where r_eval/2
is used to evaluate
monotonically behaving R functions and to obtain the names of function
arguments in R.
As show below, the default environment of rolog
’s
r_eval/2
is .GlobalEnv
, this can be changed in
an optional argument to once()
, findall()
, and
query()
.
# Set variable in R, read in Prolog
<- new.env()
env with(env, a <- 1)
once(call("r_eval", quote(a), expression(X)), env=env)
#> $X
#> [1] 5
# Set R variable in Prolog, read in R
invisible(once(call("r_eval", call("<-", quote(b), 2))))
cat("b =", b)
#> b = 2
If the R call raises an exception, an error is propagated to Prolog
and finally to the rolog
package:
#try(once(quote(r_eval(rnorm(-1))))) # return "-1" random normals
Let \(\langle\ell, u\rangle\) denote
a number between \(\ell\) and \(u\), \(\ell\le
u\). It is easily verified that the result of the difference
\(\langle\ell_1, u_1\rangle - \langle\ell_2,
u_2\rangle\) is somewhere in the interval \(\langle \ell_1 - u_2, u_1 -
\ell_2\rangle\), and a number of rules exist for basic arithmetic
operations and (piecewise) monotonically behaving functions (Hickey, Ju, and Emden 2001). For ratios,
denominators with mixed sign yield two possible intervals, for example,
\(\langle 1, 2\rangle / \langle -3, 3\rangle =
\langle -\infty, 3\rangle \cup \langle 3, \infty\rangle\), as
shown in Figure 4 in Hickey et al.’s article. The number of possible
candidates increases if more complicated functions are involved, as
unions of intervals themselves appear as arguments (e.g., if \(I_1 \cup I_2\) is added to \(I_3 \cup I_4\), the result is \(I_1 + I_3 \cup I_1 + I_4 \cup I_2 + I_3 \cup I_2 +
I_4\)). As a consequence, calculations in interval arithmetic are
non-deterministic in nature, and the number of possible results is not
foreseeable and cannot, in general, be vectorized as is often done in R.
Use cases for interval arithmetic are the limitations of floating-point
representations in computer hardware, but intervals can also be used to
represent the result of measurements with limited precision, or
truncated intermediate results of students doing hand calculations. A
few rules for basic interval arithmetic are found in
pl/interval.pl
; a few examples are shown below. Again,
Prolog rings back to R via r_eval/2
to determine the result
of dbinom(X, Size, Prob, Log)
.
#consult(system.file(file.path("pl", "interval.pl"), package="rolog"))
#Q <- quote(int(`...`(1, 2) / `...`(-3, 3), .Res))
#unlist(findall(Q, options=list(preproc=as.rolog)))
#D <- quote(`...`(5.7, 5.8))
#mu <- 4
#s <- quote(`...`(3.8, 3.9))
#N <- 24L
#tratio <- call("/", call("-", D, mu), call("/", s, call("sqrt", N)))
#once(call("int", tratio, expression(Res)))
# Binomial density
#prob = quote(`...`(0.2, 0.3))
#once(call("int", call("dbinom", 4L, 10L, prob, FALSE), expression(Res)))
The slightly cumbersome syntax for entering an interval \(\langle \ell, u\rangle\) is due to the fact that the ellipsis is a reserved symbol in R and cannot be used as an infix operator. A powerful and comprehensive system for constraint logic programming over intervals is available as a Prolog pack (Workman 2021) and can easily be connected to R using, for example, the present package.
R has become the primary language for statistical programming and
data science, but is currently lacking support for traditional, symbolic
artificial intelligence. There are already two add-ons for SWI-Prolog
that allow to run R calculations from Prolog (Angelopoulos et al. 2013; Wielemaker 2021b),
but a connection in the other direction was missing, so far.
rolog
bridges this gap by providing an interface to a
SWI-Prolog distribution in an R package. The communication between the
two systems is mainly in the form of queries from R to Prolog, but two
predicates allow Prolog to ring back and evaluate terms in R. The design
of the package is minimalistic, providing three main functions
query()
, submit()
, and clear()
,
and a very limited set of convenience tools (consult()
,
once()
, and findall()
) to facilitate recurrent
everyday actions. As both systems are homoiconic in nature, it was easy
to establish a one-to-one correspondence between many of the elements of
the two languages. Most exceptions (e.g., lack of R support for empty
symbols) can be avoided and/or circumvented by wrapper functions at both
ends.
Simple ways to extend the package have been described in Section 2;
such extensions could, for example, include R objects and structures
like those returned by lm()
, or S4 classes. In many use
cases, this may be realized by transforming the R object to a list with
named elements, and rebuild the object on the Prolog end on an as-needed
basis. After a query, the process is reversed. If speed is an issue,
more of these steps can, in principle, be moved into the package and
implemented in Rcpp
.
rolog
, thus, opens up a wide of applications in logic
programming for statisticians and researchers at the intersection of
symbolic and connectionist artificial intelligence, where concise
knowledge representation is combined with statistical power. Moreover,
rolog
provides starting points for useful small-scale
solutions for everyday issues in data science (term transformations,
pretty mathematical output, interval arithmetic, see Section 3).
At its present stage, a major limitation of rolog
is its
relatively slow speed. For example, translation of R lists or vectors to
the respective elements of the Prolog language (also lists,
#/N
) is done element-wise, in both directions. The
translation is optimized by using Rcpp
(Eddelbuettel and Balamuta 2018), but there
remains an upper bound in the efficiency, because Prolog does not
support vectors or matrices. Since Prolog’s primary purpose is not
vector or matrix calculation, this limitation may not show up in
real-world applications. Another issue, maybe a bit annoying, is the
rather cumbersome syntax of the interface, with the need for quoted
calls and R expressions for representing Prolog variables.
rolog
was deliberately chosen to be minimalistic and, so
far, only depends on base R. A more concise representation might be
obtained by tools from the “Tidyverse” ecosystem, as described in
Chapter 19 of Advanced R (Wickham 2019).
Finally, at this stage, rolog
is unable to deal with cyclic
terms (e.g.,
once(call("=", expression(A), call("f", expression(A))))
,
i.e., A = f(A)
raises an error message).
rolog
is available for R Version 4.2 and later, and can
easily be installed using the usual
install.packages("rolog")
. The source code of the package
is found at https://github.com/mgondan/rolog/, including
installation instructions for Unix, Windows and macOS.
Development of the package profited substantially from the Prolog
packs rserve_client
(Wielemaker
2021b) and real
(Angelopoulos
et al. 2013).
The results in this paper were obtained using R 2.2.2022 with the
rolog
0.9.18 package. R itself and all packages used are
available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/.