15. Unevaluated expressions (*)#
The open-access textbook Deep R Programming by Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF). It is a non-profit project. This book is still a work in progress. Beta versions of all chapters are already available (proofreading and copyediting pending). In the meantime, any bug/typos reports/fixes are appreciated. Although available online, this is a whole course. It should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Also, check out my other book, Minimalist Data Wrangling with Python [26].
In this and the remaining chapters, we will learn some hocus-pocus that should only be of interest to the advanced-to-be[1] and open-minded R programmers who would like to understand what is going on under our language’s hood. In particular, we will inspect the mechanisms behind why certain functions act differently from what we would expect them to do if a standard evaluation scheme was followed (compare subset and transform mentioned in Section 12.3.9).
Namely, in normal programming languages, when we execute something like:
plot(x, exp(x))
the expression exp(x)
,
is evaluated first and its value[2]
(in this case: probably a numeric vector) is only then passed
to the plot function as the actual parameter.
Thus, if `x
` was set to be
seq(0, 10, length.out=1001)
,
the above never means anything else than:
plot(c(0.00, 0.01, 0.02, 0.03, ...), c(1.0000, 1.0101, 1.0202, 1.0305, ...))
But R was heavily inspired by the S language from whom it has taken the notion of lazy arguments (Chapter 17). It is equipped with the ability to apply a set of techniques referred to as metaprogramming (computing on the language, reflection). With it, we can define functions that can peek outside their small world and clearly see the code fragment passed as their arguments. Having access to such unevaluated expressions, we can do to them whatever we please: print, modify, evaluate on different data, or ignore whatsoever.
In theory, this enables implementing many potentially helpful[3]
beginner-friendly features and express certain
requests in a more concise manner. For instance, that the y-axis labels
in Figure 2.2 could be generated automatically is
precisely because plot was able to see not only
a vector like c(1.0000, 1.0101, 1.0202, 1.0305, ...)
but also the expression that generated it, exp(x)
.
Nonetheless, as a form of untamed freedom of expression[4], metaprogramming has the endless potential to arouse chaos, confusion, and division in the user community. In particular, we can introduce a dialect within our language that people outside our circle will not be able to understand.
Therefore, cursed be us, for we are about to start eating from the tree of the knowledge of good and evil. But remember: with great power comes great fun (and responsibility).
15.1. Expressions at a glance#
At the most general level, expressions (statements) in a language like R can be classified into two groups:
simple expressions:
constants (e.g.,
3.14
,2i
,42L
,NA_real_
,TRUE
,"character string"
,NULL
,-1.3e-16
,0x123abc
),names (symbols, identifiers),
compound expressions – combinations of \(n+1\) expressions (simple or compound) of the form:
\[(f, e_1, e_2, \dots, e_n).\]
As we will soon see, compound expressions represent a call to \(f\) (an operator) on a sequence of arguments \(e_1,e_2,\dots,e_n\) (operands). It is why, equivalently, we denote them with \(f(e_1,e_2,\dots,e_n)\).
On the other hand, names such as `x
`, `iris
`, `sum
`,
and `spam
` have no meaning without an explicitly stated context.
This will be a topic we will explore in Chapter 16.
Prior to that, we treat them as meaning-less.
Hence, for the time being, we are only interested
in the syntax or grammar of our language, not the semantics.
We are abstract in the sense that, in the expression
“mean(rates)+2
”[5],
neither
`mean
`, `x
`, nor even `+
` have the “usual” sense.
Therefore, we should treat them as equivalent to, say,
f(
g(x), 2)
or
spam(
bacon(spanish_inquisition), 2)
.
15.2. Language objects#
There are three types of language objects in R:
name
(symbol
) – stores object names in the sense of simple expressions: names in Section 15.1;call
– represents unevaluated function calls in the sense of compound expressions above;expression
– quite confusingly, represents a sequence of simple or compound expressions (constants, names, or calls).
One way to create a simple or compound expression is by quoting, where interpreter is asked to refrain from evaluating a given command:
quote(spam) # name (symbol)
## spam
quote(f(x)) # call
## f(x)
quote(1+2+3*pi) # another call
## 1 + 2 + 3 * pi
None of the above was executed.
Single strings can be converted to names by calling:
as.name("spam")
## spam
Calls can be built programmatically by invoking:
call("sin", pi/2)
## sin(1.5707963267949)
Sometimes we had rather quote the arguments passed:
call("sin", quote(pi/2))
## sin(pi/2)
call("c", 1, exp(1), quote(exp(1)), pi, quote(pi))
## c(1, 2.71828182845905, exp(1), 3.14159265358979, pi)
Objects of the type expression
can be thought of as list-like objects
that consist of simple or compound expressions.
(exprs <- expression(1, spam, mean(x)+2))
## expression(1, spam, mean(x) + 2)
All arguments were quoted.
We can select or subset the individual statements using the extraction or index operators:
exprs[-1]
## expression(spam, mean(x) + 2)
exprs[[3]]
## mean(x) + 2
Check the type of the object returned by a call to
“c(1, "two", sd,
list(3, 4:5),
expression(3+3))
”.
There is also an option to parse a given text fragment or a whole source file:
parse(text="mean(x)+2")
## expression(mean(x) + 2)
parse(text=" # two code lines (a comment to be ignored by the parser)
x <- runif(5, -1, 1)
print(mean(x)+2)
")
## expression(x <- runif(5, -1, 1), print(mean(x) + 2))
parse(text="2+") # syntax error - unfinished business
## Error in parse(text = "2+"): <text>:2:0: unexpected end of input 1: 2+ ^
Important
The deparse function can be used to convert language objects to character vectors. For instance:
deparse(quote(mean(x+2)))
## [1] "mean(x + 2)"
This function has the nice side effect of tidying up the code formatting:
exprs <- parse(text=
"`+`(x, 2)->y; if(y>0) print(y**10|>log()) else { y<--y; print(y)}")
Let us print them out:
for (e in exprs)
cat(deparse(e), sep="\n")
## y <- x + 2
## if (y > 0) print(log(y^10)) else {
## y <- -y
## print(y)
## }
Note
Calling class on objects of the three aforementioned types
yields name
, call
, and expression
,
whereas typeof returns symbol
, language
, and expression
,
respectively.
15.3. Calls as combinations of expressions#
We have mentioned that calls (compound expressions) are combinations of simple or compound expressions of the form \((f, e_1, \dots, e_n)\).
The first expression on the list, denoted above with \(f\), plays a special role. It is precisely seen in the following examples:
as.call(expression(f, x))
## f(x)
as.call(expression(`+`, 1, x))
## 1 + x
as.call(expression(`while`, i < 10, i <- i + 1))
## while (i < 10) i <- i + 1
as.call(expression(function(x) x**2, log(exp(1))))
## (function(x) x^2)(log(exp(1)))
as.call(expression(1, x, y, z)) # utter nonsense, but syntactically valid
## 1(x, y, z)
Recall from Section 9.4 that operators and language constructs such as if and while are ordinary functions.
Furthermore:
expr <- quote(f(1+2, a=1, b=2))
length(expr)
## [1] 4
names(expr) # NULL if no arguments are named
## [1] "" "" "a" "b"
15.3.1. Browsing parse trees#
Square brackets give us access to the individual expressions constituting
an object of the type call
. For example:
expr <- quote(1+x)
expr[[1]]
## `+`
expr[2:3]
## 1(x)
A compound expression was defined recursively: it may consist of other compound expressions.
For instance, the following expression:
expr <- quote(
while (i < 10) {
cat("i =", i, "\n")
i <- i+1
}
)
can be rewritten using the \(f(...)\) notation like:
`while`(`<`(i, 10), `{`( cat("i =", i, "\n"), `<-`(i, `+`(i, 1))))
Equivalently, in the Polish notation (the prefix notation; \((f, ...)\); traditionally used in Lisp), it will look like:
(
`while`,
(`<`, i, 10),
(
`{`,
(cat, "i =", i, "\n"),
(
`<-`,
i,
(`+`, i, 1)
)
)
)
Thus, for example, we can dig into the sub-expressions using a series of extractions:
expr[[2]][[1]] # or expr[[c(2, 1)]]
## `<`
expr[[3]][[2]][[4]] # or expr[[c(3, 2, 4)]]
## [1] "\n"
We can even compose a recursive function to traverse the whole parse tree:
recapply <- function(expr)
{
if (is.call(expr)) lapply(expr, recapply)
else expr
}
str(recapply(expr))
## List of 3
## $ : symbol while
## $ :List of 3
## ..$ : symbol <
## ..$ : symbol i
## ..$ : num 10
## $ :List of 3
## ..$ : symbol {
## ..$ :List of 4
## .. ..$ : symbol cat
## .. ..$ : chr "i ="
## .. ..$ : symbol i
## .. ..$ : chr "\n"
## ..$ :List of 3
## .. ..$ : symbol <-
## .. ..$ : symbol i
## .. ..$ :List of 3
## .. .. ..$ : symbol +
## .. .. ..$ : symbol i
## .. .. ..$ : num 1
15.3.2. Manipulating calls#
The R language is homoiconic: it can treat code as data.
This includes the ability to manipulate it on the fly.
This is because, just like on lists, we can freely use the
replacement versions of `[` and `[[` on
objects of the type call
.
expr[[2]][[1]] <- as.name("<=")
expr[[3]] <- quote(i <- i * 2)
print(expr)
## while (i <= 10) i <- i * 2
We are only limited by our imagination.
15.4. Inspecting function definitions and arguments thereto#
15.4.1. Getting the body and formal arguments#
Consider the following definition:
test <- function(x, y=1)
x+y # whatever
We know from the first part of this book that calling print on a function will reveal its source code.
It turns out that we can get access to the list of parameters in the form of a named list[6]:
formals(test)
## $x
##
##
## $y
## [1] 1
Note that the expressions generating the values of the default arguments are stored as ordinary list elements (for more details, see Section 17.2).
Furthermore, we can get access to the function’s body:
body(test)
## x + y
It is an object of the now well-known class call
.
Thus, we can customise it as we please:
body(test)[[1]] <- as.name("*") # change from `+` to `*`
body(test) <- as.call(list(as.name("{"), quote(cat("spam")), body(test)))
test
## function (x, y = 1)
## {
## cat("spam")
## x * y
## }
15.4.2. Getting the expression passed as an actual argument#
A call to substitute allows us to reveal the expression used to generate a function’s argument.
test <- function(x) substitute(x)
Some examples:
test(1)
## [1] 1
test(2+spam)
## 2 + spam
test(test(test(!!7)))
## test(test(!!7))
test() # it is not an error
Chapter 17 notes that arguments are evaluated only on demand (lazily): substitute triggers no computations. This opens the possibility to author functions that interpret their input whichever way they like; see Section 9.5.7, Section 12.3.9, and Section 17.5 for examples.
library (see Section 7.3.1) allows specifying the name of the package to be loaded both in the form of a character string and a name:
library("gsl") # preferred
library(gsl) # discouraged - via as.character(substitute(package))
A user saves two keystrokes at the cost of not being able to prepare the package name programmatically before the call:
which_package <- "gsl"
library(which_package) # library("which_package")
## Error in library(which_package): there is no package called
## 'which_package'
In order to make the above possible, we need to
alter the character.only
argument (which defaults to FALSE
):
library(which_package, character.only=TRUE)
It is quite common to see a call like
deparse(
substitute(arg))
or
as.character(
substitute(arg))
in many built-in functions.
Study the source code of plot.default,
hist.default, prop.test,
wilcox.test.default and the aforementioned
library.
Explain why they do that. Propose a solution to achieve the same
functionality without using reflection techniques.
15.4.3. Checking if an argument is missing#
There is an easy way to check whether an argument was provided at all:
test <- function(x) missing(x)
test(1)
## [1] FALSE
test()
## [1] TRUE
Study the source code of
sample,
seq.default,
plot.default,
matplot, and
t.test.default.
Determine the role of a call to missing.
Would introducing a default argument NULL
and testing its value
with is.null constitute a reasonable alternative?
15.4.4. Determining how a function was called#
Even though this somewhat touches on the topics discussed in the two following chapters, it is worth knowing that sys.call can look at the call stack and determine how the current function was invoked.
Moreover, match.call takes us a step further: it returns a call with argument names matched to a function’s formal parameters list.
For instance:
test <- function(x, y, ..., a="yes", b="no")
{
print(sys.call()) # sys.call(0)
print(match.call())
}
x <- "maybe"
test("spam", "bacon", "eggs", u = "ham"<"jam", b=x)
## test("spam", "bacon", "eggs", u = "ham" < "jam", b = x)
## test(x = "spam", y = "bacon", "eggs", u = "ham" < "jam", b = x)
Another example where we see that we can access the call stack much more deeply:
f <- function(x)
{
g <- function(y)
{
cat("g:\n")
print(sys.call(0))
print(sys.call(-1)) # go back one frame
y
}
cat("f:\n")
print(sys.call(0))
g(x+1)
}
f(1)
## f:
## f(1)
## g:
## g(x+1)
## f(1)
## [1] 2
Note
Let us formalise the order of matching function parameters to the passed arguments. As described in Section 4.3 of [66], it proceeds as follows:
keyword arguments with names matched exactly, each name matched at most once,
remaining keyword arguments, but with the partial matching of names listed before the ellipsis, `
...
`, each match must be unambiguous,positional matching to the remaining parameters,
all remaining arguments (named or not) will be consumed by the ellipsis (if present).
For instance:
test <- function(spam, jasmine, jam, ..., option=NULL)
print(match.call())
Example calls:
test(1, 2, 3, 4, option="yes")
## test(spam = 1, jasmine = 2, jam = 3, 4, option = "yes")
test(1, 2, jasmine="no", sp=4, ham=7)
## Warning in test(1, 2, jasmine = "no", sp = 4, ham = 7): partial argument
## match of 'sp' to 'spam'
## Warning in match.call(definition, call, expand.dots, envir): partial
## argument match of 'sp' to 'spam'
## test(spam = 4, jasmine = "no", jam = 1, 2, ham = 7)
test(1, 2, ja=7) # ambiguous match
## Warning in test(1, 2, ja = 7): partial argument match of 'ja' to 'jasmine'
## Error in test(1, 2, ja = 7): argument 3 matches multiple formal arguments
test(o=7) # partial matching of `option` failed - `option` is after `...`
## test(o = 7)
Note again that our environment uses
options(warnPartialMatchArgs=TRUE)
.
A function can[7] see how it was defined by its maker. Call sys.function inside its body to reveal that.
Execute “match.call(
sys.function(-1),
sys.call(-1))
” in the g function above.
15.5. Exercises#
Answer the following questions:
What is a simple expression? What is a compound expression? Give a few examples.
What is the difference between an object of the type
call
and that of the typeexpression
?What do formals and body return when called on a function object?
How to test if an argument to a function was given at all? Provide a use case for such a verification.
Give a few ways to create an unevaluated call.
What is the purpose of deparse
(
substitute(...))
? Give a few examples of functions that use this technique.What is the difference between sys.call and match.call?
Write a function that takes the dot-dot-dot argument (Section 9.5.6). Using match.call (amongst others), determine the list of all the expressions passed via `...`; some might be named (just like in one of the above examples). The solution is given in Section 17.3.
Write a function check_if_calls(f, fun_list)
that takes another function {command}
fon input. Then, it verifies if
f calls any of the functions
(referred to by their names) from a character vector fun_list
.