Deep R Programming

Marek Gagolewski

doi:10.5281/zenodo.7490464

15. Unevaluated expressions (*)¶

This open-access textbook is, and will remain, freely available for everyone’s enjoyment (also in PDF; a paper copy can also be ordered). It is a non-profit project. Although available online, it is a whole course, and should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Any bug/typo reports/fixes are appreciated. Make sure to check out Minimalist Data Wrangling with Python [28], too.

In this and the remaining chapters, we will learn some hocus-pocus that should only be of interest to the advanced-to-be[1] and open-minded R programmers who would like to understand what is going on under our language’s bonnet. In particular, we will inspect the mechanisms behind why certain functions act differently from what we would expect them to do if a standard evaluation scheme was followed (compare subset and transform mentioned in Section 12.3.9). Namely, in normal programming languages, when we execute something like:

plot(x, exp(x))

the expression exp(x) is evaluated first. Then, and only then, its value[2] (in this case: probably a numeric vector) is passed to the plot function as an actual parameter. Thus, if x becomes seq(0, 10, length.out=1001), the above never means anything else than:

plot(c(0.00, 0.01, 0.02, 0.03, ...), c(1.0000, 1.0101, 1.0202, 1.0305, ...))

But R was heavily inspired by the S language from which it has taken the notion of lazy arguments (Chapter 17). It is thus equipped with the ability to apply a set of techniques referred to as metaprogramming (computing on the language, reflection). With it, we can define functions that do not take their arguments for granted and clearly see the code fragments passed to them. Having access to such unevaluated expressions, we can do to them whatever we please: print, modify, evaluate on different data, or ignore whatsoever.

In theory, this enables implementing many potentially helpful[3], beginner-friendly features and express certain requests in a more concise manner. For instance, that the y-axis labels in Figure 2.2 could be generated automatically is precisely because plot was able to see not only a vector like c(1.0000, 1.0101, 1.0202, 1.0305, ...) but also the expression that generated it, exp(x).

Nonetheless, as a form of untamed freedom of expression[4], metaprogramming has the endless potential to arouse chaos, confusion, and division in the user community. In particular, we can introduce a dialect within our language that people outside our circle will not be able to understand. Once it becomes a dominant one, other users will feel excluded.

Cursed be us, for we are about to start eating from the tree of the knowledge of good and evil. But remember: with great power comes great fun (and responsibility).

15.1. Expressions at a glance¶

At the most general level, expressions (statements) in a language like R can be classified into two groups:

simple expressions:
- constants (e.g., 3.14, 2i, 42L, NA_real_, Inf, NaN, NA, FALSE, TRUE, "character string", NULL, -1.3e-16, and 0x123abc),
- names (symbols, identifiers; e.g., x, iris, sum, data.frame, spam, `+`, `[<-`, and spanish_inquisition),
compound expressions – combinations of \(n+1\) expressions (simple or compound) of the form:

\[(f, e_1, e_2, \dots, e_n).\]

As we will soon see, compound expressions represent a call to \(f\) (an operator) on a sequence of arguments \(e_1,e_2,\dots,e_n\) (operands). It is why, equivalently, we denote them by \(f(e_1,e_2,\dots,e_n)\).

On the other hand, names have no meaning without an explicitly stated context, which we will explore in Chapter 16. Prior to that, we treat them as meaning-less. Hence, for the time being, we are only interested in the syntax or grammar of our language, not the semantics. We are abstract in the sense that, in the expression mean(rates)+2, which we know from Section 9.3.5 that we can equivalently articulate as `+`(mean(rates), 2), neither mean, x, nor even `+` has the usual sense. Therefore, we should consider it the same as, say, f(g(x), 2) or nobody(expects(spanish_inquisition), 2).

15.2. Language objects¶

There are three types of language objects in R:

name (symbol) represents object names in the sense of simple expressions: names in Section 15.1;
call stores unevaluated function calls in the sense of compound expressions above;
expression, quite confusingly, represents a sequence of simple or compound expressions (constants, names, or calls).

One way to create a simple or compound expression is by quoting, where the R interpreter is asked to refrain from evaluating a given command:

quote(spam)  # name (symbol)
## spam
quote(f(x))  # call
## f(x)
quote(1+2+3*pi)  # another call
## 1 + 2 + 3 * pi

None of the foregoing was executed. In particular, spam has no sense in the current context (whichever that is). It is not the meaning that we are now after.

Single strings can be converted to names by calling:

as.name("spam")
## spam

Calls can be built programmatically by invoking:

call("sin", pi/2)
## sin(1.5707963267949)

Sometimes we would rather have the arguments quoted:

call("sin", quote(pi/2))
## sin(pi/2)
call("c", 1, exp(1), quote(exp(1)), pi, quote(pi))
## c(1, 2.71828182845905, exp(1), 3.14159265358979, pi)

Objects of the type expression can be thought of as list-like sequences that consist of simple or compound expressions.

(exprs <- expression(1, spam, mean(x)+2))
## expression(1, spam, mean(x) + 2)

All arguments were quoted. We can select or subset the individual statements using the extraction or index operators:

exprs[-1]
## expression(spam, mean(x) + 2)
exprs[[3]]
## mean(x) + 2

Exercise 15.1

Check the type of the object returned by a call to c(1, "two", sd, list(3, 4:5), expression(3+3)).

There is also an option to parse a given text fragment or a whole source file:

parse(text="mean(x)+2")
## expression(mean(x) + 2)
parse(text="  # two code lines (comments are ignored by the parser)
    x <- runif(5, -1, 1)
    print(mean(x)+2)
")
## expression(x <- runif(5, -1, 1), print(mean(x) + 2))
parse(text="2+")  # syntax error - unfinished business
## Error in parse(text = "2+"): <text>:2:0: unexpected end of input 1: 2+ ^

Important

The deparse function converts language objects to character vectors, e.g.:

deparse(quote(mean(x+2)))
## [1] "mean(x + 2)"

This function has the nice side effect of tidying up the code formatting:

exprs <- parse(text=
    "`+`(x, 2)->y; if(y>0) print(y**10|>log()) else { y<--y; print(y)}")

Let’s print them out:

for (e in exprs)
    cat(deparse(e), sep="\n")
## y <- x + 2
## if (y > 0) print(log(y^10)) else {
##     y <- -y
##     print(y)
## }

Note

Calling class on objects of the three aforementioned types yields name, call, and expression, whereas typeof returns symbol, language, and expression, respectively.

15.3. Calls as combinations of expressions¶

We have mentioned that calls (compound expressions) are combinations of simple or compound expressions of the form \((f, e_1, \dots, e_n)\). The first expression on the list, denoted above by \(f\), plays a special role. This is precisely seen in the following examples:

as.call(expression(f, x))
## f(x)
as.call(expression(`+`, 1, x))  # `+`(1, x)
## 1 + x
as.call(expression(`while`, i < 10, i <- i + 1))
## while (i < 10) i <- i + 1
as.call(expression(function(x) x**2, log(exp(1))))
## (function(x) x^2)(log(exp(1)))
as.call(expression(1, x, y, z))  # utter nonsense, but syntactically valid
## 1(x, y, z)

Recall from Section 9.3 that operators and language constructs such as if and while are ordinary functions. Furthermore, keyword arguments to a call result in the underlying sequence’s being named:

expr <- quote(f(1+2, a=1, b=2))
length(expr)  # three arguments –> length-4 sequence
## [1] 4
names(expr)  # NULL if no arguments are named
## [1] ""  ""  "a" "b"

15.3.1. Browsing parse trees¶

Square brackets give us access to the individual expressions constituting an object of the type call. For example:

expr <- quote(1+x)
expr[[1]]
## `+`
expr[c(1, 3, 2)]
## x + 1
expr[c(2, 3, 1, 3)]
## 1(x, `+`, x)

A compound expression was defined recursively: it may consist of other compound expressions. For instance, a statement:

expr <- quote(
    while (i < 10) {
        cat("i = ", i, "\n", sep="")
        i <- i+1
    }
)

can be rewritten[5] using the \(f(...)\) notation like:

quote(
    `while`(
        `<`(i, 10),
        `{`(cat("i = " , i, "\n", sep=""), `<-`(i, `+`(i, 1)))
    )
)

We can dig into all the subexpressions using a series of extractions:

expr[[2]][[1]]  # expr[[c(2, 1)]]
## `<`
expr[[3]][[3]][[3]]  # expr[[c(3, 3, 3)]]
## i + 1
expr[[3]][[3]][[3]][[1]]  # expr[[c(3, 3, 3, 1)]]
## `+`

Example 15.2

We can even compose a recursive function to traverse the whole parse tree:

recapply <- function(expr)
{
    if (is.call(expr)) lapply(expr, recapply)
    else expr
}

str(recapply(expr))
## List of 3
##  $ : symbol while
##  $ :List of 3
##   ..$ : symbol <
##   ..$ : symbol i
##   ..$ : num 10
##  $ :List of 3
##   ..$ : symbol {
##   ..$ :List of 5
##   .. ..$    : symbol cat
##   .. ..$    : chr "i = "
##   .. ..$    : symbol i
##   .. ..$    : chr "\n"
##   .. ..$ sep: chr ""
##   ..$ :List of 3
##   .. ..$ : symbol <-
##   .. ..$ : symbol i
##   .. ..$ :List of 3
##   .. .. ..$ : symbol +
##   .. .. ..$ : symbol i
##   .. .. ..$ : num 1

15.3.2. Manipulating calls¶

The R language is homoiconic: it can treat code as data. This includes the ability to manipulate it on the fly. This is because, just like on lists, we can freely use the replacement versions of `[` and `[[` on objects of the type call.

expr[[2]][[1]] <- as.name("<=")  # was: `<`
expr[[3]] <- quote(i <- i * 2)  # was: {...}
print(expr)
## while (i <= 10) i <- i * 2

We are only limited by our imagination. We should spend some time and contemplate how powerful this is, knowing that soon we will become able to evaluate any expression in different contexts.

15.4. Inspecting function definition and usage¶

15.4.1. Getting the body and formal arguments¶

Consider a function:

test <- function(x, y=1, z=f(x, y))
    x+y+z  # whatever

We know from the first part of this book that calling print on a function reveals its source code. But there is more. We can fetch its formal parameters in the form of a named list[6]:

formals(test)
## $x
##
##
## $y
## [1] 1
##
## $z
## f(x, y)

Expressions corresponding to the default arguments are stored as ordinary list elements (for more details, see Section 17.2).

Furthermore, we can access the function’s body:

body(test)
## x + y + z

It is an object of the now well-known class call. Thus, we can customise it as we please:

body(test)[[1]] <- as.name("*")  # change `+` to `*`
body(test) <- as.call(list(
    as.name("{"), quote(cat("spam\n")), body(test)
))
print(test)
## function (x, y = 1, z = f(x, y))
## {
##     cat("spam\n")
##     (x + y) * z
## }

15.4.2. Getting the expression passed as an actual argument¶

A call to substitute reveals the expression passed as a function’s argument.

test <- function(x) substitute(x)

Some examples:

test(1)
## [1] 1
test(2+spam)
## 2 + spam
test(test(test(!!7)))
## test(test(!!7))
test()  # it is not an error

Chapter 17 notes that arguments are evaluated only on demand (lazily): substitute triggers no computations. This opens the possibility to author functions that interpret their input whichever way they like; see Section 9.4.7, Section 12.3.9, and Section 17.5 for examples.

Example 15.3

library (see Section 7.3.1) specifies the name of the package to be loaded both in the form of a character string and a name:

library("gsl")  # preferred
library(gsl)  # discouraged; via as.character(substitute(package))

A user saves two keystrokes at the cost of not being able to prepare the package name programmatically before the call:

which_package <- "gsl"
library(which_package)  # library("which_package")
## Error in library(which_package): there is no package called
##     'which_package'

In order to make the above possible, we need to alter the function’s character.only argument (which defaults to FALSE):

library(which_package, character.only=TRUE)  # OK

Exercise 15.4

In many functions, we can see a call like deparse(substitute(arg)) or as.character(substitute(arg)). Study the source code of plot.default, hist.default, prop.test, wilcox.test.default and the aforementioned library. Explain why they do that. Propose a solution to achieve the same functionality without using reflection techniques.

15.4.3. Checking if an argument is missing¶

missing checks whether an argument was provided:

test <- function(x) missing(x)

test(1)
## [1] FALSE
test()
## [1] TRUE

Exercise 15.5

Study the source code of sample, seq.default, plot.default, matplot, and t.test.default. Determine the role of a call to missing. Would introducing a default argument NULL and testing its value with is.null constitute a reasonable alternative?

15.4.4. Determining how a function was called¶

Even though this somewhat touches on the topics discussed in the two coming chapters, it is worth knowing that sys.call can look at the call stack and determine how the current function was invoked. Moreover, match.call takes us a step further: it returns a call with argument names matched to a function’s list of formal parameters. For instance:

test <- function(x, y, ..., a="yes", b="no")
{
    print(sys.call())  # sys.call(0)
    print(match.call())
}

x <- "maybe"
test("spam", "bacon", "eggs", u=("ham"<"jam"), b=x)
## test("spam", "bacon", "eggs", u = ("ham" < "jam"), b = x)
## test(x = "spam", y = "bacon", "eggs", u = ("ham" < "jam"), b = x)

In both cases, the results are objects of the type call. We know how to manipulate them already.

Another example where we see that we can dig into the call stack much more deeply:

f <- function(x)
{
    g <- function(y)
    {
        cat("g:\n")
        print(sys.call(0))
        print(sys.call(-1))  # go back one frame
        y
    }

    cat("f:\n")
    print(sys.call(0))
    g(x+1)
}

f(1)
## f:
## f(1)
## g:
## g(x+1)
## f(1)
## [1] 2

Note

Matching function parameters to the passed arguments is done in the following order (see Section 4.3 of [70]):

First, keyword arguments with names are matched exactly. Each name is matched at most once.
Then, we take the remaining keyword arguments, but with the partial matching of names listed before the ellipsis, `...`. Each match must be unambiguous.
Third, we apply the positional matching to the remaining parameters.
Last, the ellipsis (if present) consumes all the remaining arguments (named or not).

For instance:

test <- function(spam, jasmine, jam, ..., option=NULL)
    print(match.call())

Example calls:

test(1, 2, 3, 4, option="yes")
## test(spam = 1, jasmine = 2, jam = 3, 4, option = "yes")
test(1, 2, jasmine="no", sp=4, ham=7)
## Warning in test(1, 2, jasmine = "no", sp = 4, ham = 7): partial argument
##     match of 'sp' to 'spam'
## Warning in match.call(definition, call, expand.dots, envir): partial
##     argument match of 'sp' to 'spam'
## test(spam = 4, jasmine = "no", jam = 1, 2, ham = 7)
test(1, 2, ja=7)  # ambiguous match
## Warning in test(1, 2, ja = 7): partial argument match of 'ja' to 'jasmine'
## Error in test(1, 2, ja = 7): argument 3 matches multiple formal arguments
test(o=7)  # partial matching of `option` failed - `option` is after `...`
## test(o = 7)

Note again that our environment uses options(warnPartialMatchArgs=TRUE).

Exercise 15.6

A function can[7] see how it was defined by its maker. Call sys.function inside its body to reveal that.

Exercise 15.7

Execute match.call(sys.function(-1), sys.call(-1)) in the above g function.

15.5. Exercises¶

Exercise 15.8

Answer the following questions.

What is a simple expression? What is a compound expression? Give a few examples.
What is the difference between an object of the type call and that of the type expression?
What do formals and body return when called on a function object?
How to test if an argument to a function was given? Provide a use case for such a verification step.
Give a few ways to create an unevaluated call.
What is the purpose of deparse(substitute(...))? Give a few examples of functions that use this technique.
What is the difference between sys.call and match.call?
Why cannot we rely on partial matching in the call boxplot(x, horiz=TRUE) and have to write the full argument name like boxplot(x, horizontal=TRUE) instead?

Exercise 15.9

Write a function that takes the dot-dot-dot argument. Using match.call (amongst others), determine the list of all the expressions passed via `...`. Allow some of them to be named (just like in one of the preceding examples). The solution will be given in Section 17.3.

Exercise 15.10

Write a function check_if_calls(f, fun_list) that takes another function f as input. Then, it verifies if f calls any of the functions (referred to by their names) from a character vector fun_list.