15. 🚧 Unevaluated Expressions (**)

The open-access textbook Deep R Programming by Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF). It is a non-profit project. This book is still a work in progress. Beta versions of Chapters 1–12 are already complete, but there will be more. In the meantime, any bug/typos reports/fixes are appreciated. Although available online, it is a whole course; it should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Also, check out my other book, Minimalist Data Wrangling with Python [20].

In this and the next chapter, we will learn some hocus-pocus that should only be of interest to advanced-to-be[1] and open-minded R programmers who would really like to understand what is going on under our language’s hood. In particular, we will inspect the mechanisms behind why certain functions do something very different from what we would expect them to do, if a standard evaluation scheme was followed (compare subset and transform mentioned in Section 12.3.9).

Namely, in normal programming languages, when we write something like:

plot(x, exp(x))

the expression exp(x), is evaluated first and its value[2] (in this case: a numeric vector) is only then passed to the plot function as the actual parameter. Thus, if x was set to be seq(0, 10, length.out=1001), the above never means anything else than:

plot(c(0.00, 0.01, 0.02, 0.03, ...), c(1.0000, 1.0101, 1.0202, 1.0305, ...))

But R was heavily inspired by a Lisp language dialect called Scheme[3], from whom it has inherited a quite disturbing ability to apply a set of techniques referred to as metaprogramming (computing on the language). Namely, we may define functions that can peek outside their small world and clearly see the code used to generate the arguments passed thereto. Having access to such unevaluated expressions, we can do to them whatever we please: print, modify, subset, re-interpret, evaluate on different data, or ignore whatsoever.

In theory, this enables the implementing of many potentially helpful beginner-friendly features, which allow us to express certain requests in a more concise manner. For instance, that the y-axis labels in Figure 2.2 could be generated automatically is exactly due to the fact that plot was able to see not only a vector like c(1.0000, 1.0101, 1.0202, 1.0305, ...), but also the expression that generated it, exp(x).

It also can, and did, cause chaos, confusion, and division.

In the current author’s opinion, R (as a whole, in the sense of R as a Language and an Environment) would be better-off if metaprogramming techniques were not exposed to an ordinary programmer[4]. A healthy R user (also yours truly 99% of the time) can perfectly do without and thus refrain from using them. The fact that we call them “advanced” will not make us “cool” if we start horsing around with nonstandard evaluation. “Perverse” is perhaps a better label.

Cursed be us, as we are about to start eating from tree of the knowledge of good and evil. But remember: with great power comes great responsibility.

15.1. Expressions at a Glance

At the most general level, expressions in a language like R can be classified into two groups:

  • simple expressions:

    • constants (e.g., 3.14, 2i, NA_real_, TRUE, "character string"),

    • names (symbols, identifiers),

  • compound expressions: combinations of \(n+1\) expressions (simple or compound) of the form \((f, e_1, e_2, \dots, e_n)\).

As we will soon see, compound expressions are used to represent a call to \(f\) (an operator) on a sequence of arguments \(e_1,e_2,\dots,e_n\) (operands). This is why we will also be denoting them simply with \(f(e_1,e_2,\dots,e_n)\).

On the other hand, names such as x, iris, sum, and spam, have no meaning without an explicitly provided context, which will be a topic that we explore in sec:to-do. Prior to that, we treat them as meaning-less.

Hence, for the time being, we are now only interested in the syntax or grammar of our language, not the semantics. We are abstract in the sense that in the expression “mean(rates)+2[5] neither `mean`, `x`, nor even `+` have the “usual” sense. We should therefore treat them as equivalent to, say, f(g(x), 2) or spam(bacon(spanish_inquisition), 2).

15.2. Language Objects

There are three types of language objects in R:

  • name (symbol) – stores object names in the sense of “simple expressions: names” in Section 15.1;

  • call – represents unevaluated function calls in the sense of “compound expressions” above;

  • expression – quite confusingly, represents a sequence of simple or compound expressions (constants, names, or calls).

One way to create a simple or compound expression is is by quoting, where we ask R to refrain itself from evaluating a given command:

quote(spam)  # name (symbol)
## spam
quote(f(x))  # call
## f(x)
quote(1+2+3*pi)  # another call
## 1 + 2 + 3 * pi

Note that none of the above was executed.

Single strings can be converted to names by calling:

as.name("spam")
## spam

And calls can be built programmatically by invoking:

call("sin", pi/2)
## sin(1.5707963267949)

Sometimes we might rather wish to quote the arguments passed:

call("sin", quote(pi/2))
## sin(pi/2)
call("c", 1, exp(1), quote(exp(1)), pi, quote(pi))
## c(1, 2.71828182845905, exp(1), 3.14159265358979, pi)

Objects of type expression can be thought of as lists of simple or compound expressions.

(exprs <- expression(1, spam, mean(x)+2))
## expression(1, spam, mean(x) + 2)

Note that all the arguments were quoted.

We can access the individual components using the index or extraction operators:

exprs[-1]
## expression(spam, mean(x) + 2)
exprs[[3]]
## mean(x) + 2
Exercise 15.1

Check the type of the object returned by a call to c(1, "two", sd, list(3, 4:5), expression(3+3)).

Note

Calling class on the aforementioned language objects yields name, call, and expression, whereas typeof returns symbol, language, and expression, respectively.

There is also an option to parse a given text fragment or a whole R script:

parse(text="mean(x)+2")
## expression(mean(x) + 2)
parse(text="  # two code lines (a comment to be ignored by the parser)
    x <- runif(5, -1, 1)
    print(mean(x)+2)
")
## expression(x <- runif(5, -1, 1), print(mean(x) + 2))
parse(text="2+")  # syntax error - unfinished business
## Error in parse(text = "2+"): <text>:2:0: unexpected end of input
## 1: 2+
##    ^

Important

deparse can be used to convert language objects to character vectors. For instance:

deparse(quote(mean(x+2)))
## [1] "mean(x + 2)"

This function has the nice side effect of tidying up the code formatting:

exprs <- parse(text=
    "`+`(x, 2)->y; if(y>0) print(y**10|>log()) else { y<--y; print(y)}")
for (e in exprs)
    cat(deparse(e), sep="\n", end="\n")
## y <- x + 2
## 
## 
## if (y > 0) print(log(y^10)) else {
##     y <- -y
##     print(y)
## }

15.3. Calls as Combinations of Expressions

We have mentioned that calls (compound expressions) are combinations of simple or compound expressions of the form \((f, e_1, \dots, e_n)\).

That the first expression on the list, denoted above with \(f\), plays a special role, is exactly seen in the following examples:

as.call(expression(f, x))
## f(x)
as.call(expression(`+`, 1, x))
## 1 + x
as.call(expression(`while`, i < 10, i <- i + 1))
## while (i < 10) i <- i + 1
as.call(expression(function(x) x**2, log(exp(1))))
## (function(x) x^2)(log(exp(1)))
as.call(expression(1, x, y, z))  # utter nonsense, but syntactically valid
## 1(x, y, z)

Recall from Section 9.4 that operators and language constructs such as if and while are ordinary functions.

Furthermore:

expr <- quote(f(1+2, a=1, b=2))
length(expr)
## [1] 4
names(expr)  # NULL if no arguments are named
## [1] ""  ""  "a" "b"

15.3.1. Browsing Parse Trees

We can access the individual expressions constituting an object of type call using square brackets. For example,

expr <- quote(1+x)
expr[[1]]
## `+`
expr[2:3]
## 1(x)

A compound expression was defined recursively: it can consist of other compound expressions.

For instance, the following expression:

expr <- quote(
    while (i < 10) {
        cat("i =", i, "\n")
        i <- i+1
    }
)

can be rewritten using the \(f(...)\) notation like:

`while`(`<`(i, 10), `{`( cat("i =", i, "\n"), `<-`(i, `+`(i, 1))))

Equivalently, in the Polish (prefix; \((f, ...)\); traditionally used in Lisp) notation it will look like:

(
    `while`,
    (`<`, i, 10),
    (
        `{`,
        (cat, "i =", i, "\n"),
        (
            `<-`,
            i,
            (`+`, i, 1)
        )
    )
)

Thus, for example, we can dig into the sub-expressions using a series of extractions:

expr[[2]][[1]]  # or expr[[c(2, 1)]]
## `<`
expr[[3]][[2]][[4]]  # or expr[[c(3, 2, 4)]]
## [1] "\n"
Example 15.2

We can even write a recursive function to traverse the whole parse tree:

recapply <- function(expr)
{
    if (is.call(expr)) lapply(expr, recapply)
    else expr
}

str(recapply(expr))
## List of 3
##  $ : symbol while
##  $ :List of 3
##   ..$ : symbol <
##   ..$ : symbol i
##   ..$ : num 10
##  $ :List of 3
##   ..$ : symbol {
##   ..$ :List of 4
##   .. ..$ : symbol cat
##   .. ..$ : chr "i ="
##   .. ..$ : symbol i
##   .. ..$ : chr "\n"
##   ..$ :List of 3
##   .. ..$ : symbol <-
##   .. ..$ : symbol i
##   .. ..$ :List of 3
##   .. .. ..$ : symbol +
##   .. .. ..$ : symbol i
##   .. .. ..$ : num 1

15.3.2. Manipulating Calls

The R language is homoiconic: it can treat code as data. This includes the ability to arbitrarily manipulate it on the fly.

Just like on lists: we can freely use the replacement versions of `[` and `[[`.

expr[[2]][[1]] <- as.name("<=")
expr[[3]] <- quote(i <- i + 2)
print(expr)
## while (i <= 10) i <- i + 2

We are only limited by our imagination.

15.4. Inspecting Function Definitions and Arguments Thereto

15.4.1. Getting Formal Arguments and Body

Consider the following function:

test <- function(x, y=1)
    x+y  # whatever

We know from the first part of this book that calling print on the above will reveal its source code.

It turns out that we can easily get access to the list of parameters in the form of a named list[6]:

formals(test)
## $x
## 
## 
## $y
## [1] 1

Note that the expressions generating the values of the default arguments (compare Section 16.4.1) are stored as ordinary list elements.

Furthermore, we can get access to its body:

body(test)
## x + y

It is an object of the now-well-known class call.

Thus, we can manipulate it arbitrarily:

body(test)[[1]] <- as.name("*")  # change from `+` to `*`
body(test) <- as.call(list(as.name("{"), quote(cat("spam")), body(test)))
test
## function (x, y = 1) 
## {
##     cat("spam")
##     x * y
## }

15.4.2. Getting the Expression Passed as an Argument

A call to substitute allows us to reveal the expression used to generate a function’s argument:

test <- function(x) substitute(x)

test(1)
## [1] 1
test(2+spam)
## 2 + spam
test(test(test(!!7)))
## test(test(!!7))
test()  # it is not an error

In Section 16.4.2 we note that arguments are evaluated only on demand – substitute does not trigger that. Therefore, we are able to write functions that accept gobbledegook (as long as it is syntactically correct) and programmatically reinterpret it in whichever way we like. Please, do not do that. Have mercy on other R users.

Exercise 15.3

It is quite common to see a call like deparse(substitute(arg)). in many R functions. Study the source code of plot.default, hist.default, prop.test, and wilcox.test.default. Explain why they do that. Propose a solution to to achieve the same functionality without the use of reflection techniques.

15.4.3. Checking if an Argument is Missing

There is an easy way to check whether an argument was provided at all:

test <- function(x) missing(x)

test(1)
## [1] FALSE
test()
## [1] TRUE
Exercise 15.4

Study the source code of sample, seq.default, plot.default, matplot, and t.test.default. Determine the role of a call to missing. Would introducing a default argument NULL and testing its value with is.null be a good alternative?

15.4.4. Determining How a Function was Called

Even though this somewhat already touches the topic of the environment model of evaluation that we discuss in the next chapter, it is worth knowing that sys.call can take a look at the call stack and determine how the current function was invoked.

Moreover, match.call takes a step further: it returns a call with argument names matched to the list of a function’s formal parameters.

For instance:

test <- function(x, y, ..., a="yes", b="no")
{
    print(sys.call())  # sys.call(0)
    print(match.call())
}

x <- "maybe"
test("spam", "bacon", "eggs", u = "ham"<"jam", b=x)
## test("spam", "bacon", "eggs", u = "ham" < "jam", b = x)
## test(x = "spam", y = "bacon", "eggs", u = "ham" < "jam", b = x)

Another example, where we see that we can access the call stack much more deeply:

f <- function(x)
{
    g <- function(y)
    {
        cat("g:\n")
        print(sys.call(0))
        print(sys.call(-1))  # go back one frame
        y
    }

    cat("f:\n")
    print(sys.call(0))
    g(x+1)
}

f(1)
## f:
## f(1)
## g:
## g(x+1)
## f(1)
## [1] 2
Exercise 15.5

A function can[7] see how it has been defined by its maker. Call sys.function inside its body to reveal that.

Exercise 15.6

Call match.call(sys.function(-1), sys.call(-1)) in the g function above.

15.5. Exercises

Exercise 15.7

Answer the following questions:

  • What is a simple expression? What is a compound expressions? Give a few examples.

  • What is the difference between a call and an expression object?

  • What does formals and body return when fed with a function object?

  • How to test if an argument to a function was given at all? Provide a use case for such a verification.

  • Give two ways to create an unevaluated expression by quoting.

  • What is the purpose of deparse(substitute(...))? Give a few examples of functions that use this technique.

  • What is the difference between sys.call and match.call?

Exercise 15.8

Write a function that takes a dot-dot-dot argument (Section 9.5.6). Using match.call (amongst others), determine a list of all the expressions passed via `...`. Note that some of them might be named (just like in one of the above examples).

Exercise 15.9

Write a function check_if_calls(f, fun_list) that takes another function f on input and check if it calls any of the functions (by name) from a character vector fun_list.