16. Environments and evaluation (*)#
The open-access textbook Deep R Programming by Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF). It is a non-profit project. This book is still a work in progress. Beta versions of all chapters are already available (proofreading and copyediting pending). In the meantime, any bug/typos reports/fixes are appreciated. Although available online, this is a whole course. It should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Also, check out my other book, Minimalist Data Wrangling with Python [26].
In the first part of our book, we discussed the most crucial basic object types: numeric, logical, and character vectors, lists (generic vectors), and functions.
In this chapter, we introduce another basic type: environments. Like lists, they can be classified as recursive data structures; compare the diagram in Figure 17.2.
Important
Each object of the type environment
consists of:
a frame[1] (Section 16.1), which stores a set of bindings that associate variable names with their corresponding values; it can be thought of as a container of named R objects of any type;
a reference to an enclosing environment[2] (Section 16.2.2), which might be inspected (recursively!) when a requested named variable is not found in the current frame.
Even though we rarely interact with them directly (unless we need a hash table-like data structure with a quick by-name element lookup), they are crucial for the R interpreter itself. Namely, we shall soon see that they form the basis of the environment model of evaluation, which governs how expressions are computed (Section 16.2).
16.1. Frames: Environments as object containers#
To create a new, empty environment, we can call the new.env function:
e1 <- new.env()
typeof(e1)
## [1] "environment"
In this section, we treat environments merely as containers for named objects of any kind, i.e., we deal with the frame part thereof.
Let us insert some elements into e1
:
e1[["x"]] <- "x in e1"
e1[["y"]] <- 1:3
e1[["z"]] <- NULL # unlike in the case of lists, creates a new element
The `[[` operator provides us with a named list-like look-and-feel also in the case of element extraction:
e1[["x"]]
## [1] "x in e1"
e1[["spam"]] # does not exist
## NULL
(e1[["y"]] <- e1[["y"]]*10) # replace with new content
## [1] 10 20 30
16.1.1. Printing#
Printing an environment leads to a quite awkward result:
print(e1) # same with str(e1)
## <environment: 0x55888446baa8>
It is the address where e1
is stored in the computer’s memory.
It can serve as the environment’s unique identifier.
As we have said, these objects are of rather internal interest. Thus, such an esoteric message was perhaps a good design choice to ward off novices. However, we can easily get the list of objects stored inside the container by calling names:
names(e1) # but attr(e1, "names") is not set
## [1] "x" "y" "z"
Moreover, length gives the number of bindings in the frame:
length(e1)
## [1] 3
16.1.2. Environments vs named lists#
Environment frames, in some sense, can be thought of as named lists, but the set of admissible operations is severely restricted. In particular, we cannot extract more than one element at the same time using the index operator:
e1[c("x", "y")] # but see the `mget` function
## Error in e1[c("x", "y")]: object of type 'environment' is not subsettable
nor can we refer to the elements by position:
e1[[1]] <- "bad key"
## Error in e1[[1]] <- "bad key": wrong args for environment subassignment
Check if lapply and Map can be applied directly on environments. Also, can we iterate over their elements using a for loop?
Still, named lists can be converted to environments and vice versa using as.list and as.environment.
as.list(e1)
## $x
## [1] "x in e1"
##
## $y
## [1] 10 20 30
##
## $z
## NULL
as.environment(list(u=42, whatever="it's not going to be printed anyway"))
## <environment: 0x558884023528>
as.list(as.environment(list(x=1, y=2, x=3))) # no duplicates allowed
## $y
## [1] 2
##
## $x
## [1] 3
16.1.3. Hash maps: Fast element lookup by name#
Environment frames are internally implemented using hash tables (hash maps; see, e.g., [14, 41]) with character string keys.
Important
A hash table is a data structure that allows for a very quick[3] lookup and insertion of individual elements by name.
The above comes at a price, including what we have already observed above:
the elements are not ordered in any particular way: they cannot be referred to via a numeric index;
all element names must be unique.
Note
A list may be considered a sequence, but an environment frame is only, in fact, a set (a bag) of key-value pairs. In most numerical computing applications, we would rather store, iterate over, and process all the elements in order, hence the greater prevalence of the former. Lists still allow for an element lookup by name, even though this is slightly slower[4]. Overall, they are much more universal.
A natural use case of manually-created environment frames deals with grouping a series of objects identified by character string keys.
Consider a simple pseudocode for counting the number of occurrences of objects in a given container:
for (key in some_container) {
if (!is.null(counter[["key"]]))
counter[["key"]] <- counter[["key"]]+1
else
counter[["key"]] <- 1
}
Let us assume that some_container
is large, say, of size \(n\),
e.g., it is generated on the fly by reading some data stream.
Then, the run-time of the above algorithm will depend on the
data structure used.
If the counter
is a list, then, theoretically, the worst-case performance
will be \(O(n^2)\) (if all keys are unique).
On the other hand, for environments, it will be faster
by one order magnitude: down to amortised \(O(n)\).
Implement a test function according to the above pseudocode and benchmark the two data structures using proc.time on some example data.
(*) Determine the number of unique text lines in a huge file (assuming that the set of unique text lines fits into memory, but the file itself does not). Also, determine the five most frequently occurring text lines.
16.1.4. Pass by value, copy on demand: Not for environments#
Given any object, say, x
, when we issue:
y <- x
its copy[5] is made so that y
and x
are independent.
In other words, any change to the state of x
(or y
)
is not reflected in y
(or x
).
For instance:
x <- list(a=1)
y <- x
y[["a"]] <- y[["a"]]+1
print(y)
## $a
## [1] 2
print(x) # not affected: `x` and `y` are independent
## $a
## [1] 1
The same happens with arguments that we feed to the functions:
mod <- function(y, key) # it is like: local_y <- passed_argument
{
y[[key]] <- y[[key]]+1
y
}
mod(x, "a") # returns a modified copy of `x`
## $a
## [1] 2
print(x) # not affected
## $a
## [1] 1
We can thus say that R applies the pass-by-value strategy here.
Important
Environments are the only[6] objects that follow the assign- and pass-by-reference strategies.
In other words, if we perform:
x <- as.environment(x)
y <- x
then the names x
and y
are bound with the same object
in the computer’s memory:
print(x)
## <environment: 0x558883254728>
print(y)
## <environment: 0x558883254728>
Therefore:
y[["a"]] <- y[["a"]]+1
print(y[["a"]])
## [1] 2
print(x[["a"]]) # `x` is `y`, `y` is `x`
## [1] 2
The same happens when we pass an environment to a function:
mod(y, "a") # pass-by-reference (`y` is `x`, remember?)
## <environment: 0x558883254728>
x[["a"]] # `x` has changed
## [1] 3
Thus, any changes we make to an environment passed as an argument to a function will be visible outside the call. This minimises time and memory use in certain situations.
Note
(*)
For efficiency reasons, when we write “y <- x
” ,
a copy of `x
` (unless it is an environment)
is created only if it is absolutely necessary.
Here is some benchmarking of the copy-on-demand mechanism.
n <- 100000000 # like, a lot
Creation of a new large numeric vector:
t0 <- proc.time(); x <- numeric(n); proc.time() - t0
## user system elapsed
## 0.853 1.993 2.852
Creation of a (delayed) copy:
t0 <- proc.time(); y <- x; proc.time() - t0
## user system elapsed
## 0 0 0
It was instant. Thus, we definitely did not duplicate the n
data cells.
Copy-on-demand is implemented using some simple reference counting;
compare Section 14.2.4. We can inspect that
x
and y
point to the same address in memory by calling:
.Internal(inspect(x)) # internal function - do not use it
## @7efba1134010 14 REALSXP g0c7 [REF(2)] (len=1000000000, tl=0) 0,0,0,0,...
.Internal(inspect(y))
## @7efba1134010 14 REALSXP g0c7 [REF(2)] (len=1000000000, tl=0) 0,0,0,0,...
The actual copying is only triggered when we try to modify x
or y
.
This is when the objects need to be separated.
t0 <- proc.time(); y[1] <- 1; proc.time() - t0
## user system elapsed
## 1.227 1.910 3.142
Now x
and y
are different objects.
.Internal(inspect(x))
## @7efba1134010 14 REALSXP g0c7 [MARK,REF(1)] (len=1000000000, tl=0) 0,0,...
.Internal(inspect(y))
## @7ef9c43ce010 14 REALSXP g0c7 [MARK,REF(1)] (len=1000000000, tl=0) 1,0,...
The elapsed time is similar to that needed to create x
from scratch.
Further modifications will already be quick:
t0 <- proc.time(); y[2] <- 2; proc.time() - t0
## user system elapsed
## 0.000 0.001 0.000
16.1.5. A note on reference classes (**)#
In Section 11.5, we briefly mentioned the S4 system for object-oriented programming.
It turns out that we also have access to its variant, called reference classes[7]. It was first introduced in R version 2.12.0.
Reference classes are implemented using S4 classes, with the data part
being of the type environment
.
This gives a more typical object-oriented experience,
where methods can modify the data they act on in place.
They are theoretically interesting concepts on their own and quite appealing to package developers with C++ or Java background. Nevertheless, in the current author’s opinion, such classes are alien citizens of our environment, violating its functional nature. Therefore, we will not be discussing them here.
A curious reader is referred to help("ReferenceClasses")
and Chapters 9 and 11 of [11] for more details.
16.2. The environment model of evaluation#
In Chapter 15, we said that
there are three types of expressions: constants (e.g., 1
and "spam"
),
names (e.g., `x
` and `spam
`),
and calls (like f(x, 1)
).
Important
Names (symbols) have no meaning by themselves. The meaning of a name always depends on the context, which is specified by some environment.
Consider a simple expression merely consisting of a name, `x
`:
expr_x <- quote(x)
Let us define two environments that bind the name `x
` with
two different constants.
e1 <- as.environment(list(x=1))
e2 <- as.environment(list(x="spam"))
An expression is evaluated within a specific environment. We can do that by calling eval:
eval(expr_x, envir=e1) # evaluate `x` within environment e1
## [1] 1
eval(expr_x, envir=e2) # evaluate the same `x` within environment e2
## [1] "spam"
The very same expression has two different meanings, depending on the context. This is quite like in the so-called real life: “I’m good” can mean “I don’t need anything” but also “My virtues are plentiful”. It all depends on who and when is asking, i.e., in which environment we evaluate the said sentence.
We call this the environment model of evaluation, a notion that R authors have borrowed from a Lisp-like language called Scheme[8] (see Section 3.2 of [1] and Section 6 of [66]).
16.2.1. Getting the current environment (here: the global one)#
By default, expressions are evaluated in the current environment. We can fetch it by calling:
sys.frame(sys.nframe()) # get the current environment
## <environment: R_GlobalEnv>
We are working on the R console. Hence, the current one is the global
environment (user workspace). We can access it from anywhere by calling
globalenv or referring to the `.GlobalEnv
` object.
Calling any operation, for instance[9]:
x <- "spammity spam"
means evaluating it within the current environment:
eval(quote(x <- "spammity spam"), envir=sys.frame(sys.nframe()))
Here, we bound the string "spammity spam"
with the name `x
` in
the current environment’s frame:
sys.frame(sys.nframe())[["x"]] # yes, `x` is in the current environment now
## [1] "spammity spam"
globalenv()[["x"]] # because the global environment is the current one here
## [1] "spammity spam"
Therefore, when we now refer to `x
` (from within
the current environment):
x # eval(quote(x), envir=sys.frame(sys.nframe()))
## [1] "spammity spam"
precisely the above named object is fetched.
save.image can be used to save the current workspace,
i.e., the global environment,
by default, to the file named .Rdata
in the current working
directory. Test this function in combination with load.
Note
Names starting with a dot are hidden. ls, a function to fetch all names registered within a given environment, does not list them by default.
.test <- "spam"
ls() # list all names in the current environment, i.e., the global one
## [1] "e1" "e2" "expr_x" "mod" "x" "y"
Compare the above with:
ls(all.names=TRUE)
## [1] ".Random.seed" ".test" "e1" "e2"
## [5] "expr_x" "mod" "x" "y"
On a side note, `.Random.seed
` stores the current pseudorandom
number generator’s seed; compare Section 2.1.5.
16.2.2. Enclosures, enclosures thereof, etc.#
To show that there is much more to the environment model of evaluation than what we mentioned above, let us try to evaluate an expression featuring two names:
e2 <- as.environment(list(x="spam")) # once again (a reminder)
expr_comp <- quote(x < "eggs")
eval(expr_comp, envir=e2)
## Error in x < "eggs": could not find function "<"
The meaning of any constant (here, "spam"
) is context-independent,
`x
` is specified by the environment provided,
but the name `<` is not mentioned therein.
Hence the error.
Nonetheless, we feel that we know the meaning of `<`. It is a relational operator, obviously, isn’t it? To increase the confusion, let us highlight that our experience-grounded intuition is true in the following context:
e3 <- new.env()
e3[["x"]] <- "bacon"
eval(expr_comp, envir=e3)
## [1] TRUE
So where does the name `<` come from?
It is neither included in e2
nor e3
:
e2[["<"]]
## NULL
e3[["<"]]
## NULL
Is `<` hardcoded somewhere?
Or is it also dependent on the context?
Why is it visible when evaluating
an expression within e3
but not in e2
?
Studying[10] help("[[")
(see the Environments section therein), we discover that
e3[["<"]]
is equivalent to a call to
get("<", envir=e3, inherits=FALSE)
.
In help("get")
, we read that if the inherits
argument
is set to TRUE
(which is the default in get), then
the enclosing frames of the given environment are searched as well.
Continuing the example from the previous subsection:
get("<", envir=e2) # inherits=TRUE
## Error in get("<", envir = e2): object '<' not found
get("<", envir=e3) # inherits=TRUE
## function (e1, e2) .Primitive("<")
And indeed, we see that `<` is reachable from
within e3
but not e2
. It means that e3
points to another
environment where further information should be sought
if we the current container is left empty-handed.
Important
The reference (pointer) to the enclosing environment is integral to each environment (alongside a frame of objects). It can be fetched and set using the parent.env function.
16.2.3. Missing names are sought in enclosing environments#
To understand the idea of enclosing environments better, let us create two new environments whose enclosures are explicitly set as follows:
(e4 <- new.env(parent=e3))
## <environment: 0x558883a96948>
(e5 <- new.env(parent=e4))
## <environment: 0x558883975f78>
To verify that everything is in order, let us inspect the following:
print(e3) # this is the address of e3 by the way
## <environment: 0x558884072f90>
parent.env(e4) # e3 is the enclosing environment of e4
## <environment: 0x558884072f90>
parent.env(e5) # e4 is the enclosing environment of e5
## <environment: 0x558883a96948>
Also, let us bind two different objects with the name `y
` in e5
and e3
.
e5[["y"]] <- "spam"
e3[["y"]] <- function() "a function `y` in e3"
The current state of matters is depicted in Figure 16.1.

Figure 16.1 Example environments and their enclosures (original setting)#
Now, let us consider a simple expression featuring
the `y
` name only and evaluate it in the above environments:
expr_y <- quote(y)
eval(expr_y, envir=e3)
## function() "a function `y` in e3"
eval(expr_y, envir=e5)
## [1] "spam"
No surprises yet.
However, evaluating it in e4
, which does not feature
`y
`, yields:
eval(expr_y, envir=e4)
## function() "a function `y` in e3"
It returned `y
` from e4
’s enclosure, e3
.
Let us horse around with the enclosures of e5
and e4
so that we obtain the setting depicted in Figure 16.2:
parent.env(e5) <- e3
parent.env(e4) <- e5

Figure 16.2 Example environments and their enclosures (after the change made)#
Evaluating `y
` again in the same e4
nourishes a very different result:
eval(expr_y, envir=e4)
## [1] "spam"
Important
If a name is referred to in an expression but is missing in the current environment, it will be sought in its enclosure(s).
Note
There are some functions related to searching within
and modifying environments which optionally (see their inherits
argument)
allow for continuing explorations in the enclosures
until successful:
inherits=TRUE
by default:exists,
get,
inherits=FALSE
by default:assign,
rm (remove).
16.2.4. Looking for functions#
Interestingly, if a name is used instead of a function to be called,
the object sought is always[11] of the mode function
.
Consider a similar expression to the above, but this
time including the name `y
` playing a different role:
expr_y2 <- quote(y()) # a call to something named `y`
eval(expr_y2, envir=e4)
## [1] "a function `y` in e3"
In other words, what we used here was not:
get("y", envir=e4)
## [1] "spam"
but:
get("y", envir=e4, mode="function")
## function() "a function `y` in e3"
Note
“name()
”,
“"name"()
”,
and “`name
`()
”
are synonymous.
However, the first expression is acceptable
only if the name
is syntactically valid.
16.2.5. Inspecting the search path#
Going back to our expression involving a relational operator:
expr_comp
## x < "eggs"
Why does the following work as expected?
eval(expr_comp, envir=e3)
## [1] TRUE
Well, we gathered all the bits to understand it now. Namely, `<` is a function that is looked up in the following way:
get("<", envir=e3, inherits=TRUE, mode="function")
## function (e1, e2) .Primitive("<")
It was reachable from e3
, which means that
e3
also has an enclosing environment.
parent.env(e3)
## <environment: R_GlobalEnv>
This is our global namespace, which was the current environment
when e3
was created. Still, we did not define `<` there.
It means that the global environment also has an enclosure.
We can explore the whole search path by starting at the global environment and following the enclosures recursively.
ecur <- globalenv() # starting point
repeat {
cat(paste0(format(ecur), " (", attr(ecur, "name"), ")")) # pretty-print
if (exists("<", envir=ecur, inherits=FALSE))
cat(strrep(" ", 20), "`<` found here!")
cat("\n")
ecur <- parent.env(ecur) # advance to its enclosure
}
## <environment: R_GlobalEnv> ()
## <environment: 0x5588842ab190> (.marekstuff)
## <environment: package:stats> (package:stats)
## <environment: package:graphics> (package:graphics)
## <environment: package:grDevices> (package:grDevices)
## <environment: package:utils> (package:utils)
## <environment: package:datasets> (package:datasets)
## <environment: package:methods> (package:methods)
## <environment: 0x558882343960> (Autoloads)
## <environment: base> () `<` found here!
## <environment: R_EmptyEnv> ()
## Error in parent.env(ecur): the empty environment has no parent
Underneath the global environment, there is a whole list of attached packages:
packages attached by the user (.marekstuff is used internally in the process of evaluating code in this book),
default packages (Section 7.3.1.1),
(**) Autoloads (for the promises-to-load R packages; compare help
("autoload")
; it is a technicality we may safely ignore here),the base package, which we can access directly by calling baseenv; it is where most of the fundamental functions from the previous chapters reside,
the empty environment (emptyenv), which is the only one followed by nothing (the loop would turn out endless otherwise).
It comes at no surprise that the `<` operator has been found in the base package.
Note
On a side note, the reason why this operation failed:
e2 <- as.environment(list(x="spam")) # to recall
eval(expr_comp, envir=e2)
## Error in x < "eggs": could not find function "<"
is because as.environment sets the enclosing environment to:
parent.env(e2)
## <environment: R_EmptyEnv>
See also list2env which gives greater control over this
(cf. its parent
argument).
16.2.6. Attaching to and detaching from the search path#
In Section 7.3.1, we mentioned that we can access
the objects exported by a package without attaching them to the search path
by using the pkg::object
syntax
(this loads the package if necessary).
For instance:
tools::toTitleCase("`tools` not attached to the search path")
## [1] "`tools` not Attached to the Search Path"
However:
toTitleCase("nope")
## Error in toTitleCase("nope"): could not find function "toTitleCase"
It does not work since toTitleCase is not reachable from the current environment.
Let us inspect the current search path (yes, there is a built-in function for that):
search()
## [1] ".GlobalEnv" ".marekstuff" "package:stats"
## [4] "package:graphics" "package:grDevices" "package:utils"
## [7] "package:datasets" "package:methods" "Autoloads"
## [10] "package:base"
Writing “pkg::
” might be inconvenient in the
long run (for some). Thus, we can call library to
attach the package to the search path immediately below the global environment.
library("tools")
The search path becomes (see Figure 16.3 for an illustration):
search()
## [1] ".GlobalEnv" "package:tools" ".marekstuff"
## [4] "package:stats" "package:graphics" "package:grDevices"
## [7] "package:utils" "package:datasets" "package:methods"
## [10] "Autoloads" "package:base"

Figure 16.3 The search path after attaching the tools package#
Therefore, what follows, now works as expected:
toTitleCase("Nobody expects the Spanish Inquisition")
## [1] "Nobody Expects the Spanish Inquisition"
We can use detach[12] to remove an item from the search path.
detach("package:tools")
"package:tools" %in% search() # not there anymore
## [1] FALSE
We use the “package:
” prefix for the reasons that we shall describe
in Section 16.3.5.
Note
We can also plug arbitrary environments[13] and named lists into the search path. Recalling that data frames are built on the latter (Section 12.1.6), some users rely on this technique to free themselves from the onerous burden of typing the object name each time its column is to be referred to:
attach(iris)
head(Sepal.Length) # iris[["Sepal.Length"]]
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
Here, the iris
list was converted to an environment,
and the necessary enclosures were set accordingly:
str(parent.env(globalenv()))
## <environment: 0x558884344860>
## - attr(*, "name")= chr "iris"
str(parent.env(parent.env(globalenv())))
## <environment: 0x5588842ab190>
## - attr(*, "name")= chr ".marekstuff"
Overall, attaching data frames is discouraged, especially outside the interactive mode. Let us not be too lazy.
detach(iris) # such a relief
16.2.7. Masking (shadowing) objects from down under#
An assignment via `<-` (and assign, by default) creates a binding in the current environment.
Therefore, even if the name to bind exists somewhere on the search path, it will not be modified. Instead, a new name will be created.
expr_comp <- quote("spam" < "eggs")
eval(expr_comp)
## [1] FALSE
Here, we rely on `<` from the base environment. Withal, we can create an object of the same name in the current (global) context:
`<` <- function(e1, e2)
{
warning("This is not the base `<`, mate.")
NA
}
Now we have two different functions of the same name. When we evaluate an expression within the current environment or any of its “descendants”, the new name will shadow the base one:
eval(expr_comp)
## Warning in "spam" < "eggs": This is not the base `<`, mate.
## [1] NA
eval(expr_comp, envir=e5) # e5's enclosure's enclosure is the global env
## Warning in "spam" < "eggs": This is not the base `<`, mate.
## [1] NA
But we can still call the original function directly:
base::`<`(1, 2)
## [1] TRUE
It is also reachable from within the current environment’s “ancestors”:
eval(expr_comp, envir=as.environment("package:utils"))
## [1] FALSE
Before proceeding any further, let us clean up after ourselves. Otherwise, we will be asking for trouble.
rm("<") # removes `<` from the global environment
An attached package may introduce some object names that are also available elsewhere. For instance:
library("stringx")
## Attaching package: 'stringx'
## The following objects are masked from 'package:base': casefold, chartr,
## endsWith, gregexec, gregexpr, grep, grepl, gsub, ISOdate, ISOdatetime,
## nchar, nzchar, paste, paste0, regexec, regexpr, sprintf, startsWith,
## strftime, strptime, strrep, strsplit, strtrim, strwrap, sub, substr,
## substr<-, substring, substring<-, Sys.time, tolower, toupper, trimws,
## xtfrm, xtfrm.default
Therefore, in the current context, we have what follows:
toupper("Groß") # stringx::toupper
## [1] "GROSS"
base::toupper("Groß")
## [1] "GROß"
Sometimes[14], we can use
assign(..., inherits=TRUE)
or its synonym, `<<-`, to modify the existing
binding (without creating a new one if not necessary).
Note
Let us attach the iris
data frame (named list) to the search
path again:
attach(iris)
Sepal.Length[1] <- 0
It does not modify the original iris
nor
its converted-to-an-environment copy that we can find in the search
path. Instead, a new vector named Sepal.Length
has been created
in the current environment:
exists("Sepal.Length", envir=globalenv(), inherits=FALSE)
## [1] TRUE
We can verify the above statement as follows:
rm("Sepal.Length") # removes the one in the global environment
Sepal.Length[1] # `iris` from the search path
## [1] 5.1
iris[["Sepal.Length"]][1] # the original `iris`
## [1] 5.1
However, by writing:
Sepal.Length[1] <<- 0 # uses assign(..., inherits=TRUE)
We changed the state of the environment on the search path.
exists("Sepal.Length", envir=globalenv(), inherits=FALSE)
## [1] FALSE
Sepal.Length[1] # `iris` from the search path
## [1] 0
Yet, the original iris
object is left untouched.
There is no mechanism in place that would synchronise
the original data frame and the object in the search path.
iris[["Sepal.Length"]][1] # the original `iris`
## [1] 5.1
16.3. Closures#
So far, we have only covered the rules for evaluating standalone R expressions. In this section, we look at what happens inside the invoked functions.
16.3.1. Local environment#
When we call a function, a new temporary environment is created. It is where all arguments[15] and local variables are emplaced. During the function evaluation, this environment becomes the current one. After the call, the environment ceases to exist, and we return to the previous environment from the call stack.
Consider the following function:
test <- function(x)
{
print(ls()) # list object names in the current environment
y <- x^2 # creates a new variable
print(sys.frame(sys.nframe())) # get the ID of the current environment
str(as.list(sys.frame(sys.nframe()))) # display its contents
}
First call:
test(2)
## [1] "x"
## <environment: 0x558884328378>
## List of 2
## $ y: num 4
## $ x: num 2
Second call:
test(3)
## [1] "x"
## <environment: 0x5588846da6c0>
## List of 2
## $ y: num 9
## $ x: num 3
Each time, the current environment is different.
This is why we do not see the `y
` variable at the start
of the second call. It is a brilliantly simple implementation
of the storage for local variables.
16.3.2. Lexical scope and function closures#
We were able to access the print function (amongst others) in the above example. This should make us wonder what the enclosing environment of that local environment is.
print_enclosure <- function()
print(parent.env(sys.frame(sys.nframe())))
print_enclosure()
## <environment: R_GlobalEnv>
It is the global environment. Let us evaluate the same function from within another one:
call_print_enclosure <- function()
print_enclosure()
call_print_enclosure()
## <environment: R_GlobalEnv>
It is the global environment again. If R used the so-called dynamic scoping, we would see the local environment of the function that invoked the one above. If this were true, we would have access to the caller’s local variables from within the callee.
Important
Objects of the type closure
, i.e., user-defined[16] functions,
consist of three components:
a list of formal arguments (compare formals and Section 15.4.1);
an expression (compare body and Section 15.4.1 again);
a reference to the associated environment where the function might store data for further use (see environment).
By default, the associated environment is set to the current environment where the function was created.
A local environment created during a function’s call has this associated environment as its closure.
Due to this, we say that R has lexical (static) scope.
Thence, in the above example, we have:
environment(print_enclosure) # print the associated environment
## <environment: R_GlobalEnv>
Consider the following function that prints out `x
` defined outside
of its scope:
test <- function() print(x)
Now:
x <- "x in global"
test()
## [1] "x in global"
printed out `x
` from the user workspace as this is precisely
the environment associated with the function.
However, setting the associated environment to a different
one that also happens to define `x
` will give a different result:
e <- new.env()
e[["x"]] <- "x in e"
environment(test) <- e
test()
## [1] "x in e"
Consider the following example:
test <- function()
{
cat(sprintf("test: current env: %s\n", format(sys.frame(sys.nframe()))))
subtest <- function()
{
e <- sys.frame(sys.nframe())
cat(sprintf("subtest: enclosing env: %s\n", format(parent.env(e))))
cat(sprintf("x = %s\n", x))
}
x <- "spam"
subtest()
environment(subtest) <- globalenv()
subtest()
}
x <- "bacon"
test()
## test: current env: <environment: 0x5588840af080>
## subtest: enclosing env: <environment: 0x5588840af080>
## x = spam
## subtest: enclosing env: <environment: R_GlobalEnv>
## x = bacon
Here is what happened.
A call to test creates a local function subtest, whose associated environment is set to the local frame of the current call. It is precisely the current environment where subtest was created.
The above explains why subtest can access the local variable `
x
` inside its maker.Then we change the environment associated with subtest to the global one.
In the next call to subtest, unsurprisingly, we gain access to `
x
` in the user workspace.
Note
In lexical (static) scoping, which variables a function refers to can be deduced by reading the function’s body only and not how it is called in other contexts. This is the theory. Nevertheless, the fact that we can freely modify the associated environment anywhere can complicate the program analysis greatly.
If we find the rules of lexical scoping confusing, we ought to refrain from referring to objects outside of the current scope (“global” or “non-local” variables”) except for the functions defined as top-level ones or coming from the external packages. It is what we have been doing so far most of the time.
16.3.3. Application: Function factories#
As closures are functions with associated environments, and the role of environments is to store information, we can consider closures = functions + data.
We have already seen that in Section 9.5.3, where we described the approxfun function. To recall:
x <- seq(0, 1, length.out=11)
f1 <- approxfun(x, x^2)
print(f1)
## function (v)
## .approxfun(x, y, v, method, yleft, yright, f, na.rm)
## <environment: 0x5588845a2050>
The variables `x
`, `y
`, etc., that f1’s source code
refers to, are stored inside its dedicated, associated environment:
ls(envir=environment(f1))
## [1] "f" "method" "na.rm" "x" "y" "yleft" "yright"
We are used to referring to the routines such as approxfun as function factories. They return functions whose non-local variables are stored in their associated environments.
Consider the following function factory:
gen_power <- function(p)
function(x) x^p # p references a non-local variable
A call to gen_power creates a local environment
that defines one variable, `p
`, where argument’s value
is stored. Then, we create a function whose associated environment
(remember that R uses lexical scoping) is that local one.
Therefore, the reference to the non-local `p
` in its body
will be resolved therein.
This new function is returned by gen_power to the caller.
Normally, the local environment would be destroyed, but it is still
used after the call, so it will not be garbage-collected.
Example calls:
(square <- gen_power(2))
## function(x) x^p
## <environment: 0x5588828e7548>
(cube <- gen_power(3))
## function(x) x^p
## <environment: 0x558882e5e8e0>
cube(2)
## [1] 8
square(2)
## [1] 4
The underlying environment can, of course, be modified:
assign("p", 7, envir=environment(cube))
cube(2) # so much for the cube
## [1] 128
Negate is another example of a built-in function factory. Study its source code:
print(Negate)
## function (f)
## {
## f <- match.fun(f)
## function(...) !f(...)
## }
## <environment: namespace:base>
In [37], the following example is given:
account <- function(total)
list(
balance = function() total,
deposit = function(amount) total <<- total+amount,
withdraw = function(amount) total <<- total-amount
)
Robert <- account(1000)
Ross <- account(500)
Robert$deposit(100)
Ross$withdraw(150)
Robert$balance()
## [1] 1100
Ross$balance()
## [1] 350
We can now fully understand why the above code does what it does. It somewhat resembles a class with three methods and one data field. No wonder why reference classes (Section 16.1.5) were introduced at some point: they are based on the same concept.
Write a function factory named gen_counter which implements a simple counter that is increased by one on each call thereto.
gen_counter <- function() ...to.do...
c1 <- gen_counter()
c2 <- gen_counter()
c(c1(), c1(), c2(), c1(), c2())
## [1] 1 2 1 3 2
Moreover, compose a function that resets a given counter to zero.
reset_counter <- function(counter_fun) ...to.do...
reset_counter(c1)
c1()
## [1] 1
16.3.4. Accessing the calling environment#
We know that the environment associated with a function is not necessarily the same as the environment from which the function was called, sometimes quite confusingly referred to as the parent frame.
R maintains a whole frame stack. The global environment is assigned the number 0. Each call to a function increases the stack by one frame, whereas returning from a call decreases the counter.
To get the current frame number, we call sys.nframe.
This is why sys.frame(
sys.nframe())
returns the current environment.
We can fetch the calling environment by referring to
parent.frame()
or
sys.frame(
sys.parent())
,
amongst others[17].
Thanks to parent.frame, we may evaluate arbitrary expressions in (on behalf of) the calling environment. Typically, we should not be doing that. However, some built-in functions rely on this feature, hence our avid interest in it here. We will discuss a few of its use cases below.
16.3.5. Package namespaces (*)#
Any R package, say, pkg
, defines two environments:
namespace:pkg
– where all objects are defined (functions, vectors, etc.); this is the enclosing environment of all closures in the package;package:pkg
– can be attached to the search path; contains selected[18] objects fromnamespace:pkg
that can be accessed by the user.
We will use our example package discussed in Section 7.3.1.2. Recall that it is available for download from https://github.com/gagolews/rpackagedemo/.
library("rpackagedemo") # https://github.com/gagolews/rpackagedemo/
## Loading required package: tools
Here is its DESCRIPTION
file:
Package: rpackagedemo
Type: Package
Title: Just a Demo R Package
Version: 1.0.2
Date: 1970-01-01
Author: Anonymous Llama
Maintainer: Unnamed Kangaroo <roo@inthebush.au>
Description: Provides a function named bamboo(), just give it a shot.
License: GPL (>= 2)
Imports: stringx
Depends: tools
The Import
and Depends
fields specify which packages
(apart from base) ours depends on.
As we can see above, all items in the latter list are attached
to the search path on a call to library.
The NAMESPACE
file specifies the names imported
from other packages and those that are
expected to be visible to the user:
importFrom(stringx, sprintf)
importFrom(tools, toTitleCase)
S3method(print, koala)
S3method(print, kangaroo, .a_hidden_method_to_print_a_roo)
export(bamboo)
Thus, our package exports one object, a function named bamboo
(we will discuss the S3 methods in the next section).
It is included in the “package:rpackagedemo
” environment
attached to the search path:
ls(envir=as.environment("package:rpackagedemo")) # ls("package:rpackagedemo")
## [1] "bamboo"
Let us give it a shot:
bamboo("spanish inquisition") # rpackagedemo::bamboo
## G'day, Spanish Inquisition!
We did not expect that at all.
Let us inspect its source code:
print(bamboo)
## function (x = "world")
## cat(prepare_message(toTitleCase(x)))
## <environment: namespace:rpackagedemo>
We see a call to toTitleCase (most likely from tools, and this is indeed the case) and to prepare_message which is not listed in the package’s imports. We definitely cannot access it directly:
prepare_message
## Error in eval(expr, envir, enclos): object 'prepare_message' not found
It turns out that it is the package’s internal function.
It is thus included in the “namespace:rpackagedemo
” environment.
(e <- environment(rpackagedemo::bamboo)) # or getNamespace("rpackagedemo")
## <environment: namespace:rpackagedemo>
ls(envir=e)
## [1] "bamboo" "prepare_message" "print.koala"
We can get it via the `:::
` operator:
print(rpackagedemo:::prepare_message)
## function (x)
## sprintf("G'day, %s!\n", x)
## <environment: namespace:rpackagedemo>
All functions defined in a package have the corresponding package namespace as their associated environment. As a consequence, bamboo can refer to prepare_message directly.
Now, it will be educative to inspect the enclosure
of “namespace:rpackagedemo
”:
(e <- parent.env(e))
## <environment: 0x558883e3ab38>
## attr(,"name")
## [1] "imports:rpackagedemo"
ls(envir=e)
## [1] "sprintf" "toTitleCase"
It is the environment featuring the bindings to all the imported
objects. This is why our package can also refer to
stringx::
sprintf
and tools::
toTitleCase.
Its enclosure is the namespace of the base package
(not to be confused with “package:base
”):
(e <- parent.env(e))
## <environment: namespace:base>
The next enclosure is, interestingly, the global environment:
(e <- parent.env(e))
## <environment: R_GlobalEnv>
Then, of course, the whole search path follows (Section 16.2.5); see Figure 16.4 for an illustration.

Figure 16.4 A search path for an example package; Dashed lines represent environments associated with closures, whereas solid lines denote enclosing environments; References to objects within each package are resolved inside their respective namespaces#
Note
(**)
All environments related to packages are locked, which means that we cannot
change any bindings within their frames; compare help("lockEnvironment")
.
In the extremely rare event of needing to patch an existing function within an already loaded package, we can call unlockBinding followed by assign to change its definition.
new_message <- function (x) sprintf("Nobody expects %s!\n", x)
e <- getNamespace("rpackagedemo")
environment(new_message) <- e # set enclosing environment (very important!)
unlockBinding("prepare_message", e)
assign("prepare_message", new_message, e)
rm("new_message")
bamboo("the spanish inquisition")
## Nobody expects The Spanish Inquisition!
R is indeed a quite hackable language (except in the cases where it is not).
(**)
A function or a package might register certain functions
(hooks) to be called on various events, e.g.,
attaching a package to the search patch;
see help("setHook")
and help(".onAttach")
.
Inspect the source code of plot.new and notice a reference to a hook named
"before.plot.new"
. Try setting such a hook yourself (e.g., one that changes some graphics parameters discussed in Section 13.2) and see what happens on each call to a plotting function.Define the .onLoad, .onAttach, .onUnload, and .onDetach functions in your own R package and take note of when they are invoked.
(**)
For the purpose of this book,
we have registered a custom "before.plot.new"
hook
that sets our favourite graphics parameters that we described
in Section 13.2.4. Moreover, we modified plot.window slightly
to obtain the white grid on a grey background,
e.g., in Figure 13.13.
Apply similar hacks to the graphics package so that its outputs suit your taste better.
16.3.6. S3 method lookup by UseMethod (*)#
Let us go back to the rpackagedemo example.
Inspecting the NAMESPACE
file, we see that the package defines
two print methods: for printing S3 objects
of the classes koala
and kangaroo
.
The package is attached to the search path. Therefore, we can access these methods via a call to the corresponding generic:
print(structure("Tiny Teddy", class="koala"))
## This is a cute koala, Tiny Teddy
print(structure("Moike", class="kangaroo"))
## This is a very naughty kangaroo, Moike
However, the package does not make the definitions of these S3 methods available to the user, at least not directly. It is not the first time when we have experienced such an obscuration.
In the first case, the method is simply hidden in the package namespace. It is still available under the expected name:
rpackagedemo:::print.koala
## function (x, ...)
## cat(sprintf("This is a cute koala, %s\n", x))
## <environment: namespace:rpackagedemo>
In the second case, it appears under a very different identifier:
rpackagedemo:::.a_hidden_method_to_print_a_roo
## function (x, ...)
## cat(sprintf("This is a very naughty kangaroo, %s\n", x))
## <environment: namespace:rpackagedemo>
Since the base UseMethod is still able to find them, we suspect there probably is a global register of all S3 methods. And this is indeed the case.
We can use getS3method to get access to what is available via UseMethod:
getS3method("print", "kangaroo")
## function (x, ...)
## cat(sprintf("This is a very naughty kangaroo, %s\n", x))
## <environment: namespace:rpackagedemo>
Important
Overall, the search for methods is performed in two places:
in the environment where the generic is called (the current environment); this is why defining print.kangaroo in the current scope will use this method instead of the one from the package:
print.kangaroo <- function(x, ...) cat("Nobody expects", x, "\n") print(structure("the Spanish Inquisition", class="kangaroo")) ## Nobody expects the Spanish Inquisition
in the internal S3 methods table (registration database).
See help("UseMethod")
for more details.
Also, recall that in Section 10.2.3, we said that
UseMethod is not the only way to perform method dispatching.
There are also internal generics (help("InternalMethods")
)
and group generic functions (help("groupGeneric")
).
(*)
Study the source code of getS3method.
Note the reference to the base::.__S3MethodsTable__.
object
which is for R’s internal use (we ought not to tinker with it directly).
Moreover, study the registerS3method function with which we can
define new S3 methods not necessarily following the
generic.classname convention.
16.4. Exercises#
Asking too many questions is not very charismatic, but challenge yourself by trying to find the answer to the following.
What is the role of a frame in an environment?
What is the role of an enclosing environment? How to read it or set it?
What is the difference between a named list and an environment?
What functions and operators work on named lists but cannot be applied on environments?
What do we mean by saying that environments are not passed by value to R functions?
What do we mean by saying that objects are sometimes copied on demand?
What happens if a name listed in an expression to be evaluated is not found in the current environment?
How and what kind of objects can we attach to the search path?
What happens if we have two identical object names on the search path?
What do we mean by saying that package namespaces are locked when loaded?
What is the current environment when we evaluate an expression “on the console”?
What is the difference between `<-` and `<<-`?
Do packages have their own search paths?
What is a function closure?
What is the difference between the dynamic and the lexical scope?
When evaluating a function, how is the enclosure of the current (local) environment determined? Is it the same as the calling environment? How to get it/them programmatically?
How and why function factories work?
(*) What is the difference between the “
package:pkg
” and “namespace:pkg
” environments?How do we fetch the definition of an S3 method that does not seem to be available directly via the standard accessor generic.classname?
(*) base
::
print.data.frame calls base::
format.data.frame (directly). Will the introduction of print.data.frame in the current environment affect how data frames are printed?(*) On the other hand, base
::
format.data.frame calls the generic base::
format on all the input data frame’s columns. Will the overloading of the particular methods affect how data frames are printed?
Calling:
pkg <- available.packages()
pkg[, "Package"] # a list of the names of available packages
pkg[, "Depends"] # dependencies
gives the list of available packages and their dependencies. Convert the dependency lists to a list of character vectors (preferably using regular expressions; see Section 6.2.4).
Then, generate a list of reverse dependencies: what packages depend on each given package.
Use an object of the type environment
(a hash table)
to map the package names to numeric IDs (indexes).
It will significantly speed up the whole process
(compare it to a named list-based implementation).
According to [66], compare also Section 9.4.6, a call to:
add(x, f(x)) <<- v
translates to:
`*tmp*` <- get(x, envir=parent.env(), inherits=TRUE)
x <<- `add<-`(`*tmp*`, f(x), v) # note: not f(`*tmp*`)
rm(`*tmp*`)
Given:
`add<-` <- function(x, where=TRUE, value)
{
x[where] <- x[where] + value
x # the modified object that will replace the original one
}
y <- 1:5
f <- function() { y <- -(1:5); add(y, y==-3) <<- 1000; y }
explain why we get the following results:
f()
## [1] -1 -2 -3 -4 -5
print(y)
## [1] 1 2 1003 4 5