16. 🚧🚧 Environments and Evaluation (**)
The open-access textbook Deep R Programming by Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF). It is a non-profit project. This book is still a work in progress. Beta versions of Chapters 1–12 are already complete, but there will be more. In the meantime, any bug/typos reports/fixes are appreciated. Although available online, it is a whole course; it should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Also, check out my other book, Minimalist Data Wrangling with Python [20].
In the first part of our book, we have discussed quite a few basic object types: numeric, logical, and character vectors, lists (generic vectors), and functions.
Environments, which we introduce in this chapter, just like lists, are instances of recursive types (compare the diagram in Figure 16.1).
Even though we rarely interact with them directly (unless we need a hash table-like data structure with quick by-name element look-up), they are crucial for R itself: they form the basis of the environment model of evaluation which governs how expressions are computed; see Section 16.2.
Important
Each object of type environment consists of:
a frame[1] (Section 16.1) – stores a set of bindings, which associate variable names with their corresponding values; it can be thought of as a container of named R objects of any type;
a pointer to an enclosing environment[2] (Section 16.3).
🚧 This chapter is under construction. Please come back later.
16.1. Frames: Environments as Object Containers
To create a new, empty environment, we can call the new.env function:
e1 <- new.env()
typeof(e1)
## [1] "environment"
In this section, we treat environments merely as containers for named objects of any kind, i.e., we deal with the frame part thereof.
Let us insert some elements into e1
:
e1[["x"]] <- "x in e1"
e1[["y"]] <- 1:3
The `[[` operator provides some named list-like look-and-feel also in the case of element extraction:
e1[["x"]]
## [1] "x in e1"
e1[["spam"]] # does not exist
## NULL
e1[["y"]] <- e1[["y"]]*10 # replace with new content
e1[["z"]] <- NULL # unlike in the case of lists, creates a new element
16.1.1. Printing
Let us note that the printing of an environment is quite awkward:
print(e1) # same with str(e1)
## <environment: 0x55b724cb5dd8>
Later we will mention that this is the address where e1
is stored
in computer’s memory.
As we have said, these objects are rather of internal interest, so nobody cared[3] about making the interaction therewith particularly smooth.
However, we can easily get the list of objects stored within the container using names[4]:
names(e1) # compare attr(e1, "names") – it is not set
## [1] "x" "y" "z"
Also:
length(e1)
## [1] 3
gives the number of objects in the container.
16.1.2. Environments vs Named Lists
Environment frames, in some sense, can be thought of as named lists, but the set of admissible operations is severely restricted. In particular, we cannot extract more than one element at the same time with the index operator:
e1[c("x", "y")] # but see the mget function
## Error in e1[c("x", "y")]: object of type 'environment' is not subsettable
nor can we refer to the elements by position:
e1[[1]] <- "bad key"
## Error in e1[[1]] <- "bad key": wrong args for environment subassignment
Check if lapply and Map can be applied directly on environments. Also, can we iterate over their elements using a for loop?
Actually, named lists can be converted to environments and vice versa using as.list and as.environment.
as.list(e1)
## $x
## [1] "x in e1"
##
## $y
## [1] 10 20 30
##
## $z
## NULL
as.environment(list(u=42, whatever="it's not gonna be printed anyway"))
## <environment: 0x55b724d09b38>
as.list(as.environment(list(x=1, y=2, x=3))) # no duplicates allowed
## $y
## [1] 2
##
## $x
## [1] 3
16.1.3. Hash Maps: Fast Element Look-up by Name
Environment frames are internally implemented using hash tables (hash maps; see, e.g., [9, 33]) with string keys.
Important
A hash table is a data structure that allows for a very quick[5] lookup and insertion of individual elements by name.
This comes at a price, including what we have already observed above:
the elements are not particularly unordered: they cannot be referred to by a numeric index;
all element names must be unique.
Note
A list may be considered a sequence, but an environment frame is only set (a bag) of key–value pairs. In the majority of numerical computing applications, we would rather store, iterate over, and process all the elements in order, hence the greater popularity of the former. Lists still allow for an element look-up by name, even though this is slightly slower[6]. Hence, they are much more universal.
A natural use case of manually-created environment frames deals with grouping a series of objects identified by character string keys.
Consider a simple pseudocode for counting the number of occurrences of objects in a given container:
for (key in some_container) {
if (!is.null(counter[["key"]]))
counter[["key"]] <- counter[["key"]]+1
else
counter[["key"]] <- 1
}
If some_container
is large, say, of size \(n\)
(let us say it is generated on the fly by reading some data stream),
then the run-time of the above will depend on the type of the data structure
used.
If counter
is a list, then, theoretically, the worst-case performance
will be \(O(n^2)\) (if all keys are unique).
However, for environments, it is an order of magnitude faster:
down to amortised \(O(n)\).
(*) Implement the above pseudocode and benchmark the two data structures using proc.time on some example data:
t0 <- proc.time() # timer start
# ... to do - something time-consuming ...
print(proc.time() - t0) # elapsed time
(*) Determine the number of unique text lines in a very large file (assuming that the set of unique text lines fits into memory, but the file itself does not). Also, determine the five most frequently occurring text lines.
16.1.4. Pass-by-Value, Copy on Demand – Not for Environments
Given any R object, say, x
,
when we issue:
y <- x
its copy[7] is made so that y
and x
are independent
of each other. In other words, any change in the state of x
(or y
)
is not reflected in the state of y
(or x
).
For instance:
x <- list(a=1)
y <- x
y[["a"]] <- y[["a"]]+1
print(y)
## $a
## [1] 2
print(x) # not affected – `x` and `y` are independent
## $a
## [1] 1
The same happens with arguments that we feed to the functions:
mod <- function(y, key) # it's like: local_y <- passed_argument
{
y[[key]] <- y[[key]]+1
y
}
mod(x, "a") # returns a modified copy of `x`
## $a
## [1] 2
print(x)
## $a
## [1] 1
We thus say that R has pass-by-value semantics.
Important
Environments are the only[8] R objects that have an assign- and pass-by-reference semantics.
In other words, if we perform:
x <- as.environment(x)
y <- x
then the names x
and y
are bound with exactly the same objects
in computer’s memory.
y[["a"]] <- y[["a"]]+1
print(y[["a"]])
## [1] 2
print(x[["a"]]) # `x` is `y`, `y` is `x`
## [1] 2
This is exactly seen when we print the address where the environments are located:
print(x)
## <environment: 0x55b7240e8418>
print(y)
## <environment: 0x55b7240e8418>
The same when we pass them to a function:
mod(y, "a") # pass-by-reference (`y` is `x`, remember?)
## <environment: 0x55b7240e8418>
x[["a"]] # `x` has changed (names `y` and `x` are bound to the same object)
## [1] 3
Thus, any changes we make to an environment passed as an argument to a function will be visible “outside” the call. This minimises time and memory use in certain situations. If this sounds complicated, we should rather be avoiding these data structures whatsoever. As we have said, an R user can live without them.
Note
(*)
Actually, for efficiency reasons, when we write “y <- x
” ,
a copy of x
(unless it is an environment)
is only created if absolutely necessary.
Here is some benchmarking of the copy-on-demand mechanism.
n <- 100000000 # like, a lot
Creation of a new large numeric vector:
t0 <- proc.time(); x <- numeric(n); proc.time() - t0
## user system elapsed
## 0.123 0.189 0.312
Creation of a (delayed) copy:
t0 <- proc.time(); y <- x; proc.time() - t0
## user system elapsed
## 0.001 0.000 0.000
This was instant. Thus, we definitely did not clone the n
data cells.
Copy-on-demand is implemented using some quite simple reference counting;
compare Section 14.4. That – temporarily –
x
and y
point to the same address in memory can be inspected by calling:
.Internal(inspect(x))
## @7fe7e58a0010 14 REALSXP g0c7 [REF(4)] (len=100000000, tl=0) 0,0,0,0,0,...
.Internal(inspect(y))
## @7fe7e58a0010 14 REALSXP g0c7 [REF(5)] (len=100000000, tl=0) 0,0,0,0,0,...
The real copying is only triggered when we try to modify x
or y
.
This is when they need to be separated.
t0 <- proc.time(); y[1] <- 1; proc.time() - t0
## user system elapsed
## 0.153 0.207 0.360
Now x
and y
are different objects.
.Internal(inspect(x))
## @7fe7e58a0010 14 REALSXP g0c7 [MARK,REF(5)] (len=100000000, tl=0) 0,0,0,0,0,...
.Internal(inspect(y))
## @7fe7b5daf010 14 REALSXP g0c7 [MARK,REF(1)] (len=100000000, tl=0) 1,0,0,0,0,...
Note that the elapsed time is similar to that needed to create
x
from scratch.
Further modifications can already be fast:
t0 <- proc.time(); y[2] <- 2; proc.time() - t0
## user system elapsed
## 0.213 0.212 0.424
16.1.5. 🚧 A Note on Reference Classes (*)
16.2. 🚧 The Environment Model of Evaluation
16.3. 🚧 Enclosing Environments
16.4. 🚧 Evaluating Functions
16.4.1. 🚧 Evaluation of Default Arguments
16.4.2. 🚧 Not All Arguments Need to Be Evaluated
Calls such as
`if`(test, ifyes, ifno)
or `&&`(mustbe, maybe)
do not have to evaluate all their arguments.
{cat(" first "); FALSE} && {cat(" second "); FALSE}
## first
## [1] FALSE
{cat(" first "); TRUE} && {cat(" Spanish Inquisition "); FALSE}
## first Spanish Inquisition
## [1] FALSE
This is also a kind of nonstandard evaluation.
We can write such functions too.
test <- function(a, b, c) a + c # b is unused
test({cat(" spam "); 1}, {cat(" eggs "); 10}, {cat(" bacon "); 100})
## spam bacon
## [1] 101
Here, the second argument was not used, therefore it did not have to be evaluated. And it was not.
A more advanced example:
test <- function(a, b, c)
{
cat("Arguments passed to the functions (expressions): \n")
cat("a = ", deparse(substitute(a)), "\n")
cat("b = ", deparse(substitute(b)), "\n")
cat("c = ", deparse(substitute(c)), "\n")
cat("Using a: ")
a # we don't even have to be particularly creative
cat("Using a and c: ")
retval <- a + c # a is reused, b is unused
cat("Cheers! ")
retval
}
test({cat(" spam "); 1}, {cat(" eggs "); MeAn(egGs)}, {cat(" bacon "); 100})
## Arguments passed to the functions (expressions):
## a = { cat(" spam ") 1 }
## b = { cat(" eggs ") MeAn(egGs) }
## c = { cat(" bacon ") 100 }
## Using a: spam Using a and c: bacon Cheers!
## [1] 101
Notice that:
either the evaluation of an argument does not happen or it is triggered only once (in which case the result is cached);
evaluation is delayed until the very first request for the underlying value;
regardless whether we request it or not, we still have access to the underlying unevaluated expression passed as an argument.
As a consequence, we can pass arbitrary gibberish as the second argument (and we did above).
In other words, the evaluation of arguments can be postponed arbitrarily, possibly forever.
16.4.3. 🚧 Matching of Argument Names (TODO: MOVE)
16.4.4. 🚧 Ellipsis Revisited
16.4.5. 🚧 S3 Method Lookup by UseMethod
16.4.6. 🚧 Overloading S3 Group Generics
16.4.7. 🚧 Package Namespaces
16.5. 🚧 Formulas, `~` (*)
16.6. 🚧 Exercises
Answer the following questions:
What is the role of a frame in an environment?
What is the role of an enclosing environment?
What is the difference between a named list and an environment?
What functions and operators work on named lists but cannot be applied on environments?
What do we mean by saying that environments are not passed by value to R functions?
What do we mean by saying that sometimes objects are copied on demand?
🚧 TODO…
(*) 🚧 TODO: assign consecutive, unique numeric IDs for a large list of items
pkg <- available.packages()
pkg[, "Package"] # a list of the names of available packages
pkg[, "Depends"] # dependencies
Convert the dependency lists to a list of character vectors (preferably using a regular expression).
Then, generate a list of reverse dependencies:
What packages depend on each given package.
This will be much faster using an environment.
16.7. 🚧 Outro
This is the end.
Recall our first approximation to the classification of R Data Types that we presented in the Preface.
As a summary of what we have covered, Figure 16.1 gives a much broader picture.

Figure 16.1 R data types
Now that we have reached the end of this course, we might be interested in reading the following guides:
Also, the NEWS
files available at
https://cran.r-project.org/doc/manuals/r-release/
will keep us up to date with new features, bug fixes,
and deprecated functionality.
Good luck.