9. Designing Functions

The open-access textbook Deep R Programming by Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF). It is a non-profit project. This book is still a work in progress. Beta versions of Chapters 1–12 are already complete, but there will be more. In the meantime, any bug/typos reports/fixes are appreciated. Although available online, it is a whole course; it should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Also, check out my other book, Minimalist Data Wrangling with Python [20].

In Chapter 7, we learnt how to write our own functions. This skill is key to enforcing the good development practice of avoiding the repetition of code: running the same command sequence on different data.

This chapter is devoted to the designing of such reusable modules so that they are easier to use, test, and maintain. We also provide some more technical details which were not of the highest importance upon our first exposure to this topic, but which our crucial to our better understanding of how R works.

9.1. Principles of Sustainable Design

Good design is more art than science. As usual in real life, we will need to make many compromises. This is because improving things with regard to one criterion sometimes makes them worse with respect to other aspects[1] (also which we are not aware of). Also, not everything that counts can and will be counted. Below are some observations, ideas, and food for thought.

9.1.1. To Write or to Abstain

Functions that we write ourselves can oftentimes be considered merely creative combinations of the building blocks available in base R or a few high-quality add-on packages[2]. Some are simpler than others. Thus, there is a question if a new operation should be introduced at all: whether we are faced with the case of multiplying entities without necessity.

On the one hand, the DRY (don’t repeat yourself) principle tells us that most frequently used (say, at least 3 times) code chunks should be generalised in the form of a new function. This is definitely a correct approach with regard to non-trivial operations.

On the other hand, not every generalisation is necessarily welcome. Let us say that we are lazy and tired of writing g(f(x)) for the n-th time. Why don’t we therefore introduce h defined as a combination of g and f? This might seem like a good idea, but let us not take it for granted: being tired might be an indication of our body and mind needing a rest; being lazy can be a call for more self-discipline (not an overly popular word these days, but still, a precious trait).

Example 9.1

paste0 is a specialised version of paste, but having the sep argument hardcoded to an empty string.

  • Even if this might be the most often applied use case, is the introduction of a new function justifiable? Is it so hard to write paste="" each time?

  • Would changing paste’s default argument be better? That of course would harm backward compatibility, but what strategies could we apply to make the transition as smooth as possible?

  • Would it be better to introduce a new version of paste with sep defaulting to "", informing the users that the old version is deprecated and to be removed in, say, two years? Or maybe one year is better? Or five?

Example 9.2

In R 4.0, deparse1 has been introduced: it is merely a combination of deparse (see below) and paste:

print(deparse1)
## function (expr, collapse = " ", width.cutoff = 500L, ...) 
## paste(deparse(expr, width.cutoff, ...), collapse = collapse)
## <bytecode: 0x55637dc34330>
## <environment: namespace:base>

Let us say this covers 90% of use cases: was introducing it a justified idea then? What if that number was 99%?

Overall, more functions contribute to the information overload. We do not want our users to be overwhelmed by too many choices. Luckily, nothing is cemented once and for all. Had we done bad design choices resulting in our API’s being bloated, we can always clean up those that no longer spark joy.

9.1.2. To Pamper or to Challenge

Think about the kind of audience we would like to serve: is it our team only, students, professionals, certain client groups, etc.? Do they have mathematical, programming, engineering, or scientific background? Not everything that is appropriate for one cohort, will be valuable for another. And not everything that is good for some now, will be beneficial for them in the long run. People (their skills, attitudes, etc.) change.

Example 9.3

Assume we are writing a friendly and inclusive package for novices who would like to grasp the basics of data analysis as quickly[3] as possible. Without much effort, it would enable them to solve 80–95% of the most common, easy problems.

Think of introducing the students to a function that returns five largest observations in a given vector. Let us call it nlargest: so pleasant to use. It makes the students feel empowered quickly.

Still, when faced with the remaining 5–20% tasks, they will have to learn another, more advanced, generic, and powerful tool anyway (in our case, the base R itself). Are they determined and skilled enough to do that? Time will tell. The least we can do is to be explicit about it.

Recall that it took us some time to arrive at order and subsetting via `[`. Assuming that we read this book from the beginning to the end and solve all the exercises, which we should, we are now able to implement the said nlargest (and lots of other functions) ourselves, using a single line of code. This will also pay off in many scenarios that we will be facing in the future, e.g., when we consider matrices and data frames.

Yes, everyone will be reinventing their own nlargest this way. But this constitutes a great exercise: by our being too nice, some might have lost an opportunity to learn a new, more universal skill.

Although most of the users would really love to minimise the effort put into all their activities, ultimately, they sometimes need to learn new things. Let us thus not be afraid to teach them stuff.

Furthermore, we do not want to discourage experts (or experts to-be) by presenting them with overly simplified solutions that keep their hands tied when something more ambitious needs to be done.

9.1.3. To Build or to Reuse

In the short term, the fail fast philosophy encourages us to build our applications using prefabricated components. This is fantastic at the early stage of its life cycle. If we build something really simple or whose purpose is merely to illustrate an idea, show-off how “awesome” we are, or to educate, let us be explicit about it so that other users do not feel obliged to treat our product (exercise) seriously.

In the (not so likely, probabilistically speaking) event of its becoming successful, we should start thinking about the project’s long-term stability and sustainability. After all, relying on third-party functions, packages, or programs makes our software projects less… independent. This may be problematic, because:

  • the dependencies might not be available on every platform or may behave differently across various system configurations,

  • they may be huge (and can depend on other external software too),

  • their APIs may change which could result in our project’s not working anymore,

  • their functionality can change which can lead to some unexpected behaviours.

Hence, it might be a good idea to rewrite some parts from scratch on our own.

Exercise 9.4

Identify some R packages on CRAN with many dependencies. See what functions do they import from other packages. How often it is just a few lines of code?

The Unix philosophy emphasises upon the building and using of minimalist yet nontrivial, single-purpose, high quality pieces of software that can work as parts of larger, custom pipelines. R serves as a glue language quite well.

In the long run, some of our software projects might converge to such a tool – it might be a good idea to standardise our API (e.g., make it available from the command-line; Section 1.2) so that the users of other languages can benefit from our work too.

Important

If our project is merely a modified interface/front-end to a larger program developed by others, we should be humble about it and make sure it is not us who get all the credit for other people’s work.

Also, we should state very clearly how can the original tools be used to achieve the same goals, e.g., when working from the command line.

9.2. Managing Data Flow

A function, most of the time, can and should be treated as a black box: its callers do not have to care what it hides inside. After all, they are supposed to use it: given some inputs, they expect a well-defined (read: explained in very detail in the function’s manual; see Section 9.3.3) outputs.

9.2.1. Checking Input Data Integrity and Argument Handling

A function takes R objects of any kind as arguments, but it does not mean that feeding it with every- or any-thing is healthy for its guts.

When designing functions, it is best to handle the inputs in a manner similar to base R’s behaviour. This will make our contributions easier to handle.

Unfortunately, base R functions frequently do not handle arguments of similar kind 100% consistently. Such variability might be due to many reasons and, in essence, is not necessarily bad. Usually, there might be many different possible behaviours and choosing one over another will make a few users unhappy anyway. Some choices might not be optimal, but they are for historical compatibility (e.g., with S). Of course, it might also happen (but the probability is low) that there is a bug or something is not at all well designed.

This is why it is better to keep the vocabulary quite restricted (and we advocate for such minimalism in this book): even if there are exceptions to the general rules, with fewer functions, they are simply easier to remember.

Consider the following case study, illustrating that even the extremely simple scenario where we deal with a single positive integer, is not necessarily straightforward.

Exercise 9.5

In mathematical notation, we usually denote the number of objects in a collection with the famous “n”.

It is implicitly assumed that such n is a single natural number (although whether this includes 0 or not should be specified at some point). The functions runif, sample, seq, rep, strrep, and class::knn take it arguments. However, nothing prevents their users from trying to challenge them by passing:

  • 2.5, -1, 0, 1-1e-16 (non-positive numbers, non-integers);

  • NA_real_, Inf (not finite);

  • 1:5 (not of length 1; after all, there are no scalars in R)

  • numeric(0) (an empty vector);

  • TRUE, NA, c(TRUE, FALSE, NA), "1", c("1", "2", "3") (non-numeric, but coercible to);

  • list(1), list(1, 2, 3), list(1:3, 4) (non-atomic);

  • "spam" (utter nonsense);

  • as.matrix(1), factor(7), factor(c(3, 4, 2, 3)), etc. (compound types; see Chapter 10).

Read the aforementioned functions’ reference manuals and call them on different inputs, noting how differently they handle such atypical arguments.

Sometimes we will rely on other functions to handle the data integrity checking for us.

Example 9.6

Let us consider the following function that generates n pseudorandom numbers from the unit interval rounded to d decimal digits. We strongly believe or hope (good faith and high competence assumption) that its authors knew what they were doing when they wrote:

round_rand <- function(n, d)
{
    x <- runif(n)  # runif will check if `n` makes sense
    round(x, d)    # round will determine the appropriateness of `d`
}

What constitutes correct n and d and how the function behaves when not provided with positive integers is determined by the two underlying functions, runif and round:

round_rand(4, 1)  # the expected use case
## [1] 0.3 0.8 0.4 0.9
round_rand(4.8, 1.9)  # 4, 2
## [1] 0.94 0.05 0.53 0.89
round_rand(4, NA)
## [1] NA NA NA NA
round_rand(0, 1)
## numeric(0)

If well thought-out and properly documented, many such design choices can be defended. Some programmers will opt for high uniformity/compatibility across numerous tools, but there are cases where some exceptions/diversity do more good than harm.

Yet, we should keep in mind that the functions we write might be part of a more complicated data flow pipeline, where some other function generates a value that we did not expect (because of a bug therein or because we did not study its manual) and this value is used as input to our function. In our case, this would correspond to the said n or d being determined programmatically.

Example 9.7

Continuing the previous example, the following might be somewhat challenging with regard to our being flexible and open minded:

round_rand(c(100, 42, 63, 30), 1)  # length(c(...)), 1)
## [1] 0.7 0.6 0.1 0.9
round_rand("4", 1)  # as.numeric(...), 1
## [1] 0.2 0.0 0.3 1.0

Sure, it is quite convenient, but might lead to problems that are hard to diagnose.

Also note the not-really informative error messages in cases like:

round_rand(NA, 1)
## Error in runif(n): invalid arguments
round_rand(4, "1")
## Error in round(x, d): non-numeric argument to mathematical function

Hence, some defensive design mechanisms are not a bad idea, especially if they lead to generating an informative error message.

Important

stopifnot gives a convenient means to assert the enjoyment of our expectations with regard to a function’s arguments (or some intermediate values). A call to stopifnot(cond1, cond2, ...) is more or less equivalent to:

if (!(is.logical(cond1) && !any(is.na(cond1)) && all(cond1)))
    stop("`cond1` are not all TRUE")
if (!(is.logical(cond2) && !any(is.na(cond2)) && all(cond2)))
    stop("`cond2` are not all TRUE")
...

Thus, if all the elements in the given logical vectors are TRUE, nothing happens and we can safely move on.

Example 9.8

We can rewrite the above function as follows:

round_rand2 <- function(n, d)
{
    stopifnot(
        is.numeric(n), length(n) == 1,
        is.finite(n), n > 0, n == floor(n),
        is.numeric(d), length(d) == 1,
        is.finite(d), d > 0, d == floor(d)
    )
    x <- runif(n)  # runif will check if n makes sense
    round(x, d)    # round will determine the appropriateness of d
}

round_rand2(5, 1)
## [1] 0.7 0.7 0.5 0.6 0.3
round_rand2(5.4, 1)
## Error in round_rand2(5.4, 1): n == floor(n) is not TRUE
round_rand2(5, "1")
## Error in round_rand2(5, "1"): is.numeric(d) is not TRUE

This implements the strictest test for “a single positive integer” possible. In the case of any violation of the underlying condition, we get a very informative error message.

Example 9.9

At other times, we might be interested in argument checking like:

if (!is.numeric(n))
    n <- as.numeric(n)
if (length(n) > 1) {
    warning("only the first element will be used")
    n <- n[1]
}
n <- floor(n)
stopifnot(is.finite(n), n > 0)

This way, "4" and c(4.9, 100) will all be accepted as 4[4].

We see that there is always a tension between being generous/flexible and precise/restrictive. Also, for some functions, it will be better to behave differently than the others, because of their particular use cases. Too much uniformity is as bad as chaos. Overall, we should rely on common sense, but add some lightweight foolproof mechanisms.

It is our duty to be explicit about all the assumptions we make or exceptions we allow (by writing good documentation; see Section 9.3.3).

We will revisit this topic in Section 10.4.

Note

Example exercises related to the improving of the consistency of base R’s handling of arguments in different domains include the vctrs and stringx packages[5]. Can these contributions be justified?

Exercise 9.10

Reflect on how you would handle the following scenarios (and how base R and other packages or languages you know deals with them):

  • a vectorised mathematical function (empty vectors? non-numeric inputs? what if it is equipped with the names attribute? what if it has other ones?);

  • an aggregation function (what about missing values? empty vectors?);

  • a function vectorised with regard to two arguments (elementwise vectorisation? recycling rule? only scalar vs vector or vector vs vector of the same length allowed? what if one argument is a row vector and the other is a column vector);

  • a function vectorised with regard to all arguments (really all? maybe some exceptions are necessary?);

  • a function vectorised with respect to the first argument but not the second (why such a restriction? when?).

Find a few functions that match each case.

9.2.2. Putting Outputs into Context

The functions we write do not exist in a vacuum. We should put them into a much wider context: how are they going to be used when combined with other tools?

As a general rule, our functions should generate outputs of predictable kind, so that when we write and read the code chunks that utilise them, we can easily deduce what is going to happen.

Example 9.11

Some base R functions do not adhere to this rule for the sake of (questionable) users’ convenience. We will meet a few of them in Chapter 11 and Chapter 12. In particular, sapply and the underlying simplify2array, can return a list, an atomic vector, or a matrix.

simplify2array(list(1, 3:4))    # list
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 3 4
simplify2array(list(1, 3))      # vector
## [1] 1 3
simplify2array(list(1:2, 3:4))  # matrix
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

Further, the index operator with drop=TRUE, which is the default, may output an atomic vector. But it may as well yield a matrix or a data frame.

(A <- matrix(1:6, nrow=3))  # an example matrix
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
A[1, ]    # vector
## [1] 1 4
A[1:2, ]  # matrix
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
A[1, , drop=FALSE]  # matrix with 1 row
##      [,1] [,2]
## [1,]    1    4

We proclaim that the default functions’ behaviour should be to return the object of the most generic kind possible (if there are other options), and then to either have a further argument which must be explicitly set if we really wish to simplify the output, or we should ask the user to call a simplifier explicitly.

In the latter case, the simplifier should probably fail issuing an error if it is unable to neaten the object or at least apply some brute force solution (e.g., or “fill the gaps” somehow itself, possibly with a warning).

Example 9.12

For instance:

as.numeric(A[1:2, ])  # always returns a vector
## [1] 1 2 4 5
stringi::stri_list2matrix(list(1, 3:4))  # fills the gaps with NAs
##      [,1] [,2]
## [1,] "1"  "3" 
## [2,] NA   "4"

Ideally, a function should perform one (and only one) well-defined task. If a function tends to generate objects of different kinds, depending on the arguments provided, maybe it is better to write two functions instead?

Exercise 9.13

Functions such as rep, seq, and sample do not perform a single task. Or do they?

Note

(*) In a purely functional programming language, we can assume the so-called referential transparency: a call to a pure function can always safely be replaced with the value it is supposed to generate. If this is true, then for the same set of argument values, the output is always the same. Furthermore, there are no side effects. In R, it is not really the case:

  • a call can introduce/modify/delete the variables in other environments (see sec:to-do), e.g., the state of the random number generator,

  • metaprogramming and lazy evaluation techniques lead to the functions’ being free to interpret the argument forms (not only: values) freely (see sec:to-do),

  • printing, plotting, file reading, database access have obvious consequences with regard to the state of some external resources.

Important

Each function must return some value, but there are several instances (e.g., plotting, printing), where this does not make sense.

In such a case, we should consider returning invisible(NULL), a NULL whose first printing will be suppressed.

Compare the following:

(function() NULL)()  # anonymous function, called instantly
## NULL
(function() invisible(NULL))()  # printing suppressed
x <- (function() invisible(NULL))()
print(x)  # no longer invisible
## NULL

Take a look at the return value of the built-on cat.

9.3. Organising and Maintaining Functions

9.3.1. Function Libraries

Definitions of frequently-used functions or datasets can be emplaced in separate source files (.R extension) for further reference.

Such libraries can be executed by calling:

source("path_to_file.R")
Exercise 9.14

Create a source file (script) named mylib.R, where you define a function called nlargest which returns a few largest elements in a given atomic vector.

From within another script, call source("mylib.R") (note that relative paths to refer to the current working director; (compare Section 2.1.6) and then write a few lines of code where you test nlargest on some example inputs.

9.3.2. Writing R Packages

When a function library grows substantially, or when there is a need for equipping it with the relevant manual pages[6] (Section 9.3.3) or compiled code (Chapter 14), turning it into an own R package (Section 7.3.1) might be a good idea (even if it is only for our own or small team’s purpose).

A source package is merely a directory containing some special files and subdirectories:

  • DESCRIPTION – a text file that gives the name of the package, its version, authors, dependencies upon other packages, license, etc.; see Section 1.1.1 of [47];

  • NAMESPACE – a text file containing directives stating which objects are to be exported so that they are available to the package users, and which names are to be imported from other packages;

  • R – a directory with R scripts (.R files), which define, e.g., functions, example datasets, etc.;

  • man – a directory with R documentation files (.Rd), describing at least all the exported objects; see Section 9.3.3;

  • src – optional; compiled code, see Chapter 14;

  • tests – optional; tests to run on the package check, see Section 9.3.4.2;

see Section 1.5 of Writing R Extensions [47] for more details and other options: there is no need for us to repeat the information from the official manual as everyone can read it themself.

Important

A source package can be built and installed from within an R session by calling:

install.packages("pkg_directory", repos=NULL, type="source")

Then it can be used as any other R package (Section 9.3.3). In particular, it can be loaded and attached via a call to:

library("pkg")

This makes all the objects listed in its NAMESPACE file available to the user.

Exercise 9.15

Create your own package mypkg featuring some of the solutions to the exercises you have solved whilst studying the material in the previous chapters. When in doubt, refer to the official manual [47].

Important

Note that you do not have to publish your package on CRAN[7]. Many users are tempted to submit whatever they have been tinkering around with for a while. Have mercy on the busy CRAN maintainers and do not contribute to the information overload, unless you have come up with something potentially useful for other R users (make it less about you, and more about the community; thank you in advance). R packages can always be hosted on and installed from, for instance, GitLab or GitHub.

Note

(*) The building and installing of packages also be done from the command line:

R CMD build pkg_directory
R CMD INSTALL --build pkg_directory

Also, some users could potentially benefit from creating own Makefiles that help automate the processes of building, testing, checking, etc.

9.3.3. Documenting R Packages

Documenting functions and commenting code thoroughly is critical, even if we just write for ourselves. Most programmers sooner or later will note that they find it hard to determine what a piece of code is doing after they took a break from it. In some sense, we always write for external audience, which incudes our future self.

The help system is one of the stronger assets of the R environment. By far we should have interacted with many man pages and got a good idea of what constitutes an informative documentation piece.

From the technical side, R Documentation (.Rd) files should be emplaced in the man subdirectory of a source package. All exported objects (e.g., functions) should be described clearly. Additional topics can be covered too.

During the package install, the .Rd files are converted to various output formats, e.g., HTML or plain text, and displayed upon a call to the well-known help function.

Documentation files use a LaTeX-like syntax, which looks quite obscure to an untrained eye. The relevant commands are explained in very detail in Section 2 of Writing R Extensions [47].

Note

The process of writing .Rd files by hand might be quite tedious, especially keeping track of the changes to the \usage and \arguments commands. Rarely do we recommend the use of third-party packages, because base R facilities are usually good enough, but roxygen2 might be worth a try, because it really makes the developers’ lives easier. Most importantly, it allows for documentation to be specified alongside the functions’ definitions, which is much more natural.

Exercise 9.16

Add a few manual pages to your example R package.

9.3.4. Assuring Quality Code

Below we mention some good development practices related to maintaining quality code. This is an important topic, but writing about them is tedious to the same extent that reading about them is boring, because it is the more-artistic part of software engineering. After all, these are some heuristics that are learnt best by observing and mimicking what the others are doing (and hence the exercises below will encourage to do so).

9.3.4.1. Managing Changes and Working Collaboratively

It is a good idea to employ some source code version control system such as git to keep track of the changes made to the software.

Note

It is worth investing some time and effort to learn how to use git from the command line; see https://git-scm.com/doc.

There are a few hosting providers for git repositories, with GitLab and GitHub being a popular choice amongst open-source software developers.

Not only do they support working collaboratively on the projects, but also are equipped with additional tools for reporting bugs, suggesting feature requests, etc.

Exercise 9.17

Find where the source code of some of your most favourite R packages or other open-source projects are hosted. Explore the corresponding repositories, feature trackers, wikis, discussion boards, etc. Note that each community is different and is governed by different guidelines: after all, we are from all over the world.

9.3.4.2. Test-driven Development and Continuous Integration

It is often hygienic to include some principles of test-driven development when writing own functions.

Exercise 9.18

Assume that, for some reasons, we were asked to write a function to compute the root mean square (quadratic mean) of a given numeric vector. Before implementing the actual routine, it is a good idea to reflect upon what we want to achieve, especially how we want our function to behave in certain boundary cases.

stopifnot gives simple means to assure a given assertion is fulfilled. If that is the case, it will move forward quietly.

Let us say we have come up with the following set of expectations:

stopifnot(all.equal(rms(1), 1))
stopifnot(all.equal(rms(1:100), 58.16786054171151931769))
stopifnot(all.equal(rms(rep(pi, 10)), pi))
stopifnot(all.equal(rms(numeric(0)), 0))

Write a function rms that fulfils the above assertions.

Exercise 9.19

Implement your own version of the sample function (assuming replace=TRUE), using calls to runif. However, start by writing a few unit tests.

There are also a couple of R packages that support writing and executing of unit tests, including testthat, tinytest (which is a lighter-weight version of the former), RUnit, or realtest. However, in the most typical use cases, relying on stopifnot is powerful enough.

Exercise 9.20

(*) Consult the Writing R Extensions manual [47] about where and how to include unit tests in your example package.

Note

(*) R includes a built-in mechanism to check a couple of code quality areas: running R CMD check pkg_directory from the command line (preferably using the most recent version of R) can suggest a number of improvements.

Also, it is possible to use various continuous integration techniques that are automatically triggered when pushing changes to our software repositories; see GitLab CI or GitHub Actions. For instance, it is possible to run a package build, install, and check process upon every git commit. Also, CRAN features some continuous integration services, including checking the package on a range of different platforms.

9.3.4.3. Debugging

For all his life, the current author has been debugging all his programs mostly by manually printing the state of suspected variables (printf and the like) in different areas of the code. No shame in that.

For an interactive debugger, see the browser function. Also, refer to Section 9 of [51] for more details.

Some IDEs (e.g., RStudio) support this feature too; see their corresponding documentation.

9.3.4.4. Profiling

Typically, a program spends relatively long time executing only a small portion of code. The Rprof function can be a helpful tool to identify which chunks might need a rewrite, for instance using a compiled language (Chapter 14).

Please remember, though, that not only implementations of algorithms that have hight computational complexity can form a bottleneck, but also data input and output (such as reading files from disk, printing messages, on the console, querying Web APIs, etc.).

9.4. Special Functions: Syntactic Sugar

Some functions, such as `*`, are somewhat special. They can be referred to using an alternative syntax which for some reason most of us accepted as the default one. Below we will reveal, amongst others, that “5 * 9” reduces in fact to an ordinary function call:

`*`(5, 9)  # a call to `*` with 2 arguments, equivalent to 5 * 9
## [1] 45

9.4.1. A Note on Backticks

In Section 2.2, we have mentioned that we can assign (as in `<-`) syntactically valid names to our objects. Most identifiers comprised of letters, digits, dots, and underscores can be used directly in R code.

However, it is possible to label our objects however we like: non-syntactically valid (nonstandard) identifiers just need to be enclosed in backticks (back quotes, grave accents):

`42 a quite peculiar name :O lollolll` <- c(a=1, `b c`=2, `42`=3, `!`=4)
1/(1+exp(-`42 a quite peculiar name :O lollolll`))
##       a     b c      42       ! 
## 0.73106 0.88080 0.95257 0.98201

Of course, such names are less convenient, but still: backticks let us refer to them in any context.

9.4.2. Dollar, `$` (*)

The dollar operator, `$`, can be used as an alternative accessor to a single element in a named list[8].

If label is a syntactically valid name, then x$label does the same job as x[["label"]] (saving five keystrokes: such a burden!).

x <- list(spam="a", eggs="b", `eggs and spam`="c", best.spam.ever="d")
x$eggs
## [1] "b"
x$best.spam.ever  # note that a dot has no special meaning in most contexts
## [1] "d"

Nonstandard names must still be enclosed in backticks

x$`eggs and spam`  # x[["eggs and spam"]] is okay as usual
## [1] "c"

We are minimalist-by-design here, hence we will tend to avoid this operator, as it does not really increase the expressive power of our function repertoire. Also, it does not work on atomic vectors nor on matrices.

Furthermore, it does not work with names that are generated programmatically:

what <- "spam"
x$what # the same as x[["what"]] – we don't want this
## NULL
x[[what]]  # works fine
## [1] "a"

The support for the partial matching of element names has been added to provide the users working in quick-and-dirty, interactive programming sessions with some relief in the case where they find the typing of the whole label extremely problematic:

x$s  # x[["s"]] would return NULL; you will get no warning here!
## Warning in x$s: partial match of 's' to 'spam'
## [1] "a"

It is generally a bad programming practice, because the result depends on the names of other items in x (which might change later) and can decrease code readability. The only reason why we have obtained a warning message was because this book enforces the options(warnPartialMatchDollar=TRUE) setting, which – sadly – is not the default.

However, note the behaviour on ambiguous partial matches:

x$egg  # ambiguous resolution
## NULL

9.4.3. Curly Braces, `{`

A block of statements grouped with curly braces, `{`, corresponds to a function call. When we write:

{
    print(TRUE)
    cat("two")
    3
}
## [1] TRUE
## two
## [1] 3

The parser translates it to a call to:

`{`(print(TRUE), cat("two"), 3)
## [1] TRUE
## two
## [1] 3

When the above is executed, every argument, one by one, is evaluated and then the last value is returned in result of that call.

9.4.4. `if`

if is a function too; as mentioned in Section 8.1, it returns the value corresponding to the expression evaluated conditionally. Hence, we may write:

if (runif(1) < 0.5) "head" else "tail"
## [1] "head"

but also:

`if`(runif(1) < 0.5, "head", "tail")
## [1] "head"

Note

A call like `if`(test, what_if_true, what_if_false) can only work properly because of the lazy evaluation of function arguments; see Section 9.5.5.

On a side note, while, for, repeat can also be called that way, but they return invisible(NULL).

9.4.5. Operators are Functions Too

9.4.5.1. Calling Built-in Operators as Functions

Every arithmetic, logical, and comparison operator is translated to a call to the corresponding function. For instance:

`<`(`+`(`*`(`-`(3), 4)), 5)  # 2+(-3)*4 < 5
## [1] TRUE

Also, x[i] is equivalent to `[`(x, i) and x[[i]] maps to `[[`(x, i).

Knowing this will not only enable us to manipulate unevaluated R code (Chapter 15) or access the corresponding manual pages (see, e.g., help("[")), but also write some expressions in a more concise manner. For instance,

x <- list(1:5, 11:17, 21:23)
unlist(Map(`[`, x, 1))  # 1 is a further argument passed to `[`
## [1]  1 11 21

is equivalent to a call to Map(function(e) e[1], x).

Note

Unsurprisingly, the assignment operator, `<-`, is a function too. It returns the assigned value, invisibly.

Knowing that `<-` binds right to left (compare help("Syntax")), this is why the expression “a <- b <- 1” results in both a and b being assigned 1: it is equivalent to “`<-`("a", `<-`("b", 1))” and “`<-`("b", 1)” returns 1.

Owing to the pass-by-value semantics (Section 9.5.1) we can also expect that we will always be (with the exception of environments, Chapter 16) assigning a copy of the value on the righthand side.

x <- 1:6
y <- x  # makes a copy (but delayed, on demand, for performance reasons)
y[c(TRUE, FALSE)] <- NA_real_  # modify every 2nd element
print(y)
## [1] NA  2 NA  4 NA  6
print(x)  # state of x has not changed — x and y are different objects
## [1] 1 2 3 4 5 6

This is especially worth pointing out to Python (amongst others) programmers, where the above assignment would mean that x and y both refer to the same (shared) object in the computer’s memory.

However, with no harm done to semantics, the actual copying of x is postponed until absolutely necessary (Section 16.1.4). This is efficient both time- and memory-wise.

9.4.5.2. Creating Own Binary Operators

We can also introduce our own binary operators named like `%myopname%`:

`%:)%` <- function(e1, e2) (e1+e2)/2
5 %:)% 1:10
##  [1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5

Recall that `%%` and `%/%` are built-in operators denoting division remainder and integer division. Rarely will we be defining our own operators, but when we encounter a similar one next time, we will no longer be surprised. For instance, in Chapter 11 we will learn about `%*%` which implements matrix multiplication.

Note

In Chapter 10, we will note that most existing operators can be overloaded for objects of different types.

9.4.6. Replacement Functions

Functions generally do not change the state of their arguments. However, there is some syntactic sugar that allows us to replace objects or parts thereof with new content. We call them replacement functions.

For instance, three of the following calls replace the input x with its modified version:

x <- 1:5  # example input
x[3] <- 0  # replace the 3rd element with 0
length(x) <- 7  # "replace" length
names(x) <- LETTERS[seq_along(x)]  # replace the names attribute
print(x)  # x is different than before
##  A  B  C  D  E  F  G 
##  1  2  0  4  5 NA NA

9.4.6.1. Creating Own Replacement Functions

A replacement function is a mapping named like `name<-` with at least two parameters:

  • x (the object to be modified),

  • ... (possible further arguments),

  • value (as the last parameter; the object on the righthand side of the `<-` operator).

Most often, we will be interacting with existing replacement functions, not creating our own ones. However, knowing how to do the latter is key to understanding this language feature.

For example:

`add<-` <- function(x, where=TRUE, value)
{
    x[where] <- x[where] + value
    x  # the modified object that will replace the original one
}

The above aims to add some value to a subset of the input vector x (by default, to each element therein) and return its altered version that will replace the object it has been called upon.

y <- 1:5           # example vector
add(y) <- 10       # calls `add<-`(y, value=10)
print(y)           # y has changed
## [1] 11 12 13 14 15
add(y, 3) <- 1000  # calls `add<-`(y, 3, value=1000)
print(y)           # y has changed again
## [1]   11   12 1013   14   15

We see that calling “add(y, w) <- v” works as if we have called “y <- `add<-`(y, w, value=v)”.

Note

(*) According to [51], a call “add(y, 3) <- 1000” is a syntactic sugar precisely for:

`*tmp*` <- y  # temporary substitution
y <- `add<-`(`*tmp*`, 3, value=1000)
rm("*tmp*")  # remove the named object from the current scope

This has at least two implications. First, in the unlikely event that a variable `*tmp*` existed before the call to the replacement function, it will be no more, it will cease to be. It will be an ex-variable. Second, the temporary substitution guarantees that y must exist before the call (a function’s body does not have to refer to all the arguments passed; because of lazy evaluation, see Section 9.5.5, we could get away with it otherwise).

9.4.6.2. Substituting Parts of Vectors

The replacement versions of the subsetting operators are named as follows:

  • `[<-` is used in substitutions like “x[i] <- value”,

  • `[[<-` is called when we perform “x[[i]] <- value”,

  • `$<-` is used whilst calling “x$i <- value”.

Here is a use case:

x <- 1:5
`[<-`(x, c(3, 5), NA_real_)  # returns a new object
## [1]  1  2 NA  4 NA
print(x)  # does not change the original input
## [1] 1 2 3 4 5

On a side note, `length<-` can be used to expand or shorten a given vector by calling “length(x) <- new_length”, see also Section 5.3.3.

x <- 1:5
x[7] <- 7
length(x) <- 10
print(x)
##  [1]  1  2  3  4  5 NA  7 NA NA NA
length(x) <- 3
print(x)
## [1] 1 2 3

Despite the fact that, semantically speaking, calling `[<-` results in the creation of a new vector (a modified version of the original one), we may luckily expect some performance optimisations happening behind our back (reference counting, modification in-place; see sec:to-do).

Exercise 9.21

Write a function `extend<-` which pushes new elements at the end of a given vector, modifying it in place.

`extend<-` <- function(x, value) ...to.do...

Example use:

x <- 1
extend(x) <- 2     # push 2 at the back
extend(x) <- 3:10  # add 3, 4, ..., 10
print(x)
##  [1]  1  2  3  4  5  6  7  8  9 10

9.4.6.3. Replacing Attributes

Many replacement functions deal with the re-setting of objects’ attributes (Section 4.4).

In particular, for each special attribute, there is also its replacement version, e.g., `names<-`, `class<-`, `dim<-`, `levels<-`, etc.

x <- 1:3
names(x) <- c("a", "b", "c")  # change the `names` attribute
print(x)  # x has been altered
## a b c 
## 1 2 3

Individual (arbitrary, including non-special ones) attributes can be set using `attr<-` and all of them can be established by means of a single call to `attributes<-`.

x <- "spam"
attributes(x) <- list(shape="oval", smell="meaty")
attributes(x) <- c(attributes(x), taste="umami")
attr(x, "colour") <- "rose"
print(x)
## [1] "spam"
## attr(,"shape")
## [1] "oval"
## attr(,"smell")
## [1] "meaty"
## attr(,"taste")
## [1] "umami"
## attr(,"colour")
## [1] "rose"

Also note that setting an attribute to NULL results, by convention, in its removal:

attr(x, "taste") <- NULL  # this is tasteless now
print(x)
## [1] "spam"
## attr(,"shape")
## [1] "oval"
## attr(,"smell")
## [1] "meaty"
## attr(,"colour")
## [1] "rose"
attributes(x) <- NULL  # remove all
print(x)
## [1] "spam"

Which can be useful in contexts such as:

x <- structure(c(a=1, b=2, c=3), some_attrib="value")
y <- `attributes<-`(x, NULL)

Here, x retains its attributes and y is a version of x with metadata removed.

9.4.6.4. Compositions of Replacement Functions

Updating only selected names like:

x <- c(a=1, b=2, c=3)
names(x)[2] <- "spam"
print(x)
##    a spam    c 
##    1    2    3

is possible due to the fact that “names(x)[i] <- v” is equivalent to:

old_names <- names(x)
new_names <- `[<-`(old_names, i, value=v)
x <- `names<-`(x, value=new_names)

Important

More generally, a composition of replacement calls “g(f(x, a), b) <- y” yields a result equivalent to “x <- `f<-`(x, a, value=`g<-`(f(x, a), b, value=y))”. Note that both f and `f<-` need to be defined, but having g is not necessary.

Exercise 9.22

(*) What is “h(g(f(x, a), b), c) <- y” equivalent to?

Exercise 9.23

Write a (actually very useful!) function `recode<-` which replaces specific elements in a character vector with some other ones, allowing the following interface:

`recode<-` <- function(x, value) ...to.do...
x <- c("spam", "bacon", "eggs", "spam", "eggs")
recode(x) <- c(eggs="best spam", bacon="yummy spam")
print(x)
## [1] "spam"       "yummy spam" "best spam"  "spam"       "best spam"

We see that the named character vector gives a few from="to" pairs, e.g., all eggs are to be replaced by best spam.

Now, determine which calls are equivalent to the following:

x <- c(a=1, b=2, c=3)
recode(names(x)) <- c(c="z", b="y")  # or equivalently = ... ?
print(x)
## a y z 
## 1 2 3
y <- list(c("spam", "bacon", "spam"), c("spam", "eggs", "cauliflower"))
recode(y[[2]]) <- c(cauliflower="broccoli")  # or = ... ?
print(y)
## [[1]]
## [1] "spam"  "bacon" "spam" 
## 
## [[2]]
## [1] "spam"     "eggs"     "broccoli"
Exercise 9.24

(*) Consider the `recode<-` function from the previous exercise.

Here is an example matrix with the dimnames attribute whose names attribute is set (more details in Chapter 11):

(x <- Titanic["Crew", "Male", , ])
##        Survived
## Age      No Yes
##   Child   0   0
##   Adult 670 192
recode(names(dimnames(x))) <- c(Age="age", Survived="survived")
print(x)
##        survived
## age      No Yes
##   Child   0   0
##   Adult 670 192

This changes the x object. For each of the following subtasks, write a single call which alters names(dimnames(x)) without modifying x in-place but returning a recoded copy of:

  • names(dimnames(x)),

  • dimnames(x)),

  • x.

Exercise 9.25

(*) Consider the `recode<-` function once again but now let an example object be a data frame featuring a column of class factor:

x <- iris[c(1, 2, 51, 101), ]
recode(levels(x[["Species"]])) <- c(
    setosa="SET", versicolor="VER", virginica="VIR"
)
print(x)
##     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1            5.1         3.5          1.4         0.2     SET
## 2            4.9         3.0          1.4         0.2     SET
## 51           7.0         3.2          4.7         1.4     VER
## 101          6.3         3.3          6.0         2.5     VIR

Without modifying x in-place, how to change levels(x[["Species"]]) and return an altered copy of:

  • levels(x[["Species"]]),

  • x[["Species"]],

  • x?

9.5. Arguments and Local Variables

9.5.1. Pass by “Value”

As a general rule, functions cannot change the state of their arguments[9]. We can think of them as being passed by value, i.e., as if their copy was made.

test_change <- function(y)
{
    y[1] <- 7
    y
}

x <- 1:5
test_change(x)
## [1] 7 2 3 4 5
print(x)  # same
## [1] 1 2 3 4 5

If the above was not the case, the state of x would have been changed after the call.

9.5.2. Variable Scope

Function arguments as well as any other variables we create inside a function’s body are relative to each call to that function.

test_change <- function(x)
{
    x <- x+1
    z <- -x
    z
}

x <- 1:5
test_change(x*10)
## [1] -11 -21 -31 -41 -51
print(x)  # x in the function's body was a different x
## [1] 1 2 3 4 5
print(z)  # z was local
## Error in print(z): object 'z' not found

Both x and z are local variables and live only whilst our function is being executed. The former temporarily “overshadows”[10] the object of the same name from the caller’s context.

Important

It is a good development practice to refrain from referring to objects not created within the current function, especially to “global” variables. We can always pass an object as an argument explicitly.

Note

It is a function call as such, not curly braces per se that form a local scope.

Writing “x <- { y <- 1; y + 1 }”, y is not an auxiliary variable; it is an ordinary named object created alongside x.

On the other hand, in “x <- (function() { z <- 1; z + 1 })()”, z will not be available thereafter.

9.5.3. Closures (*)

Most user-defined functions are in fact representatives of the so-called closures; see sec:to-do and [1]. They not only consist of an R expression to evaluate, but also can carry some auxiliary data.

For instance, given two equal-length numeric vectors x and y, a call to approxfun(x, y) returns a function that linearly interpolates between the consecutive points \((x_1, y_1)\), \((x_2, y_2)\), and so forth, so that a corresponding \(y\) can be determined for any \(x\).

x <- seq(0, 1, length.out=11)
f1 <- approxfun(x, x^2)
f2 <- approxfun(x, x^3)
f1(0.75)  # check that it is quite close to the true 0.75^2
## [1] 0.565
f2(0.75)  # compare with 0.75^3
## [1] 0.4275

Inspecting, however, the source codes of the above functions:

print(f1)
## function (v) 
## .approxfun(x, y, v, method, yleft, yright, f, na.rm)
## <bytecode: 0x55637b8dddf0>
## <environment: 0x55637b8de800>
print(f2)
## function (v) 
## .approxfun(x, y, v, method, yleft, yright, f, na.rm)
## <bytecode: 0x55637b8dddf0>
## <environment: 0x55637b468378>

we might wonder how can they produce different results — it is evident that they are identical. It turns out, however, that they internally store some additional data that is referred to upon their calls:

environment(f1)[["y"]]
##  [1] 0.00 0.01 0.04 0.09 0.16 0.25 0.36 0.49 0.64 0.81 1.00
environment(f2)[["y"]]
##  [1] 0.000 0.001 0.008 0.027 0.064 0.125 0.216 0.343 0.512 0.729 1.000

This and many more we will explore in great detail in the third part of this book.

9.5.4. Default Arguments

We have already mentioned above that, when designing functions that perform complex tasks, we will sometimes be faced with a design problem: how to find a sweet spot between being generous/mindful of the diverse needs of our users and making the API neither overwhelming nor oversimplistic.

We know that it is best if a function performs one, well-specified task, but also allows its behaviour be tuned-up if one wishes to do so. This principle can be facilitated by the use of default arguments.

For instance, log computes logarithms, by default the natural ones.

log(2.718)  # the same as log(2.78, base=exp(1)) — default base
## [1] 0.9999
log(4, base=2)  # different base
## [1] 2
Exercise 9.26

Study the documentation of the following functions and note the default values that they define: round, hist, grep, and download.file.

We can easily define our own functions equipped with such recommended settings:

test_default <- function(x=1) x

test_default()   # use default
## [1] 1
test_default(2)  # use something else
## [1] 2

Most often, default arguments are just constants, e.g., 1. However, they can be any R expressions, also including a reference to other arguments passed to the same function; see more in Section 16.4.1.

Note that default arguments will most often appear and the end of the parameter list, but see Section 9.4.6 (on replacement functions) for a well-justified exception.

9.5.5. Lazy Evaluation

In some languages, function arguments are always evaluated prior to a call. In R, though, they are only computed when actually needed. We call it lazy or delayed evaluation. Recall that in Section 8.1.4 we introduced the short-circuit evaluation operators `||` (or) and `&&` (and). They are able to do their job precisely thanks to this mechanism.

Example 9.27

In the following example, we do not use the function’s argument at all:

lazy_test1 <- function(x) 1  # x not used at all

lazy_test1({cat("and now for something completely different!"); 7})
## [1] 1

Otherwise, we would see a message being printed out on the console.

Example 9.28

Next, let us use x amidst other expressions in the body:

lazy_test2 <- function(x)
{
    cat("it's... ")
    y <- x+x  # using x twice
    cat(" a man with two noses")
    y
}

lazy_test2({cat("and now for something completely different!"); 7})
## it's... and now for something completely different! a man with two noses
## [1] 14

Note that an argument is evaluated once and its value is stored for further reference. If that was not the case, we would see two messages like and now....

9.5.6. Ellipsis, `...`

Let us start with an exercise.

Exercise 9.29

Note the presence of `...` in the parameter list of c, list, structure, cbind, rbind, cat, Map (and the underlying mapply), lapply (a specialised version of Map), optimise, optim, uniroot, integrate, outer, aggregate. What purpose does it serve, according to these functions manual pages?

We can create a variadic function by placing a dot-dot-dot (ellipsis; see help("dots")), `...`, somewhere in its parameter list. The ellipsis serves as placeholder for all objects passed to the function but not matched by any formal (named) parameters.

The easiest way to process arguments passed via `...` programmatically (see also Section 16.4.4) is by redirecting them to list.

test_dots <- function(...)
    list(...)

test_dots(1, a=2)
## [[1]]
## [1] 1
## 
## $a
## [1] 2

Such a list can be processed just like… any other R list. What we can do with these arguments is only limited by our creativity (in particular, recall from Section 7.2.2 the very powerful do.call function). Still, there are two major use cases of the ellipsis[11]:

  • create a new object by combining an arbitrary number of other objects:

    c(1, 2, 3)   # 3 arguments
    ## [1] 1 2 3
    c(1:5, 6:7)  # 2 arguments
    ## [1] 1 2 3 4 5 6 7
    structure("spam")  # 0 additional arguments
    ## [1] "spam"
    structure("spam", color="rose", taste="umami")  # 2 further arguments
    ## [1] "spam"
    ## attr(,"color")
    ## [1] "rose"
    ## attr(,"taste")
    ## [1] "umami"
    cbind(1:2, 3:4)
    ##      [,1] [,2]
    ## [1,]    1    3
    ## [2,]    2    4
    cbind(1:2, 3:4, 5:6, 7:8)
    ##      [,1] [,2] [,3] [,4]
    ## [1,]    1    3    5    7
    ## [2,]    2    4    6    8
    sum(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 42)
    ## [1] 108
    
  • pass further arguments (as-is) to other methods :

    lapply(list(c(1, NA, 3), 4:9), mean, na.rm=TRUE)  # mean(x, na.rm=TRUE)
    ## [[1]]
    ## [1] 2
    ## 
    ## [[2]]
    ## [1] 6.5
    integrate(dbeta, 0, 1,
        shape1=2.5, shape2=0.5)  # dbeta(x, shape1=2.5, shape2=0.5)
    ## 1 with absolute error < 1.2e-05
    
Example 9.30

The documentation of lapply (let us call help("lapply") now) states that this function is defined as lapply(X, FUN, ...). Here, the ellipsis is a placeholder for a number of optional arguments that can be passed to FUN. Hence, if we denote the i-th element of a vector X by X[[i]], calling lapply(X, FUN, ...) will return a list whose i-th element will be equal to FUN(X[[i]], ...).

Exercise 9.31

Using a single call to lapply, generate a list with three numeric vectors of lengths 3, 9, and 7, respectively, drawn from the uniform distribution on the unit interval. Then, upgrade your code to get numbers sampled form the interval \([-1, 1]\).

9.5.7. Metaprogramming (*)

Under the hood, lazy evaluation is a quite complicated mechanism that relies upon the storing of unevaluated R expressions and special promises to instantiate them[12].

It turns out that we have access to such expressions programmatically. In particular, a call to the composition of deparse and substitute can convert them to a character vector:

test_deparse_substitute <- function(x)
    deparse(substitute(x))

test_deparse_substitute(testing+1+2+3)
## [1] "testing + 1 + 2 + 3"
test_deparse_substitute(spam & spam^2 & bacon | grilled(spam))
## [1] "spam & spam^2 & bacon | grilled(spam)"
Exercise 9.32

Check out the y-axis label generated by plot.default((1:100)^2). Inspect its source code and note a call to the two aforementioned functions.

Similarly, call shapiro.test(log(rlnorm(100))) and take note of the data: field.

A function is free to do with such an expression whatever it likes. For instance, it can manipulate it and evaluate it in a different context. Thanks to such a language feature, certain operations can be designed so that their users can express them much more compactly. This is certainly (in theory) a very powerful tool but from practice we know many instances where it has been over/misused and made the use of R confusing.

Example 9.33

(*) The built-in subset and transform use metaprogramming techniques to specify basic data frame transformation techniques (see Section 12.3.9). For instance:

transform(
    subset(
        iris,
        Sepal.Length>=7.7 & Sepal.Width >= 3.0,
        select=c(Species, Sepal.Length:Sepal.Width)
    ),
    Sepal.Length.mm=Sepal.Length/10
)
##       Species Sepal.Length Sepal.Width Sepal.Length.mm
## 118 virginica          7.7         3.8            0.77
## 132 virginica          7.9         3.8            0.79
## 136 virginica          7.7         3.0            0.77

Note that none of the arguments – except iris – makes sense outside of the function call contexts. In particular, neither Sepal.Length nor Sepal.Width variables exist.

The two functions took the liberty to interpret the arguments passed as they felt like. They have created their own virtual reality within our well-defined world. The reader must refer to their documentation to discover the meaning of the special syntax used therein.

Note

(*) Some functions have rather peculiar default arguments. For instance, in the manual page of prop.test, we read that the alternative parameter defaults to c("two.sided", "less", "greater") but that "two.sided" is actually the default one.

If we call print(prop.test), we will find the code line responsible for this behaviour: “alternative <- match.arg(alternative)”. Consider the following example:

test_match_arg <- function(x=c("a", "b", "c")) match.arg(x)

test_match_arg()  # missing argument — choose 1st
## [1] "a"
test_match_arg("c")  # one of the predefined options
## [1] "c"
test_match_arg("d")  # unexpected setting
## Error in match.arg(x): 'arg' should be one of "a", "b", "c"

In this setting, match.arg allows only an actual parameter amongst a given set of choices, but selects the first option if the argument is missing.

Unfortunately, we have to learn this behaviour by heart, because actually looking at the above source code gives us no clue about this being possible whatsoever. If such an expression was normally evaluated, we would either be passing the default argument or whatever the user passed as x (but then the function would not know about the range of possible choices). A call to “match.arg(x, c("a", "b", "c"))” could guarantee the desired functionality and would be much more readable. Instead, metaprogramming techniques allowed match.arg to access the enclosing function’s default argument list without our explicitly referring to them.

One may ask “why is it so” and the only sensible answer to this will be “because its programmer decided it must be this way”. Let us contemplate this for a while. In cases like this, we are dealing not with some base R language design choice that we might like or dislike, but which we should normally just accept as an inherent feature. Rather, we are struggling intellectually because of some R programmer’s (mis)use (in good faith…) of R’s flexibility itself. They have introduced a slang/dialect on top of our mother tongue, whose meaning is valid only within this function. Blame the middleman, not the environment, please.

We generally advocate for avoiding metaprogramming wherever possible (and will elaborate on this later on, including formulas (`~`), built-in functions like subset or transform, etc.).

9.6. Exercises

Exercise 9.34

Answer the following questions:

  • Will “stopifnot(1)” stop? What about “stopifnot(NA)”, “stopifnot(TRUE, FALSE)”, and “stopifnot(c(TRUE, TRUE, NA))”?

  • What does the `if` function return?

  • Does `attributes<-`(x, NULL) modify x?

  • When can we be interested in calling `[` and `[<-` as functions (and not as operators) directly?

  • How to define our own binary operator? Can it have some default arguments?

  • What are the main use cases of `...`?

  • What is wrong with transform, subset, and match.arg?

  • When a call like “f(-1, do_something_that_takes_a_million_years())” does not necessarily have to be a bad idea?

Exercise 9.35

What is the return value of a call to “f(list(1, 2, 3))”?

f <- function(x)
{
    for (e in x) {
        print(e)
    }
}

Is it NULL, invisible(NULL), x[[length(x)]], or invisible(x[[length(x)]])?

Exercise 9.36

The split function also has its replacement version. Study its documentation to learn how it works.

Exercise 9.37

A call to ls(envir=baseenv()) returns all objects defined in package base (see Chapter 16). List the names corresponding to some replacement functions.

Important

Apply the principle of test-driven development when solving the remaining exercises (or those which you have skipped intentionally).

Exercise 9.38

Implement your own version of the Position and Find functions. Evaluation should stop as soon as the first element fulfilling a given predicate has been found.

Exercise 9.39

Implement your own version of the Reduce function.

Exercise 9.40

Write a function slide(f, x, k, ...) which returns a list y of size length(x)-k+1 such that y[[i]] = f(x[i:(i+k-1)], ...)

unlist(slide(sum, 1:5, 1))
## [1] 1 2 3 4 5
unlist(slide(sum, 1:5, 3))
## [1]  6  9 12
unlist(slide(sum, 1:5, 5))
## [1] 15
Exercise 9.41

Using slide defined above, write another function that counts how many increasing pairs of numbers are featured in a given numeric vector. For instance, in c(0,2,1,1,0,1,6,0) there are three such pairs: (0,2), (0,1), (1,6).

Exercise 9.42

(*) Write your own version of tools::package_dependencies with reverse=TRUE based on information extracted by calling utils::available.packages.