4. Lists and attributes#

The open-access textbook Deep R Programming by Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF). It is a non-profit project. This book is still a work in progress. Beta versions of all chapters are already available (proofreading and copyediting pending). In the meantime, any bug/typos reports/fixes are appreciated. Although available online, this is a whole course. It should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Also, check out my other book, Minimalist Data Wrangling with Python [26].

After two brain-teasing chapters, it is time to cool it down a little. In this more technical part, we will introduce lists, which serve as universal containers for R objects of any size and type. Moreover, we will also show that each R object can be equipped with a number of optional attributes, thanks to which we will not only be able to label elements in any vector but also introduce new complex data types such as matrices and data frames later.

4.1. Type hierarchy and conversion#

So far, we have been dealing with three types of atomic vectors:

  1. logical (Chapter 3),

  2. numeric (Chapter 2),

  3. character (which we have barely touched upon yet, but rest assured that they will be covered in detail very soon; see Chapter 6).

To determine the type of an object programmatically, we can call the typeof function.

typeof(c(1, 2, 3))
## [1] "double"
typeof(c(TRUE, FALSE, TRUE, NA))
## [1] "logical"
typeof(c("spam", "spam", "bacon", "gluten-free spam"))
## [1] "character"

It turns out that we can easily convert between these types, either on our explicit demand (type casting) or on-the-fly (coercion, when we perform an operation that expects something different from the kind of input it was fed with).

Note

(*) Numeric vectors are reported as being either of the type double (double-precision floating-point numbers) or integer (32-bit; it is a subset of double); see Section 6.4.1. In most practical cases, this is a technical detail that we can risklessly ignore; compare also the mode function.

4.1.1. Explicit type casting#

We can use functions such as as.logical, as.numeric, and as.character to coerce (convert) given objects to the corresponding types.

as.numeric(c(TRUE, FALSE, NA, TRUE, NA, FALSE))
## [1]  1  0 NA  1 NA  0
as.logical(c(-2, -1, 0, 1, 2, 3, NA_real_, -Inf, NaN))
## [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE    NA  TRUE    NA

Important

It is easily seen that the rules are:

  • TRUE → 1,

  • FALSE → 0,

  • NANA_real_,

and:

  • 0 → FALSE,

  • NA_real_ and NaNNA,

  • anything else → TRUE.

The distinction between zero and non-zero is commonly applied in other programming languages as well.

Moreover, in the case of the conversion involving character strings, we have:

as.character(c(TRUE, FALSE, NA, TRUE, NA, FALSE))
## [1] "TRUE"  "FALSE" NA      "TRUE"  NA      "FALSE"
as.character(c(-2, -1, 0, 1, 2, 3, NA_real_, -Inf, NaN))
## [1] "-2"   "-1"   "0"    "1"    "2"    "3"    NA     "-Inf" "NaN"
as.logical(c("TRUE", "True", "true", "T",
             "FALSE", "False", "false", "F",
             "anything other than these", NA_character_))
##  [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE    NA    NA
as.numeric(c("0", "-1.23e4", "pi", "2+2", "NaN", "-Inf", NA_character_))
## Warning: NAs introduced by coercion
## [1]      0 -12300     NA     NA    NaN   -Inf     NA

4.1.2. Implicit conversion (coercion)#

Recall that we referred to the three vector types as atomic ones: they can only be used to store elements of the same type.

If we make an attempt at composing an object of mixed types with c, the common type will be determined in such a way that storing the data is done without information loss:

c(-1, FALSE, TRUE, 2, "three", NA)
## [1] "-1"    "FALSE" "TRUE"  "2"     "three" NA
c("zero", TRUE, NA)
## [1] "zero" "TRUE" NA
c(-1, FALSE, TRUE, 2, NA)
## [1] -1  0  1  2 NA

Hence, we see that logical is the most specialise of the tree, whereas character is the most general.

Note

The logical NA is converted to NA_real_ and NA_character_ in the above examples. R users tend to rely on implicit type conversion when they write c(1, 2, NA, 4) instead of the more explicit c(1, 2, NA_real_, 4). In most cases, this is fine.

However, occasionally, it will be wiser to be more unequivocal. For instance, rep(NA_real_, 1e9) pre-allocates a long numeric vector instead of a logical one.

Some functions that expect vectors of specific types can apply coercion by themselves (or act as if they do so):

c(NA, FALSE, TRUE) + 10 # implicit conversion logical -> numeric
## [1] NA 10 11
c(-1, 0, 1) & TRUE  # implicit conversion numeric -> logical
## [1]  TRUE FALSE  TRUE
sum(c(TRUE, TRUE, FALSE, TRUE, FALSE))  # same as sum(as.numeric(...))
## [1] 3
cumsum(c(TRUE, TRUE, FALSE, TRUE, FALSE))
## [1] 1 2 2 3 3
cummin(c(TRUE, TRUE, FALSE, TRUE, FALSE))
## [1] 1 1 0 0 0
Exercise 4.1

In one of the previous exercises, we computed the cross-entropy loss between a logical vector \(\boldsymbol{y}\in\{0, 1\}^n\) and a numeric vector \(\boldsymbol{p}\in(0, 1)^n\). This measure can be equivalently defined as:

\[ \mathcal{L}(\boldsymbol{p}, \boldsymbol{y}) = -\frac{1}{n}\left(\sum_{i=1}^n y_i\log(p_i)+ (1-y_i)\log(1-p_i) \right). \]

Implement the above formula (using vectorised operations, but not relying on ifelse this time) and compute the cross-entropy loss between, say, “y <- sample(c(FALSE, TRUE), n)” and “p <- runif(n)” for some n. Note how seamlessly we are translating between FALSE/TRUEs and 0/1s in the above equation (in particular, where we let \(1-y_i\) mean the logical negation of \(y_i\)).

4.2. Lists#

Lists are generalised vectors. They can be comprised of R objects of any kind, also other lists. This is why we classify them as recursive (and not atomic) objects. They are especially useful wherever there is a need to handle some multitude as a single entity.

4.2.1. Creating lists#

The most straightforward way to create a list is by means of the list function:

list(1, 2, 3)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3

Notice that the above is not the same as “c(1, 2, 3)”. We got a sequence that wraps three numeric vectors, each of length one. Also, how overly talkative R is when printing out lists!

list(c(1, 2, 3), 4, c(TRUE, FALSE, FALSE, NA, TRUE), "and so forth")
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] 4
##
## [[3]]
## [1]  TRUE FALSE FALSE    NA  TRUE
##
## [[4]]
## [1] "and so forth"
list(list(c(TRUE, FALSE, NA, TRUE), letters), runif(5))  # a list of lists
## [[1]]
## [[1]][[1]]
## [1]  TRUE FALSE    NA  TRUE
##
## [[1]][[2]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
##
## [[2]]
## [1] 0.28758 0.78831 0.40898 0.88302 0.94047

However, the str function can be used to print R objects in a more concise fashion:

str(list(list(c(TRUE, FALSE, NA, TRUE), letters), runif(5)))
## List of 2
##  $ :List of 2
##   ..$ : logi [1:4] TRUE FALSE NA TRUE
##   ..$ : chr [1:26] "a" "b" "c" "d" ...
##  $ : num [1:5] 0.288 0.788 0.409 0.883 0.94

Note

In Section 4.1, we said that the c function, when fed with arguments of mixed types, tries to determine the common type that retains the sense of data. If coercion to an atomic vector is not possible, the result will be a list.

c(1, "two", identity)  # `identity` is an object of the type "function"
## [[1]]
## [1] 1
##
## [[2]]
## [1] "two"
##
## [[3]]
## function (x)
## x
## <environment: namespace:base>

Thus, the c function can also be used to concatenate lists:

c(list(1), list(2), list(3))  # 3 lists -> 1 list
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3

Lists can be repeated using rep:

rep(list(1:11, LETTERS), 2)
## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10 11
##
## [[2]]
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
##
## [[3]]
##  [1]  1  2  3  4  5  6  7  8  9 10 11
##
## [[4]]
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

4.2.2. Coercing to and from lists#

The conversion of an atomic vector to a list of length-1 vectors can be done via a call to as.list:

as.list(c(1, 2, 3))  # vector of length 3 -> list of 3 length-1 vectors
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3

Unfortunately, calling, say, as.numeric on a list arouses an error (even if it is comprised of numeric vectors only). We can try to flatten it to an atomic sequence by calling unlist:

unlist(list(list(1, 2), list(3, list(4:8)), 9))
## [1] 1 2 3 4 5 6 7 8 9
unlist(list(list(1, 2), list(3, list(4:8)), "spam"))
## [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "spam"

Note

(*) Chapter 11 will mention the simplify2array function, which generalises unlist in a way that can sometimes give rise to a matrix.

4.3. NULL#

The NULL object (the one and only object of the type “NULL”) can be used as a placeholder for any other R object or designate the absence of such.

list(NULL, NULL, month.name)
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
##  [1] "January"   "February"  "March"     "April"     "May"
##  [6] "June"      "July"      "August"    "September" "October"
## [11] "November"  "December"

NULL is different from a vector of length zero because the latter has a type.

However, NULL sometimes behaves as a 0-length vector. In particular, length(NULL) returns 0. Also, c called with no arguments returns NULL.

Testing for NULL-ness can be done with a call to is.null.

Important

NULL is not the same as NA (or it is other-typed variants). The latter can be emplaced in an atomic vector.

c(1, NA, 3, NULL, 5)  # NULL behaves as a 0-length vector here
## [1]  1 NA  3  5

They both have very distinct semantics (no value vs a missing value).

Later we will see that some functions return NULL invisibly when they have nothing interesting to report. This is the case of print or plot, which are called because of their side effects (printing and plotting).

Furthermore, in some contexts, replacing content with NULL (e.g., when subsetting a list) will actually result in its removal.

4.4. Object attributes#

Lists can be used to embrace many objects to form a single item sequence. Attributes, on the other hand, give means to inject some extra data into an object of any type (except NULL).

Attributes are (unordered) key=value pairs, where key is a single string, and value is any R object except NULL. They can be introduced by calling, amongst others[1], the structure function:

x_simple <- 1:10
x <- structure(
    x_simple,  # the object to be equipped with attributes
    attribute1="value1",
    attribute2=c(6, 100, 324)
)

4.4.1. Developing perceptual indifference to most attributes#

Let us see how the above x is reported on the console:

print(x)
##  [1]  1  2  3  4  5  6  7  8  9 10
## attr(,"attribute1")
## [1] "value1"
## attr(,"attribute2")
## [1]   6 100 324

The object of concern, “1:10”, was displayed first. We need to get used to that. Most of the time, we suggest to treat the “attr…” parts of the display as if they were printed in tiny font.

Equipping an object with attributes does not change its very nature (see, however, Chapter 10 for some exceptions). For example, the above x, despite featuring some extra data (metadata), is still treated as an ordinary sequence of numbers by most functions:

sum(x)   # the same as sum(1:10), sum() does not care about any attributes
## [1] 55
typeof(x)  # just a numeric vector, but with some perks
## [1] "integer"

Important

Attributes are generally ignored by most functions unless they have specifically been programmed to pay attention to them.

4.4.2. But there are some use cases#

Some R functions add attributes to the return value to sneak extra information that might be useful, just in case.

For instance, na.omit, whose main aim is to remove missing values from an atomic vector, yields:

y <- c(10, 20, NA, 40, 50, NA, 70)
(y_na_free <- na.omit(y))
## [1] 10 20 40 50 70
## attr(,"na.action")
## [1] 3 6
## attr(,"class")
## [1] "omit"

We can enjoy the NA-free version of y in any further computations:

mean(y_na_free)
## [1] 38

However, the na.action attribute (we ignore the class part until Chapter 10) tells us where the missing observations were:

attr(y_na_free, "na.action")  # read the attribute value
## [1] 3 6
## attr(,"class")
## [1] "omit"

As another example, gregexpr can be used to search for a given pattern in a character vector (for more details, see Chapter 6):

needle <- "spam|gluten"  # pattern to search for: spam OR gluten
haystack <- c("spam, spam, bacon, and gluten-free spam", "spammer")  # text
(pos <- gregexpr(needle, haystack))
## [[1]]
## [1]  1  7 24 36
## attr(,"match.length")
## [1] 4 4 6 4
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
##
## [[2]]
## [1] 1
## attr(,"match.length")
## [1] 4
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE

We sought all occurrences of the pattern within two character strings. As their number may vary from string to string, wrapping the results in a list was a good design choice. Each list element gives the starting positions where matches can be found (there are four and one match(es), respectively).

Each vector of positions also features its own match.length attribute (amongst others), in case we need it.

Exercise 4.2

Create a list with EUR/AUD, EUR/GBP, and EUR/USD exchange rates read from the euraud-*.csv, eurgbp-*.csv, and eurusd-*.csv files in our data repository. Each of its three elements should be a numeric vector storing the currency exchange rates. Furthermore, equip them with currency_from, currency_to, date_from, and date_to attributes, for example:

##  [1]     NA 1.6006 1.6031     NA     NA 1.6119 1.6251 1.6195 1.6193 1.6132
## [11]     NA     NA 1.6117 1.6110 1.6188 1.6115 1.6122     NA
## attr(,"currency_from")
## [1] "EUR"
## attr(,"currency_to")
## [1] "AUD"
## attr(,"date_from")
## [1] "2020-01-01"
## attr(,"date_to")
## [1] "2020-06-30"

Such additional information could be stored in a few separate variables (other vectors), but then it would not be as convenient to use as the above representation.

4.4.3. Special attributes#

Attributes have great potential which is somewhat wasted, for R users rarely know:

  • that attributes exist (pessimistic scenario), or

  • how to handle them (realistic scenario).

But we now know.

What is more, some attributes have been predestined to play a fundamental role in R. Namely, the most prevalent amongst the special attributes are:

  • names, row.names, and dimnames are used to label the elements of atomic and generic vectors (see below), and also rows and columns in matrices (Chapter 11) and data frames (Chapter 12),

  • dim allows for turning flat vectors into matrices and other tensors (Chapter 11),

  • levels labels the underlying integer codes in factor objects (Section 10.3.2),

  • class can be used to bring forth new complex data structures based on basic types (Chapter 10).

We call them special because:

  • they cannot be assigned arbitrary values; for instance, we will soon see that names can only be mapped to a character vector of the length equal to that of the sequence it is labelling,

  • they can be accessed via designated functions, e.g., names, class, dim, dimnames, levels, etc.,

  • they are widely recognised by many base and third-party R functions.

However, in spite of the above, special attributes can still be managed as any other (ordinary) ones.

Exercise 4.3

comment is perhaps the most rarely used special attribute. Create an object (whatever) equipped with the comment attribute. Verify that assigning to it anything other than a character vector leads to an error. Read its value by calling the comment function. Display the object equipped with this attribute. Note that the print function ignores its existence whatsoever: this is how special it is.

Important

(*) The accessor functions such as names or class might return meaningful values, even if the corresponding attribute is not set explicitly; see, e.g., Section 11.1.5 for an example.

4.4.4. Labelling vector elements with the names attribute#

A special attribute called names can be used to label the elements of atomic vectors and lists.

(x <- structure(c(13, 2, 6), names=c("spam", "sausage", "celery")))
##    spam sausage  celery
##      13       2       6

The labels may improve the expressivity and readability of our code and data.

Exercise 4.4

Verify that the above x is still an ordinary numeric vector by calling typeof and sum on it.

Let us stress that we can ignore the names attribute whatsoever. If we apply any operation discussed in Chapter 2, we will still garner the same result no matter if such extra information is present or not.

It is just the print function that changed its behaviour slightly (it is a special attribute, after all). Instead of reporting:

## [1] 13  2  6
## attr(,"names​")
## [1] "spam"    "sausage" "celery"

we got a nicely formatted table-like display. Non-special attributes are still printed in a standard way.

##    spam sausage  celery
##      13       2       6
## attr(,"additional_attribute")
##  [1]  1  2  3  4  5  6  7  8  9 10

Note

Chapter 5 will also mention that some operations (such as indexing) can gain extra features in the presence of the names attribute.

This attribute can be read by calling:

attr(x, "names")  # just like any other attribute
## [1] "spam"    "sausage" "celery"
names(x)  # because it is so special
## [1] "spam"    "sausage" "celery"

Named vectors can be easily created with the c and list functions as well:

c(a=1, b=2)
## a b
## 1 2
list(a=1, b=2)
## $a
## [1] 1
##
## $b
## [1] 2
c(a=c(x=1, y=2), b=3, c=c(z=4))  # this is smart
## a.x a.y   b c.z
##   1   2   3   4

Let us contemplate how a named list is printed on the console. Again, it is still a list, but with some extras.

Exercise 4.5

A whole lot of functions return named vectors. Evaluate the following expressions and read the corresponding pages in the documentation:

  • quantile(runif(100)),

  • hist(runif(100), plot=FALSE),

  • options (take note of the digits, scipen, max.print, and width options),

  • capabilities.

Note

(*) Most of the time, lists are used merely as containers for other R objects. This is a boring yet essential role. However, let us just mention here that each data frame is, in fact, a generic vector (see Chapter 12). Each column corresponds to a named list element:

(df <- head(iris))  # some data frame
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
typeof(df)  # it is just a list (with extras that'll be discussed later)
## [1] "list"
unclass(df)  # how it is represented exactly (without the extras)
## $Sepal.Length
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
##
## $Sepal.Width
## [1] 3.5 3.0 3.2 3.1 3.6 3.9
##
## $Petal.Length
## [1] 1.4 1.4 1.3 1.5 1.4 1.7
##
## $Petal.Width
## [1] 0.2 0.2 0.2 0.2 0.2 0.4
##
## $Species
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
##
## attr(,"row.names")
## [1] 1 2 3 4 5 6

Therefore, the functions we discuss in this chapter are of use in the processing of such structured data as well.

4.4.5. Altering and removing attributes#

We know that a single attribute can be read by calling attr. Their whole list is generated with a call to attributes.

(x <- structure(c("some", "object"), names=c("X", "Y"),
    attribute1="value1", attribute2="value2", attribute3="value3"))
##        X        Y
##   "some" "object"
## attr(,"attribute1")
## [1] "value1"
## attr(,"attribute2")
## [1] "value2"
## attr(,"attribute3")
## [1] "value3"
attr(x, "attribute1")  # reads a single attribute, returns NULL if unset
## [1] "value1"
attributes(x)  # returns a named list with all attributes of an object
## $names
## [1] "X" "Y"
##
## $attribute1
## [1] "value1"
##
## $attribute2
## [1] "value2"
##
## $attribute3
## [1] "value3"

We can alter an attribute’s value or add further attributes by referring to the structure function once again. Moreover, setting an attribute’s value to NULL gets rid of it completely.

structure(x, attribute1=NULL, attribute4="added", attribute3="modified")
##        X        Y
##   "some" "object"
## attr(,"attribute2")
## [1] "value2"
## attr(,"attribute3")
## [1] "modified"
## attr(,"attribute4")
## [1] "added"

As far as the names attribute is concerned, we may generate an un-named copy of an object by calling:

unname(x)
## [1] "some"   "object"
## attr(,"attribute1")
## [1] "value1"
## attr(,"attribute2")
## [1] "value2"
## attr(,"attribute3")
## [1] "value3"

In Section 9.4.6, we will discuss the so-called replacement functions. They will enable us to modify or remove an object’s attribute by calling “attr(x, "some_attribute") <- new_value”.

Moreover, Section 5.5 highlights that certain operations (such as vector indexing, elementwise arithmetic operations, and coercion) might not preserve all attributes of the objects that were given as their inputs.

4.5. Exercises#

Exercise 4.6

Answer the following.

  • That is the meaning of “c(TRUE, FALSE) * 1:10”?

  • What does “sum(as.logical(x))” compute when x is a numeric vector?

  • We said that atomic vectors of the type character are the most general ones. Therefore, is “as.numeric(as.character(x))” the same as “as.numeric(x)”, regardless of the type of x?

  • What is the meaning of “as.logical(x+y)” if x and y are logical vectors? What about “as.logical(x*y)”, “as.logical(1-x)”, and “as.logical(x!=y)”?

  • Let x be a named numeric vector, e.g., “x <- quantile(runif(100))”. What is the result of “2*x”, “mean(x)”, and round(x, 2)?

  • What is the meaning of x == NULL?

  • Give two ways to create a named character vector.

  • Give two ways (discussed above; there are more) to remove the names attribute from an object.

Exercise 4.7

There are a few peculiarities when joining or coercing lists. Compare the results generated by the following pairs of expressions:

# 1)
as.character(list(list(1, 2), list(3, list(4)), 5))
as.character(unlist(list(list(1, 2), list(3, list(4)), 5)))
# 2)
as.numeric(list(list(1, 2), list(3, list(4)), 5))
as.numeric(unlist(list(list(1, 2), list(3, list(4)), 5)))
# 3)
unlist(list(list(1, 2), sd))
list(1, 2, sd)
# 4)
c(list(c(1, 2), 3), 4, 5)
c(list(c(1, 2), 3), c(4, 5))
Exercise 4.8

Given numeric vectors x, y, z, and w, how to combine x, y, and list(z, w) so as to obtain list(x, y, z, w)? More generally, given a set of atomic vectors and lists of atomic vectors, how to combine them to get a single list that features all atomic vectors as its elements (not a list of atomic vectors and lists, not atomic vectors unwound, etc.)?

Exercise 4.9

What is the meaning of the following when x is a logical vector?

  • cummin(x) and cummin(!x),

  • cummax(x) and cummax(!x),

  • cumsum(x) and cumsum(!x),

  • cumprod(x) and cumprod(!x).

Exercise 4.10

readRDS allows for serialising R objects and writing their snapshots to disk so that they can be later restored very quickly via a call to saveRDS. Verify whether this function preserves object attributes.

See also dput and dget which work with objects’ textual representation in the form executable R code.

Exercise 4.11

(*) Use jsonlite::fromJSON to read a JSON file in the form of a named list.

In the extremely unlikely event of us finding the current chapter boring, let us rejoice: some of the exercises and remarks that we will encounter in the next part – devoted to vector indexing – will definitely be deliciously stimulating!