Deep R Programming

Marek Gagolewski

doi:10.5281/zenodo.7490464

10. S3 classes¶

This open-access textbook is, and will remain, freely available for everyone’s enjoyment (also in PDF; a paper copy can also be ordered). It is a non-profit project. Although available online, it is a whole course, and should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Any bug/typo reports/fixes are appreciated. Make sure to check out Minimalist Data Wrangling with Python [28], too.

Let x be a randomly generated matrix with 1 000 000 rows and 1 000 columns, y be a data frame with results from the latest survey indicating that things are way more complicated than what most people think, and z be another matrix, this time with many zeroes.

The human brain is not capable of dealing with excessive amounts of data that are immoderately specific. This is why we have a natural tendency to group different entities based on their similarities. This way, we form more abstract classes of objects.

Also, many of us are inherently lazy. Oftentimes we take shortcuts to minimise energy (at a price to be paid later).

Printing out a matrix, a data frame, and a time series are all instances of the displaying of things, although they undoubtedly differ in detail. By now, we have probably forgotten which objects are hidden behind the aforementioned x, y, and z. Being able to simply call print(y) without having to recall that, yes, y is a data frame, seems pretty appealing.

This chapter introduces S3 classes [14]. They provide a lightweight object-orientated programming (OOP) approach for automated dispatching calls to generics of the type print(y) to concrete methods such as print.data.frame(y), based on the class of the object they are invoked on.

We shall see that S3 classes in their essence are beautifully simple[1]. Ultimately, generics and methods are ordinary R functions (Chapter 7) and classes are merely additional object attributes (Section 4.4).

Of course, this does not mean that wrapping our heads around them will be effortless. However, unlike other “class systems”[2], S3 is ubiquitous in most R programming projects. Suffice it to say that factors, matrices, and data frames discussed in the coming chapters are straightforward, S3-based extensions of the concepts we are about to introduce.

10.1. Object type vs class¶

Recall that typeof (introduced in Section 4.1) returns the internal type of an object. So far, we were mostly focused on atomic and generic vectors; compare Figure 1 in the Preface.

typeof(NULL)
## [1] "NULL"
typeof(c(TRUE, FALSE, NA))
## [1] "logical"
typeof(c(1, 2, 3, NA_real_))
## [1] "double"
typeof(c("a", "b", NA_character_))
## [1] "character"
typeof(list(list(1, 2, 3), LETTERS))
## [1] "list"
typeof(function(x) x)
## [1] "closure"

The number of admissible types is small[3], but they open the world of endless possibilities[4]. They provide a basis for more complex data structures. This is thanks to the fact that they can be equipped with arbitrary attributes (Section 4.4).

Most compound types constructed using the mechanisms discussed in this chapter only pretend they are something different from what they actually are. Still, they often do their job very well. By looking under their bonnet, we will be able to manipulate their state outside of the prescribed use cases.

Important

Setting the class attribute might make some objects behave differently in certain scenarios.

Example 10.1

Let’s equip two identical objects with different class attributes.

xt <- structure(123, class="POSIXct")  # POSIX calendar time
xd <- structure(123, class="Date")

Both objects are represented using numeric vectors:

c(typeof(xt), typeof(xd))
## [1] "double" "double"

However, when printed, they are decoded differently:

print(xt)
## [1] "1970-01-01 01:02:03 CET"
print(xd)
## [1] "1970-05-04"

In the former case, 123 is understood as the number of seconds since the UNIX epoch, 1970-01-01T00:00:00+0000. The latter is deciphered as the number of days since the said timestamp. Therefore, we expect that there must exist a mechanism that calls a version of print dependent on an object’s virtual class. That it only relies on the class attribute, which might be set, unset, or reset freely, is emphasised below.

attr(xt, "class") <- "Date"  # change class from POSIXct to Date
print(xt)  # same 123, but now interpreted as Date
## [1] "1970-05-04"
as.numeric(xt)  # drops all attributes
## [1] 123
unclass(xd)  # drops the class attribute; `attr<-`(xd, "class", NULL)
## [1] 123

We are having so much fun that one more illustration can only grow our joy.

Example 10.2

Consider an example data frame:

x <- iris[1:3, 1:2]  # a subset of an example data frame
print(x)
##   Sepal.Length Sepal.Width
## 1          5.1         3.5
## 2          4.9         3.0
## 3          4.7         3.2

It is an object of the class (an object whose class attribute is set to):

attr(x, "class")
## [1] "data.frame"

Some may say, and they are absolutely right, that we have not covered data frames yet. After all, they are the topic of Chapter 12, which is still ahead of us. However, from the current perspective, we should know that R data frames are nothing but lists of vectors of the same lengths equipped with the names and row.names attributes.

typeof(x)
## [1] "list"
`attr<-`(x, "class", NULL)  # or unclass(x)
## $Sepal.Length
## [1] 5.1 4.9 4.7
##
## $Sepal.Width
## [1] 3.5 3.0 3.2
##
## attr(,"row.names")
## [1] 1 2 3
print(x)
##   Sepal.Length Sepal.Width
## 1          5.1         3.5
## 2          4.9         3.0
## 3          4.7         3.2

Important

Revealing how x is actually represented enables us to process it using the extensive skill set that we have already[5] developed by studying the material covered in the previous part of our book (including all the exercises). This fact is noteworthy because some built-in and third-party data types are not particularly well-designed.

Let’s underline again that attributes are simple additions to R objects. However, as we said in Section 4.4.3, certain attributes are special, and class is one of them. In particular, we can only set class to be a character vector (possibly of length greater than one; see Section 10.2.5).

x <- 12345
attr(x, "class") <- 1  # character vectors only
## Error in attr(x, "class") <- 1: attempt to set invalid 'class' attribute

Furthermore, the class function can read the value of the class attribute. Its replacement version is also available.

class(x) <- "Date"  # set; the same as attr(x, "class") <- "Date"
class(x)  # get; here, it is the same as attr(x, "class")
## [1] "Date"

Important

The class function always yields a value, even if the corresponding attribute is not set. We call it an implicit class. Compare the following results:

class(NULL)  # no `class` set because NULL cannot have any attributes
## [1] "NULL"
class(c(TRUE, FALSE, NA))  # no attributes so class is implicit (= typeof)
## [1] "logical"
class(c(1, 2, 3, NA_real_))  # typeof returns "double"
## [1] "numeric"
class(c("a", "b", NA_character_))
## [1] "character"
class(list(list(1, 2, 3), LETTERS))
## [1] "list"
class(function(x) x)  # typeof gives "closure"
## [1] "function"

Also, Chapter 11 will explain that any object equipped with the dim attribute also has an implicit class:

(x <- as.matrix(c(1, 2, 3)))
##      [,1]
## [1,]    1
## [2,]    2
## [3,]    3
attributes(x)  # `class` is not amongst the attributes
## $dim
## [1] 3 1
class(x)  # implicit class
## [1] "matrix" "array"
typeof(x)  # it is still a numeric vector (under the bonnet)
## [1] "double"

10.2. Generics and method dispatching¶

10.2.1. Generics, default, and custom methods¶

Let’s inspect the source code of the print function:

print(print)  # sic!
## function (x, ...)
## UseMethod("print")
## <environment: namespace:base>

Any function like the above[6] we will call from now on a generic (an S3 generic, from S version 3 [14]). Its only job is to invoke UseMethod("print"). It dispatches the control flow to another function, referred to as a method, based on the class of the first argument.

Important

All arguments passed to the generic will also be available[7] in the method dispatched to.

For example, let’s define an object of the class categorical (a name that we have just come up with; we could have called it cat, CATEGORICAL, or SpanishInquisition as well). It will be our version of the factor type that we discuss later.

x <- structure(
    c(1, 3, 2, 1, 1, 1, 3),
    levels=c("a", "b", "c"),
    class="categorical"
)

We assume that such an object is a sequence of small positive integers (codes). It is equipped with the levels attribute, which is a character vector of length not less than the maximum of the said integers. In particular, the first level deciphers the meaning of the code 1. Hence, the above vector represents a sequence a, c, b, a, a, a, c.

There is no special method for displaying objects of the categorical class. Hence, when we call print, the default (fallback) method is invoked:

print(x)
## [1] 1 3 2 1 1 1 3
## attr(,"levels")
## [1] "a" "b" "c"
## attr(,"class")
## [1] "categorical"

This is the standard function for displaying numeric vectors. We are well familiar with it. Its name is print.default, and we can always call it directly:

print.default(x)  # the default print method
## [1] 1 3 2 1 1 1 3
## attr(,"levels")
## [1] "a" "b" "c"
## attr(,"class")
## [1] "categorical"

However, we can introduce a designated method for printing categorical objects. Its name must precisely be print.categorical:

print.categorical <- function(x, ...)
{
    x_character <- attr(x, "levels")[unclass(x)]
    print(x_character)  # calls `print.default`
    cat(sprintf("Categories: %s\n",
        paste(attr(x, "levels"), collapse=", ")))
    invisible(x)  # this is what all print methods do; see help("print")
}

Calling print automatically dispatches the control flow to this method:

print(x)
## [1] "a" "c" "b" "a" "a" "a" "c"
## Categories: a, b, c

Of course, the default method can still be called. Referring to print.default(x) directly will output the same result as the one a few chunks above.

Note

print.categorical has been equipped with the dot-dot-dot attribute since the generic print had one too[8].

10.2.2. Creating generics¶

Introducing new S3 generics is as straightforward as defining a function that calls UseMethod. For instance, here is a dispatcher which creates new objects of the categorical class based on other objects:

as.categorical <- function(x, ...)
    UseMethod("as.categorical")  # synonym: UseMethod("as.categorical", x)

We always need to define the default method:

as.categorical.default <- function(x, ...)
{
    if (!is.character(x))
        x <- as.character(x)
    xu <- unique(sort(x))  # drops NAs
    structure(
        match(x, xu),
        class="categorical",
        levels=xu
    )
}

Testing:

as.categorical(c("a", "c", "a", "a", "d", "c"))
## [1] "a" "c" "a" "a" "d" "c"
## Categories: a, c, d
as.categorical(c(3, 6, 4, NA, 9, 9, 6, NA, 3))
## [1] "3" "6" "4" NA  "9" "9" "6" NA  "3"
## Categories: 3, 4, 6, 9

This method is already quite flexible. It handles a wide variety of data types because it relies on the built-in generic as.character (Section 10.2.3).

Example 10.3

We might want to forbid the conversion from lists because it does not necessarily make sense:

as.categorical.list <- function(x, ...)
    stop("conversion of lists to categorical is not supported")

The users can always be instructed in the method’s documentation that they are responsible for converting lists to another type prior to a call to as.categorical.

Example 10.4

The default method deals with logical vectors perfectly fine:

as.categorical(c(TRUE, FALSE, NA, NA, FALSE))  # as.categorical.default
## [1] "TRUE"  "FALSE" NA      NA      "FALSE"
## Categories: FALSE, TRUE

However, we might still want to introduce its specialised version. This is because we know a slightly more efficient algorithm (and we have nothing better to do) based on the fact that FALSE and TRUE converted to numeric yield 0 and 1, respectively:

as.categorical.logical <- function(x, ...)
{
    if (!is.logical(x))
        x <- as.logical(x)  # or maybe stopifnot(is.logical(x))?
    structure(
        x + 1,  # only 1, 2, and NAs will be generated
        class="categorical",
        levels=c("FALSE", "TRUE")
    )
}

It spawns the same result as the default method but is slightly faster.

as.categorical(c(TRUE, FALSE, NA, NA, FALSE))  # as.categorical.logical
## [1] "TRUE"  "FALSE" NA      NA      "FALSE"
## Categories: FALSE, TRUE

We performed some argument consolidation at the beginning because a user is always able to call a method directly on an R object of any kind (which is a good thing; see Section 10.2.4). In other words, there is no guarantee that the argument x must be of type logical.

10.2.3. Built-in generics¶

Many[9] functions and operators we have introduced so far are, in fact, S3 generics: print, head, `[`, `[[`, `[<-`, `[[<-`, length, `+`, `<=`, is.numeric, as.numeric, is.character, as.character, as.list, round, log, sum, rep, c, and na.omit, to name a few.

Example 10.5

Let’s overload the as.character method. The default one does not make much sense for the objects of our custom type:

as.character(x)
## [1] "1" "3" "2" "1" "1" "1" "3"

So:

as.character.categorical <- function(x, ...)
    attr(x, "levels")[unclass(x)]

And now:

as.character(x)
## [1] "a" "c" "b" "a" "a" "a" "c"

Exercise 10.6

Overload the unique and rep methods for objects of the class categorical.

Example 10.7

New types ought to be designed carefully. For instance, if we forget to overload the to-numeric converter, some users might be puzzled[10] when they see:

(x <- as.categorical(c(4, 9, 100, 9, 9, 100, 42, 666, 4)))
## [1] "4"   "9"   "100" "9"   "9"   "100" "42"  "666" "4"
## Categories: 100, 4, 42, 666, 9
as.double(x)  # synonym: as.numeric(x); here, it calls as.double.default(x)
## [1] 2 5 1 5 5 1 3 4 2

Hence, we might want to introduce a new method:

as.double.categorical <- function(x, ...)  # not: as.numeric.categorical
{
    # actually: as.double.default(as.character.categorical(x))
    as.double(as.character(x))
}

It now yields:

as.double(x)  # or as.numeric(x); calls as.double.categorical(x)
## [1]   4   9 100   9   9 100  42 666   4

Note

We can still use unclass to fetch the codes:

unclass(x)
## [1] 2 5 1 5 5 1 3 4 2
## attr(,"levels")
## [1] "100" "4"   "42"  "666" "9"

It is because the foregoing returns a class-free object, which is now guaranteed to be processed by the default methods (print, subsetting, as.character, etc.).

Exercise 10.8

What would happen if we used as.numeric instead of unclass in print.categorical and as.character.categorical?

Exercise 10.9

Update the preceding methods so that we can also create named objects of the class categorical (i.e., equipped with the names attribute).

Exercise 10.10

The levels of x are sorted lexicographically, not numerically. Introduce a single method that would let the above code (when rerun without any alterations) generate a more natural result.

10.2.4. First-argument dispatch and calling S3 methods directly¶

With S3, dispatching is most often done based on the class of only one[11] argument: by default, the first one from the parameter list.

For example, the c function is a generic that dispatches on the first argument’s class. Let’s overload it for categorical objects. In other words, we will create a function to be called by the generic when it is invoked on a series of objects whose first element is of the said class.

c.categorical <- function(...)
    as.categorical(
        unlist(
            lapply(list(...), as.character)
        )
    )

It converts each argument to a character vector, relying on the generic as.character to take care of the details. It works because unlist converts a list of such atomic vectors to a single sequence of strings.

Calling c with the first argument of the class categorical dispatches to the above method:

x <- c(9, 5, 7, 7, 2)
xc <- as.categorical(x)
c(xc, x)  # c.categorical
##  [1] "9" "5" "7" "7" "2" "9" "5" "7" "7" "2"
## Categories: 2, 5, 7, 9

However, if the first argument is, say, unclassed, the default method will be consulted:

c(x, xc)  # default c
##  [1] 9 5 7 7 2 4 2 3 3 1

It ignored the class attribute and saw xc as it is, a bareboned numeric vector:

`attributes<-`(xc, NULL)  # the underlying codes
## [1] 4 2 3 3 1

It is not a bug. It is a well-documented (and now explained) behaviour. After all, compound types (classed objects) are emulated through the basic ones.

Important

In most cases, S3 methods can be called directly to get the desired outcome:

c.categorical(x, xc)  # force a call to the specific method
##  [1] "9" "5" "7" "7" "2" "9" "5" "7" "7" "2"
## Categories: 2, 5, 7, 9

We said in most cases because methods can be:

hardcoded at the C language level (e.g., there is no c.default defined at all[12]),
hidden (defined in a package’s namespace but not exported; Section 16.3.6),
overloaded as a group; see Section 10.2.6 and help("groupGeneric").

Example 10.11

Purely for jollity, let’s find a partition of the iris dataset into three clusters using the \(k\)-means algorithm:

res <- kmeans(iris[-5], centers=3, nstart=10)
print(res)
## K-means clustering with 3 clusters of sizes 50, 62, 38
##
## Cluster means:
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1       5.0060      3.4280       1.4620      0.2460
## 2       5.9016      2.7484       4.3935      1.4339
## 3       6.8500      3.0737       5.7421      2.0711
##
## Clustering vector:
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [71] 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
##  [ reached getOption("max.print") -- omitted 51 entries ]
##
## Within cluster sum of squares by cluster:
## [1] 15.151 39.821 23.879
##  (between_SS / total_SS =  88.4 %)
##
## Available components:
##
## [1] "cluster"      "centers"      "totss"        "withinss"
## [5] "tot.withinss" "betweenss"    "size"         "iter"
## [9] "ifault"

It is an object of the class:

class(res)
## [1] "kmeans"

which, in fact, is a:

typeof(res)
## [1] "list"

The underlying list looks like:

unclass(res)
## $cluster
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [71] 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
##  [ reached getOption("max.print") -- omitted 51 entries ]
##
## $centers
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1       5.0060      3.4280       1.4620      0.2460
## 2       5.9016      2.7484       4.3935      1.4339
## 3       6.8500      3.0737       5.7421      2.0711
##
## $totss
## [1] 681.37
##
## $withinss
## [1] 15.151 39.821 23.879
##
## $tot.withinss
## [1] 78.851
##
## $betweenss
## [1] 602.52
##
## $size
## [1] 50 62 38
##
## $iter
## [1] 2
##
## $ifault
## [1] 0

We already know that res was displayed in a fancy way only because there is a print method overloaded for objects of the kmeans class.

But is there?

print.kmeans
## Error in eval(expr, envir, enclos): object 'print.kmeans' not found

Even though the method is hidden (internal) in the stats package’s namespace, from Section 16.3.6 we will learn that it can be accessed by calling getS3method("print", "kmeans") or referring to stats:::print.kmeans (note the triple colon).

10.2.5. Multi-class-ness¶

The class attribute can be instantiated as a character vector of any length. For example:

(t1 <- Sys.time())
## [1] "2025-01-15 15:07:03 CET"
(t2 <- strptime("2021-08-15T12:59:59+1000", "%Y-%m-%dT%H:%M:%S%z"))
## [1] "2021-08-15 04:59:59"

Let’s inspect the classes of these two objects:

class(t1)
## [1] "POSIXct" "POSIXt"
class(t2)
## [1] "POSIXlt" "POSIXt"

Section 10.3.1 will discuss date-time classes in more detail. It will highlight that the former is represented as a numeric vector, while the latter is a list. Thus, these two should primarily be seen as instances of two distinct types. However, as both of them have a lot in common, it was a wise design choice to allow them to be seen also as the representatives of the same generic category of POSIX time objects.

Important

When calling a generic function[13] f on an object x of the classes[14] class1, class2, …, classK (in this order), UseMethod(f, x) dispatches to the method determined as follows:

if f.class1 is available[15], call it;
otherwise, if f.class2 is available, call this one;
…;
otherwise, if f.classK is available, invoke it;
otherwise, refer to the fallback f.default.

Example 10.12

There is a method diff for objects of the class POSIXt that carries a statement:

r <- if (inherits(x, "POSIXlt")) as.POSIXct(x) else x

This way, we can process both POSIXct and POSIXlt instances using the same procedure.

We should see no magic in this simple scheme. It is nothing more than a way to determine the method to be called for a particular R object. It can be used as a mechanism to mimic the idea of inheritance in object-orientated programming languages. However, the S3 system does not allow for defining classes in any formal manner.

For example, we cannot say that objects of the class POSIXct inherit from POSIXt. Neither can we say that each object of the class POSIXct is also an instance of POSIXt. The class attribute can still be set arbitrarily on a per-object basis. We can create ones whose class is simply POSIXct (without the POSIXt part) or even c("POSIXt", "POSIXct") (in this order).

Note

In any method, it is possible to call the method corresponding to the next class by calling NextMethod.

For instance, if we are in f.class1, a call to NextMethod(f) will try invoking f.class2. If such a method does not exist, further methods in the search chain will be probed, falling back to the default method if necessary. We will give an illustration later.

10.2.6. Operator overloading¶

Operators are ordinary functions (Section 9.3.5). Even though what follows can partially be implied by what we have already said, as usual in R, some oddities are to be expected.

For example, let’s overload the index operator for objects of the class categorical. Looking at help("["), we see that the default method has two arguments: x (the categorical object being sliced) and i (the indexer). Ours will have the same interface then:

`[.categorical` <- function(x, i)
{
    structure(
        unclass(x)[i],  # `[`(unclass(x), i)
        class="categorical",
        levels=attr(x, "levels")  # the same levels as input
    )
}

The default S3 method, `[.default`, is hardcoded at the C language level and we cannot refer to it directly. This is why we called unclass instead. Alternatively, we can also invoke NextMethod:

`[.categorical` <- function(x, i)
{
    structure(
        NextMethod("["),  # call default method, passing `x` and `i`
        class="categorical",
        levels=attr(x, "levels")  # the same levels as input
    )
}

We can also introduce the replacement version of this operator:

`[<-.categorical` <- function(x, i, value)
{
    levels <- attr(x, "levels")
    value <- match(value, levels)  # integer codes corresponding to levels
    structure(
        NextMethod("[<-"),  # call default method, passing `x`, `i`, `values`
        class="categorical",
        levels=levels  # same levels as input
    )

    # # or, equivalently:
    # structure(
    #     `[<-`(unclass(x), i, value=match(value, attr(x, "levels"))),
    #     class="categorical",
    #     levels=attr(x, "levels")
    # )
}

Testing:

x <- as.categorical(c(3, 6, 4, NA, 9, 9, 6, NA, 3))
x[1:4]
## [1] "3" "6" "4" NA
## Categories: 3, 4, 6, 9
x[1:4] <- c("6", "7")
print(x)
## [1] "6" NA  "6" NA  "9" "9" "6" NA  "3"
## Categories: 3, 4, 6, 9

Notice how we handled the case of nonexistent levels and that the recycling rule has been automagically inherited (amongst other features) from the default index operator.

Exercise 10.13

Do these two operators preserve the names attribute of x? Is indexing with negative integers or logical vectors supported as well? Why is that/is that not the case?

Furthermore, let’s overload the `==` operator. Assume[16] that we would like two categorical objects to be compared based on the actual labels they encode, in an elementwise manner:

`==.categorical` <- function(e1, e2)
    as.character(e1) == as.character(e2)

We are feeling lucky: by not performing any type checking, we rely on the particular as.character methods corresponding to the types of e1 and e2. Also, assuming that as.character always[17] returns a character object, we dispatch to the default method for `==` (which handles atomic vectors).

Some examples:

as.categorical(c(1, 3, 5, 1)) == as.categorical(c(1, 3, 1, 1))
## [1]  TRUE  TRUE FALSE  TRUE
as.categorical(c(1, 3, 5, 1)) == c(1, 3, 1, 1)
## [1]  TRUE  TRUE FALSE  TRUE
c(1, 3, 5, 1) == as.categorical(c(1, 3, 1, 1))
## [1]  TRUE  TRUE FALSE  TRUE

Important

In the case of binary operators, dispatching is done based on the classes of both arguments. In all three preceding calls, we call `==.categorical`, regardless of whether the classed object is the first or the second operand.

If two operands are classed, and different methods are overloaded for both, a warning will be generated, and the default internal method will be called.

`==.A` <- function(e1, e2) "A"
`==.B` <- function(e1, e2) "B"
structure(c(1, 2, 3), class="A") == structure(c(2, NA, 3), class="B")
## Warning: Incompatible methods ("==.A", "==.B") for "=="
## [1] FALSE    NA  TRUE

Note

(*) By creating a single Ops method, we can define the meaning of all binary operators at once.

Ops.categorical <- function(e1, e2)
{
    if (!(.Generic %in% c("<", ">", "<=", ">=", "==", "!=")))
        stop(sprintf("%s not defined for 'categorical' objects", .Generic))
    e1 <- as.character(e1)
    e2 <- as.character(e2)
    NextMethod(.Generic)  # dispatch to the default method (for character)
}

as.categorical(c(1, 3, 5, 1)) > c(1, 2, 4, 2)
## [1] FALSE  TRUE  TRUE FALSE

Here, `.Generic` is a variable representing the name of the operator (generic) being invoked; see Section 16.3.6.

Other group generics are: Summary (including functions such as min, sum, and all), Math (abs, log, round, etc.), matrixOps (`%*%`), and Complex (e.g., Re, Im); see help("groupGeneric") for more details.

Sometimes we must rely on the `.S3method` function to let R recognise a custom method related to such generics.

10.3. Common built-in S3 classes¶

Below we discuss a few noteworthy classes, including those representing date-time information and factors (ordered or not). Note that classes representing tabular data will be dealt with in separate parts, owing to their importance and ubiquity. Namely, matrices and other arrays are covered in Chapter 11, and data frames are discussed in Chapter 12. Inspecting other[18] interesting compound types is left as a simple exercise for the studious reader.

10.3.1. Date, time, etc.¶

The Date class represents… dates (calendar ones, not fruits).

(x <- c(Sys.Date(), as.Date(c("1969-12-31", "1970-01-01", "2023-02-29"))))
## [1] "2025-01-15" "1969-12-31" "1970-01-01" NA
class(x)
## [1] "Date"

Complex types are built on basic ones. Underneath, what we deal with here is:

typeof(x)
## [1] "double"
unclass(x)
## [1] 20103    -1     0    NA

which is the number of days since the UNIX epoch, 1970-01-01T00:00:00+0000 (midnight GMT/UTC).

The POSIXct (calendar time) class represents date-time objects:

(x <- Sys.time())
## [1] "2025-01-15 15:07:03 CET"
class(x)
## [1] "POSIXct" "POSIXt"
typeof(x)
## [1] "double"
unclass(x)
## [1] 1.737e+09

Underneath, it is the number of seconds since the UNIX epoch. By default, whilst printing, the current default timezone is used (see Sys.timezone). However, such objects can be equipped with the tzone attribute.

structure(1, class=c("POSIXct", "POSIXt"))  # using current default timezone
## [1] "1970-01-01 01:00:01 CET"
structure(1, class=c("POSIXct", "POSIXt"), tzone="UTC")
## [1] "1970-01-01 00:00:01 UTC"

In both cases, the time is 1 second after the beginning of the UNIX epoch. On the author’s PC, the former is displayed in the current local timezone, though.

Exercise 10.14

Use ISOdatetime to inspect how midnights are displayed in different timezones.

The POSIXlt (local time) class is represented using a list of atomic vectors[19].

(x <- as.POSIXlt(c(a="1970-01-01 00:00:00", b="2030-12-31 23:59:59")))
##                         a                         b
## "1970-01-01 00:00:00 CET" "2030-12-31 23:59:59 CET"
class(x)
## [1] "POSIXlt" "POSIXt"
typeof(x)
## [1] "list"
str(unclass(x))  # calling str instead of print to make display more compact
## List of 11
##  $ sec   : num [1:2] 0 59
##  $ min   : int [1:2] 0 59
##  $ hour  : int [1:2] 0 23
##  $ mday  : int [1:2] 1 31
##  $ mon   : int [1:2] 0 11
##  $ year  : Named int [1:2] 70 130
##   ..- attr(*, "names")= chr [1:2] "a" "b"
##  $ wday  : int [1:2] 4 2
##  $ yday  : int [1:2] 0 364
##  $ isdst : int [1:2] 0 0
##  $ zone  : chr [1:2] "CET" "CET"
##  $ gmtoff: int [1:2] NA NA
##  - attr(*, "tzone")= chr [1:3] "" "CET" "CEST"
##  - attr(*, "balanced")= logi TRUE

Exercise 10.15

Read about the meaning of each named element, especially mon and year; see help("DateTimeClasses").

The manual states that POSIXlt is supposedly closer to human-readable forms than POSIXct, but it is a matter of taste. Some R functions return the former, and other output the latter type.

Exercise 10.16

The two main functions for date formatting and parsing, strftime and strptime, use special field formatters (similar to sprintf). Read about them in the R manual. What type of inputs do they accept? What outputs do they produce?

There are several methods overloaded for objects of the said classes. In fact, the first call in this section already involved the use of c.Date.

Exercise 10.17

Play around with the overloaded versions of seq, rep, and as.character.

A specific number of days or seconds can be added to or subtracted from a date or time, respectively. However, `-` (see also diff) can also be applied on two date-time objects, which yields an object of the class difftime.

Sys.Date() - (Sys.Date() - 1)
## Time difference of 1 days
Sys.time() - (Sys.time() - 1)
## Time difference of 1 secs

Exercise 10.18

Check out how objects of the class difftime are internally represented.

Applying other arithmetic operations on date-time objects raises an error. Because date-time objects are just numbers, they can be compared to each other using binary operators[20]. Also, methods such as sort and order[21] could be applied on them.

Exercise 10.19

Check out the stringx package, which replaces the base R date-time processing functions with their more portable counterparts.

Exercise 10.20

proc.time can be used to measure the time to execute a given expression:

t0 <- proc.time()  # timer start
# ... to do - something time-consuming ...
sum(runif(1e7))  # whatever, just testing
## [1] 4999488
print(proc.time() - t0)  # elapsed time
##    user  system elapsed
##   0.108   0.010   0.118

The function returns an object of the class proc_time. Inspect how it is represented internally.

10.3.2. Factors¶

The factor class is often used to represent qualitative data, e.g., species, groups, types. In fact, categorical (our example class) was inspired by the built-in factor.

(x <- c("spam", "spam", "bacon", "sausage", "spam", "bacon"))
## [1] "spam"    "spam"    "bacon"   "sausage" "spam"    "bacon"
(f <- factor(x))
## [1] spam    spam    bacon   sausage spam    bacon
## Levels: bacon sausage spam

Note how factors are printed. There are no double quote characters around the labels. The list of levels is given at the end.

Internally, such objects are represented as integer vectors (Section 6.4.1) with elements between \(1\) and \(k\). They are equipped with the special (as in Section 4.4.3) levels attribute, which is a character vector of length \(k\)[22].

class(f)
## [1] "factor"
typeof(f)
## [1] "integer"
unclass(f)
## [1] 3 3 1 2 3 1
## attr(,"levels")
## [1] "bacon"   "sausage" "spam"
attr(f, "levels")  # also: levels(f)
## [1] "bacon"   "sausage" "spam"

Factors are often used instead of character vectors defined over a small number of unique labels[23], where there is a need to manipulate their levels conveniently.

attr(f, "levels") <- c("a", "b", "c")  # also levels(f) <- c(....new...)
print(f)
## [1] c c a b c a
## Levels: a b c

The underlying integer codes remain the same.

Certain operations on vectors of small integers are relatively easy to express, especially those concerning element grouping: splitting, counting, and plotting (e.g., Figure 13.17). It is because the integer codes can naturally be used whilst indexing other vectors. Section 5.4 mentioned a few functions related to this, such as match, split, findInterval, and tabulate. Specifically, the latter can be implemented like “for each i, increase count[factor_codes[i]] by one”.

Exercise 10.21

Study the source code of the factor function. Note the use of as.character, unique, order, and match.

Exercise 10.22

Implement a simplified version of table based on tabulate. It should work for objects of the class factor and return a named numeric vector.

Exercise 10.23

Implement a version of cut based on findInterval.

Important

The as.numeric method has not been overloaded for factors. Therefore, when we call the generic, the default method is used: it returns the underlying integer codes as-is. This can surprise unaware users when they play with factors representing integer numbers:

(g <- factor(c(11, 15, 16, 11, 13, 4, 15)))  # converts numbers to strings
## [1] 11 15 16 11 13 4  15
## Levels: 4 11 13 15 16
as.numeric(g)  # the underlying codes
## [1] 2 4 5 2 3 1 4
as.numeric(as.character(g))  # to get the numbers encoded
## [1] 11 15 16 11 13  4 15

Alas, support for factors is often hardcoded at the C language level. From the end user perspective, it makes this class behave less predictably. In particular, the manual overloading of certain methods for factor objects might have no effect.

Important

If f is a factor, then x[f] does not behave like x[as.character(f)], i.e., it is not indexing by labels using the names attribute. Instead, we get x[as.numeric(f)]; the underlying codes determine the positions.

h <- factor(c("a", "b", "a", "c", "a", "c"))
levels(h)[h]  # the same as c("a", "b", "c")[c(1, 2, 1, 3, 1, 3)]
## [1] "a" "b" "a" "c" "a" "c"
c(b="x", c="y", a="z")[h]  # names are not used whilst indexing
##   b   c   b   a   b   a
## "x" "y" "x" "z" "x" "z"
c(b="x", c="y", a="z")[as.character(h)]  # names are used now
##   a   b   a   c   a   c
## "z" "x" "z" "y" "z" "y"

More often than not, indexing by factors will happen “accidentally”[24], leaving us slightly puzzled. In particular, factors look much like character vectors when they are carried in data frames:

(df <- data.frame(A=c("x", "y", "z"), B=factor(c("x", "y", "z"))))
##   A B
## 1 x x
## 2 y y
## 3 z z
class(df[["A"]])
## [1] "character"
class(df[["B"]])
## [1] "factor"

Important

Be careful when combining factors and not-factors:

x <- factor(c("A", "B", "A"))
c(x, "C")
## [1] "1" "2" "1" "C"
c(x, factor("C"))
## [1] A B A C
## Levels: A B C

Exercise 10.24

When subsetting a factor object, the result will inherit the levels attribute in its entirety:

f[c(1, 2)]  # drop=FALSE
## [1] c c
## Levels: a b c

However:

f[c(1, 2), drop=TRUE]
## [1] c c
## Levels: c

Implement your version of the droplevels function, which removes the unused attributes.

Exercise 10.25

The replacement version of the index operator does not automatically add new levels to the modified object:

x <- factor(c("A", "B", "A"))
`[<-`(x, 4, value="C")  # like in x[4] <- "C"
## Warning in `[<-.factor`(x, 4, value = "C"): invalid factor level, NA
##     generated
## [1] A    B    A    <NA>
## Levels: A B

Implement a version of `[<-.factor` that has such a capability.

10.3.3. Ordered factors¶

When creating factors, we can enforce a particular ordering and the number of levels:

x <- c("spam", "spam", "bacon", "sausage", "spam", "bacon")
factor(x, levels=c("eggs", "bacon", "sausage", "spam"))
## [1] spam    spam    bacon   sausage spam    bacon
## Levels: eggs bacon sausage spam

If we want the arrangement of the levels to define a linear ordering relation over the set of labels, we can call:

(f <- factor(x, levels=c("eggs", "bacon", "sausage", "spam"), ordered=TRUE))
## [1] spam    spam    bacon   sausage spam    bacon
## Levels: eggs < bacon < sausage < spam
class(f)
## [1] "ordered" "factor"

It yields an ordered factor, which enables comparisons like:

f[f >= "bacon"]  # what's not worse than bacon?
## [1] spam    spam    bacon   sausage spam    bacon
## Levels: eggs < bacon < sausage < spam

How is that possible? Well, based on information provided in this chapter, it will come as no surprise that it is because… someone has created a relational operator for objects of the class ordered.

10.3.4. Formulae (*)¶

Formulae are created using the `~` operator. Some R users employ them to specify widely-conceived statistical models in functions such as lm (e.g., linear regression), glm (generalised linear models like logistic regression etc.), aov (analysis of variance), wilcox.test (the two-sample Mann–Whitney–Wilcoxon test), aggregate (computing aggregates within data groups), boxplot (box-and-whisker plots for a variable split by a combination of factors), or plot (scatter plots); see also Chapter 11 of [59]. For instance, formulae can be used to describe symbolic relationships such as:

“y as a linear combination of x1, x2, and x3”,
“y grouped/split by a combination of x1 and x2”,

where y, x1, etc., are, for example, column names in a data frame.

Formulae are interpreted by the corresponding functions, and not the R language itself. Thus, programmers are free to assign them any meaning. As their syntax is quite esoteric, beginners might find them confusing. Hence, we will postpone discussing them until Section 17.6. Luckily, the use of formulae can usually easily be avoided[25].

10.4. (Over)using the forward pipe operator, `|>` (*)¶

The OOP paradigm is utile when we wish to define a new data type, perhaps even a hierarchy of types. Many development teams find it an efficient tool to organise larger pieces of software. However, in the data science and numerical computing domains, more often than not, we are the consumers of object orientation rather than class designers.

Thanks to the S3 method dispatch mechanism, our language is easily extensible. Something that mimics a new data type can easily be introduced. Most importantly, methods can be added[26] or removed during runtime, e.g., when importing external packages.

However, R is still a functional programming language, where functions are not just first-class citizens: they are privileged. In functional programming, the emphasis is on operations (verbs), not data (nouns). It leads to a very readable syntax. For example, assuming that square, x, and y are sensibly defined, the mean squared difference can be written as:

mean(square(x-y))  # read: mean of squares of differences

Example 10.26

Base R is extremely flexible. We can introduce new vocabulary as we please. In Section 12.3.7, we will study an example where we define:

group_by (a function that splits a data frame with respect to a combination of levels in given named columns and returns a list of data frames with class list_dfs),
aggregate.list_dfs (which applies an aggregation function on every column of all data frames in a given list), and
mean.list_dfs (a specialised version of the former that calls mean).

The specifics do not matter now. Let’s just consider the notation we use when the operations are chained:

# select a few rows and columns from the `iris` data frame:
iris_subset <- iris[51:150, c("Sepal.Width", "Petal.Length", "Species")]
# compute the averages of all variables grouped by Species:
mean(group_by(iris_subset, "Species"))
##      Species            x  Mean
## 1 versicolor  Sepal.Width 2.770
## 2 versicolor Petal.Length 4.260
## 3  virginica  Sepal.Width 2.974
## 4  virginica Petal.Length 5.552

The functional syntax is very reader-centric (by the way, self-explanatory variable names and rich comments are priceless). We compute the mean in groups defined by Species in a subset of the iris data frame. All verbs appear on the left side of the expression, with the final (the most important?) operation being listed first.

Nonetheless, when implementing more complex data processing pipelines, programmers think in different categories: “first, we need to do this, then we need to do that, and afterwards…”. When they write their ideas down, they have to press Home and End or arrow keys a few times to move the caret to the right places:

finally(thereafter(then(first(x))))

As people are inherently lazy, they might want to “optimise” their workflow to save a bit of energy. Thus, many popular languages rely on message-passing syntax, where operations are propagated (and written) left-to-right instead of inside-out. For instance, object.method1().method2() might mean “call method1 on the object and then call method2 on the result”. Here, the object is told what to do.

Since R 4.1.0, we have the pipe operator[27], `|>`. It is merely syntactic sugar for translating between the message-passing and function-centric notion. In a nutshell, instead of writing:

h(g(f(x), y))
mean(square(x-y))
mean(group_by(iris_subset, "Species"))

we have the following equivalent forms:

x |> f() |> g(y) |> h()
(x-y) |> square() |> mean()
iris_subset |> group_by("Species") |> mean()

Such syntax is developer-centric. It might be faster to write. It emphasises the order in which the operations are executed. However, we must stress that there is nothing that cannot be achieved through the function-centric form and perhaps a few auxiliary variables. As this book is minimalist by design, we refrain ourselves from using it. Those unconvinced should take note of the following.

First, expressions on the right side of the pipe operator must always be proper calls. Therefore, the use of round brackets is obligatory. Thus, when passing anonymous functions, we must write:

runif(10) |> (function(x) mean((x-mean(x))^2))()  # note the "()" at the end
## [1] 0.078184

Peculiarly, in R 4.1.0, a “shorthand” notation for creating functions was introduced. We can save seven keystrokes and scribble “\(...) expr” instead of “function(...) expr”.

runif(10) |> (\(x) mean((x-mean(x))^2))()  # again: "()" at the end
## [1] 0.078184

Also, the placeholder `_` can be used on the right side of the pipe operator (only once) to indicate that the left side must be matched with a specific named argument of the function to be called. Otherwise, the left side is always passed as the first argument. Therefore, the two following expressions are equivalent:

x |> median() |> `-`(e1=x, e2=_) |> abs() |> median()
median(abs(x-median(x)))

Note

When writing code interactively, we may sometimes benefit from using the rightward `->` operator. Suffice it to say that “name <- value” and “value -> name” are synonymous. This way, we can type some lengthy code, store the result in an intermediate variable, and then continue in the next line (possibly referring to that auxiliary value more than once). For instance:

runif(10) -> .
mean((.-mean(.))^2)
## [1] 0.078184

Recall that `.` is as valid a variable name as any other one. Another example:

iris[, c("Sepal.Width", "Petal.Length", "Species")] -> .
.[ .[, "Species"] %in% c("versicolor", "virginica"), ] -> .
mean(group_by(., "Species"))
##      Species            x  Mean
## 1 versicolor  Sepal.Width 2.770
## 2 versicolor Petal.Length 4.260
## 3  virginica  Sepal.Width 2.974
## 4  virginica Petal.Length 5.552

10.5. S4 classes (*)¶

The S3-style OOP is based on a brilliantly simple idea: calling a generic f(x) dispatches automatically to a method f.class_of_x(x) or f.default(x) in the case where the former does not exist. Naturally, S3 has some inherent limitations:

classes cannot be formally defined; the class attribute may be assigned arbitrarily to any object[28],
argument dispatch is performed only[29] with regard to one data type[30].

In most cases, and with an appropriate level of mindfulness, they are not a problem at all. However, it is a typical condition of programmers who come to our world from more mainstream languages (e.g., C++ or Java; yours truly included) until they appreciate the true beauty of R’s being somewhat different. Before they fully develop such an acquired taste, though, they grow restless as “R is has no real OOP because it lacks polymorphism, encapsulation, formal inheritance, and so on, and something must be done about it!”. The truth is that it had not had to, but with high probability, it would have anyway in one way or another.

And so the fourth version of the S language was introduced in 1998 (see [10]). It brought a new object-orientated system, which we are used to referring to as S4. Its R version is defined by the methods package. Below we discuss it briefly. For more details, see help("Classes_Details") and help("Methods_Details") as well as [11] and [12].

Note

(*) S4 was loosely inspired by the Common Lisp Object System (with its defclass, defmethod, etc.; see, e.g., [21]). In the current author’s opinion, the S4 system is somewhat of an afterthought. Due to appendages like this, R seems like a patchwork language. Suffice it to say that it was not the last attempt to introduce a “real” OOP in the overall functional R: the story will resume in Section 16.1.5.

The main issue with all the supplementary OOP approaches is that each of them is parallel to S3 which never lost its popularity and is still in the very core of our language. We are thus covering them only for the sake of completeness for the readers might come across such objects. In particular, we shall explain the meaning of a notation like x@slot. Moreover, in Section 11.4.7 we mention the Matrix class which is perhaps the most prominent showcase of S4.

Nonetheless, the current author advises taking with a pinch of salt statements such as “for new projects, it is recommended to use the more flexible and robust S4 scheme provided in the methods package” mentioned in help("UseMethod").

10.5.1. Defining S4 classes¶

An S4 class can be formally registered through a call to setClass. For instance:

library("methods")  # in the case where it is not auto-loaded
setClass("qualitative", slots=c(data="integer", levels="character"))

We defined a class named qualitative (similarity to our own categorical and the built-in factor S3 classes is intended). It has two slots: data and levels being integer and character vectors, respectively. This notation is already outlandish. There is no assignment suggesting that we have introduced something novel.

An object of the above class can be instantiated by calling new:

z <- new("qualitative", data=c(1L, 2L, 2L, 1L, 1L), levels=c("a", "b"))
print(z)
## An object of class "qualitative"
## Slot "data":
## [1] 1 2 2 1 1
##
## Slot "levels":
## [1] "a" "b"

That z is of this class can be verified by calling is.

is(z, "qualitative")
## [1] TRUE
class(z)  # also: attr(z, "class")
## [1] "qualitative"
## attr(,"package")
## [1] ".GlobalEnv"

Important

A few R packages import the methods package only to get access the handy is function. It does not mean they are defining new S4 classes.

Note

S4 objects are marked as being of the following basic type:

typeof(z)
## [1] "S4"

See Section 1.12 of [69] for technical details on how they are internally represented. In particular, in our case, all the slots are simply stored as object attributes:

attributes(z)
## $data
## [1] 1 2 2 1 1
##
## $levels
## [1] "a" "b"
##
## $class
## [1] "qualitative"
## attr(,"package")
## [1] ".GlobalEnv"

10.5.2. Accessing slots¶

Reading or writing slot contents can be done via the `@` operator and the slot function or their replacement versions.

z@data  # or slot(z, "data")
## [1] 1 2 2 1 1
z@levels <- c("A", "B")

Note

The `@` operator can only be used on S4 objects, and some sanity checks are automatically performed:

z@unknown <- "spam"
## Error in (function (cl, name, valueClass) : 'unknown' is not a slot in
##     class "qualitative"
z@data <- "spam"
## Error in (function (cl, name, valueClass) : assignment of an object of
##     class "character" is not valid for @'data' in an object of class
##     "qualitative"; is(value, "integer") is not TRUE

10.5.3. Defining methods¶

For the S4 counterparts of the S3 generics (Section 10.2), see help("setGeneric"). Luckily, there is a reasonable degree of interoperability between the S3 and S4 systems.

Let’s introduce a new method for the well-known as.character generic. Instead of defining as.character.qualitative, we need to register a new routine with setMethod.

setMethod(
    "as.character",     # name of the generic
    "qualitative",      # class of 1st arg; or: signature=c(x="qualitative")
    function(x, ...)    # method definition
        x@levels[x@data]
)

Testing:

as.character(z)
## [1] "A" "B" "B" "A" "A"

show is the S4 counterpart of print:

setMethod(
    "show",
    "qualitative",
    function(object)
    {
        x <- as.character(object)
        print(x)  # calls `print.default`
        cat(sprintf("Categories: %s\n",
            paste(object@levels, collapse=", ")))
    }
)

Interestingly, it is involved automatically on a call to print:

print(z)  # calls `show` for `qualitative`
## [1] "A" "B" "B" "A" "A"
## Categories: A, B

Methods that dispatch on the type of multiple arguments are also possible. For example:

setMethod(
    "split",
    c(x="ANY", f="qualitative"),
    function (x, f, drop=FALSE, ...)
        split(x, as.character(f), drop=drop, ...)
)

It permits the first argument to be of any type (like a default method). Moreover, here is its version tailored for matrices (see Chapter 11).

setMethod(
    "split",
    c(x="matrix", f="qualitative"),
    function (x, f, drop=FALSE, ...)
        lapply(
            split(seq_len(NROW(x)), f, drop=drop, ...),  # calls the above
            function(i) x[i, , drop=FALSE])
)

Some tests:

A <- matrix(1:35, nrow=5)  # whatever
split(A, z)    # matrix, qualitative
## $A
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    1    6   11   16   21   26   31
## [2,]    4    9   14   19   24   29   34
## [3,]    5   10   15   20   25   30   35
##
## $B
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    2    7   12   17   22   27   32
## [2,]    3    8   13   18   23   28   33
split(1:5, z)  # ANY, qualitative
## $A
## [1] 1 4 5
##
## $B
## [1] 2 3

Exercise 10.27

Overload the `[` operator for the qualitative class.

10.5.4. Defining constructors¶

We can also overload the initialize method, which is automatically called by new:

setMethod(
    "initialize",   # note the American spelling
    "qualitative",
    function(.Object, x)
    {  # the method itself
        if (!is.character(x))
            x <- as.character(x)  # see above
        xu <- unique(sort(x))  # drops NAs

        .Object@data <- match(x, xu)
        .Object@levels <- xu

        .Object  # return value - a modified object
    }
)

This constructor yields instances of the class qualitative based on an object coercible to a character vector. For example:

w <- new("qualitative", c("a", "c", "a", "a", "d", "c"))
print(w)
## [1] "a" "c" "a" "a" "d" "c"
## Categories: a, c, d

Exercise 10.28

Set up a validating method for our class; see help("setValidity").

10.5.5. Inheritance¶

New S4 classes can be derived from existing ones. For instance:

setClass("binary", contains="qualitative")

It is a child class that inherits all slots from its parent. We can overload its initialisation method:

setMethod(
    "initialize",
    "binary",
    function(.Object, x)
    {
        if (!is.logical(x))
            x <- as.logical(x)
        x <- as.character(as.integer(x))
        xu <- c("0", "1")
        .Object@data <- match(x, xu)
        .Object@levels <- xu
        .Object
    }
)

Testing:

new("binary", c(TRUE, FALSE, TRUE, FALSE, NA, TRUE))
## [1] "1" "0" "1" "0" NA  "1"
## Categories: 0, 1

We can still use the show method of the parent class.

10.6. Exercises¶

Exercise 10.29

Answer the following questions.

How to display the source code of the default methods for head and tail?
Can there be, at the same time, one object of the class c("A", "B") and another one of the class c("B", "A")?
If f is a factor, what are the relationships between as.character(f), as.numeric(f), as.character(as.numeric(f)), and as.numeric(as.character(f))?
If x is a named vector and f is a factor, is x[f] equivalent to x[as.character(f)] or rather x[as.numeric(f)]?

Exercise 10.30

A user calls:

plot(x, y, col="red", ylim=c(1, max(x)), log="y")

where x and y are numeric vectors. Consult help("plot") for the meaning of the ylim and log arguments. Was that straightforward?

Exercise 10.31

Explain why the two following calls return significantly different results.

c(Sys.Date(), "1970-01-01")
## [1] "2025-01-15" "1970-01-01"
c("1970-01-01", Sys.Date())
## [1] "1970-01-01" "20103"

Propose a workaround.

Exercise 10.32

Write methods head and tail for our example categorical class.

Exercise 10.33

(*) Write an R package that defines S3 class categorical. Add a few methods for this class. Note the need to use the S3method directive in the NAMESPACE file; see [66].

Exercise 10.34

Inspect the result of a call to binom.test(79, 100) and to rle(c(1, 1, 1, 4, 3, 3, 3, 3, 3, 2, 2)). Find the methods responsible for such objects’ pretty-printing.

Exercise 10.35

Read more about the connection class. In particular, see the Value section of help("connections").

Exercise 10.36

Read about the subsetting operators overloaded for the package_version class; see help("numeric_version").

Exercise 10.37

There are xtfrm methods overloaded for classes such as numeric_version, difftime, Date, and factor. Find out how they work and where they might be of service (especially in relation to order and sort; see also Section 12.3.1).

Exercise 10.38

Give an example where split(x, list(y1, y2)) (with default arguments) will fail to generate the correct result.

Exercise 10.39

Write a function that determines the mode, i.e., the most frequently occurring value in a given object of the class factor. If the mode is not unique, return a randomly chosen one (each with the same probability).

Exercise 10.40

Implement your version of the gl function.