10. S3 classes¶
This open-access textbook is, and will remain, freely available for everyone’s enjoyment (also in PDF; a paper copy can also be ordered). It is a non-profit project. Although available online, it is a whole course, and should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Any bug/typo reports/fixes are appreciated. Make sure to check out Minimalist Data Wrangling with Python [27], too.
Let x
be a randomly generated matrix with 1 000 000 rows and 1 000 columns,
y
be a data frame with results from the latest survey indicating that
things are way more complicated than what most people think,
and z
be another matrix, this time with many zeroes.
The human brain is not capable of dealing with excessive amounts of data that are immoderately specific. This is why we have a natural tendency to group different entities based on their similarities. This way, we form more abstract classes of objects.
Also, many of us are inherently lazy. Oftentimes we take shortcuts to minimise energy (at a price to be paid later).
Printing out a matrix, a data frame, and a time series
are all instances of the displaying of things,
although they undoubtedly differ in detail.
By now, we have probably forgotten which objects are hidden behind
the aforementioned x
, y
, and z
.
Being able to simply call print(y)
without having to recall that, yes, y
is a data frame,
seems pretty appealing.
This chapter introduces S3 classes [13].
They provide a lightweight object-orientated programming (OOP) approach
for automated dispatching calls to generics
of the type print(y)
to concrete methods
such as print.data.frame(y)
,
based on the class of the object they are invoked on.
We shall see that S3 classes in their essence are beautifully simple[1]. Ultimately, generics and methods are ordinary R functions (Chapter 7) and classes are merely additional object attributes (Section 4.4).
Of course, this does not mean that wrapping our heads around them will be effortless. However, unlike other “class systems”[2], S3 is ubiquitous in most R programming projects. Suffice it to say that factors, matrices, and data frames discussed in the coming chapters are straightforward, S3-based extensions of the concepts we are about to introduce.
10.1. Object type vs class¶
Recall that typeof (introduced in Section 4.1) returns the internal type of an object. So far, we were mostly focused on atomic and generic vectors; compare Figure 1 in the Preface.
typeof(NULL)
## [1] "NULL"
typeof(c(TRUE, FALSE, NA))
## [1] "logical"
typeof(c(1, 2, 3, NA_real_))
## [1] "double"
typeof(c("a", "b", NA_character_))
## [1] "character"
typeof(list(list(1, 2, 3), LETTERS))
## [1] "list"
typeof(function(x) x)
## [1] "closure"
The number of admissible types is small[3], but they open the world of endless possibilities[4]. They provide a basis for more complex data structures. This is thanks to the fact that they can be equipped with arbitrary attributes (Section 4.4).
Most compound types constructed using the mechanisms discussed in this chapter only pretend they are something different from what they actually are. Still, they often do their job very well. By looking under their bonnet, we will be able to manipulate their state outside of the prescribed use cases.
Important
Setting the class
attribute might make some objects
behave differently in certain scenarios.
Let’s equip two identical objects with different class
attributes.
xt <- structure(123, class="POSIXct") # POSIX calendar time
xd <- structure(123, class="Date")
Both objects are represented using numeric vectors:
c(typeof(xt), typeof(xd))
## [1] "double" "double"
However, when printed, they are decoded differently:
print(xt)
## [1] "1970-01-01 01:02:03 CET"
print(xd)
## [1] "1970-05-04"
In the former case, 123 is understood as the number of seconds
since the UNIX epoch, 1970-01-01T00:00:00+0000.
The latter is deciphered as the number of days since the said timestamp.
Therefore, we expect that there must exist a mechanism
that calls a version of print dependent on an
object’s virtual class.
That it only relies on the class
attribute,
which might be set, unset, or reset freely,
is emphasised below.
attr(xt, "class") <- "Date" # change class from POSIXct to Date
print(xt) # same 123, but now interpreted as Date
## [1] "1970-05-04"
as.numeric(xt) # drops all attributes
## [1] 123
unclass(xd) # drops the class attribute; `attr<-`(xd, "class", NULL)
## [1] 123
We are having so much fun that one more illustration can only grow our joy.
Consider an example data frame:
x <- iris[1:3, 1:2] # a subset of an example data frame
print(x)
## Sepal.Length Sepal.Width
## 1 5.1 3.5
## 2 4.9 3.0
## 3 4.7 3.2
It is an object of the class
(an object whose class
attribute is set to):
attr(x, "class")
## [1] "data.frame"
Some may say, and they are absolutely right, that
we have not covered data frames yet. After all, they are the topic of
Chapter 12, which is still ahead of us.
However, from the current perspective, we should know that
R data frames are nothing but lists of vectors
of the same lengths equipped with the names
and row.names
attributes.
typeof(x)
## [1] "list"
`attr<-`(x, "class", NULL) # or unclass(x)
## $Sepal.Length
## [1] 5.1 4.9 4.7
##
## $Sepal.Width
## [1] 3.5 3.0 3.2
##
## attr(,"row.names")
## [1] 1 2 3
print(x)
## Sepal.Length Sepal.Width
## 1 5.1 3.5
## 2 4.9 3.0
## 3 4.7 3.2
Important
Revealing how x
is actually represented enables us to process it
using the extensive skill set that we have already[5]
developed by studying the material covered in the previous part of our book
(including all the exercises). This fact is noteworthy because some
built-in and third-party data types are not particularly well-designed.
Let’s underline again that attributes are simple additions to R objects.
However, as we said in Section 4.4.3,
certain attributes are special, and class
is one of them.
In particular, we can only set class
to be a character vector
(possibly of length greater than one; see Section 10.2.5).
x <- 12345
attr(x, "class") <- 1 # character vectors only
## Error in attr(x, "class") <- 1: attempt to set invalid 'class' attribute
Furthermore, the class function can read the value of
the class
attribute. Its replacement version is also available.
class(x) <- "Date" # set; the same as attr(x, "class") <- "Date"
class(x) # get; here, it is the same as attr(x, "class")
## [1] "Date"
Important
The class function always yields a value, even if the corresponding attribute is not set. We call it an implicit class. Compare the following results:
class(NULL) # no `class` set because NULL cannot have any attributes
## [1] "NULL"
class(c(TRUE, FALSE, NA)) # no attributes so class is implicit (= typeof)
## [1] "logical"
class(c(1, 2, 3, NA_real_)) # typeof returns "double"
## [1] "numeric"
class(c("a", "b", NA_character_))
## [1] "character"
class(list(list(1, 2, 3), LETTERS))
## [1] "list"
class(function(x) x) # typeof gives "closure"
## [1] "function"
Also, Chapter 11 will explain that any object
equipped with the dim
attribute also has an implicit class:
(x <- as.matrix(c(1, 2, 3)))
## [,1]
## [1,] 1
## [2,] 2
## [3,] 3
attributes(x) # `class` is not amongst the attributes
## $dim
## [1] 3 1
class(x) # implicit class
## [1] "matrix" "array"
typeof(x) # it is still a numeric vector (under the bonnet)
## [1] "double"
10.2. Generics and method dispatching¶
10.2.1. Generics, default, and custom methods¶
Let’s inspect the source code of the print function:
print(print) # sic!
## function (x, ...)
## UseMethod("print")
## <environment: namespace:base>
Any function like the above[6] we will call from now on
a generic (an S3 generic, from S version 3 [13]).
Its only job is to invoke UseMethod("print")
.
It dispatches the control flow to another function,
referred to as a method, based on the class of the first argument.
Important
All arguments passed to the generic will also be available[7] in the method dispatched to.
For example, let’s define an object of the class categorical
(a name that we have just come up with; we could have called it
cat
, CATEGORICAL
, or SpanishInquisition
as well). It will be our
version of the factor
type that we discuss later.
x <- structure(
c(1, 3, 2, 1, 1, 1, 3),
levels=c("a", "b", "c"),
class="categorical"
)
We assume that such an object is a sequence of small positive integers
(codes). It is equipped with the levels
attribute,
which is a character vector of length not less than the maximum
of the said integers.
In particular, the first level deciphers the meaning of
the code 1
. Hence, the above vector represents a sequence
a, c, b, a, a, a, c.
There is no special method for displaying objects of the categorical
class.
Hence, when we call print, the default (fallback) method is
invoked:
print(x)
## [1] 1 3 2 1 1 1 3
## attr(,"levels")
## [1] "a" "b" "c"
## attr(,"class")
## [1] "categorical"
This is the standard function for displaying numeric vectors. We are well familiar with it. Its name is print.default, and we can always call it directly:
print.default(x) # the default print method
## [1] 1 3 2 1 1 1 3
## attr(,"levels")
## [1] "a" "b" "c"
## attr(,"class")
## [1] "categorical"
However, we can introduce a designated method for printing
categorical
objects. Its name must precisely be
print.categorical:
print.categorical <- function(x, ...)
{
x_character <- attr(x, "levels")[unclass(x)]
print(x_character) # calls `print.default`
cat(sprintf("Categories: %s\n",
paste(attr(x, "levels"), collapse=", ")))
invisible(x) # this is what all print methods do; see help("print")
}
Calling print automatically dispatches the control flow to this method:
print(x)
## [1] "a" "c" "b" "a" "a" "a" "c"
## Categories: a, b, c
Of course, the default method can still be called.
Referring to print.default(x)
directly
will output the same result as the one a few chunks above.
Note
print.categorical has been equipped with the dot-dot-dot attribute since the generic print had one too[8].
10.2.2. Creating generics¶
Introducing new S3 generics is as straightforward
as defining a function that calls UseMethod.
For instance, here is a dispatcher which creates new objects of the
categorical
class based on other objects:
as.categorical <- function(x, ...)
UseMethod("as.categorical") # synonym: UseMethod("as.categorical", x)
We always need to define the default method:
as.categorical.default <- function(x, ...)
{
if (!is.character(x))
x <- as.character(x)
xu <- unique(sort(x)) # drops NAs
structure(
match(x, xu),
class="categorical",
levels=xu
)
}
Testing:
as.categorical(c("a", "c", "a", "a", "d", "c"))
## [1] "a" "c" "a" "a" "d" "c"
## Categories: a, c, d
as.categorical(c(3, 6, 4, NA, 9, 9, 6, NA, 3))
## [1] "3" "6" "4" NA "9" "9" "6" NA "3"
## Categories: 3, 4, 6, 9
This method is already quite flexible. It handles a wide variety of data types because it relies on the built-in generic as.character (Section 10.2.3).
We might want to forbid the conversion from lists because it does not necessarily make sense:
as.categorical.list <- function(x, ...)
stop("conversion of lists to categorical is not supported")
The users can always be instructed in the method’s documentation that they are responsible for converting lists to another type prior to a call to as.categorical.
The default method deals with logical vectors perfectly fine:
as.categorical(c(TRUE, FALSE, NA, NA, FALSE)) # as.categorical.default
## [1] "TRUE" "FALSE" NA NA "FALSE"
## Categories: FALSE, TRUE
However, we might still want to introduce its specialised version.
This is because we know a slightly more efficient algorithm
(and we have nothing better to do)
based on the fact that FALSE
and TRUE
converted to
numeric yield 0 and 1, respectively:
as.categorical.logical <- function(x, ...)
{
if (!is.logical(x))
x <- as.logical(x) # or maybe stopifnot(is.logical(x))?
structure(
x + 1, # only 1, 2, and NAs will be generated
class="categorical",
levels=c("FALSE", "TRUE")
)
}
It spawns the same result as the default method but is slightly faster.
as.categorical(c(TRUE, FALSE, NA, NA, FALSE)) # as.categorical.logical
## [1] "TRUE" "FALSE" NA NA "FALSE"
## Categories: FALSE, TRUE
We performed some argument consolidation at the beginning because a user
is always able to call a method directly on an R object of any kind
(which is a good thing; see Section 10.2.4).
In other words, there is no guarantee that the argument x
must be of
type logical.
10.2.3. Built-in generics¶
Many[9] functions and operators we have introduced so far are, in fact, S3 generics: print, head, `[`, `[[`, `[<-`, `[[<-`, length, `+`, `<=`, is.numeric, as.numeric, is.character, as.character, as.list, round, log, sum, rep, c, and na.omit, to name a few.
Let’s overload the as.character method. The default one does not make much sense for the objects of our custom type:
as.character(x)
## [1] "1" "3" "2" "1" "1" "1" "3"
So:
as.character.categorical <- function(x, ...)
attr(x, "levels")[unclass(x)]
And now:
as.character(x)
## [1] "a" "c" "b" "a" "a" "a" "c"
Overload the unique and rep
methods for objects of the class categorical
.
New types ought to be designed carefully. For instance, if we forget to overload the to-numeric converter, some users might be puzzled[10] when they see:
(x <- as.categorical(c(4, 9, 100, 9, 9, 100, 42, 666, 4)))
## [1] "4" "9" "100" "9" "9" "100" "42" "666" "4"
## Categories: 100, 4, 42, 666, 9
as.double(x) # synonym: as.numeric(x); here, it calls as.double.default(x)
## [1] 2 5 1 5 5 1 3 4 2
Hence, we might want to introduce a new method:
as.double.categorical <- function(x, ...) # not: as.numeric.categorical
{
# actually: as.double.default(as.character.categorical(x))
as.double(as.character(x))
}
It now yields:
as.double(x) # or as.numeric(x); calls as.double.categorical(x)
## [1] 4 9 100 9 9 100 42 666 4
Note
We can still use unclass to fetch the codes:
unclass(x)
## [1] 2 5 1 5 5 1 3 4 2
## attr(,"levels")
## [1] "100" "4" "42" "666" "9"
It is because the foregoing returns a class-free object, which is now guaranteed to be processed by the default methods (print, subsetting, as.character, etc.).
What would happen if we used as.numeric instead of unclass in print.categorical and as.character.categorical?
Update the preceding methods so that
we can also create named objects of the class categorical
(i.e., equipped with the names
attribute).
The levels of x
are sorted lexicographically, not numerically.
Introduce a single method that would let the above code
(when rerun without any alterations) generate a more natural result.
10.2.4. First-argument dispatch and calling S3 methods directly¶
With S3, dispatching is most often done based on the class of only one[11] argument: by default, the first one from the parameter list.
For example, the c function is a generic that
dispatches on the first argument’s class.
Let’s overload it for categorical
objects.
In other words, we will create a function to be called by the generic
when it is invoked on a series of objects whose
first element is of the said class.
c.categorical <- function(...)
as.categorical(
unlist(
lapply(list(...), as.character)
)
)
It converts each argument to a character vector, relying on the generic as.character to take care of the details. It works because unlist converts a list of such atomic vectors to a single sequence of strings.
Calling c with the first argument of the class categorical
dispatches to the above method:
x <- c(9, 5, 7, 7, 2)
xc <- as.categorical(x)
c(xc, x) # c.categorical
## [1] "9" "5" "7" "7" "2" "9" "5" "7" "7" "2"
## Categories: 2, 5, 7, 9
However, if the first argument is, say, unclassed, the default method will be consulted:
c(x, xc) # default c
## [1] 9 5 7 7 2 4 2 3 3 1
It ignored the class
attribute and saw xc
as it is,
a bareboned numeric vector:
`attributes<-`(xc, NULL) # the underlying codes
## [1] 4 2 3 3 1
It is not a bug. It is a well-documented (and now explained) behaviour. After all, compound types (classed objects) are emulated through the basic ones.
Important
In most cases, S3 methods can be called directly to get the desired outcome:
c.categorical(x, xc) # force a call to the specific method
## [1] "9" "5" "7" "7" "2" "9" "5" "7" "7" "2"
## Categories: 2, 5, 7, 9
We said in most cases because methods can be:
hardcoded at the C language level (e.g., there is no c.default defined at all[12]),
hidden (defined in a package’s namespace but not exported; Section 16.3.6),
overloaded as a group; see Section 10.2.6 and help
("groupGeneric")
.
Purely for jollity, let’s find a partition of the iris
dataset
into three clusters using the \(k\)-means algorithm:
res <- kmeans(iris[-5], centers=3, nstart=10)
print(res)
## K-means clustering with 3 clusters of sizes 50, 62, 38
##
## Cluster means:
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 5.0060 3.4280 1.4620 0.2460
## 2 5.9016 2.7484 4.3935 1.4339
## 3 6.8500 3.0737 5.7421 2.0711
##
## Clustering vector:
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [71] 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [ reached getOption("max.print") -- omitted 51 entries ]
##
## Within cluster sum of squares by cluster:
## [1] 15.151 39.821 23.879
## (between_SS / total_SS = 88.4 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault"
It is an object of the class:
class(res)
## [1] "kmeans"
which, in fact, is a:
typeof(res)
## [1] "list"
The underlying list looks like:
unclass(res)
## $cluster
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [71] 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [ reached getOption("max.print") -- omitted 51 entries ]
##
## $centers
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 5.0060 3.4280 1.4620 0.2460
## 2 5.9016 2.7484 4.3935 1.4339
## 3 6.8500 3.0737 5.7421 2.0711
##
## $totss
## [1] 681.37
##
## $withinss
## [1] 15.151 39.821 23.879
##
## $tot.withinss
## [1] 78.851
##
## $betweenss
## [1] 602.52
##
## $size
## [1] 50 62 38
##
## $iter
## [1] 2
##
## $ifault
## [1] 0
We already know that res
was displayed in a fancy way
only because there is a print method overloaded
for objects of the kmeans
class.
But is there?
print.kmeans
## Error in eval(expr, envir, enclos): object 'print.kmeans' not found
Even though the method is hidden (internal) in the stats package’s
namespace, from Section 16.3.6 we will learn that
it can be accessed by calling
getS3method("print", "kmeans")
or referring to stats:::
print.kmeans
(note the triple colon).
10.2.5. Multi-class-ness¶
The class
attribute can be instantiated as a character vector
of any length. For example:
(t1 <- Sys.time())
## [1] "2024-08-23 16:51:48 CEST"
(t2 <- strptime("2021-08-15T12:59:59+1000", "%Y-%m-%dT%H:%M:%S%z"))
## [1] "2021-08-15 04:59:59"
Let’s inspect the classes of these two objects:
class(t1)
## [1] "POSIXct" "POSIXt"
class(t2)
## [1] "POSIXlt" "POSIXt"
Section 10.3.1 will discuss date-time classes in more detail. It will highlight that the former is represented as a numeric vector, while the latter is a list. Thus, these two should primarily be seen as instances of two distinct types. However, as both of them have a lot in common, it was a wise design choice to allow them to be seen also as the representatives of the same generic category of POSIX time objects.
Important
When calling a generic function[13] f on an object x
of the classes[14]
class1
, class2
, …, classK
(in this order),
UseMethod(f, x)
dispatches to the method
determined as follows:
if f.class1 is available[15], call it;
otherwise, if f.class2 is available, call this one;
…;
otherwise, if f.classK is available, invoke it;
otherwise, refer to the fallback f.default.
There is a method diff for objects of the class
POSIXt
that carries a statement:
r <- if (inherits(x, "POSIXlt")) as.POSIXct(x) else x
This way, we can process both POSIXct
and POSIXlt
instances
using the same procedure.
We should see no magic in this simple scheme. It is nothing more than a way to determine the method to be called for a particular R object. It can be used as a mechanism to mimic the idea of inheritance in object-orientated programming languages. However, the S3 system does not allow for defining classes in any formal manner.
For example, we cannot say that objects of the class POSIXct
inherit
from POSIXt
. Neither can we say that each object of the class POSIXct
is also an instance of POSIXt
.
The class attribute can still be set arbitrarily on
a per-object basis. We can create ones whose class is
simply POSIXct
(without the POSIXt
part)
or even c("POSIXt", "POSIXct")
(in this order).
Note
In any method, it is possible to call the method corresponding to the next class by calling NextMethod.
For instance, if we are in f.class1, a call to
NextMethod(f)
will try invoking
f.class2. If such a method does not exist,
further methods in the search chain will be probed,
falling back to the default method if necessary.
We will give an illustration later.
10.2.6. Operator overloading¶
Operators are ordinary functions (Section 9.3.5). Even though what follows can partially be implied by what we have already said, as usual in R, some oddities are to be expected.
For example, let’s overload the index operator for
objects of the class categorical
.
Looking at help("[")
, we see that the default method
has two arguments: x
(the categorical
object being sliced)
and i
(the indexer).
Ours will have the same interface then:
`[.categorical` <- function(x, i)
{
structure(
unclass(x)[i], # `[`(unclass(x), i)
class="categorical",
levels=attr(x, "levels") # the same levels as input
)
}
The default S3 method, `[.default`, is hardcoded at the C language level and we cannot refer to it directly. This is why we called unclass instead. Alternatively, we can also invoke NextMethod:
`[.categorical` <- function(x, i)
{
structure(
NextMethod("["), # call default method, passing `x` and `i`
class="categorical",
levels=attr(x, "levels") # the same levels as input
)
}
We can also introduce the replacement version of this operator:
`[<-.categorical` <- function(x, i, value)
{
levels <- attr(x, "levels")
value <- match(value, levels) # integer codes corresponding to levels
structure(
NextMethod("[<-"), # call default method, passing `x`, `i`, `values`
class="categorical",
levels=levels # same levels as input
)
# # or, equivalently:
# structure(
# `[<-`(unclass(x), i, value=match(value, attr(x, "levels"))),
# class="categorical",
# levels=attr(x, "levels")
# )
}
Testing:
x <- as.categorical(c(3, 6, 4, NA, 9, 9, 6, NA, 3))
x[1:4]
## [1] "3" "6" "4" NA
## Categories: 3, 4, 6, 9
x[1:4] <- c("6", "7")
print(x)
## [1] "6" NA "6" NA "9" "9" "6" NA "3"
## Categories: 3, 4, 6, 9
Notice how we handled the case of nonexistent levels and that the recycling rule has been automagically inherited (amongst other features) from the default index operator.
Do these two operators preserve the names
attribute of x
?
Is indexing with negative integers or logical vectors supported as well?
Why is that/is that not the case?
Furthermore, let’s overload the `==` operator.
Assume[16] that we would like two categorical
objects
to be compared based on the actual labels they encode, in an elementwise
manner:
`==.categorical` <- function(e1, e2)
as.character(e1) == as.character(e2)
We are feeling lucky: by not performing any type checking,
we rely on the particular as.character methods
corresponding to the types of e1
and e2
.
Also, assuming that as.character always[17]
returns a character
object,
we dispatch to the default method for `==`
(which handles atomic vectors).
Some examples:
as.categorical(c(1, 3, 5, 1)) == as.categorical(c(1, 3, 1, 1))
## [1] TRUE TRUE FALSE TRUE
as.categorical(c(1, 3, 5, 1)) == c(1, 3, 1, 1)
## [1] TRUE TRUE FALSE TRUE
c(1, 3, 5, 1) == as.categorical(c(1, 3, 1, 1))
## [1] TRUE TRUE FALSE TRUE
Important
In the case of binary operators, dispatching is done based on the classes of both arguments. In all three preceding calls, we call `==.categorical`, regardless of whether the classed object is the first or the second operand.
If two operands are classed, and different methods are overloaded for both, a warning will be generated, and the default internal method will be called.
`==.A` <- function(e1, e2) "A"
`==.B` <- function(e1, e2) "B"
structure(c(1, 2, 3), class="A") == structure(c(2, NA, 3), class="B")
## Warning: Incompatible methods ("==.A", "==.B") for "=="
## [1] FALSE NA TRUE
Note
(*) By creating a single Ops method, we can define the meaning of all binary operators at once.
Ops.categorical <- function(e1, e2)
{
if (!(.Generic %in% c("<", ">", "<=", ">=", "==", "!=")))
stop(sprintf("%s not defined for 'categorical' objects", .Generic))
e1 <- as.character(e1)
e2 <- as.character(e2)
NextMethod(.Generic) # dispatch to the default method (for character)
}
as.categorical(c(1, 3, 5, 1)) > c(1, 2, 4, 2)
## [1] FALSE TRUE TRUE FALSE
Here, `.Generic
` is a variable representing the name
of the operator (generic) being invoked; see Section 16.3.6.
Other group generics are: Summary
(including functions such as
min, sum, and all),
Math (abs, log, round, etc.),
matrixOps (`%*%`),
and Complex (e.g., Re, Im);
see help("groupGeneric")
for more details.
Sometimes we must rely on the `.S3method` function to let R recognise a custom method related to such generics.
10.3. Common built-in S3 classes¶
Below we discuss a few noteworthy classes, including those representing date-time information and factors (ordered or not). Note that classes representing tabular data will be dealt with in separate parts, owing to their importance and ubiquity. Namely, matrices and other arrays are covered in Chapter 11, and data frames are discussed in Chapter 12. Inspecting other[18] interesting compound types is left as a simple exercise for the studious reader.
10.3.1. Date, time, etc.¶
The Date
class represents… dates (calendar ones, not fruits).
(x <- c(Sys.Date(), as.Date(c("1969-12-31", "1970-01-01", "2023-02-29"))))
## [1] "2024-08-23" "1969-12-31" "1970-01-01" NA
class(x)
## [1] "Date"
Complex types are built on basic ones. Underneath, what we deal with here is:
typeof(x)
## [1] "double"
unclass(x)
## [1] 19958 -1 0 NA
which is the number of days since the UNIX epoch, 1970-01-01T00:00:00+0000 (midnight GMT/UTC).
The POSIXct
(calendar time) class represents date-time objects:
(x <- Sys.time())
## [1] "2024-08-23 16:51:48 CEST"
class(x)
## [1] "POSIXct" "POSIXt"
typeof(x)
## [1] "double"
unclass(x)
## [1] 1724424708
Underneath, it is the number of seconds since the UNIX epoch.
By default, whilst printing, the current default timezone
is used (see Sys.timezone).
However, such objects can be equipped with the tzone
attribute.
structure(1, class=c("POSIXct", "POSIXt")) # using current default timezone
## [1] "1970-01-01 01:00:01 CET"
structure(1, class=c("POSIXct", "POSIXt"), tzone="UTC")
## [1] "1970-01-01 00:00:01 UTC"
In both cases, the time is 1 second after the beginning of the UNIX epoch. On the author’s PC, the former is displayed in the current local timezone, though.
Use ISOdatetime to inspect how midnights are displayed in different timezones.
The POSIXlt
(local time) class is represented using a list
of atomic vectors[19].
(x <- as.POSIXlt(c(a="1970-01-01 00:00:00", b="2030-12-31 23:59:59")))
## a b
## "1970-01-01 00:00:00 CET" "2030-12-31 23:59:59 CET"
class(x)
## [1] "POSIXlt" "POSIXt"
typeof(x)
## [1] "list"
str(unclass(x)) # calling str instead of print to make display more compact
## List of 11
## $ sec : num [1:2] 0 59
## $ min : int [1:2] 0 59
## $ hour : int [1:2] 0 23
## $ mday : int [1:2] 1 31
## $ mon : int [1:2] 0 11
## $ year : Named int [1:2] 70 130
## ..- attr(*, "names")= chr [1:2] "a" "b"
## $ wday : int [1:2] 4 2
## $ yday : int [1:2] 0 364
## $ isdst : int [1:2] 0 0
## $ zone : chr [1:2] "CET" "CET"
## $ gmtoff: int [1:2] NA NA
## - attr(*, "tzone")= chr [1:3] "" "CET" "CEST"
## - attr(*, "balanced")= logi TRUE
Read about the meaning of each named element,
especially mon
and year
; see help("DateTimeClasses")
.
The manual states that POSIXlt
is supposedly
closer to human-readable forms than POSIXct
,
but it is a matter of taste.
Some R functions return the former, and other output the latter type.
The two main functions for date formatting and parsing, strftime and strptime, use special field formatters (similar to sprintf). Read about them in the R manual. What type of inputs do they accept? What outputs do they produce?
There are several methods overloaded for objects of the said classes. In fact, the first call in this section already involved the use of c.Date.
Play around with the overloaded versions of seq, rep, and as.character.
A specific number of days or seconds can be added to or
subtracted from a date or time, respectively.
However, `-` (see also diff)
can also be applied on two date-time objects,
which yields an object of the class difftime
.
Sys.Date() - (Sys.Date() - 1)
## Time difference of 1 days
Sys.time() - (Sys.time() - 1)
## Time difference of 1 secs
Check out how objects of the class difftime
are internally represented.
Applying other arithmetic operations on date-time objects raises an error. Because date-time objects are just numbers, they can be compared to each other using binary operators[20]. Also, methods such as sort and order[21] could be applied on them.
Check out the stringx package, which replaces the base R date-time processing functions with their more portable counterparts.
proc.time can be used to measure the time to execute a given expression:
t0 <- proc.time() # timer start
# ... to do - something time-consuming ...
sum(runif(1e7)) # whatever, just testing
## [1] 4999488
print(proc.time() - t0) # elapsed time
## user system elapsed
## 0.120 0.006 0.126
The function returns an object of the class proc_time
.
Inspect how it is represented internally.
10.3.2. Factors¶
The factor
class is often used to represent qualitative
data, e.g., species, groups, types. In fact, categorical
(our example
class) was inspired by the built-in factor
.
(x <- c("spam", "spam", "bacon", "sausage", "spam", "bacon"))
## [1] "spam" "spam" "bacon" "sausage" "spam" "bacon"
(f <- factor(x))
## [1] spam spam bacon sausage spam bacon
## Levels: bacon sausage spam
Note how factors are printed. There are no double quote characters around the labels. The list of levels is given at the end.
Internally, such objects are represented as integer vectors
(Section 6.4.1) with elements between \(1\) and \(k\).
They are equipped with the special
(as in Section 4.4.3) levels
attribute, which is a character vector of length \(k\)[22].
class(f)
## [1] "factor"
typeof(f)
## [1] "integer"
unclass(f)
## [1] 3 3 1 2 3 1
## attr(,"levels")
## [1] "bacon" "sausage" "spam"
attr(f, "levels") # also: levels(f)
## [1] "bacon" "sausage" "spam"
Factors are often used instead of character vectors defined over a small number of unique labels[23], where there is a need to manipulate their levels conveniently.
attr(f, "levels") <- c("a", "b", "c") # also levels(f) <- c(....new...)
print(f)
## [1] c c a b c a
## Levels: a b c
The underlying integer codes remain the same.
Certain operations on vectors of small integers are relatively easy to
express, especially those concerning element grouping:
splitting, counting, and plotting (e.g., Figure 13.17).
It is because the integer codes can naturally be used whilst indexing
other vectors. Section 5.4 mentioned a few functions
related to this, such as
match,
split,
findInterval, and
tabulate.
Specifically, the latter can be implemented like
“for each i
, increase count[factor_codes[i]]
by one”.
Study the source code of the factor function. Note the use of as.character, unique, order, and match.
Implement a simplified version of table
based on tabulate. It should work
for objects of the class factor
and return a named numeric vector.
Implement a version of cut based on findInterval.
Important
The as.numeric method has not been overloaded for factors. Therefore, when we call the generic, the default method is used: it returns the underlying integer codes as-is. This can surprise unaware users when they play with factors representing integer numbers:
(g <- factor(c(11, 15, 16, 11, 13, 4, 15))) # converts numbers to strings
## [1] 11 15 16 11 13 4 15
## Levels: 4 11 13 15 16
as.numeric(g) # the underlying codes
## [1] 2 4 5 2 3 1 4
as.numeric(as.character(g)) # to get the numbers encoded
## [1] 11 15 16 11 13 4 15
Alas, support for factors is often hardcoded at the C language level. From the end user perspective, it makes this class behave less predictably. In particular, the manual overloading of certain methods for factor objects might have no effect.
Important
If f
is a factor, then x[f]
does not behave like
x[
as.character(f)]
, i.e.,
it is not indexing by labels using the names
attribute.
Instead, we get x[
as.numeric(f)]
;
the underlying codes determine the positions.
h <- factor(c("a", "b", "a", "c", "a", "c"))
levels(h)[h] # the same as c("a", "b", "c")[c(1, 2, 1, 3, 1, 3)]
## [1] "a" "b" "a" "c" "a" "c"
c(b="x", c="y", a="z")[h] # names are not used whilst indexing
## b c b a b a
## "x" "y" "x" "z" "x" "z"
c(b="x", c="y", a="z")[as.character(h)] # names are used now
## a b a c a c
## "z" "x" "z" "y" "z" "y"
More often than not, indexing by factors will happen “accidentally”[24], leaving us slightly puzzled. In particular, factors look much like character vectors when they are carried in data frames:
(df <- data.frame(A=c("x", "y", "z"), B=factor(c("x", "y", "z"))))
## A B
## 1 x x
## 2 y y
## 3 z z
class(df[["A"]])
## [1] "character"
class(df[["B"]])
## [1] "factor"
Important
Be careful when combining factors and not-factors:
x <- factor(c("A", "B", "A"))
c(x, "C")
## [1] "1" "2" "1" "C"
c(x, factor("C"))
## [1] A B A C
## Levels: A B C
When subsetting a factor object, the result will inherit the
levels
attribute in its entirety:
f[c(1, 2)] # drop=FALSE
## [1] c c
## Levels: a b c
However:
f[c(1, 2), drop=TRUE]
## [1] c c
## Levels: c
Implement your version of the droplevels function, which removes the unused attributes.
The replacement version of the index operator does not automatically add new levels to the modified object:
x <- factor(c("A", "B", "A"))
`[<-`(x, 4, value="C") # like in x[4] <- "C"
## Warning in `[<-.factor`(x, 4, value = "C"): invalid factor level, NA
## generated
## [1] A B A <NA>
## Levels: A B
Implement a version of `[<-.factor` that has such a capability.
10.3.3. Ordered factors¶
When creating factors, we can enforce a particular ordering and the number of levels:
x <- c("spam", "spam", "bacon", "sausage", "spam", "bacon")
factor(x, levels=c("eggs", "bacon", "sausage", "spam"))
## [1] spam spam bacon sausage spam bacon
## Levels: eggs bacon sausage spam
If we want the arrangement of the levels to define a linear ordering relation over the set of labels, we can call:
(f <- factor(x, levels=c("eggs", "bacon", "sausage", "spam"), ordered=TRUE))
## [1] spam spam bacon sausage spam bacon
## Levels: eggs < bacon < sausage < spam
class(f)
## [1] "ordered" "factor"
It yields an ordered factor, which enables comparisons like:
f[f >= "bacon"] # what's not worse than bacon?
## [1] spam spam bacon sausage spam bacon
## Levels: eggs < bacon < sausage < spam
How is that possible? Well, based on information provided in this chapter,
it will come as no surprise that it is because… someone has
created a relational operator for objects of the class ordered
.
10.3.4. Formulae (*)¶
Formulae are created using the `~` operator. Some R users employ them to specify widely-conceived statistical models in functions such as lm (e.g., linear regression), glm (generalised linear models like logistic regression etc.), aov (analysis of variance), wilcox.test (the two-sample Mann–Whitney–Wilcoxon test), aggregate (computing aggregates within data groups), boxplot (box-and-whisker plots for a variable split by a combination of factors), or plot (scatter plots); see also Chapter 11 of [58]. For instance, formulae can be used to describe symbolic relationships such as:
“
y
as a linear combination ofx1
,x2
, andx3
”,“
y
grouped/split by a combination ofx1
andx2
”,
where y
, x1
, etc., are, for example, column names in a data frame.
Formulae are interpreted by the corresponding functions, and not the R language itself. Thus, programmers are free to assign them any meaning. As their syntax is quite esoteric, beginners might find them confusing. Hence, we will postpone discussing them until Section 17.6. Luckily, the use of formulae can usually easily be avoided[25].
10.4. (Over)using the forward pipe operator, `|>` (*)¶
The OOP paradigm is utile when we wish to define a new data type, perhaps even a hierarchy of types. Many development teams find it an efficient tool to organise larger pieces of software. However, in the data science and numerical computing domains, more often than not, we are the consumers of object orientation rather than class designers.
Thanks to the S3 method dispatch mechanism, our language is easily extensible. Something that mimics a new data type can easily be introduced. Most importantly, methods can be added[26] or removed during runtime, e.g., when importing external packages.
However, R is still a functional programming language,
where functions are not just first-class citizens: they are privileged.
In functional programming, the emphasis is on operations (verbs),
not data (nouns). It leads to a very readable syntax. For example,
assuming that square, x
, and y
are sensibly defined,
the mean squared difference can be written as:
mean(square(x-y)) # read: mean of squares of differences
Base R is extremely flexible. We can introduce new vocabulary as we please. In Section 12.3.7, we will study an example where we define:
group_by (a function that splits a data frame with respect to a combination of levels in given named columns and returns a list of data frames with class
list_dfs
),aggregate.list_dfs (which applies an aggregation function on every column of all data frames in a given list), and
mean.list_dfs (a specialised version of the former that calls mean).
The specifics do not matter now. Let’s just consider the notation we use when the operations are chained:
# select a few rows and columns from the `iris` data frame:
iris_subset <- iris[51:150, c("Sepal.Width", "Petal.Length", "Species")]
# compute the averages of all variables grouped by Species:
mean(group_by(iris_subset, "Species"))
## Species x Mean
## 1 versicolor Sepal.Width 2.770
## 2 versicolor Petal.Length 4.260
## 3 virginica Sepal.Width 2.974
## 4 virginica Petal.Length 5.552
The functional syntax is very reader-centric (by the way, self-explanatory
variable names and rich comments are priceless). We compute the mean
in groups defined by Species
in a subset of the iris
data frame.
All verbs appear on the left side of the expression,
with the final (the most important?) operation being listed first.
Nonetheless, when implementing more complex data processing pipelines, programmers think in different categories: “first, we need to do this, then we need to do that, and afterwards…”. When they write their ideas down, they have to press Home and End or arrow keys a few times to move the caret to the right places:
finally(thereafter(then(first(x))))
As people are inherently lazy, they might want to “optimise” their
workflow to save a bit of energy.
Thus, many popular languages rely on message-passing syntax,
where operations are propagated (and written) left-to-right
instead of inside-out. For instance,
object.
method1().
method2()
might mean “call method1 on the object
and
then call method2 on the result”.
Here, the object
is told what to do.
Since R 4.1.0, we have the pipe operator[27], `|>`. It is merely syntactic sugar for translating between the message-passing and function-centric notion. In a nutshell, instead of writing:
h(g(f(x), y))
mean(square(x-y))
mean(group_by(iris_subset, "Species"))
we have the following equivalent forms:
x |> f() |> g(y) |> h()
(x-y) |> square() |> mean()
iris_subset |> group_by("Species") |> mean()
Such syntax is developer-centric. It might be faster to write. It emphasises the order in which the operations are executed. However, we must stress that there is nothing that cannot be achieved through the function-centric form and perhaps a few auxiliary variables. As this book is minimalist by design, we refrain ourselves from using it. Those unconvinced should take note of the following.
First, expressions on the right side of the pipe operator must always be proper calls. Therefore, the use of round brackets is obligatory. Thus, when passing anonymous functions, we must write:
runif(10) |> (function(x) mean((x-mean(x))^2))() # note the "()" at the end
## [1] 0.078184
Peculiarly, in R 4.1.0, a “shorthand” notation
for creating functions was introduced. We can save seven keystrokes
and scribble “\(...) expr
” instead of
“function(...) expr
”.
runif(10) |> (\(x) mean((x-mean(x))^2))() # again: "()" at the end
## [1] 0.078184
Also, the placeholder `_
` can be used on the right side of the pipe
operator (only once) to indicate that the left side must be matched
with a specific named argument of the function to be called.
Otherwise, the left side is always passed as the first
argument. Therefore, the two following expressions are equivalent:
x |> median() |> `-`(e1=x, e2=_) |> abs() |> median()
median(abs(x-median(x)))
Note
When writing code interactively, we may sometimes benefit from
using the rightward `->` operator. Suffice it to say
that “name <- value
” and “value -> name
” are synonymous.
This way, we can type some lengthy code, store the result
in an intermediate variable, and then continue in the next line
(possibly referring to that auxiliary value more than once).
For instance:
runif(10) -> .
mean((.-mean(.))^2)
## [1] 0.078184
Recall that `.
` is as valid a variable name as any other one.
Another example:
iris[, c("Sepal.Width", "Petal.Length", "Species")] -> .
.[ .[, "Species"] %in% c("versicolor", "virginica"), ] -> .
mean(group_by(., "Species"))
## Species x Mean
## 1 versicolor Sepal.Width 2.770
## 2 versicolor Petal.Length 4.260
## 3 virginica Sepal.Width 2.974
## 4 virginica Petal.Length 5.552
10.5. S4 classes (*)¶
The S3-style OOP is based on a brilliantly simple idea: calling a generic
f(x)
dispatches automatically to a method
f.class_of_x(x)
or
f.default(x)
in the case where the former does not exist.
Naturally, S3 has some inherent limitations:
classes cannot be formally defined; the
class
attribute may be assigned arbitrarily to any object[28],argument dispatch is performed only[29] with regard to one data type[30].
In most cases, and with an appropriate level of mindfulness, they are not a problem at all. However, it is a typical condition of programmers who come to our world from more mainstream languages (e.g., C++ or Java; yours truly included) until they appreciate the true beauty of R’s being somewhat different. Before they fully develop such an acquired taste, though, they grow restless as “R is has no real OOP because it lacks polymorphism, encapsulation, formal inheritance, and so on, and something must be done about it!”. The truth is that it had not had to, but with high probability, it would have anyway in one way or another.
And so the fourth version of the S language was introduced in 1998
(see [9]). It brought a new object-orientated system,
which we are used to referring to as S4.
Its R version is defined by the methods package.
Below we discuss it briefly. For more details, see
help("Classes_Details")
and help("Methods_Details")
as well as
[10] and [11].
Note
(*)
S4 was loosely inspired by the Common Lisp Object System
(with its defclass
, defmethod
, etc.; see, e.g., [20]).
In the current author’s opinion, the S4 system is somewhat of
an afterthought. Due to appendages like this, R seems like a patchwork
language. Suffice it to say that it was not the last attempt to introduce
a “real” OOP in the overall functional R:
the story will resume in Section 16.1.5.
The main issue with all the supplementary OOP approaches is that each
of them is parallel to S3 which never lost its popularity
and is still in the very core of our language.
We are thus covering them only for the sake of completeness
for the readers might come across such objects.
In particular, we shall explain the meaning of a notation like x@slot
.
Moreover, in Section 11.4.7 we mention the Matrix
class
which is perhaps the most prominent showcase of S4.
Nonetheless, the current author advises taking with a pinch of salt
statements such as “for new projects, it is recommended to use the more
flexible and robust S4 scheme provided in the methods package”
mentioned in help("UseMethod")
.
10.5.1. Defining S4 classes¶
An S4 class can be formally registered through a call to setClass. For instance:
library("methods") # in the case where it is not auto-loaded
setClass("qualitative", slots=c(data="integer", levels="character"))
We defined a class named qualitative
(similarity to
our own categorical
and the built-in factor
S3 classes is intended).
It has two slots: data
and levels
being integer and character
vectors, respectively. This notation is already outlandish.
There is no assignment suggesting that we have introduced something novel.
An object of the above class can be instantiated by calling new:
z <- new("qualitative", data=c(1L, 2L, 2L, 1L, 1L), levels=c("a", "b"))
print(z)
## An object of class "qualitative"
## Slot "data":
## [1] 1 2 2 1 1
##
## Slot "levels":
## [1] "a" "b"
That z
is of this class can be verified by calling is.
is(z, "qualitative")
## [1] TRUE
class(z) # also: attr(z, "class")
## [1] "qualitative"
## attr(,"package")
## [1] ".GlobalEnv"
Important
A few R packages import the methods package only to get access the handy is function. It does not mean they are defining new S4 classes.
Note
S4 objects are marked as being of the following basic type:
typeof(z)
## [1] "S4"
See Section 1.12 of [68] for technical details on how they are internally represented. In particular, in our case, all the slots are simply stored as object attributes:
attributes(z)
## $data
## [1] 1 2 2 1 1
##
## $levels
## [1] "a" "b"
##
## $class
## [1] "qualitative"
## attr(,"package")
## [1] ".GlobalEnv"
10.5.2. Accessing slots¶
Reading or writing slot contents can be done via the `@` operator and the slot function or their replacement versions.
z@data # or slot(z, "data")
## [1] 1 2 2 1 1
z@levels <- c("A", "B")
Note
The `@` operator can only be used on S4 objects, and some sanity checks are automatically performed:
z@unknown <- "spam"
## Error in (function (cl, name, valueClass) : 'unknown' is not a slot in
## class "qualitative"
z@data <- "spam"
## Error in (function (cl, name, valueClass) : assignment of an object of
## class "character" is not valid for @'data' in an object of class
## "qualitative"; is(value, "integer") is not TRUE
10.5.3. Defining methods¶
For the S4 counterparts of the S3 generics
(Section 10.2), see help("setGeneric")
.
Luckily, there is a reasonable degree of interoperability between
the S3 and S4 systems.
Let’s introduce a new method for the well-known as.character generic. Instead of defining as.character.qualitative, we need to register a new routine with setMethod.
setMethod(
"as.character", # name of the generic
"qualitative", # class of 1st arg; or: signature=c(x="qualitative")
function(x, ...) # method definition
x@levels[x@data]
)
Testing:
as.character(z)
## [1] "A" "B" "B" "A" "A"
show is the S4 counterpart of print:
setMethod(
"show",
"qualitative",
function(object)
{
x <- as.character(object)
print(x) # calls `print.default`
cat(sprintf("Categories: %s\n",
paste(object@levels, collapse=", ")))
}
)
Interestingly, it is involved automatically on a call to print:
print(z) # calls `show` for `qualitative`
## [1] "A" "B" "B" "A" "A"
## Categories: A, B
Methods that dispatch on the type of multiple arguments are also possible. For example:
setMethod(
"split",
c(x="ANY", f="qualitative"),
function (x, f, drop=FALSE, ...)
split(x, as.character(f), drop=drop, ...)
)
It permits the first argument to be of any type (like a default method). Moreover, here is its version tailored for matrices (see Chapter 11).
setMethod(
"split",
c(x="matrix", f="qualitative"),
function (x, f, drop=FALSE, ...)
lapply(
split(seq_len(NROW(x)), f, drop=drop, ...), # calls the above
function(i) x[i, , drop=FALSE])
)
Some tests:
A <- matrix(1:35, nrow=5) # whatever
split(A, z) # matrix, qualitative
## $A
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 6 11 16 21 26 31
## [2,] 4 9 14 19 24 29 34
## [3,] 5 10 15 20 25 30 35
##
## $B
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 2 7 12 17 22 27 32
## [2,] 3 8 13 18 23 28 33
split(1:5, z) # ANY, qualitative
## $A
## [1] 1 4 5
##
## $B
## [1] 2 3
Overload the `[` operator for the qualitative
class.
10.5.4. Defining constructors¶
We can also overload the initialize method, which is automatically called by new:
setMethod(
"initialize", # note the American spelling
"qualitative",
function(.Object, x)
{ # the method itself
if (!is.character(x))
x <- as.character(x) # see above
xu <- unique(sort(x)) # drops NAs
.Object@data <- match(x, xu)
.Object@levels <- xu
.Object # return value - a modified object
}
)
This constructor yields instances of the class qualitative
based on an object coercible to a character vector. For example:
w <- new("qualitative", c("a", "c", "a", "a", "d", "c"))
print(w)
## [1] "a" "c" "a" "a" "d" "c"
## Categories: a, c, d
Set up a validating method for our class; see help("setValidity")
.
10.5.5. Inheritance¶
New S4 classes can be derived from existing ones. For instance:
setClass("binary", contains="qualitative")
It is a child class that inherits all slots from its parent. We can overload its initialisation method:
setMethod(
"initialize",
"binary",
function(.Object, x)
{
if (!is.logical(x))
x <- as.logical(x)
x <- as.character(as.integer(x))
xu <- c("0", "1")
.Object@data <- match(x, xu)
.Object@levels <- xu
.Object
}
)
Testing:
new("binary", c(TRUE, FALSE, TRUE, FALSE, NA, TRUE))
## [1] "1" "0" "1" "0" NA "1"
## Categories: 0, 1
We can still use the show method of the parent class.
10.6. Exercises¶
Answer the following questions.
How to display the source code of the default methods for head and tail?
Can there be, at the same time, one object of the class c
("A", "B")
and another one of the class c("B", "A")
?If
f
is a factor, what are the relationships between as.character(f)
, as.numeric(f)
, as.character(
as.numeric(f))
, and as.numeric(
as.character(f))
?If
x
is a named vector andf
is a factor, isx[f]
equivalent tox[
as.character(f)]
or ratherx[
as.numeric(f)]
?
A user calls:
plot(x, y, col="red", ylim=c(1, max(x)), log="y")
where x
and y
are numeric vectors. Consult help("plot")
for the meaning of the ylim
and log
arguments.
Was that straightforward?
Explain why the two following calls return significantly different results.
c(Sys.Date(), "1970-01-01")
## [1] "2024-08-23" "1970-01-01"
c("1970-01-01", Sys.Date())
## [1] "1970-01-01" "19958"
Propose a workaround.
Write methods head and tail
for our example categorical
class.
(*) Write an R package that defines S3 class categorical
.
Add a few methods for this class.
Note the need to use the S3method
directive
in the NAMESPACE
file; see [65].
Inspect the result of a call to
binom.test(79, 100)
and to
rle(
c(1, 1, 1, 4, 3, 3, 3, 3, 3, 2, 2))
.
Find the methods responsible for such objects’ pretty-printing.
Read more about the connection
class. In particular,
see the Value section of help("connections")
.
Read about the subsetting operators overloaded for the package_version
class; see help("numeric_version")
.
There are xtfrm methods
overloaded for classes such as numeric_version
,
difftime
, Date
, and factor
. Find out how they work and
where they might be of service (especially in relation to
order and sort; see also Section 12.3.1).
Give an example where
split(x,
list(y1, y2))
(with default arguments) will fail to generate the correct result.
Write a function that determines the mode, i.e., the most frequently
occurring value in a given object of the class factor
. If the mode
is not unique, return a randomly chosen one (each with the same probability).
Implement your version of the gl function.