This work is licensed under CC BY-SA 4.0

Synopsis

If you are accustomed to “modern” object-oriented programming tools like Java or Python, chances are that you will be thrown off by the at best confusing OOP systems lurking in R.

This post strives to provide a succinct overview of the S3 object system that is used widely in base R as well as many extension packages.

TL;DR: The S3 system is designed for method dispatching of generic functions. S3 objects are 1) standard copy-on-modify R objects (NOT mutable references), and 2) non-restrictive about internal data structure. Therefore, proper S3 OOP design asks for the programmer to follow best practices on their own.

The R’ posts are my study notes of the Advanced R book. For this post, refer to the following resources for more details: Chapters ‘Object-oriented programming Introduction’, ‘Base types’, and ‘S3’ in Advanced R. All vignettes of vctrs. “Object-Oriented Programming, Functional Programming and R” by John M. Chambers. Inaccurate information here is in all likelihood my fault.

Functions and objects

A good computation design divides a large problem into smaller pieces that can be validated as subgroups. As such, there are two well-studied models to achieve this goal:

  1. Solving a problem is acquiring a mapping from its inputs to desired outputs. With this notion, we write functions to describe the solution (functional models).
  2. Solving a problem is running a realistic enough simulation of relevant entities. With this notion, we build objects and perform operations on them to describe the solution (sequential models)1.

R possesses features from both worlds.

As described in my previous post, copy-on-modify and lazy evaluation are among the features that make R behave like a functional language. Such features are what allow us to write functions that are flexible yet easy to validate with minimal overhead.

On the other hand, there are clear motivation for objects in R. Consider the following:

1
2
3
4
x1 <- seq(1,10)
x2 <- factor(c("a","b"))
summary(x1)
summary(x2)

summary function, as the name suggests, provide a summary of the input. It is immediately clear that summary is generic - its actual implementation depends on the input. The S3 object system is what allows function polymorphism.

The S3 fOOP system

S3 is an ad hoc system to implement function polymorphism. In short, functions declared as generics dispatch to different actual implementations (i.e., methods) based on its inputs2. There are three components of the S3 functional OOP system:

  1. generics. Only functions declared to be S3 generic will use the S3 dispatching system.
  2. class. While “everything in R is an object”3, S3 dispatching uses the class attribute of inputs.
  3. methods. Call of S3 generics leads to evaluation of S3 methods based on input classes.

S3 generics and methods

As said, S3 is centered around method dispatching, which relies on two functions UseMethod() and NextMethod(). In the example below, I show how one can print human-friendly forms of name and email information with two classes my_person and my_person_email.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
obj1 <- structure(
  # my_person is a named list with the following fields:
  #		$first, $last, $misc
  list(first = "Ye", last = "Yuan", misc = NULL),
  class = "my_person"
)

obj2 <- structure(
	# my_person_email shall be a subclass of my_person, where
	# 	$email field encodes subclass-specific information
  list(first = "Ye", last = "Yuan", email = "yeyu.at.umich.edu"),
  class = c("my_person_email", "my_person")
)

view <- function(x,...){
	# S3 generic `view`
  UseMethod("view") # UseMethod(<genericName>) triggers S3 dispatching
}

view.my_person <- function(x, formal = FALSE){
  # Method of S3 generic `view`, of class `my_person`
  if (formal){
  	# Example: YUAN, YE
    view <- paste0(toupper(x$last), ",", toupper(x$first))
  } else{
  	# Example: Ye Yuan
    view <- paste(x$first, x$last)
  }
  
  if (!is.null(x$misc)){
  	# Allows additional information
    view <- paste(view, x$misc)
  }
  
  cat(paste(view, collapse = "\n"), "\n") # Each person occupies a line
  invisible(x) # Return the object invisibly
}

view.my_person_email <- function(x, ...){
	# Method of S3 generic `view`, of class `my_person_email`
  x$misc <- paste0("(", x$email, ")")
  NextMethod() # call the next method in the S3 dispatch ordered list
}

view(obj1) # Calls `view.my_person`
view(obj2, formal = TRUE) # Calls `view.my_person_email` which calls `view.my_person`

Here is a step-by-step explanation of the function call view(obj2):

  1. UseMethod("view") calls the S3 dispatching function. This function call evaluates the first argument of the enclosing call to get class() information for dispatching. In this case, class(x) = c("my_person_email", "my_person"). This class information is retained as a variable .Class in the execution environment. It also records the generic function name view as a variable .Generic4.
  2. As name view.my_person_email is bound to a function in the global environment, the S3 dispatching function calls view.my_person_email with parameter list obj2, formal = TRUE.
  3. This call first assigns values to $misc field using email information provided in $email.
  4. NextMethod() calls the other S3 dispatching function, which uses the .Class and .Generic generated by UseMethod() to decide on consequent method dispatching. In this case, the generic is view and the next class in list is my_person.
  5. As name view.my_person is bound to a function in the global environment, the S3 dispatching function calls view.my_person with parameter list obj2, formal = TRUE5.
  6. The function body uses the $misc field to allow printing of additional information of persons except first and last names. In this case, the additional info is email values.

Takeaways:

  1. Generic functions end with a call UseMethod("genericName"), which is a special call that does not return.
  2. Method dispatch is based on class attribute of the first argument of the enclosing call. The system searches for genericName.class method in two places 1) environment where the generic is called, and 2) namespace where the generic is defined.
  3. NextMethod() performs further method dispatching based on .Class and .Generic variables saved in the execution environment by UseMethod(). It ends up with a function call to the next method based on the genericName.class search. If no further method is available, it will consequently search for genericName.default. genericNameitself can also be dispatched if it itself is also a method6.

S3 generics in base R

Apart from user-defined generics, many base R functions are generic and allow user-defined S3 methods. There are two major types:

  1. Some of the base R generics uses the standard UseMethod/NextMethod mechanism. This includes the print generic.
  2. Moreover, many of the base R generics are internal (i.e., primitives). Those are called Internal Generic Functions.

Internal generics deserve more explanation and is argubly the more complex part of the S3 system. Being primitive means that they do not use the R interpreter UseMethod/NextMethod dispatching mechanism. Dispatching for them, while still relies on the class attribute of the arguments, is performed in C using DispatchOrEval or DispatchGroup.

Dispatch of internal generics

Help page ?InternalGenerics lists all internal generic functions in R. Most notably, internal generics include:

  • Extract operators [, [[, $, @ and replacement operators [<-, etc.
  • Length and dimension accessors (and setters) including length, dim, names, etc.
  • Combine operators c, cbind, rbind.
  • Casting to base vectors, as.* family including as.character, as.double, etc.
  • … (this is by no means a complete list)

These (unary) functions mostly contains a call .Primitive() and in many cases their function bodies are NULL. They work as follows:

  1. Internal C function DispatchOrEval checks whether the input argument is “internally classed”, which is equivalent to testing 1) whether class() is set for the argument, or 2) whether is.object returns true7.
  2. If the argument is classed, internal method dispatching takes place following the same rules as explained in previous sections.
  3. If the argument is not classed, the default C method is executed8.

Dispatch of group (internal) generics

Help page ?groupGeneric lists all group generic functions in R. Functions belonging to group generics are all internal generics9. These generics are organized in five groups (refer to the R help page for an exhaustive list of the generics):

Group Math
myriad basic math functions including abs, exp, etc.
Group Ops
arithmetic operators incl. +, boolean operators incl. &, and logical operators incl. ==.
Group Summary
logical (incl. all) and math (incl. sum) summary functions.
Group Complex
complex arithmetic functions.
Group matrixOps
matrix operators.

Similar to internal generics, these (unary or binary) functions mostly contains a call .Primitive() and in many cases their function bodies are NULL. They work as follows:

  1. Internal C function DispatchGroup checks whether the input argument(s) is/are “internally classed”, which is equivalent to testing 1) whether class() is set for the argument, or 2) whether is.object returns true7.
  2. If the argument(s) is/are classed, internal method dispatching takes place following mostly similar rules as explained in previous sections. For unary functions dispatching is performed using the first argument. For binary functions in the Ops and matrixOps group, dispatching is performed using both arguments (double dispatching)10.
  3. If the argument is not classed, the default C method is executed8.

Group generics can be quite complex and may be better explained with an example. For this, we will use the s3_dispatch helper of sloop package.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(sloop)
mtcars # demo with the all numeric mtcars data frame. See ?mtcars
# 1. `sqrt` is a Group "Math" primitive
#     element-wise sqrt; retained data.frame attributes
sqrt(mtcars)
s3_dispatch(sqrt(mtcars)) # `Math.data.frame` is dispatched
Math.data.frame # Defined in the base package

# 2. `+` is a Group "Ops" primitive
#	  when two arguments are all data frames
#		behavior = element-wise add; retain data.frame attrs
mtcars + mtcars
s3_dispatch(mtcars + mtcars) # `Ops.data.frame` is dispatched
Ops.data.frame # Defined in the base package

Best practices for OOP with S3

It shall be clear now that the S3 system is all about method dispatching. That a R object is a S3 object of a particular class does NOT provide any guarantee on data structure of the object.

Therefore, effective application of S3 OOP demands the programmer to follow certain rules on their end.

Provide constructor, helper and validator functions

Generally you need to provide three functions for a class (say myclass):

  1. low-level constructor new_myclass(). It should check object classes of the input arguments and error if unexpected arguments are provided.
  2. user-friendly helper myclass(). It should: 1) cast classes of input arguments as needed and end up with a call to the constructor and then the validator if exists, 2) provide error messages crafted for end-users, and 3) provide sensible default values.
  3. validator validate_myclass(). It should verify data structure of the object and error if the object is not valid.

The above follows a typical OOP design and you should tailor it accordingly. For example, internal-use only classes may not have a helper and simple classes may not need a validator.

Use structure

In a typical S3 design, we divide an object into data and attribute parts. Except that the object must have the class attribute, we have complete flexibility as to what to put in each part.

Use the structure function to build your object data and attribute in one call.

Retain class attribute in S3 methods

S3 system does not enforce output type of methods, even for extract operators [, etc. It is up to the programmer to preserve attributes, including the class as necessary. Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
obj.test <- structure(1:10, class = "test")
# The following dispatches to internal default extraction
obj.test[1] # Output is no longer of class "test"
# Define a extraction method for "test" class
`[.test` <- function(x, i){
  structure(NextMethod(), class = "test")
}
# Now extraction dispatches to `[.test` (which then dispatch to default)
obj.test[1] # Output ALWAYS have class "test"
# However, superclass method will NOT retain subclass attribute
obj.subtest <- structure(1:10, class = c("subtest", "test"))
obj.subtest[1] # Output is still of class "test" only
# It is possible to retain class attribute,
#   however involves cluttering boilerplate code
`[.test` <- function(x, i){
  x.cls <- class(x)
  structure(NextMethod(), class = x.cls)
}
obj.subtest[1] # Now output class always = input class

This behavior keeps simplicity of the S3 system. S3 methods, except dispatched by S3, are not quite different than standard R functions. However, it does cause unnecessary clutter for class developers.

The vctrs package provides classes and helper functions that give more guarantees on behavior of S3 data classes.

Intricate cases

Admittedly, S3 dispatching can be quite complex and involved. This section explains several intricate cases. Probably for the better - know that such cases exist, and do not rely on these features in your code.

Group Ops double dispatching

Ops group generics can be binary. What if lhs and rhs arguments both have Ops S3 method defined? In this case, R uses the double dispatching rule to decide on which method to dispatch.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#	  when one of the arguments has class()
#		behavior = dispatch to the Ops.<class>
2 + mtcars
mtcars + 2 # Both dispatches to Ops.data.frame

#	  when both arguments have class()
my2 <- structure(2, class="my_num")
#		behavior is going to be not quite robust, two cases:
#		i) only one argument defined Ops.<class>
#			easy choice
mtcars + my2
my2 + mtcars
#		ii) both arguments defined Ops.<class>
#			C DispatchGroup relies on R S3 chooseOpsMethod
`+.my_num` <- function(e1, e2) "my_num add"
mtcars + my2 # Both give "Incompatible methods" warning
my2 + mtcars # As `+.my_num` and `Ops.data.frame` are both defined
#			Now define chooseOpsMethod
chooseOpsMethod.my_num <- function(...) TRUE
mtcars + my2 
my2 + mtcars # Now both output "my_num add"

The S3 method chooseOpsMethod.my_num <- function(...) TRUE makes all binary Ops generics with a my_num object as one argument to dispatch to the my_num methods, unless such methods are NOT defined.


  1. Functions in R are quite likely procedural and not pure mathematical description of mappings. ↩︎

  2. You may think of it as an object-based method dispatching system on top of the functional programming paradigm. It does not enforce well-defined data structures for objects with the same “class” definition. Therefore, S3 class by itself does not enforce any constraints on the data structure of the objects. Also, S3 objects still follow copy-on-modify behavior and are not mutable. ↩︎

  3. R Internals use different C struct types, which ultimately makes everything in R an object. This is called “base types” of R. However, it is not a OOP system as behavior is hardcoded using switch statements in C source code of R and therefore not extensible by us end users. ↩︎

  4. UseMethod() also saves other information in the execution environment, namely .Method, .GenericCallEnv and .GenericDefEnv. Refer to ?UseMethod for more details. ↩︎

  5. NextMethod() is not a standard function call. The S3 dispatching information created by UseMethod() is actually retained (incl. .Generic, .Class, etc.) Refer to ?UseMethod for more details. ↩︎

  6. Only if genericName itself is 1) a primitive, or 2) a wrapper for a .Internal function of the same name. This means that a R user can NOT add to or modify this list. ↩︎

  7. Actually this is done by looking at the OBJECT bit of the SEXP object for performance concerns. Refer to R internals for more information. ↩︎ ↩︎

  8. Do not confuse internal default with the S3 dispatching default method. For example, you can define a S3 default method for c as c.default <- function(...) in R. However, the internal default for c is written in C as do_c_dflt in the R source code↩︎ ↩︎

  9. One difference between internal generics and group generics is which function call is used for method dispatching in C. Internal generics uses DispatchOrEval while group generics uses DispatchGroup↩︎

  10. Refer to R help page ?groupGenerics for details on double dispatching for the Ops group. ↩︎