This work is licensed under CC BY-SA 4.0
Synopsis
If you are accustomed to “modern” object-oriented programming tools like Java or Python, chances are that you will be thrown off by the at best confusing OOP systems lurking in R.
This post strives to provide a succinct overview of the S3 object system that is used widely in base R as well as many extension packages.
TL;DR: The S3 system is designed for method dispatching of generic functions. S3 objects are 1) standard copy-on-modify R objects (NOT mutable references), and 2) non-restrictive about internal data structure. Therefore, proper S3 OOP design asks for the programmer to follow best practices on their own.
The R’ posts are my study notes of the Advanced R book. For this post, refer to the following resources for more details: Chapters ‘Object-oriented programming Introduction’, ‘Base types’, and ‘S3’ in Advanced R. All vignettes of vctrs. “Object-Oriented Programming, Functional Programming and R” by John M. Chambers. Inaccurate information here is in all likelihood my fault.
Functions and objects
A good computation design divides a large problem into smaller pieces that can be validated as subgroups. As such, there are two well-studied models to achieve this goal:
- Solving a problem is acquiring a mapping from its inputs to desired outputs. With this notion, we write functions to describe the solution (functional models).
- Solving a problem is running a realistic enough simulation of relevant entities. With this notion, we build objects and perform operations on them to describe the solution (sequential models)1.
R possesses features from both worlds.
As described in my previous post, copy-on-modify and lazy evaluation are among the features that make R behave like a functional language. Such features are what allow us to write functions that are flexible yet easy to validate with minimal overhead.
On the other hand, there are clear motivation for objects in R. Consider the following:
|
|
summary
function, as the name suggests, provide a summary of the input. It is immediately clear that summary
is generic - its actual implementation depends on the input. The S3 object system is what allows function polymorphism.
The S3 fOOP system
S3 is an ad hoc system to implement function polymorphism. In short, functions declared as generics dispatch to different actual implementations (i.e., methods) based on its inputs2. There are three components of the S3 functional OOP system:
- generics. Only functions declared to be S3 generic will use the S3 dispatching system.
- class. While “everything in R is an object”3, S3 dispatching uses the
class
attribute of inputs. - methods. Call of S3 generics leads to evaluation of S3 methods based on input classes.
S3 generics and methods
As said, S3 is centered around method dispatching, which relies on two functions UseMethod()
and NextMethod()
. In the example below, I show how one can print human-friendly forms of name and email information with two classes my_person
and my_person_email
.
|
|
Here is a step-by-step explanation of the function call view(obj2)
:
UseMethod("view")
calls the S3 dispatching function. This function call evaluates the first argument of the enclosing call to getclass()
information for dispatching. In this case,class(x) = c("my_person_email", "my_person")
. This class information is retained as a variable.Class
in the execution environment. It also records the generic function nameview
as a variable.Generic
4.- As name
view.my_person_email
is bound to a function in the global environment, the S3 dispatching function callsview.my_person_email
with parameter listobj2, formal = TRUE
. - This call first assigns values to
$misc
field using email information provided in$email
. NextMethod()
calls the other S3 dispatching function, which uses the.Class
and.Generic
generated byUseMethod()
to decide on consequent method dispatching. In this case, the generic isview
and the next class in list ismy_person
.- As name
view.my_person
is bound to a function in the global environment, the S3 dispatching function callsview.my_person
with parameter listobj2, formal = TRUE
5. - The function body uses the
$misc
field to allow printing of additional information of persons except first and last names. In this case, the additional info is email values.
Takeaways:
- Generic functions end with a call
UseMethod("genericName")
, which is a special call that does not return. - Method dispatch is based on
class
attribute of the first argument of the enclosing call. The system searches forgenericName.class
method in two places 1) environment where the generic is called, and 2) namespace where the generic is defined. NextMethod()
performs further method dispatching based on.Class
and.Generic
variables saved in the execution environment byUseMethod()
. It ends up with a function call to the next method based on thegenericName.class
search. If no further method is available, it will consequently search forgenericName.default
.genericName
itself can also be dispatched if it itself is also a method6.
S3 generics in base R
Apart from user-defined generics, many base R functions are generic and allow user-defined S3 methods. There are two major types:
- Some of the base R generics uses the standard
UseMethod/NextMethod
mechanism. This includes theprint
generic. - Moreover, many of the base R generics are internal (i.e., primitives). Those are called Internal Generic Functions.
Internal generics deserve more explanation and is argubly the more complex part of the S3 system. Being primitive means that they do not use the R interpreter UseMethod/NextMethod
dispatching mechanism. Dispatching for them, while still relies on the class
attribute of the arguments, is performed in C using DispatchOrEval
or DispatchGroup
.
Dispatch of internal generics
Help page ?InternalGenerics
lists all internal generic functions in R. Most notably, internal generics include:
- Extract operators
[
,[[
,$
,@
and replacement operators[<-
, etc. - Length and dimension accessors (and setters) including
length
,dim
,names
, etc. - Combine operators
c
,cbind
,rbind
. - Casting to base vectors,
as.*
family includingas.character
,as.double
, etc. - … (this is by no means a complete list)
These (unary) functions mostly contains a call .Primitive()
and in many cases their function bodies are NULL
. They work as follows:
- Internal C function
DispatchOrEval
checks whether the input argument is “internally classed”, which is equivalent to testing 1) whetherclass()
is set for the argument, or 2) whetheris.object
returns true7. - If the argument is classed, internal method dispatching takes place following the same rules as explained in previous sections.
- If the argument is not classed, the default C method is executed8.
Dispatch of group (internal) generics
Help page ?groupGeneric
lists all group generic functions in R. Functions belonging to group generics are all internal generics9. These generics are organized in five groups (refer to the R help page for an exhaustive list of the generics):
- Group
Math
- myriad basic math functions including
abs
,exp
, etc. - Group
Ops
- arithmetic operators incl.
+
, boolean operators incl.&
, and logical operators incl.==
. - Group
Summary
- logical (incl.
all
) and math (incl.sum
) summary functions. - Group
Complex
- complex arithmetic functions.
- Group
matrixOps
- matrix operators.
Similar to internal generics, these (unary or binary) functions mostly contains a call .Primitive()
and in many cases their function bodies are NULL
. They work as follows:
- Internal C function
DispatchGroup
checks whether the input argument(s) is/are “internally classed”, which is equivalent to testing 1) whetherclass()
is set for the argument, or 2) whetheris.object
returns true7. - If the argument(s) is/are classed, internal method dispatching takes place following mostly similar rules as explained in previous sections. For unary functions dispatching is performed using the first argument. For binary functions in the
Ops
andmatrixOps
group, dispatching is performed using both arguments (double dispatching)10. - If the argument is not classed, the default C method is executed8.
Group generics can be quite complex and may be better explained with an example. For this, we will use the s3_dispatch
helper of sloop
package.
|
|
Best practices for OOP with S3
It shall be clear now that the S3 system is all about method dispatching. That a R object is a S3 object of a particular class does NOT provide any guarantee on data structure of the object.
Therefore, effective application of S3 OOP demands the programmer to follow certain rules on their end.
Provide constructor, helper and validator functions
Generally you need to provide three functions for a class (say myclass
):
- low-level constructor
new_myclass()
. It should check object classes of the input arguments and error if unexpected arguments are provided. - user-friendly helper
myclass()
. It should: 1) cast classes of input arguments as needed and end up with a call to the constructor and then the validator if exists, 2) provide error messages crafted for end-users, and 3) provide sensible default values. - validator
validate_myclass()
. It should verify data structure of the object and error if the object is not valid.
The above follows a typical OOP design and you should tailor it accordingly. For example, internal-use only classes may not have a helper and simple classes may not need a validator.
Use structure
In a typical S3 design, we divide an object into data and attribute parts. Except that the object must have the class
attribute, we have complete flexibility as to what to put in each part.
Use the structure
function to build your object data and attribute in one call.
Retain class attribute in S3 methods
S3 system does not enforce output type of methods, even for extract operators [
, etc. It is up to the programmer to preserve attributes, including the class
as necessary. Example:
|
|
This behavior keeps simplicity of the S3 system. S3 methods, except dispatched by S3, are not quite different than standard R functions. However, it does cause unnecessary clutter for class developers.
The vctrs
package provides classes and helper functions that give more guarantees on behavior of S3 data classes.
Intricate cases
Admittedly, S3 dispatching can be quite complex and involved. This section explains several intricate cases. Probably for the better - know that such cases exist, and do not rely on these features in your code.
Group Ops
double dispatching
Ops
group generics can be binary. What if lhs and rhs arguments both have Ops
S3 method defined? In this case, R uses the double dispatching rule to decide on which method to dispatch.
|
|
The S3 method chooseOpsMethod.my_num <- function(...) TRUE
makes all binary Ops
generics with a my_num
object as one argument to dispatch to the my_num
methods, unless such methods are NOT defined.
Functions in R are quite likely procedural and not pure mathematical description of mappings. ↩︎
You may think of it as an object-based method dispatching system on top of the functional programming paradigm. It does not enforce well-defined data structures for objects with the same “class” definition. Therefore, S3 class by itself does not enforce any constraints on the data structure of the objects. Also, S3 objects still follow copy-on-modify behavior and are not mutable. ↩︎
R Internals use different C struct types, which ultimately makes everything in R an object. This is called “base types” of R. However, it is not a OOP system as behavior is hardcoded using switch statements in C source code of R and therefore not extensible by us end users. ↩︎
UseMethod()
also saves other information in the execution environment, namely.Method
,.GenericCallEnv
and.GenericDefEnv
. Refer to?UseMethod
for more details. ↩︎NextMethod()
is not a standard function call. The S3 dispatching information created byUseMethod()
is actually retained (incl..Generic
,.Class
, etc.) Refer to?UseMethod
for more details. ↩︎Only if
genericName
itself is 1) a primitive, or 2) a wrapper for a.Internal
function of the same name. This means that a R user can NOT add to or modify this list. ↩︎Actually this is done by looking at the
OBJECT
bit of theSEXP
object for performance concerns. Refer to R internals for more information. ↩︎ ↩︎Do not confuse internal default with the S3 dispatching default method. For example, you can define a S3 default method for
c
asc.default <- function(...)
in R. However, the internal default forc
is written in C asdo_c_dflt
in the R source code. ↩︎ ↩︎One difference between internal generics and group generics is which function call is used for method dispatching in C. Internal generics uses
DispatchOrEval
while group generics usesDispatchGroup
. ↩︎Refer to R help page
?groupGenerics
for details on double dispatching for theOps
group. ↩︎