This work is licensed under CC BY-SA 4.0
Synopsis
If you are accustomed to “modern” object-oriented programming tools like Java or Python, chances are that you will be thrown off by the at best confusing OOP systems lurking in R.
This post strives to provide a succinct overview of the S3 object system that is used widely in base R as well as many extension packages.
TL;DR: The S3 system is designed for method dispatching of generic functions. S3 objects are 1) standard copy-on-modify R objects (NOT mutable references), and 2) non-restrictive about internal data structure. Therefore, proper S3 OOP design asks for the programmer to follow best practices on their own.
The R’ posts are my study notes of the Advanced R book. For this post, refer to the following resources for more details: Chapters ‘Object-oriented programming Introduction’, ‘Base types’, and ‘S3’ in Advanced R. All vignettes of vctrs. “Object-Oriented Programming, Functional Programming and R” by John M. Chambers. Inaccurate information here is in all likelihood my fault.
Functions and objects
A good computation design divides a large problem into smaller pieces that can be validated as subgroups. As such, there are two well-studied models to achieve this goal:
- Solving a problem is acquiring a mapping from its inputs to desired outputs. With this notion, we write functions to describe the solution (functional models).
- Solving a problem is running a realistic enough simulation of relevant entities. With this notion, we build objects and perform operations on them to describe the solution (sequential models)1.
R possesses features from both worlds.
As described in my previous post, copy-on-modify and lazy evaluation are among the features that make R behave like a functional language. Such features are what allow us to write functions that are flexible yet easy to validate with minimal overhead.
On the other hand, there are clear motivation for objects in R. Consider the following:
| |
summary function, as the name suggests, provide a summary of the input. It is immediately clear that summary is generic - its actual implementation depends on the input. The S3 object system is what allows function polymorphism.
The S3 fOOP system
S3 is an ad hoc system to implement function polymorphism. In short, functions declared as generics dispatch to different actual implementations (i.e., methods) based on its inputs2. There are three components of the S3 functional OOP system:
- generics. Only functions declared to be S3 generic will use the S3 dispatching system.
- class. While “everything in R is an object”3, S3 dispatching uses the
classattribute of inputs. - methods. Call of S3 generics leads to evaluation of S3 methods based on input classes.
S3 generics and methods
As said, S3 is centered around method dispatching, which relies on two functions UseMethod() and NextMethod(). In the example below, I show how one can print human-friendly forms of name and email information with two classes my_person and my_person_email.
| |
Here is a step-by-step explanation of the function call view(obj2):
UseMethod("view")calls the S3 dispatching function. This function call evaluates the first argument of the enclosing call to getclass()information for dispatching. In this case,class(x) = c("my_person_email", "my_person"). This class information is retained as a variable.Classin the execution environment. It also records the generic function nameviewas a variable.Generic4.- As name
view.my_person_emailis bound to a function in the global environment, the S3 dispatching function callsview.my_person_emailwith parameter listobj2, formal = TRUE. - This call first assigns values to
$miscfield using email information provided in$email. NextMethod()calls the other S3 dispatching function, which uses the.Classand.Genericgenerated byUseMethod()to decide on consequent method dispatching. In this case, the generic isviewand the next class in list ismy_person.- As name
view.my_personis bound to a function in the global environment, the S3 dispatching function callsview.my_personwith parameter listobj2, formal = TRUE5. - The function body uses the
$miscfield to allow printing of additional information of persons except first and last names. In this case, the additional info is email values.
Takeaways:
- Generic functions end with a call
UseMethod("genericName"), which is a special call that does not return. - Method dispatch is based on
classattribute of the first argument of the enclosing call. The system searches forgenericName.classmethod in two places 1) environment where the generic is called, and 2) namespace where the generic is defined. NextMethod()performs further method dispatching based on.Classand.Genericvariables saved in the execution environment byUseMethod(). It ends up with a function call to the next method based on thegenericName.classsearch. If no further method is available, it will consequently search forgenericName.default.genericNameitself can also be dispatched if it itself is also a method6.
S3 generics in base R
Apart from user-defined generics, many base R functions are generic and allow user-defined S3 methods. There are two major types:
- Some of the base R generics uses the standard
UseMethod/NextMethodmechanism. This includes theprintgeneric. - Moreover, many of the base R generics are internal (i.e., primitives). Those are called Internal Generic Functions.
Internal generics deserve more explanation and is argubly the more complex part of the S3 system. Being primitive means that they do not use the R interpreter UseMethod/NextMethod dispatching mechanism. Dispatching for them, while still relies on the class attribute of the arguments, is performed in C using DispatchOrEval or DispatchGroup.
Dispatch of internal generics
Help page ?InternalGenerics lists all internal generic functions in R. Most notably, internal generics include:
- Extract operators
[,[[,$,@and replacement operators[<-, etc. - Length and dimension accessors (and setters) including
length,dim,names, etc. - Combine operators
c,cbind,rbind. - Casting to base vectors,
as.*family includingas.character,as.double, etc. - … (this is by no means a complete list)
These (unary) functions mostly contains a call .Primitive() and in many cases their function bodies are NULL. They work as follows:
- Internal C function
DispatchOrEvalchecks whether the input argument is “internally classed”, which is equivalent to testing 1) whetherclass()is set for the argument, or 2) whetheris.objectreturns true7. - If the argument is classed, internal method dispatching takes place following the same rules as explained in previous sections.
- If the argument is not classed, the default C method is executed8.
Dispatch of group (internal) generics
Help page ?groupGeneric lists all group generic functions in R. Functions belonging to group generics are all internal generics9. These generics are organized in five groups (refer to the R help page for an exhaustive list of the generics):
- Group
Math - myriad basic math functions including
abs,exp, etc. - Group
Ops - arithmetic operators incl.
+, boolean operators incl.&, and logical operators incl.==. - Group
Summary - logical (incl.
all) and math (incl.sum) summary functions. - Group
Complex - complex arithmetic functions.
- Group
matrixOps - matrix operators.
Similar to internal generics, these (unary or binary) functions mostly contains a call .Primitive() and in many cases their function bodies are NULL. They work as follows:
- Internal C function
DispatchGroupchecks whether the input argument(s) is/are “internally classed”, which is equivalent to testing 1) whetherclass()is set for the argument, or 2) whetheris.objectreturns true7. - If the argument(s) is/are classed, internal method dispatching takes place following mostly similar rules as explained in previous sections. For unary functions dispatching is performed using the first argument. For binary functions in the
OpsandmatrixOpsgroup, dispatching is performed using both arguments (double dispatching)10. - If the argument is not classed, the default C method is executed8.
Group generics can be quite complex and may be better explained with an example. For this, we will use the s3_dispatch helper of sloop package.
| |
Best practices for OOP with S3
It shall be clear now that the S3 system is all about method dispatching. That a R object is a S3 object of a particular class does NOT provide any guarantee on data structure of the object.
Therefore, effective application of S3 OOP demands the programmer to follow certain rules on their end.
Provide constructor, helper and validator functions
Generally you need to provide three functions for a class (say myclass):
- low-level constructor
new_myclass(). It should check object classes of the input arguments and error if unexpected arguments are provided. - user-friendly helper
myclass(). It should: 1) cast classes of input arguments as needed and end up with a call to the constructor and then the validator if exists, 2) provide error messages crafted for end-users, and 3) provide sensible default values. - validator
validate_myclass(). It should verify data structure of the object and error if the object is not valid.
The above follows a typical OOP design and you should tailor it accordingly. For example, internal-use only classes may not have a helper and simple classes may not need a validator.
Use structure
In a typical S3 design, we divide an object into data and attribute parts. Except that the object must have the class attribute, we have complete flexibility as to what to put in each part.
Use the structure function to build your object data and attribute in one call.
Retain class attribute in S3 methods
S3 system does not enforce output type of methods, even for extract operators [, etc. It is up to the programmer to preserve attributes, including the class as necessary. Example:
| |
This behavior keeps simplicity of the S3 system. S3 methods, except dispatched by S3, are not quite different than standard R functions. However, it does cause unnecessary clutter for class developers.
The vctrs package provides classes and helper functions that give more guarantees on behavior of S3 data classes.
Intricate cases
Admittedly, S3 dispatching can be quite complex and involved. This section explains several intricate cases. Probably for the better - know that such cases exist, and do not rely on these features in your code.
Group Ops double dispatching
Ops group generics can be binary. What if lhs and rhs arguments both have Ops S3 method defined? In this case, R uses the double dispatching rule to decide on which method to dispatch.
| |
The S3 method chooseOpsMethod.my_num <- function(...) TRUE makes all binary Ops generics with a my_num object as one argument to dispatch to the my_num methods, unless such methods are NOT defined.
Functions in R are quite likely procedural and not pure mathematical description of mappings. ↩︎
You may think of it as an object-based method dispatching system on top of the functional programming paradigm. It does not enforce well-defined data structures for objects with the same “class” definition. Therefore, S3 class by itself does not enforce any constraints on the data structure of the objects. Also, S3 objects still follow copy-on-modify behavior and are not mutable. ↩︎
R Internals use different C struct types, which ultimately makes everything in R an object. This is called “base types” of R. However, it is not a OOP system as behavior is hardcoded using switch statements in C source code of R and therefore not extensible by us end users. ↩︎
UseMethod()also saves other information in the execution environment, namely.Method,.GenericCallEnvand.GenericDefEnv. Refer to?UseMethodfor more details. ↩︎NextMethod()is not a standard function call. The S3 dispatching information created byUseMethod()is actually retained (incl..Generic,.Class, etc.) Refer to?UseMethodfor more details. ↩︎Only if
genericNameitself is 1) a primitive, or 2) a wrapper for a.Internalfunction of the same name. This means that a R user can NOT add to or modify this list. ↩︎Actually this is done by looking at the
OBJECTbit of theSEXPobject for performance concerns. Refer to R internals for more information. ↩︎ ↩︎Do not confuse internal default with the S3 dispatching default method. For example, you can define a S3 default method for
casc.default <- function(...)in R. However, the internal default forcis written in C asdo_c_dfltin the R source code. ↩︎ ↩︎One difference between internal generics and group generics is which function call is used for method dispatching in C. Internal generics uses
DispatchOrEvalwhile group generics usesDispatchGroup. ↩︎Refer to R help page
?groupGenericsfor details on double dispatching for theOpsgroup. ↩︎