Synopsis
R is a functional programming language that “allows dynamic creation and manipulation of functions and language objects”. Functions are first-class objects in R and can be used wherever an object is needed.
This post summarizes how R functions work in the follow aspects:
- Components of a function
- Function forms
- Lexical scoping and environments
- Call stack and frames
TL;DR: See a cheatsheet of relevant tools provided in the rlang
package.
The R’ posts are my study notes of the Advanced R book. For this post, refer to the following resources for more details: Chapters ‘Names and values’, ‘Functions’ and ‘Environments’ in Advanced R. Sections 1.1-1.5 of R Internals. Sections ‘Basic types’, ‘Simple evaluation’, ‘Scope of variables’ and ‘Computing on the language’ in R Language Definiton. Inaccurate information here is in all likelihood my fault.
This work is licensed under CC BY-SA 4.0
Function components
Function objects are, conceptually, templates for running code blocks (statements) with defined inputs and outputs. Therefore, a function has the following three components1.
- formals
- formal input argument list
- body
- the function code chunk
- environment
- the environment where the function is defined (see below)
Lazy evaluation of arguments: formals()
gives the formal argument list of a function. When a function is called, we pass in expressions (i.e., actual arguments) to give concrete values to the arguments. In R, actual arguments are evaluated “on-demand”, leaving unused arguments unevaluated. This is realized using the promise mechanism. Refer to the Lazy Evaluation section.
Environment of a function: environment()
gives the environment of a function. Executing a function call in R is conceptually evaluating the function body line by line, where the interpreter finds values for all symbols in the process. In R, the function environment defines what variables are available and how to find their values (i.e., scoping) during function invocation. Refer to the Scoping section.
Function forms
Everything (yes, including function defs, control flows, …) that happens in R is a result of a function call, even if not all calls looks like f(...)
. In R, function calls come in four flavors (forms):
- prefix
- function name precedes the argument list.
f(...)
. - infix
- function name comes in between.
x + y
. - replacement
- modify arguments in place2. See below.
- special
- do not have a consistent structure but notably includes parentheses (and other unary operators like negate
-
), subsetting and control flows.
All four forms can be written in the prefix form. Refer to Advanced R for a detailed list. The replacement form is discussed in a supplementary section of this post.
Scoping rules
“Scope or scoping rules are simply the set of rules used by the evaluator to find a value for a symbol”(R Language Definition). R follows lexical/static scope, which means that the resolution is complete at parse-time3. To support lexical scope, R uses the environment structure.
If you are not familiar with R environment, refer to the environment section. To see its application in lexical scoping, consider the following example:
|
|
When we construct the function object and bind it to symbol f
by f <- function(x,y){...}
, nothing in the function body is executed.
When we later call the function by f(1, c(2,3))
, the following steps are performed:
- An execution environment (EE) is created whose enclosure (i.e., parent) is the function environment (FE). In this case, FE is the global environment because the function is defined there. Symbol-value bindings created during the function call are stored in the EE.
- Argument symbols
x
andy
are bound to promises generated with the actual arguments1
andc(2,3)
. - The evaluator starts evaluating the function body. When formal arguments are evaluated, promises bound to them are evaluated. All other symbols in the function body are either local or unbound variables.
- Local variable
z
is bound toc(x,y)
, which involves 1) evaluation ofx
andy
by forcing their promises and 2) resolving unbound symbolc
. - Value
(z - min(z)) / (max(z) - min(z))
is computed and bound toz
. This needs resolving unbound symbolsmin
,max
,-
,/
, and(
4. - Value
z + offset
is computed and bound toz
. This needs resolving unbound symbols+
andoffset
. - Statement
return(z)
is evaluated which leads to function return. This needs resolving unbound symbolreturn
.
To find values for the unbound symbols during function execution, parent environments of the EE are traversed in order until a match is found. Because FE is the enclosure of EE, and FE is where the function object is defined, the process usually yields intuitive results.
Call stacks
“Every time a function is invoked a new evaluation frame is created. At any point in time during the computation the currently active environments are accessible through the call stack” (R Language Definition).
The evaluation/execution frame is created during function execution. It is referenced in two structures:
- the execution environment explained in the Scoping rules section for lexical scoping.
- an internal structure named context which records runtime information. The stack of contexts (the call stack) is a record of how the function calls are invoked.
For the call stack, the global environment is always number 0. Each subsequent function evaluation increases the stack index by 1.
The call stack is probably best understood by its accessor functions. They are in the base package and have names that start with sys.
. For a complete list, ?sys.call
. Below, I provide a partial list that hopefully points out key concepts of the stack.
sys.nframe()
: position of the current context in the call stack.
sys.parent()
: position of the calling context in the call stack. sys.parents()
yields positions of all parent contexts. sys.calls()
and sys.frames()
yield call objects and execution environments of all parent contexts.
sys.call(which=0)
: get the call
object for the context at position which
in the stack.
sys.function(which=0)
: get the function
object for the context at position which
.
sys.frame(which=0)
: get the execution environment for the context at position which
.
parent.frame(n=1)
: get the execution environment for a parent context. Equivalent to sys.frame(which=rev(sys.parents())[n])
.
Finally, a simple example of the call stack:
|
|
In this example:
fx[[1]]()
callsfetchX_1
. The evaluation frame has no symbolx
. Therefore, its parent, the EE off()
, is searched and the value is"x in f"
. This follows static scoping.fx[[2]]()
callsfetchX_2
.eval(expr, envir)
allows evaluation of statementexpr
in the environmentenvir
. Twoeval(...)
statements are provided as actual arguments to the functionc
. Therefore, both statements are processed as promises in the calling environment - EE offetchX_2
.- In,
eval(x, envir = parent.frame())
, symbolx
can be found in the FE offetchX_2
. Therefore, theenvir
argument is ignored asx
is already resolved as"x in f"
. This still follows static scoping. - In,
eval(expression(x), envir = parent.frame())
, the expressionx
has to be resolved inenvir = parent.frame()
, which is the EE ofh()
. Therefore, the evaluation yields"x in h"
. This follows dynamic scoping!
rlang
tools
The rlang
package includes a comprehensive set of functions look into functions, environments and the call stack. Below I provide a summary for a subset and include related R base functions.
Functions
fn_fmls()
and the fn_fmls_
family: Extract or set formals. formals()
.
fn_body()
and fn_body<-()
: Extract or set body. body()
.
fn_env()
and fn_env<-()
: Extract or set closure (i.e., ‘function environment’). environment()
.
Environments
General operations
env()
, new_environment()
, env_print()
, env_browse()
: Create, pretty print, or browse an environment. new.env()
.
env_clone()
, env_coalesce()
: Clone or coalesce an environment, as opposed to the default reference semantics.
get_env()
, set_env()
: Get or set the environment of an object (e.g., function and frame). environment()
.
Enclosure operations
env_parent()
, env_parents()
, env_tail()
, env_inherits()
, env_depth()
Binding operations
env_names()
, env_length()
: (Names of) symbols found in an environment.
env_has()
, env_get()
, env_get_list()
: Get or check existing bindings in an environment.
env_bind()
, env_poke()
, env_unbind()
: Bind, rebind, or unbind symbols in an environment.
Call stack
Execution environment operations
current_env()
, caller_env()
: Get current or caller EE.
Context operations
current_fn()
, caller_fn()
: Function (i.e., without actual arguments).
current_call()
, caller_call()
: Call (i.e., with arguments).
Supplementary
Lazy evaluation with promises
When a R function is called, actual arguments are NOT evaluated immediately. Instead, the actual arguments are replaced with promises5.
Promises are internal R objects with the following ‘slots’:
- expression
- the exact expression provided by the caller function.
- environment
- where to evaluate the promise. for supplied arguments - the calling environment; for default arguments in formals - the execution environment.
- value
- to store results after evaluating the expression slot.
Whenever value of an argument is required during execution, the corresponding promise is evaluated to acquire its value. Once a promise is evaluated for the first time during execution, its value is cached for reuse.
Therefore, promises allow lazy evaluation of the actual arguments - only evaluated when needed, and evaluated only once. This feature 1) helps with performance, and 2) allows access of the exact expression which isuseful for generating plot labels, etc.
Consider the following examples:
|
|
Example 1: return(z)
triggers evaluation of promise z = x*y+c
. In this case, environment of the promise is the execution environment (EE) because z
takes the default argument x*y+c
. Binding c <- 1
exists in the EE at the time of promise evaluation. Therefore, value of the promise is 3*5+1
.
Example 2: z2 <- z + 5
triggers evaluation of promise z = x*y+c
. z
is evaluated to be 16
upon evaluation. Even if the value of c
is modified, z
remains unchanged as promise z
is NOT evaluated again at return(z)
.
Example 3: return(z)
triggers evaluation of promise z = x*y+c
. In this case, environment of the promise is the calling environment (CE) because z
takes the supplied argument x*y+c
. Binding for x
does not exist in the CE at the time of promise evaluation. Therefore, an error is raised. Note that execution of f3
is actually performed until the statement return(z)
. One evidence is that message f3 is called
IS generated before the error.
Example 4: During evaluation of f4(a*b+c)
and f4_2(a*b+c)
, expression of promise x
is accessed by substitute(x)
that returns a call
object. call
objects are unevaluated parsed expressions. deparse(x.exp)
, as in its name, deparse the call
object giving the equivalent string. The string is not in literal identical to the argument a*b+c
. The deparsed string is a * b + c
with spaces between symbols. In comparison, during evaluation of f4_3(a*b+c)
, promise x
is forced when evaluating x + 1
, the value of which (1*1+1 + 1 = 3
) is bound to x
. Therefore, x
is no longer a promise but a local variable. See footnote for a bit more on how substitute()
works6.
Environment for scoping rules
R uses environment to look up values of symbols during statement evaluation. Environment consists of two things(R Language Definition)7:
- frame
- a set of symbol-value pairs.
- enclosure
- a pointer to an enclosing environment.
Practically, an environment has the following properties:
- Names/symbols must be unique.
- Names are not ordered.
- Have one and only one parent environment.
- Is modified in-place, in contrast to the typical copy-on-modify for other R objects.
Therefore, all environments in a R session form a tree structure where the enclosures are parents. The following environments are always present:
- Empty environment
emptyenv()
. Root node. The only environment without a parent. - Base environment
baseenv()
. Immediate child of root. Also known aspackage:base
. - Global environment
globalenv()
. The “user workspace” environment where all interactive (andRscript
) statements are evaluated.
There are three typical application cases of lexical scoping with environments:
- Look up a symbol from the user workspace (i.e., the global environment). This involves the enclosure of
globalenv()
. Its path to root is aptly named the search path accessible bysearch()
. - Look up a symbol from statements within a R package. This involve the package-specific enclosure namespace.
- Look up a symbol during function invocation. This involves the function-specific enclosure execution environment and is explained in the main section Scoping rules
There is one last intriguing twist - R supports dynamic scoping by allowing a function call to access the execution environment of its caller by the call stack mechanism and ts explained in the main section Call stacks. In this sense, R seems to be statically scoped until it isn’t (?!)
The replacement form
The replacement form must:
- have arguments named
x
andvalue
- returns the full modified object
- have the special name
xxx<-
- additional arguments are placed between
x
andvalue
Examples:
|
|
With one exception: primitive functions call C code directly and have
NULL
values offormals()
,body()
andenvironment()
. They 1) only exist in the base package, 2) are shown as.Primitive("name")
, 3) have type as eitherbuiltin
orspecial
. ↩︎for most cases R do not really do modification in place and follows copy-on-modify. ↩︎
equivalently, it is possible to resolve the value/promise of any symbol by looking at just the source code and no runtime information is needed. Another way to put it, “variable bndings in effect at the time the expression was created are used to provide values for any unbound symbols in the expression”. ↩︎
Yes,
(
is a R base function. When the interpreter parses a string likesymbol(...)
, it knows that it is looking for a function bound tosymbol
and the...
are the actual arguments. In comparison, when the interpreter parses a string(...)
without any preceding symbol, it is actually looking for the unary function(
. “Parentheses are recorded as equivalent to a unary operator, with name ‘(’, even in cases where the parentheses could be inferred from operator precedence (e.g., a * (b + c)).” (from R Language Definition). ↩︎Internally promises are PROMSXP which contain pointers to i) the environment where the promise is evaluated, ii) the expression to evaluate, and iii) the evaluated value. Once a promise is evaluated, its environment is set to NULL and the value will be reused. For details about this, check out this paper and the R Internals documentation. ↩︎
“The exact rules for substitutions are as follows: Each symbol in the parse tree for the first is matched against the second argument, which can be a tagged list or an environment frame. If it is a simple local object, its value is inserted, except if matching against the global environment (where the symbol is untouched). If it is a promise (usually a function argument), the promise expression is substituted. If the symbol is not matched, it is left untouched.” (adapted from R Language Definition) ↩︎
Advanced R uses the term frame as execution context, different from the R language definition. Note that base R has functions with confusing names:
parent.frame()
returns the execution context, which is actually an environment object and not simply a ‘frame’. ↩︎