Synopsis

R is a functional programming language that “allows dynamic creation and manipulation of functions and language objects”. Functions are first-class objects in R and can be used wherever an object is needed.

This post summarizes how R functions work in the follow aspects:

  1. Components of a function
  2. Function forms
  3. Lexical scoping and environments
  4. Call stack and frames

TL;DR: See a cheatsheet of relevant tools provided in the rlang package.

The R’ posts are my study notes of the Advanced R book. For this post, refer to the following resources for more details: Chapters ‘Names and values’, ‘Functions’ and ‘Environments’ in Advanced R. Sections 1.1-1.5 of R Internals. Sections ‘Basic types’, ‘Simple evaluation’, ‘Scope of variables’ and ‘Computing on the language’ in R Language Definiton. Inaccurate information here is in all likelihood my fault.

This work is licensed under CC BY-SA 4.0

Function components

Function objects are, conceptually, templates for running code blocks (statements) with defined inputs and outputs. Therefore, a function has the following three components1.

formals
formal input argument list
body
the function code chunk
environment
the environment where the function is defined (see below)

Lazy evaluation of arguments: formals() gives the formal argument list of a function. When a function is called, we pass in expressions (i.e., actual arguments) to give concrete values to the arguments. In R, actual arguments are evaluated “on-demand”, leaving unused arguments unevaluated. This is realized using the promise mechanism. Refer to the Lazy Evaluation section.

Environment of a function: environment() gives the environment of a function. Executing a function call in R is conceptually evaluating the function body line by line, where the interpreter finds values for all symbols in the process. In R, the function environment defines what variables are available and how to find their values (i.e., scoping) during function invocation. Refer to the Scoping section.

Function forms

Everything (yes, including function defs, control flows, …) that happens in R is a result of a function call, even if not all calls looks like f(...). In R, function calls come in four flavors (forms):

prefix
function name precedes the argument list. f(...).
infix
function name comes in between. x + y.
replacement
modify arguments in place2. See below.
special
do not have a consistent structure but notably includes parentheses (and other unary operators like negate -), subsetting and control flows.

All four forms can be written in the prefix form. Refer to Advanced R for a detailed list. The replacement form is discussed in a supplementary section of this post.

Scoping rules

“Scope or scoping rules are simply the set of rules used by the evaluator to find a value for a symbol”(R Language Definition). R follows lexical/static scope, which means that the resolution is complete at parse-time3. To support lexical scope, R uses the environment structure.

If you are not familiar with R environment, refer to the environment section. To see its application in lexical scoping, consider the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Note that `offset` is defined neither in the body nor formals
f <- function(x,y){
	z <- c(x,y) # concatenate
	z <- (z - min(z)) / (max(z) - min(z)) # normalize to [0,1]
	z <- z + offset # add an offset
	return(z)
}
offset <- 7
print(f(1, c(2,3))) # yields c(7.0,7.5,8.0)
offset <- 5
print(f(1, c(2,3))) # yields c(5.0,5.5,6.0)

When we construct the function object and bind it to symbol f by f <- function(x,y){...}, nothing in the function body is executed.

When we later call the function by f(1, c(2,3)), the following steps are performed:

  1. An execution environment (EE) is created whose enclosure (i.e., parent) is the function environment (FE). In this case, FE is the global environment because the function is defined there. Symbol-value bindings created during the function call are stored in the EE.
  2. Argument symbols x and y are bound to promises generated with the actual arguments 1 and c(2,3).
  3. The evaluator starts evaluating the function body. When formal arguments are evaluated, promises bound to them are evaluated. All other symbols in the function body are either local or unbound variables.
  4. Local variable z is bound to c(x,y), which involves 1) evaluation of x and y by forcing their promises and 2) resolving unbound symbol c.
  5. Value (z - min(z)) / (max(z) - min(z)) is computed and bound to z. This needs resolving unbound symbols min, max, -, /, and (4.
  6. Value z + offset is computed and bound to z. This needs resolving unbound symbols + and offset.
  7. Statement return(z) is evaluated which leads to function return. This needs resolving unbound symbol return.

To find values for the unbound symbols during function execution, parent environments of the EE are traversed in order until a match is found. Because FE is the enclosure of EE, and FE is where the function object is defined, the process usually yields intuitive results.

Call stacks

“Every time a function is invoked a new evaluation frame is created. At any point in time during the computation the currently active environments are accessible through the call stack” (R Language Definition).

The evaluation/execution frame is created during function execution. It is referenced in two structures:

  1. the execution environment explained in the Scoping rules section for lexical scoping.
  2. an internal structure named context which records runtime information. The stack of contexts (the call stack) is a record of how the function calls are invoked.

For the call stack, the global environment is always number 0. Each subsequent function evaluation increases the stack index by 1.

The call stack is probably best understood by its accessor functions. They are in the base package and have names that start with sys.. For a complete list, ?sys.call. Below, I provide a partial list that hopefully points out key concepts of the stack.

sys.nframe(): position of the current context in the call stack.

sys.parent(): position of the calling context in the call stack. sys.parents() yields positions of all parent contexts. sys.calls() and sys.frames() yield call objects and execution environments of all parent contexts.

sys.call(which=0): get the call object for the context at position which in the stack.

sys.function(which=0): get the function object for the context at position which.

sys.frame(which=0): get the execution environment for the context at position which.

parent.frame(n=1): get the execution environment for a parent context. Equivalent to sys.frame(which=rev(sys.parents())[n]).

Finally, a simple example of the call stack:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
f <- function(){
	# EE of f is where fetchX_1 and fetchX_2 are defined.
	#		Therefore, EE of f is FE of fetchX_1 and fetchX_2.
  x <- "x in f"
  fetchX_1 <- function(){
    x
  }
  fetchX_2 <- function(){
    c(eval(x, envir = parent.frame()), 
      eval(expression(x), envir = parent.frame()))
  }
  return(list(fetchX_1, fetchX_2))
}

h <- function(){
  x <- "x in h"
  fx <- f()
  print(fx[[1]]())
  print(fx[[2]]())
}

h()
#[1] "x in f"
#[1] "x in f" "x in h"

In this example:

  1. fx[[1]]() calls fetchX_1. The evaluation frame has no symbol x. Therefore, its parent, the EE of f(), is searched and the value is "x in f". This follows static scoping.
  2. fx[[2]]() calls fetchX_2. eval(expr, envir) allows evaluation of statement expr in the environment envir. Two eval(...) statements are provided as actual arguments to the function c. Therefore, both statements are processed as promises in the calling environment - EE of fetchX_2.
  3. In, eval(x, envir = parent.frame()), symbol x can be found in the FE of fetchX_2. Therefore, the envir argument is ignored as x is already resolved as "x in f". This still follows static scoping.
  4. In, eval(expression(x), envir = parent.frame()), the expression x has to be resolved in envir = parent.frame(), which is the EE of h(). Therefore, the evaluation yields "x in h". This follows dynamic scoping!

rlang tools

The rlang package includes a comprehensive set of functions look into functions, environments and the call stack. Below I provide a summary for a subset and include related R base functions.

Functions

fn_fmls() and the fn_fmls_ family: Extract or set formals. formals().

fn_body() and fn_body<-(): Extract or set body. body().

fn_env() and fn_env<-(): Extract or set closure (i.e., ‘function environment’). environment().

Environments

General operations

env(), new_environment(), env_print(), env_browse(): Create, pretty print, or browse an environment. new.env().

env_clone(), env_coalesce(): Clone or coalesce an environment, as opposed to the default reference semantics.

get_env(), set_env(): Get or set the environment of an object (e.g., function and frame). environment().

Enclosure operations

env_parent(), env_parents(), env_tail(), env_inherits(), env_depth()

Binding operations

env_names(), env_length(): (Names of) symbols found in an environment.

env_has(), env_get(), env_get_list(): Get or check existing bindings in an environment.

env_bind(), env_poke(), env_unbind(): Bind, rebind, or unbind symbols in an environment.

Call stack

Execution environment operations

current_env(), caller_env(): Get current or caller EE.

Context operations

current_fn(), caller_fn(): Function (i.e., without actual arguments).

current_call(), caller_call(): Call (i.e., with arguments).

Supplementary

Lazy evaluation with promises

When a R function is called, actual arguments are NOT evaluated immediately. Instead, the actual arguments are replaced with promises5.

Promises are internal R objects with the following ‘slots’:

expression
the exact expression provided by the caller function.
environment
where to evaluate the promise. for supplied arguments - the calling environment; for default arguments in formals - the execution environment.
value
to store results after evaluating the expression slot.

Whenever value of an argument is required during execution, the corresponding promise is evaluated to acquire its value. Once a promise is evaluated for the first time during execution, its value is cached for reuse.

Therefore, promises allow lazy evaluation of the actual arguments - only evaluated when needed, and evaluated only once. This feature 1) helps with performance, and 2) allows access of the exact expression which isuseful for generating plot labels, etc.

Consider the following examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Example 1
#		See how c is defined WITHIN the function body
f <- function(x, y, z = x*y+c){
	c <- 1
	return(z)
}
print(f(3,5)) # yields 3*5+1 = 16

#	Example 2
#		Promise `z` is evaluated only once
f2 <- function(x, y, z = x*y+c){
  c <- 1
  z2 <- z + 5
  c <- 2
  return(z)
}
print(f2(3,5)) # still yields 3*5+1 = 16

# Example 3
#		Supplied arguments are evaluated in the calling environment
f3 <- function(x, y, z = x*y+c){
	message("f3 is called")
  c <- 1
  return(z)
}
print(f3(3,5,x*y+c)) # Error in x * y + c : object 'x' not found

# Example 4
#		substitute() allows access of the argument expression
f4 <- function(x){
	x.exp <- substitute(x) # typeof() = "language", class() = "call"
	x.exp <- deparse(x.exp) # typeof() = "character"
	return(x.exp)
}
f4_2 <- function(x){
	x # force promise
	x.exp <- substitute(x) # typeof() = "language", class() = "call"
	x.exp <- deparse(x.exp) # typeof() = "character"
	return(x.exp)
}
f4_3 <- function(x){
	x <- x+1 # modify `x`
	x.exp <- substitute(x) # typeof() = "double", class() = "numeric"
	x.exp <- deparse(x.exp) # typeof() = "character"
	return(x.exp)
}
a <- b <- c <- 1
f4(a*b+c) # yields "a * b + c"
f4_2(a*b+c) # yields "a * b + c"
f4_3(a*b+c) # yields "3"

Example 1: return(z) triggers evaluation of promise z = x*y+c. In this case, environment of the promise is the execution environment (EE) because z takes the default argument x*y+c. Binding c <- 1 exists in the EE at the time of promise evaluation. Therefore, value of the promise is 3*5+1.

Example 2: z2 <- z + 5 triggers evaluation of promise z = x*y+c. z is evaluated to be 16 upon evaluation. Even if the value of c is modified, z remains unchanged as promise z is NOT evaluated again at return(z).

Example 3: return(z) triggers evaluation of promise z = x*y+c. In this case, environment of the promise is the calling environment (CE) because z takes the supplied argument x*y+c. Binding for x does not exist in the CE at the time of promise evaluation. Therefore, an error is raised. Note that execution of f3 is actually performed until the statement return(z). One evidence is that message f3 is called IS generated before the error.

Example 4: During evaluation of f4(a*b+c) and f4_2(a*b+c), expression of promise x is accessed by substitute(x) that returns a call object. call objects are unevaluated parsed expressions. deparse(x.exp), as in its name, deparse the call object giving the equivalent string. The string is not in literal identical to the argument a*b+c. The deparsed string is a * b + c with spaces between symbols. In comparison, during evaluation of f4_3(a*b+c), promise x is forced when evaluating x + 1, the value of which (1*1+1 + 1 = 3) is bound to x. Therefore, x is no longer a promise but a local variable. See footnote for a bit more on how substitute() works6.

Environment for scoping rules

R uses environment to look up values of symbols during statement evaluation. Environment consists of two things(R Language Definition)7:

frame
a set of symbol-value pairs.
enclosure
a pointer to an enclosing environment.

Practically, an environment has the following properties:

  1. Names/symbols must be unique.
  2. Names are not ordered.
  3. Have one and only one parent environment.
  4. Is modified in-place, in contrast to the typical copy-on-modify for other R objects.

Therefore, all environments in a R session form a tree structure where the enclosures are parents. The following environments are always present:

  1. Empty environment emptyenv(). Root node. The only environment without a parent.
  2. Base environment baseenv(). Immediate child of root. Also known as package:base.
  3. Global environment globalenv(). The “user workspace” environment where all interactive (and Rscript) statements are evaluated.

There are three typical application cases of lexical scoping with environments:

  1. Look up a symbol from the user workspace (i.e., the global environment). This involves the enclosure of globalenv(). Its path to root is aptly named the search path accessible by search().
  2. Look up a symbol from statements within a R package. This involve the package-specific enclosure namespace.
  3. Look up a symbol during function invocation. This involves the function-specific enclosure execution environment and is explained in the main section Scoping rules

There is one last intriguing twist - R supports dynamic scoping by allowing a function call to access the execution environment of its caller by the call stack mechanism and ts explained in the main section Call stacks. In this sense, R seems to be statically scoped until it isn’t (?!)

The replacement form

The replacement form must:

  1. have arguments named x and value
  2. returns the full modified object
  3. have the special name xxx<-
  4. additional arguments are placed between x and value

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Simple, no additional arguments
`first<-` <- function(x, value){
	x[1] <- value
	return(x)
}
t <- c(1,2,3)
first(t)<-5 # equivalent to: t <- `first<-`(t, 5)
# full prefix form of Line#7 is: `<-`(t, `first<-`(t,5))
t # t is now c(5,2,3)

# Possible to write nonsense
`first<-` <- function(x, value){
	x[1] <- value
	return(x[1])
}
t <- c(1,2,3)
first(t)<-5
t # t is now c(5)

# With additional arguments
`first<-` <- function(x, as.type, value){
	x[1] <- value
	x <- as.type(x)
	return(x)
}
t <- c(1,2,3)
first(t, as.character)<-5
t # t is now c("5", "2", "3")

  1. With one exception: primitive functions call C code directly and have NULL values of formals(), body() and environment(). They 1) only exist in the base package, 2) are shown as .Primitive("name"), 3) have type as either builtin or special↩︎

  2. for most cases R do not really do modification in place and follows copy-on-modify↩︎

  3. equivalently, it is possible to resolve the value/promise of any symbol by looking at just the source code and no runtime information is needed. Another way to put it, “variable bndings in effect at the time the expression was created are used to provide values for any unbound symbols in the expression”. ↩︎

  4. Yes, ( is a R base function. When the interpreter parses a string like symbol(...), it knows that it is looking for a function bound to symbol and the ... are the actual arguments. In comparison, when the interpreter parses a string (...) without any preceding symbol, it is actually looking for the unary function (. “Parentheses are recorded as equivalent to a unary operator, with name ‘(’, even in cases where the parentheses could be inferred from operator precedence (e.g., a * (b + c)).” (from R Language Definition). ↩︎

  5. Internally promises are PROMSXP which contain pointers to i) the environment where the promise is evaluated, ii) the expression to evaluate, and iii) the evaluated value. Once a promise is evaluated, its environment is set to NULL and the value will be reused. For details about this, check out this paper and the R Internals documentation. ↩︎

  6. “The exact rules for substitutions are as follows: Each symbol in the parse tree for the first is matched against the second argument, which can be a tagged list or an environment frame. If it is a simple local object, its value is inserted, except if matching against the global environment (where the symbol is untouched). If it is a promise (usually a function argument), the promise expression is substituted. If the symbol is not matched, it is left untouched.” (adapted from R Language Definition↩︎

  7. Advanced R uses the term frame as execution context, different from the R language definition. Note that base R has functions with confusing names: parent.frame() returns the execution context, which is actually an environment object and not simply a ‘frame’. ↩︎