4.8 Iteration

In programming, iteration refers to stepping sequentially through a set or collection of objects, be it a vector of numbers, the columns of a matrix, etc. In non-functional languages like python, C, etc. there are particular control structures that implement iteration, commonly called loops. If you have worked with these languages, you may be familiar with for and while loops, which are some of these iteration control structures. However, R was designed to execute iteration in a different way than these other languages, and provides two forms of iteration: vectorized operations, and functional programming with apply().

Note that R does have for and while loop support in the language. However, these loop structures often have poor performance, and should generally be avoided in favor of the functional style of iteration described below.

How To Avoid For Loops in R

If you really, really want to learn how to use for loops in R, read this, but don’t say I didn’t warn you when your code slows to a crawl for unknown reasons:

R for Data Science - for loops

4.8.1 Vectorized operations

The simplest form of iteration in R comes in vectorized computation. This sounds fancy, but it just means R intrinsically knows how to perform many operations on vectors and matrices as well as individual values. We have already seen examples of this above when performing arithmetic operations on vectors:

x <- c(1,2,3,4,5)
x + 3 # add 3 to every element of vector x
[1] 4 5 6 7 8
x * x # elementwise multiplication, 1*1 2*2 etc
[1] 1 4 9 16 25
x_mat <- matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
x_mat + 3 # add 3 to every element of matrix x_mat
[,1] [,2] [,3]
[1,]    4    6    8
[2,]    5    7    9
# the * operator always means element-wise
x_mat * x_mat
     [,1] [,2] [,3]
[1,]    1    9   25
[2,]    4   16   36

In addition to simple arithmetic operations, R also has syntax for vector-vector, matrix-vector, and matrix-matrix operations, like matrix multiplication and dot products:

# the %*% operator stands for matrix multiplication
x_mat %*% c(1,2,3) # [ 2x3 ] * [ 3 ]
     [,1]
[1,]   22
[2,]   28
x_mat %*% t(x_mat) # recall t() is the transpose function, making [ 2x3 ] * [ 3x2 ]
     [,1] [,2]
[1,]   35   44
[2,]   44   56

These forms of implicit iteration are very powerful, and the R program has been highly optimized to perform these operations very quickly. If you can cast your iteration into a vector or matrix multiplication, it is a good idea to do so. For other more complex or custom iteration, we must first talk briefly about functional programming.

4.8.2 Functional programming

R is a functional programming language at its core, which means it is designed around the use of functions. In the previous section, we saw that functions are defined and assigned to names just like variables. This means that functions can be passed to other functions just like variables! Consider the following example.

Let’s consider a general formulation of vector transformation:

\[ \bar{\mathbf{x}} = \frac{\mathbf{x} - t_r(\mathbf{x})}{s(\mathbf{x})} \]

Here, \(\mathbf{x}\) is a vector of real numbers, and \(\bar{\mathbf{x}}\) is defined as a vector of the same length where each value has had some average or central value \(t_r(\mathbf{x})\) subtracted from it, and is divided by a scaling factor \(s(\mathbf{x})\) to control the range of resulting values. Both \(t_r(\mathbf{x})\) and \(s(\mathbf{x})\) are scalars (i.e. individual numbers) and dependent upon the values of \(\mathbf{x}\). If \(t_r\) is arithmetic mean and \(s\) is standard deviation, we have defined the standardization transformation mentioned in earlier examples:

x <- rnorm(100, mean=20, sd=10)
x_zscore <- (x - mean(x))/sd(x)

However, there are many different ways to define the central value of a set of numbers:

Each of these central value methods accepts a vector of numbers, but their behaviors are different, and are appropriate in different situations. Likewise, there are many possible scaling strategies we might consider:

  • standard deviation
  • rescaling factor (e.g. set data range to be between -1 and 1)
  • scaling to unit length (all values sum to 1)
  • and others

We may wish to explore these different methods without writing entirely new code for each combination when trying out different transformation techniques.

In R and other functional languages, we can easily accomplish this by passing functions as arguments to other functions. Consider the following R function:

# note R already has a built in function named "transform"
my_transform <- function(x, t_r, s) {
  return((x - t_r(x))/s(x))
}

This should look familiar to the equation presented earlier, except now in code the arguments t_r and s are passed as arguments. If we wished to transform using a Z-score normalization, we could call my_transform as follows:

x <- rnorm(100,mean=20,sd=10)
x_zscore <- my_transform(x, mean, sd)
mean(x_zscore)
[1] 0
sd(x_zscore)
[1] 1

In the my_transform function call, the second and third arguments are the names of the mean and sd functions, respectively. In the definition of my_transform we use the syntax t_r(x) and s(x) to indicate that these arguments should be treated as functions. Using this strategy, we could just as easily define a transformation using median and sum for t_r and s if we wished to:

x <- rnorm(100,mean=20,sd=10)
x_transform <- my_transform(x, median, sum)
median(x_transform)
[1] 0
sum(x_transform) # this quantity does not have an a priori known value (or meaning for that matter, it's just an example)
[1] 0.013

We can also write our own functions and pass them to get the my_transform function to have desired behavior. The following scales the values of x to have a range of \([0,1]\):

data_range <- function(x) {
  return(max(x) - min(x))
}
# my_transform computes: (x - min(x))/(max(x) - min(x))
x_rescaled <- my_transform(x, min, data_range)
min(x_rescaled)
[1] 0
max(x_rescaled)
[1] 1

The data_range function simply subtracts the minimum value of x from the maximum value and returns the result.

This feature of passing functions as arguments to other functions is a fundamental property of functional programming languages. Now we are ready to finally talk about how iteration is performed in R.

4.8.3 apply() and friends

When working with lists and matrices in R, there are often times when you want to perform a computation on every row or every column separately. A common example of this in data science mentioned above is feature standardization. Earlier we wrote a Z-score transformation that accepts a vector, subtracts the mean from each element, and divides the result by the standard deviation of the data. This ensures the data has a mean and standard deviation of 0 and 1, respectively. However, this function only operates on a single vector of numbers. Large datasets have many features, each of which may be individual vectors, that we desire to perform this same Z-score transformation on separately. In other words, we have one function that we wish to execute on either every row or every column of a matrix and return the result. This is a form of iteration that can be implemented in a functional style using the apply function.

This is the signature of the apply function, from the RStudio help(apply) page:

apply(X, MARGIN, FUN, ..., simplify = TRUE)

Here, X is a matrix (i.e. a rectangle of numbers) that we wish to perform a computation on for either each row or each column. MARGIN indicates whether the matrix should be traversed by rows (MARGIN=1) or columns (MARGIN=2). FUN is the name of a function that accepts a vector and returns either a vector or a scalar value that we wish to execute on either the rows or columns. apply() then executes FUN on each row or column of X and returns the result. For example:

zscore <- function(x) {
  return((x-mean(x))/sd(x))
}
# construct a matrix of 50 rows by 100 columns with samples drawn from a normal distribution
x_mat <- matrix(
  rnorm(100*50, mean=20, sd=5),
  nrow=50,
  ncol=100
)
# z-transform the rows of x_mat, so that each column has mean,sd of 0,1
x_mat_zscore  <- apply(x_mat, 2, zscore)
# we can check that all the columns of x_mat_zscore have mean close to zero with apply too
x_mat_zscore_means <- apply(x_mat_zscore, 2, mean)
# note: due to machine precision errors, these results will not be exactly zero, but are very close
# note: the all() function returns true if all of its arguments are TRUE
all(x_mat_zscore_means<1e-15)
[1] TRUE

The same approach can be used when X is a list or data frame rather than a matrix using the lapply() function (hint: the l in lapply stands for “list”). Here is the function signature for lapply:

lapply(X, FUN, ...)

Recall that lists and data frames can be thought of as vectors where each element can be its own vector. Therefore, there is only one axis along which to iterate on the elements and there is not MARGIN argument as in apply. This function returns a new list of the same dimension as the original list with elements returned by FUN:

x <- list(
  feature1=rnorm(100,mean=20,sd=10),
  feature2=rnorm(100,mean=50,sd=5)
)
x_zscore <- lapply(x, zscore)
# check that the means are close to zero
x_zscore_means <- lapply(x_zscore, mean)
all(x_zscore_means < 1e-15)
[1] TRUE

This functional programming pattern might be counter intuitive at first, but it is well worth your while to learn.