styler
packagescale()
One of the key tidyverse
programming patterns is chaining manipulations of
tibble
s together using the %>%
operator. We very often want to perform
serial operations on a data frame, for example read in a file, rename one of the
columns, subset the rows based on some criteria, and compute summary statistics
on the result. We might implement such operations using a variable and
assignment:
# data_file.csv has two columns: bad_cOlumn_name and numeric_column
data <- readr::read_csv("data_file.csv")
data <- dplyr::rename(data, "better_column_name"=bad_cOlumn_name)
data <- dplyr::filter(data, better_column_name %in% c("condA","condB"))
data_grouped <- dplyr::group_by(data, better_column_name)
summarized <- dplyr::summarize(data_grouped, mean(numeric_column))
The repeated use of data
and the intermediate data_grouped
variable may
be unnecessary if you’re only interested in the summarized result. The code is
also not very straightforward to read. Using the %>%
operator, we can write
the same sequence of operations in a much more concise manner:
data <- readr::read_csv("data_file.csv") %>%
dplyr::rename("better_column_name"=bad_cOlumn_name) %>%
dplyr::filter(better_column_name %in% c("condA","condB")) %>%
dplyr::group_by(better_column_name) %>%
dplyr::summarize(mean(numeric_column))
Note that the function calls in the piped example do not have the data
variable passed in explicitly. This is because the %>%
operator passes the
result of the function immediately preceding it as the first argument to the
next function automatically. This convention allows us to focus on writing only
the important parts of the code that perform the logic of our analysis, and
avoid unnecessary and potentially distracting additional characters that make
the code less readable.