5.1 The Tidyverse

The tidyverse is “an opinionated collection of R packages designed for data science.” The packages are all designed to work together with a unified approach that helps code look consistent and neat. In the opinion of this author, the tidyverse practically changes the R language from a principally statistical programming language into an efficient and expressive data science language. While it is still important to understand the language fundamentals presented in our chapter on the R programming language, the tidyverse uses a distinct set of coding conventions that lets it achieve greater expressiveness, conciseness, and correctness relative to the base R language.

As a data science language, R+tidyverse (referred to as simply tidyverse in this book) is strongly focused on operations related to loading, manipulating, visualizing, summarizing, and analyzing data sets from many domains. While this is a major strength of tidyverse and its community, it means that many educational materials are written for this general use case, and not for those practicing biological data analysis. While the general data manipulation operations are often the same between biological data analysis and these general case examples, biological analysis practitioners must nonetheless translate concepts from these general cases to the common data analysis tasks they must perform. Some analytical patterns are more common in biological data analysis than others, so these materials focus on that subset of operations in this book to aid the learning in applying the concepts to their problems as directly as possible.