5.4 The tibble

Data in tidyverse is organized primarily in a special data frame object called a tibble. The tibble() function is defined in the tibble package of the tidyverse:

library(tibble) # or
tbl <- tibble(
    x = rnorm(100, mean=20, sd=10),
    y = rnorm(100, mean=50, sd=5)
# A tibble: 100 x 2
       x     y
   <dbl> <dbl>
 1 16.5   54.6
 2 14.4   54.3
 3  7.87  53.7
 4  8.06  50.8
 5 37.2   57.1
 6 16.5   51.9
 7 15.8   50.1
 8 40.3   44.3
 9 12.0   49.8
10 23.8   50.1
# ... with 90 more rows

A tibble stores rectangular data, i.e. a grid of data elements with where every column has the same number of rows. You can access individual columns in the same way as with base R data frames:

[1] 29.572549 12.015877 15.235536 23.071761 32.254703 48.048651 21.905756
[8] 15.511768 34.872685 21.352433 12.515230 23.608096  6.778630 12.342237
tbl[1,"x"] # access the first element of x
# A tibble: 1 x 1
1  29.6
[1] 29.57255

tibbles (and regular data frames) typically have names for their columns. In the above example, the column names are x and y, accessed using the colnames function:

[1] "x" "y"

Column names may be changed using this same function:

colnames(tbl) <- c("a","b")
# A tibble: 100 x 2
       a     b
   <dbl> <dbl>
 1 16.5   54.6
 2 14.4   54.3
 3  7.87  53.7
 4  8.06  50.8
 5 37.2   57.1
 6 16.5   51.9
 7 15.8   50.1
 8 40.3   44.3
 9 12.0   49.8
10 23.8   50.1
# ... with 90 more rows

As we will see again later, we can also use dplyr::rename to rename columns as well:

  a = x,
  b = y

tibbles and dataframes also have row names as well as column names:

[1] "1" "2" "3"...

However, the tibble support for row names is only included for compatibility with base R data frames and should generally be avoided. The reason is that row names are basically a character column that has different semantics than every other column, and the authors of tidyverse believe row names are better stored as a normal column.

tibble - working with row names

The tibble package provides a convenient way to construct simple tibbles manually with the tribble() function, which stands for “transposed tibble”:

gene_stats <- tribble(
    ~gene, ~test1_stat, ~test1_p, ~test2_stat, ~test2_p,
   "apoe", 12.509293,   0.1032,   34.239521,   1.3e-5,
   "hoxd1",  4.399211,   0.6323,   16.332318,   0.0421,
   "snca", 45.748431,   4.2e-9,    0.757188,   0.9146,
## # A tibble: 3 x 5
##   gene  test1_stat      test1_p test2_stat  test2_p
##   <chr>      <dbl>        <dbl>      <dbl>    <dbl>
## 1 apoe       12.5  0.103            34.2   0.000013
## 2 hoxd1       4.40 0.632            16.3   0.0421  
## 3 snca       45.7  0.0000000042      0.757 0.915

This made-up dataset includes statistics and p-values from two different statistical tests (again, made up) for three human genes. We will use this example below in the Arranging Data section.