styler
packagescale()
Data in tidyverse is organized primarily in a special data frame object called a
tibble
. The tibble()
function is defined in the tibble
package of the tidyverse:
library(tibble) # or
library(tidyverse)
tbl <- tibble(
x = rnorm(100, mean=20, sd=10),
y = rnorm(100, mean=50, sd=5)
)
tbl
# A tibble: 100 x 2
x y
<dbl> <dbl>
1 16.5 54.6
2 14.4 54.3
3 7.87 53.7
4 8.06 50.8
5 37.2 57.1
6 16.5 51.9
7 15.8 50.1
8 40.3 44.3
9 12.0 49.8
10 23.8 50.1
# ... with 90 more rows
A tibble
stores rectangular data, i.e. a grid of data elements with where
every column has the same number of rows. You can access individual columns in
the same way as with base R data frames:
tbl$x
[1] 29.572549 12.015877 15.235536 23.071761 32.254703 48.048651 21.905756
[8] 15.511768 34.872685 21.352433 12.515230 23.608096 6.778630 12.342237
...
tbl[1,"x"] # access the first element of x
# A tibble: 1 x 1
x
<dbl>
1 29.6
tbl$x[1]
[1] 29.57255
tibbles
(and regular data frames) typically have names for their columns. In
the above example, the column names are x
and y
, accessed using the
colnames
function:
colnames(tbl)
[1] "x" "y"
Column names may be changed using this same function:
colnames(tbl) <- c("a","b")
tbl
# A tibble: 100 x 2
a b
<dbl> <dbl>
1 16.5 54.6
2 14.4 54.3
3 7.87 53.7
4 8.06 50.8
5 37.2 57.1
6 16.5 51.9
7 15.8 50.1
8 40.3 44.3
9 12.0 49.8
10 23.8 50.1
# ... with 90 more rows
As we will see again later, we can also use dplyr::rename
to rename columns as
well:
dplyr::rename(tbl,
a = x,
b = y
)
tibbles and dataframes also have row names as well as column names:
rownames(tbl)
[1] "1" "2" "3"...
However, the tibble
support for row names is only included for compatibility
with base R data frames and should generally be avoided. The reason is that row
names are basically a character column that has different semantics than every
other column, and the authors of tidyverse believe row names are better stored
as a normal column.
The tibble package provides a convenient way to construct simple tibbles
manually with the tribble()
function, which stands for “tr
ansposed
tibble
”:
<- tribble(
gene_stats ~gene, ~test1_stat, ~test1_p, ~test2_stat, ~test2_p,
"apoe", 12.509293, 0.1032, 34.239521, 1.3e-5,
"hoxd1", 4.399211, 0.6323, 16.332318, 0.0421,
"snca", 45.748431, 4.2e-9, 0.757188, 0.9146,
) gene_stats
## # A tibble: 3 x 5
## gene test1_stat test1_p test2_stat test2_p
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 apoe 12.5 0.103 34.2 0.000013
## 2 hoxd1 4.40 0.632 16.3 0.0421
## 3 snca 45.7 0.0000000042 0.757 0.915
This made-up dataset includes statistics and p-values from two different statistical tests (again, made up) for three human genes. We will use this example below in the Arranging Data section.