5.3 Importing Data

The first operation we typically must perform when analyzing data is reading our data from a file into R. The readr package in the default tidyverse packages contains the following similar functions that import data from delimited text files:

Function Brief description/use
read_csv Delimiter: , - Decimal separator: .
read_csv2 Delimiter: ; - Decimal separator: ,
read_tsv Delimiter: <tab> - Decimal separator: .
read_delim Delimiter: set by user - Decimal separator: .

Some CSV files can be very large and may be compressed to save space. There are many different file compression algorithms, but the most common in data science and biology are gzip and bzip. All the readr file reading functions can work with compressed files directly, so you do not need to decompress them first.

Each of these functions returns a special data frame called a tibble, which is explained in the next section.

Note that readr also has functions for writing delimited files. These functions behave similarly to the read_X functions but instead of creating a tibble from a file, they create a file from a tibble. You will frequently need to export the results of your analysis to share with collaborators and also as part of larger workflows that use tools other than R.