Biology as Data Science

The sequencing of the human genome ushered in the “post-genome” biological revolution

Biology is now a data science

Domains of Biological Data Analysis

Modern biological data analysis requires skills and knowledge from many domains:

  • molecular biology, genetics, biochemistry
  • statistics, mathematics
  • computer science
  • programming & software engineering
  • data visualization
  • high performance & cloud computing

No one person can be expert in all these areas!

Experts create tools and techniques that we can use.

The R Programming Language

  • R is a statistical programming language
  • https://www.r-project.org/
  • Designed to conduct statistical analyses and visualize data
  • NOT a general purpose programming language!

tidyverse

Book & Course Objectives

Learn R and its related packages to analyze biological data

Communicate results of R analyses with effective visualizations and notebooks

Learn how to use the RStudio development environment

Write correct and reproducible code using formal testing strategies

Share analyses with others using RShiny applications

Course Topics

Who This Book Is For

You are: a practicing biologist wishing to learn how to use R to analyze biological data

We assume a basic working knowledge of:

  • genetics
  • genomics
  • molecular biology
  • biochemistry
  • statistics

We endeavor to explain required background whenever possible to relax these assumptions

Sources and References

R Materials

Sources and References

Data visualization

Course Structure

  • Weekly lectures

  • 7 assignments, roughly one per week

  • Final project: RShiny application combining the techniques you learned in the assignments

  • Grading:

    • Assignments 5% each (35% total)
    • Final project 60%
    • Class attendance/participation 5%
  • Zoom link is available

Things that are more important than this course

  • Your physical, emotional and mental health

  • Your family and friends

  • Policy on absences / missed classes

    • You never need to disclose anything private to me, just let me know if you will be absent for an extended period of time or need an extension, and I will work with you to accommodate your situation.

Assignments

Assignment Structure

  • Each assignment has similar format and workflow, files:

    ├── reference_report.html
    ├── main.R
    ├── README.md
    ├── report.Rmd
    └── test_main.R

Assignment Repository Structure

Assignment Workflow