R for Biological Sciences

Semester: Fall 2024

Location: EPC 209

Time: M/W 10:10 - 11:55

Site last updated Tue Aug 27 2024

Contents:

This course introduces the R programming language through the lens of practitioners in the biological sciences, particularly biology and bioinformatics. Key concepts and patterns of the language are covered, including:

  • RStudio
  • Data wrangling with tidyverse
  • Data visualization with ggplot
  • Essential biological data shapes and formats
  • Core bioconductor packages
  • Basic data exploration, including elementary statistical modeling and summarization
  • Elementary Data Science concepts
  • “Toolifying” R scripts
  • Communicating R code and results with RMarkdown
  • Buidling R packages and unit testing strategies
  • Building interactive tools with RShiny

About 1/3 of the materials are inspired by the online textbook R for Data Science, while the rest has been developed by practicing bioinformaticians based on their experiences.

Weekly programming assignments will help students apply these techniques to realistic problems involving analysis and visualization of biological data. Students will be introduced to a unit testing paradigm that will help them write correct code and deposit all their code into github for evaluation. Students will implement an end-to-end project that begins with one of a set of provided datasets, implements a set of data summarization and exploration operations on it, and allows interaction with an RShiny app.

Relevant links (zoom and textbook) are available on Blackboard as a reference. The blackboard site will only be used for announcements and to host links.

Course Schedule

Key:

  • Follow Week N links for detailed list of topic sections.
  • Assignments are assigned and due on the first day of class each week unless mentioned otherwise
Week Dates Topics Assignment
Week 1 | slides 9/4 & 9/9 - Preliminaries
- Data in Biology
- R Programming Basics
Assignment 0
Week 2 | slides 9/11 & 9/16 - EngineeRing: Unit Testing
- Assignment Structure
- Data Wrangling & Tidyverse Basics
Base R
Week 3 | slides 9/18 & 9/23 - R in Biology
- Bio: Bioconductor Basics
Tidyverse Basics
Week 4 | slides 9/25 & 9/30 - Data Viz: Grammar of Graphics
- Bio: Gene Expression
- Bio: Microarrays
- Data Sci: Data Modeling
- Bio: Differential Expression
Bioinformatics Basics
Week 5 | slides 10/2 & 10/7 - Data Sci: PCA & Clustering
- Data Viz: Heatmaps & Dendrograms
- R Programming: Structures and Iteration
Data Science Basics
Week 6 | slides 10/9 & 10/15 - Bio: High Throughput Sequencing
- Bio: RNAseq
- Bio: Count Data
- Bio: RNAseq Differential Expression
- Bio: Gene Set Enrichment Analysis
Counts Analysis
Week 7 | slides 10/16 & 10/21 - Data Sci: Distributions
- Data Sci: Statistical Tests
- R Programming: Styles and Conventions
Differential Expression Part 1
Week 8 | slides 10/23 & 10/28 - Rshiny Differential Expression Part 2
Week 9 | slides 10/30 & 11/4 - EngineeRing RShiny Basics
Week 10 | slides 11/6 & 11/11 - Data Vis: Plot week
- Data Viz: Responsible Plotting
RShiny Basics
Week 11 11/13 & 11/18 - Project Work Final Project
Week 12 11/20 & 11/25 - Professional Development Workshops
Week 13 12/2 & 12/4 - Professional Development Workshops
Week 14 | 12/9 - Course Feedback

Instructors

Primary instructors:

Joey Orofino (jorofino AT bu DOT edu)

Adam Labadorf (labadorf AT bu DOT edu)

TAs:

Xudong Han

Eric Palanques Tost

Office Hours

Regular: Tuesdays 11-12pm, LSE101, Zoom link on Blackboard

By appointment: Email me (Joey)

TAs: TBA

Course Values and Policies

Everyone is welcome. Every background, race, color, creed, religion, ethnic origin, age, sex, sexual orientation, gender identity, nationality is welcome and celebrated in this course. Everyone deserves respect, patience, and kindness. Disrespectful language, discrimination, or harassment of any kind are not tolerated, and may result in removal from class or the University. This is not merely BU policy. The instructors deem these principles to be inviolable human rights. Students should feel safe reporting any and all instances of discrimination or harassment to the instructor, to any of the Bioinformatics Program leadership, or the BU Equal Opportunity Office.

Everyone brings value. Each of us brings unique experiences, skills, and creativity to this course. Our diversity is our greatest asset.

Collaboration is highly encouraged. All students are encouraged to work together and seek out any and all available resources when completing projects in all aspects of the course, including sharing coding ideas and strategies with each other as well as those found on the internet. Any and all available resources may be brought to bear. However, consistent with BU policy, the bulk of your code and your final reports should be written in your own words and represent your own work and understanding of the material. Copying/pasting large sections of code is not acceptable and will be investigated as cheating (we check).

A safe space for dissent. For complex topics such as those covered in this class, there is seldom one correct answer, approach, or solution. Disagreement fosters innovation. All in the course, including students and TAs, are encouraged to express constructive criticism and alternative ideas on any aspect of the content.

We are always learning. Our knowledge and understanding is always incomplete. Even experts are fallible. The bioinformatics field evolves rapidly, and Rome was not built in a day. Be kind to yourself and to others. You are always smarter and more knowledgeable today than you were yesterday.

Grading

Grading will be based on the 7 roughly weekly assignments and the final project. Each assignment is 5% of your total grade (35% total), and the final project is 60% of your grade. The remaining 5% is for class attendance / participation.

Absences, missed classes, and extensions

You should always prioritize your physical, emotional and mental health. BU offers a number of resources through Student Health Services and I encourage you to explore them if you feel you need to talk to someone. I am also here to listen without judgement if needed and can help with accessing the resources available.

If you need to miss extended class time or require extra time on an assignment because of personal matters, please just inform me and I will work with you whenever you are back to catch you up on the material and find an arrangement that accommodates your needs. You never need to disclose to me any private matters if you are not comfortable doing so.

Acknowledgements & Contributions

These materials would not have been possible without the contributions of Dakota Hawkins, Vanessa Li, Taylor Falk, and Mae Rose Gott, and Joey Orofino.

Former valiant and (probably still) attractive TAs:

2023

  • Regan Conrad (BU BF PhD Candidate)
  • Aubrey (Brie) Odom-Mabey (BU BF PhD Candidate)

2022

  • Taylor Falk (BU BF MS Alumnus ’21) is a Bioinformatics Developer working with the VA PTSD Brain Bank, developing infrastructure to support data generated out of our brains. He is championing the assignment strategy and will be available to help with issues.
  • Mae Rose Gott (BU BF MS Alumna ’21) is a Research Staff member working on a number of different projects across many different areas. She will be helping organize the course materials as we go forward.
  • Vanessa Li (BU BF PhD Candidate) is a PhD candidate in Dr. Stefano Monti’s lab in Computational Biomedicine. She will primarily be helping with grading of your assignments.
  • Joey Orofino (BU BF MS Alumnus ’15) is a Research Scientist working on identifying small RNA based biomarkers of Parkinson’s Disease