R for Biological Sciences
Semester: Fall 2024
Location: EPC 209
Time: M/W 10:10 - 11:55
Site last updated Tue Aug 27 2024
Contents:
This course introduces the R programming language through the lens of practitioners in the biological sciences, particularly biology and bioinformatics. Key concepts and patterns of the language are covered, including:
- RStudio
- Data wrangling with tidyverse
- Data visualization with ggplot
- Essential biological data shapes and formats
- Core bioconductor packages
- Basic data exploration, including elementary statistical modeling and summarization
- Elementary Data Science concepts
- “Toolifying” R scripts
- Communicating R code and results with RMarkdown
- Building R packages and unit testing strategies
- Building interactive tools with RShiny
About 1/3 of the materials are inspired by the online textbook R for Data Science, while the rest has been developed by practicing bioinformaticians based on their experiences.
Weekly programming assignments will help students apply these techniques to realistic problems involving analysis and visualization of biological data. Students will be introduced to a unit testing paradigm that will help them write correct code and deposit all their code into github for evaluation. Students will implement an end-to-end project that begins with one of a set of provided datasets, implements a set of data summarization and exploration operations on it, and allows interaction with an RShiny app.
Relevant links (zoom and textbook) are available on Blackboard as a reference. The blackboard site will only be used for announcements and to host links.
Course Schedule
Key:
- Follow Week N links for detailed list of topic sections.
- Assignments are assigned and due on Thursdays each week unless mentioned otherwise
Week | Dates | Topics | Assignment |
---|---|---|---|
Week 1 | slides | 9/4 & 9/9 | - Preliminaries - Data in Biology - R Programming Basics |
Assignment 0 |
Week 2 | slides | 9/11 & 9/16 | - EngineeRing: Unit Testing - Assignment Structure - Data Wrangling & Tidyverse Basics |
Base R |
Week 3 | slides | 9/18 & 9/23 | - R in Biology - Bio: Bioconductor Basics |
Tidyverse Basics |
Week 4 | slides | 9/25 & 9/30 | - Data Viz: Grammar of Graphics - Bio: Gene Expression - Bio: Microarrays - Data Sci: Data Modeling - Bio: Differential Expression |
Bioinformatics Basics |
Week 5 | slides | 10/2 & 10/7 | - Data Sci: PCA & Clustering - Data Viz: Heatmaps & Dendrograms - R Programming: Structures and Iteration |
Data Science Basics |
Week 6 | slides | 10/9 & 10/15 | - Bio: High Throughput Sequencing - Bio: RNAseq - Bio: Count Data - Bio: RNAseq Differential Expression - Bio: Gene Set Enrichment Analysis |
Counts Analysis |
Week 7 | slides | 10/16 & 10/21 | - Data Sci: Distributions - Data Sci: Statistical Tests - R Programming: Styles and Conventions |
Differential Expression Part 1 |
Week 8 | slides | 10/23 & 10/28 | - Data Vis: Plot Week - Data Vis: Responsible Plotting |
Differential Expression Part 2 |
Week 9 | slides | 10/30 & 11/4 | - Rshiny | RShiny Basics |
Week 10 | slides | 11/6 & 11/11 | - engineeRing | RShiny Basics |
Week 11 | 11/13 & 11/18 | - Project Work | Final Project |
Week 12 | 11/20 & 11/25 | - Professional Development Workshops | |
Week 13 | 12/2 & 12/4 | - Professional Development Workshops | |
Week 14 | 12/9 | - Course Feedback |
Instructors
Primary instructors:
Joey Orofino (jorofino AT bu DOT edu)
Adam Labadorf (labadorf AT bu DOT edu)
TAs:
Xudong Han
Eric Palanques Tost
Office Hours
Regular: Tuesdays 11-12pm, LSE101, Zoom link on Blackboard
By appointment: Email me (Joey)
TAs: TBA
Course Values and Policies
Everyone is welcome. Every background, race, color, creed, religion, ethnic origin, age, sex, sexual orientation, gender identity, nationality is welcome and celebrated in this course. Everyone deserves respect, patience, and kindness. Disrespectful language, discrimination, or harassment of any kind are not tolerated, and may result in removal from class or the University. This is not merely BU policy. The instructors deem these principles to be inviolable human rights. Students should feel safe reporting any and all instances of discrimination or harassment to the instructor, to any of the Bioinformatics Program leadership, or the BU Equal Opportunity Office.
Everyone brings value. Each of us brings unique experiences, skills, and creativity to this course. Our diversity is our greatest asset.
Collaboration is highly encouraged. All students are encouraged to work together and seek out any and all available resources when completing projects in all aspects of the course, including sharing coding ideas and strategies with each other as well as those found on the internet. Any and all available resources may be brought to bear. However, consistent with BU policy, the bulk of your code and your final reports should be written in your own words and represent your own work and understanding of the material. Copying/pasting large sections of code is not acceptable and will be investigated as cheating (we check).
A safe space for dissent. For complex topics such as those covered in this class, there is seldom one correct answer, approach, or solution. Disagreement fosters innovation. All in the course, including students and TAs, are encouraged to express constructive criticism and alternative ideas on any aspect of the content.
We are always learning. Our knowledge and understanding is always incomplete. Even experts are fallible. The bioinformatics field evolves rapidly, and Rome was not built in a day. Be kind to yourself and to others. You are always smarter and more knowledgeable today than you were yesterday.
Grading
Grading will be based on the 7 roughly weekly assignments and the final project. Each assignment is 5% of your total grade (35% total), and the final project is 60% of your grade. The remaining 5% is for class attendance / participation.
Absences, missed classes, and extensions
You should always prioritize your physical, emotional and mental health. BU offers a number of resources through Student Health Services and I encourage you to explore them if you feel you need to talk to someone. I am also here to listen without judgement if needed and can help with accessing the resources available.
If you need to miss extended class time or require extra time on an assignment because of personal matters, please just inform me and I will work with you whenever you are back to catch you up on the material and find an arrangement that accommodates your needs. You never need to disclose to me any private matters if you are not comfortable doing so.
Acknowledgements & Contributions
These materials would not have been possible without the contributions of Dakota Hawkins, Vanessa Li, Taylor Falk, and Mae Rose Gott, and Joey Orofino.
Former valiant and (probably still) attractive TAs:
2023
- Regan Conrad (BU BF PhD Candidate)
- Aubrey (Brie) Odom-Mabey (BU BF PhD Candidate)
2022
- Taylor Falk (BU BF MS Alumnus ’21) is a Bioinformatics Developer working with the VA PTSD Brain Bank, developing infrastructure to support data generated out of our brains. He is championing the assignment strategy and will be available to help with issues.
- Mae Rose Gott (BU BF MS Alumna ’21) is a Research Staff member working on a number of different projects across many different areas. She will be helping organize the course materials as we go forward.
- Vanessa Li (BU BF PhD Candidate) is a PhD candidate in Dr. Stefano Monti’s lab in Computational Biomedicine. She will primarily be helping with grading of your assignments.
- Joey Orofino (BU BF MS Alumnus ’15) is a Research Scientist working on identifying small RNA based biomarkers of Parkinson’s Disease