Project 1: Transcriptional Profile of Mammalian Cardiac Regeneration with mRNA-Seq

High-throughput sequencing technologies have dramatically expanded our understanding of gene expression through their ability to profile all potential mRNA molecules in an unbiased manner without being constrained to a small subset of pre-chosen targets like in a microarray. This agnostic approach to molecular profiling has allowed for more complex and interesting analyses that interrogate novel splice patterns and gene isoforms or the newly appreciated multifaceted functions of long non-coding RNAs. However, the overwhelming use case of RNA sequencing has been to provide information about mRNA abundance and gene expression. The ability to assay the gene expression profile of an organism and how it changes in response to various stimuli or under different disease conditions has revolutionized modern molecular biology and genomics. In this project, you will perform all steps necessary to produce a basic analysis of RNA sequencing data from mouse heart ventricles isolated across the differentiation process.

Upon completion of this project, you will have gained experience with the following concepts and skills: - Working familiarity with standard tools used to aid in reproducible bioinformatics research including, but not limited to, git, miniconda, snakemake, and jupyter / Rmarkdown notebooks - Hands-on experience with a current state of the art RNAseq pipeline encompassing quality control, alignment, read quantification, and differential expression analysis - Proficiency with commonly used bioinformatics tools including fastQC, multiQC, STAR, FeatureCounts (verse), DESeq2, Samtools, and HPC usage

IMPORTANT NOTES

This first project will ask you to understand a great many concepts in a relatively short amount of time. In order to make troubleshooting your snakemake workflows more expedient, you will be working with very small subsets of the original data. This will allow you to quickly confirm that both your snakemake workflow and the tools you are using to process and analyze the data are working as expected. The files are small enough that you will be able to perform all of the operations on the head node of the SCC and have them finish within a few seconds. Please use these files to ensure that your snakemake workflow is operating as specified.

After your workflow runs successfully on the test files, we will learn about the usage of qsub and how to have snakemake automatically and properly submit jobs to the cluster. Then, you will use this knowledge to run your complete workflow on the full samples.

It is critical that you do your best to complete the required weekly tasks in order to stay on schedule for finishing the analysis. It will be very difficult to complete everything requested if you fall behind. Please seek out myself or the TAs and we will do our best to assist you however you need. Please use the accompanying weekly checklist to keep track of all of the necessary tasks. We strongly encourage you to continue on with other tasks if you finish early.

At the end of each week’s instructions, you may see a section entitled CHALLENGE, these are entirely OPTIONAL. These are steps that you are not required to do, but if you are already familiar with the material of the original task, you may choose to complete these tasks as an extra challenge. The nature of these challenges is such that they may require you to have performed the challenge from a previous week, so ensure you understand what is required.

In general, each week you should:

Complete the snakefile to run the requested tasks
Write the methods and results section for the week’s tasks in the provided Rmarkdown notebook
Answer any conceptual questions also found in the provided Rmarkdown

Each week will become progressively more independent. For the first week’s snakefile, we have given you the structure and partially completed one of the rules. For the second week, we have given you the rules you will need to specify. For the third week, you will need to construct the snakefile on your own from just the requirements listed in the assignment.

The github classroom for this project is here: https://classroom.github.com/a/WXj2OrjA

0.6 Table of Contents

Project Writeup Instructions

Week 1: RNAseq