Week 1: ChIPseq
Project 2 Directions
Now that we have experience with Nextflow from two prior projects, the directions for this project will be much less detailed. I will describe what you should do and you will be expected to implement it yourself. Please follow all the conventions we’ve established so far in the course.
These conventions include:
If you are asked to create modules to run a specific tool, assume that you should
also run said tool on the data by adding it to your workflow declaration in your
main.nf
.
Containers for Project 2
FastQC: ghcr.io/bf528/fastqc:latest
multiQC: ghcr.io/bf528/multiqc:latest
bowtie2: ghcr.io/bf528/bowtie2:latest
deeptools: ghcr.io/bf528/deeptools:latest
trimmomatic: ghcr.io/bf528/trimmomatic:latest
samtools: ghcr.io/bf528/samtools:latest
macs3: ghcr.io/bf528/macs3:latest
bedtools: ghcr.io/bf528/bedtools:latest
homer: ghcr.io/bf528/homer:latest
Quality Control, Genome indexing and alignment
For many NGS experiments, the initial steps are largely universal. We perform quality control on the sequencing reads, build an index for the reference genome, and align the reads. However, the source of the data will inform what quality metrics are relevant and the particular choice of tools to accomplish these steps. For RNAseq, it is important to use a splice-aware aligner when aligning against a reference genome since our sequences originated from mRNA. For ChIPseq experiments, our reads originated from DNA sequences and we can use a non-splice aware algorithm to map our reads to the reference genome.
Between project 1 and the early labs we did, you should have working modules that perform quality control using FastQC and trimmomatic, build a genome index using bowtie2, and align reads to a genome. We are going to take advantage of the modularity of nextflow by simply copying these previous modules and incorporating them into this workflow.
Copy the subsampled files from /projectnb/bf528/materials/project-2-chipseq/subsampled_files/ to a new directory called
subsampled_files/
in yoursamples/
directory. Encode the names of the file and the paths in a samplesheet at the top level of your directoryCopy the human reference genome and associated GTF to your
refs/
directory. Add these paths to yournextflow.config
as params.Update your workflow
main.nf
to perform quality control using FastQC and trimmomatic, build a bowtie2 index for the human reference genome and align the reads to the reference genome.
Sorting and indexing the alignments
Many subsequent analyses on our BAM files will require them to be both sorted and indexed. Just like for large sequences in FASTA files, sorting and indexing the alignments will allow us to perform much more efficient computational operations on them.
- Create a module that will both sort and index your BAM files using Samtools. Do both steps in one single module.
Calculate alignment statistics using samtools flagstat
The samtools flagstat utility will report various statistics regarding the alignment flags found in the BAM.
- Create a module that will run samtools flagstat on all of your BAM files
Running multiqc to evaluate sequencing QC
Just like in project 1, we are going to use multiqc to collect the various quality control metrics from our pipeline. Ensure that multiqc collects the outputs from FastQC,
Make a channel in your workflow
main.nf
that collects all of the relevant QC outputs needed for multiqc (fastqc zip files, trimmomatic log, and samtools flagstat output)Copy your previous multiqc module and incorporate it into your workflow to generate a MultiQC report for the listed outputs
Generating bigWig representations of our BAM files
Now that we have sorted and indexed our alignments, we are going to generate bigWig files or coverage tracks containing the number of reads per genomic interval or bin for each sample. We will use these coverage tracks for calculating correlation between our samples and visualizing the read coverage in specific regions of interest.
Use the
bamCoverage
deeptools utility to generate a bigWig file for each of the sample BAM files.You may use all default parameters. If you wish, you may change the
-bs
and-p
flags as needed.