Project 3: ChIPseq Individual Project

The dataset consists of 2 paired experiments (IP and Control) derived from Mus musculus for a total of 4 samples. Your workflow must perform a basic peak calling analysis and create a single set of filtered, reproducible peaks. Both experiments were IPs for the same biological factor of interest.

Your report must address the following topics/questions in writing:

  1. Briefly remark on the quality of the sequencing reads and the alignment statistics, make sure to specifically mention the following:
    • Are there any concerning aspects of the quality control of your sequencing reads?
    • Are there any concerning aspects of the quality control related to alignment?
    • Based on all of your quality control, will you exclude any samples from further analysis?
  2. After QC, please generate a “fingerprint” plot (see deeptools utility) and a heatmap plot of correlation values between samples (see project 2)
    • Briefly remark on the plots and what tjey indicates to you in terms of the experiment
  3. After performing peak calling analysis, generating a set of reproducible peaks and filtering peaks from blacklisted regions, please answer the following:
    • How many peaks are present in each of the replicates?
    • How many peaks are present in your set of reproducible peaks? What strategy did you use to determine “reproducible” peaks?
    • How many peaks remain after filtering out peaks overlapping blacklisted regions?
  4. After performing motif analysis and gene enrichment on the peak annotations, please answer the following:
    • Briefly discuss the main results of both of these analyses and what they might imply about the function of the factor we are interested in.

Deliverables

  1. Produce a heatmap of correlation values between samples (see project 2)

  2. Generate a “fingerprint” plot using the deeptools plotFingerprint utility

  3. Create a figure / table containing the number of peaks called in each replicate, and the number of reproducible peaks

  4. A single BED file containing the reproducible peaks you determined from the experiment.

  5. Perform motif finding on your reproducible peaks

    • Create a single table / figure with the most interesting results
  6. Perform a gene enrichment analysis on the annotated peaks using a well-validated gene enrichment tool

    • Create a single table / figure with the most interesting results
  7. Produce a figure that displays the proportions of where the factor of interest is binding (Promoter, Intergenic, Intron, Exon, TTS, etc.)