Lab 08 — Snakemake
Key concepts and tools
Snakefile,rule,input,output,shellrule all— declaring target outputs- Wildcards (
{sample}) — generalization across samples expand()— generating lists of expected outputssnakemake --workflow-profile— cluster submissionconda:directive per rule- DAG: directed acyclic graph of rules
samtools sort,samtools index
Snakemake is an alternative workflow manager that reasons backward from desired output files rather than forward from inputs. This lab starts with a single-rule example to establish the syntax, then moves to a multi-sample pipeline that uses wildcards to generalize across samples without repeating code. You will fill in samtools sort and samtools index rules and specify their file dependencies, observe how Snakemake constructs the execution DAG, and run the pipeline on the cluster using a profile. The goal is not to replace Nextflow but to understand how a file-centric workflow manager thinks differently about the same problem.
Setup
Create the Snakemake conda environment from the provided YAML:
conda env create -f envs/snakemake_env.yml
conda activate snakemake_env
Part 1 — single/
Look at the Snakefile. It closely resembles the example from lecture: a rule all that declares the final target, and individual rules that specify how to produce each file.
snakemake -s Snakefile
Observe what is created and where. Snakemake checks for the existence of declared output files after each rule executes.
Part 2 — multi/
Look at the files in samples/. Note what portion of each filename is shared and what portion is unique.
Snakemake generalizes across samples using wildcards — any string in curly braces {sample} that it infers from the filesystem:
rule samtools_sort:
input:
bam = 'samples/{sample}.bam'
output:
sorted_bam = 'results/{sample}.sorted.bam'
shell:
'samtools sort {input.bam} > {output.sorted_bam}'
Tasks:
- Declare the two final target files in
rule all:results/sampleA.sorted.bam.bairesults/sampleB.sorted.bam.bai
- Fill in the rules for
samtools_sortandsamtools_index:samtools_sortinput: the starting BAM files; output: redirect sorted output to a new filesamtools_indexinput: the output ofsamtools_sort; output: the.baiindex (same name +.bai)
- Run with the cluster profile:
snakemake -s Snakefile --workflow-profile profile/
Part 3 — advanced/
Extends the multi-sample pipeline to handle paired-end reads. Explore how expand() generates lists of expected outputs and how the profile specifies cluster resource requests per rule.
Key differences from Nextflow
| Snakemake | Nextflow | |
|---|---|---|
| Reasoning direction | Backward from outputs | Forward from inputs |
| Parallelism unit | Rule × wildcard combination | Process × channel element |
| Data representation | File paths | Channels |
| Language | Python | Groovy / DSL2 |
| Config | config.yaml |
nextflow.config |