Lab 4 - Nextflow Practice
Today we are going to get practice with channel manipulation and some useful nextflow operators. You will be given a series of tasks that represent common patterns in bioinformatics workflows.
Objectives
- Gain proficiency with channel manipulation and nextflow operators
- Develop familiarity with common patterns in bioinformatics workflows
Nextflow Resources
- The guides provided in the
guides/
directory to help you with the exercises. - LLMs, stackoverflow, biostars, google, etc.
- Official Nextflow documentation
- Your classmates
Setup For Exercises
Each exercise is numbered and your repo should have a directory called exercise_X
where X is the exercise number. Inside each, you will find every file you need to
run the exercise by calling the exercise_X.nf
script. Your goal is to read the
instructions for each exercise and manipulate the channels to perform the tasks
outlined in the instructions.
Nextflow Patterns
Creating a channel from a CSV file - Exercise 1
Use map to transform values into a tuple - Exercise 2
Create a channel using the Channel.fromFilePairs
function - Exercise 3
Make a channel that is the cross product of two other channels - Exercise 4
Hypothetical Situation: Oftentimes in bioinformatics, we are not sure what value to use for a certain parameter. We may have a list of values we want to try and we want to run the same process multiple times with each value. Workflow management tools make it trivial to test any number of different combinations of parameter values.
Specific Situation: We are trying to decide on an optimal value of kmer when attempting to perform de novo assembly of a genome. We want to try a range of values and see which performs the best.
Given the channels created for you in the exercise_4.nf, use various nextflow operators to create a new channel that contains all possible combinations of the values in the two channels.
Creating a single list of output files - Exercise 5
Hypothetical Situation: We are running a process that will generate a single output file for each input file. We want to create a channel that contains a list of all the output files that will be generated. Oftentimes, this is useful when we need a process to operate on all output files generated by another process.
Specific Situation: We have a list of FASTQC Quality Control reports for a number of samples. We want to create a channel that contains all of them so that we can run MultiQC on them. MultiQC is a tool that will concatenate and summarize the results of multiple quality control tools into a single report. In this case, we want to gather all of the FASTQC files into the same channel.
Joining channels based on the sample name - Exercise 6
Hypothetical Situation: We have run two separate processes on each of our samples. We need to join their output channels so that each sample has both of the output files generated by the two processes.
Specific Situation: We have run STAR and HISAT2 on each of our RNAseq samples and