Run the pipeline¶
Here, we only describe the general guidelines tun run any pipeline. For specific options of the analysis pipeline, refer to the README.md dedicated to the pipeline you want to run.
Profiles¶
Set where the tools are available¶
conda¶
When using this profile, nextflow creates a Conda environment from the recipe environment.yml
.
The conda environment is created in the ${HOME}/conda-cache-nextflow
directory by default unless you set the directory with the option --condaCacheDir
from the command line when you launch nextflow.
example
nextflow -c conf/test.config run main.nf -profile conda --condaCacheDir "${HOME}/myCondaCacheDir"
Note
The conda environment is created the first time the pipeline is started.
Warning
Only tools that are compatible with each other can be added in the conda recipe environment.yml
.
docker¶
This profile allows the usage of the Docker containers. This profile will work in any case (provided that you have root credentials to run docker).
example
nextflow -c conf/test.config run main.nf -profile docker
multiconda¶
When using this profile, nextflow creates several Conda environments: for every tools that are specified in the params.tool
section from the conf/geniac.config
file, one Conda environment is created in the ${HOME}/conda-cache-nextflow
directory by default unless you set the directory with the option --condaCacheDir
from the command line when you launch nextflow. This profile make it possible to use Conda even if some tools are not compatible with each other.
example
nextflow -c conf/test.config run main.nf -profile multiconda --condaCacheDir "${HOME}/myCondaCacheDir"
Note
The conda environments are created the first time the pipeline is started.
multipath¶
Once the pipeline is installed, the following directory tree is created in the install directory:
multipath/alpine/bin
multipath/fastqc/bin
multipath/helloWorld/bin
multipath/rmarkdown/bin
multipath/trickySoftware/bin
This directory tree follows the pattern multipath/labelOfTheTool/bin
meaning that every tool has a specific directory having the name of its label. When using the multipath
profile, multipath/labelOfTheTool/bin
directory is automatically included in the PATH of only the process that has the corresponding label.
Therefore, this profile makes it possible to tackle any configuration such as using the same tool but with different versions.
If the tool required is already installed on your system, you can just add a symlink. For example:
ls -s /usr/bin/fastqc multipath/fastqc/bin
Alternatively, you can also do the following:
rm -r multipath/fastqc/bin
ln -s /usr/bin multipath/fastqc/bin
If the tool is not present on your system, just install it in the dedicated folder.
path¶
Once the pipeline is installed, the following directory tree is created in the install directory:
path/bin
When using the path
profile, path/bin
directory is automatically included in the PATH of every process.
If the tool required is already installed on your system, you can just add a symlink. For example:
ls -s /usr/bin/fastqc path/bin
Alternatively, assuming than some tools are already present in /usr/bin
, you can do the following:
rm -r path/bin
ln -s /usr/bin path/bin
If the tool is not present on your system, just install it in the dedicated folder.
singularity¶
This profile allows the usage of the Singularity containers. This profile will work in any case.
example
nextflow -c conf/test.config run main.nf -profile singularity
standard¶
This is the default profile used when -profile
is not specified when you launch nextflow. This profile requires that all the tools are available in your path.
example
nextflow -c conf/test.config run main.nf
Warning
If two different processes require the same tool but with different versions, this profile will not work. Thus, you will have to use multiconda, singularity, docker or multipath profiles.
Set where the computation will take place¶
local¶
This is the default.
cluster¶
If you want to launch the pipeline on a computing cluster, just launch:
nextflow -c conf/test.config run main.nf -profile multiconda,cluster
Important
The executor used is the one that has been set during Installation with the ap_nf_executor
configure option (or default is nothing was specified). If you want to change the executor, just edit the conf/cluster.config
file in the install directory and set the executor
to whatever nextflow supports.
Tip
If you want your job to be submitted on a specific queue
, use the option --queue
with the name of the queue in the command line as follows:
nextflow -c conf/test.config run main.nf -profile multiconda,cluster --queue q_medium
Options¶
General options¶
--condaCacheDir¶
Whenever you use the conda or multiconda profiles, the Conda environments are created in the ${HOME}/conda-cache-nextflow
folder by default. This folder can be changed using the --condaCacheDir
option. For example:
nextflow -c conf/test.config run main.nf -profile multiconda --condaCacheDir "${HOME}/myCondaCacheDir"
--genomeAnnotationPath¶
The genome annotations are expected to be found in the folder annotations
by default, and organized as specified in the conf/genomes.config
file. The --genomeAnnotationPath
option allows the path of the annotations
folder to be changed at runtine. For example:
nextflow -c conf/test.config run main.nf -profile multiconda --genomeAnnotationPath "${HOME}/myGenomeAnnotationPath"
--globalPath¶
When you use path or multipath profiles, the path
and multipath
folders are located in the installation directory by default (see Structure of the installation directory tree). The --globalPath
option allows the path of the path
and multipath
folders to be changed at runtine. For example:
nextflow -c conf/test.config run main.nf -profile multipath --globalPath "${HOME}/myGlobalPath"
--maxMemory¶
Use to set a top-limit for the default memory requirement for each process. Should be a string in the format integer-unit. eg. –maxMemory ‘8.GB’
--maxTime¶
Use to set a top-limit for the default time requirement for each process. Should be a string in the format integer-unit. eg. –maxTime ‘2.h’
--maxCpus¶
Use to set a top-limit for the default CPU requirement for each process. Should be a string in the format integer-unit. eg. –maxCpus 1
--outDir¶
The output directory where the results will be saved. For example:
nextflow -c conf/test.config run main.nf -profile multipath --outDir "${HOME}/myResults"
--queue¶
If you want your job to be submitted on a specific queue
when you use the cluster, use the option --queue
with the name of the queue in the command line. For example:
--singularityImagePath¶
When you use the singularity profile, the Singularity containers are located in the installation directory in the folder containers/singularity
by default (see Structure of the installation directory tree). The --singularityImagePath
option allows the path of the containers/singularity
folder to be changed at runtine. For example:
nextflow -c conf/test.config run main.nf -profile singularity --singularityImagePath "${HOME}/mySingularityImagePath"
Analysis options¶
Two generic options are available in the geniac template. Refer to the README of the pipeline for details about ad-hoc options to analyze the data.
--samplePlan¶
Use this to specify a sample plan file instead of a regular expression to find fastq files. For example: --samplePlan 'path/to/data/samplePlan.csv
.
The sample plan is a csv file with the following information (and no header) :
Sample ID | Sample Name | /path/to/R1/fastq/file | /path/to/R2/fastq/file (for paired-end only)
Note that when a sample plan is used to analyse the data, the path to the files are automatically added to the bindings needed by the singularity profile (see singularity and What is the difference between singularity and apptainer?).
--design¶
Specify a design file for advanced analysis. For example: --design 'path/to/data/design.csv'
.
The design is a custom csv file that list all experimental samples, their IDs, the associated control as well as any other useful metadata. It can contain any information you need during the analysis. The design is expected to be created with the following header :
SAMPLE_ID | VARIABLE1 | VARIABLE2
Importantly, defining a custom design file implies that you modify the variable designHeader
in the bin/apCheckDesign.py
script accordingly. For example: set designHeader=['SAMPLE_ID', 'VARIABLE1', 'VARIABLE2']
. Modify also the designCh channel in the main.nf to use the custom information.
The --samplePlan
and the --design
will be checked by the pipeline and have to be rigorously defined in order to make the pipeline work.
If the design file is not specified, the pipeline will run over the first steps but the downstream analysis will be ignored.
Results¶
To better organize your results, we recommend that use use the variable ${params.outDir}
in every process with the publishDir
directive. For example:
publishDir "${params.outDir}/fastqc", mode: 'copy'
Note that the --outDir
option defines where you want to store the results (see --outDir). In the directory, the results
folder gathers all the results. If no option is provided, the results
will be created where the main.nf
file is located.
Analysis¶
The results
folder contains the results of each tools. For example:
results/
├── alpine
│ ├── alpine_1.txt
│ ├── alpine_2.txt
│ ├── alpine_3.txt
│ ├── alpine_4.txt
│ └── alpine_5.txt
├── execBinScript
│ ├── execBinScriptResults_1.txt
│ └── execBinScriptResults_2.txt
├── fastqc
│ ├── SRR1106775-25K_1_fastqc.html
│ ├── SRR1106775-25K_1_fastqc.zip
│ ├── SRR1106775-25K_2_fastqc.html
│ ├── SRR1106775-25K_2_fastqc.zip
│ ├── SRR1106776-25K_1_fastqc.html
│ ├── SRR1106776-25K_1_fastqc.zip
│ ├── SRR1106776-25K_2_fastqc.html
│ ├── SRR1106776-25K_2_fastqc.zip
│ └── v_fastqc.txt
├── helloWorld
│ └── helloWorld.txt
├── MultiQC
│ ├── report_data
│ │ ├── multiqc_data.json
│ │ ├── multiqc_fastqc.txt
│ │ ├── multiqc_general_stats.txt
│ │ ├── multiqc.log
│ │ └── multiqc_sources.txt
│ ├── report.html
│ └── samplePlan.csv
├── standardUnixCommand
│ └── bonjourMonde.txt
├── trickySoftware
│ └── trickySoftwareResults.txt
Moreover, the following information will be systematically generated whatever the process you added in the main.nf
file:
results/
├── softwareVersions
│ └── softwareVersions_mqc.yaml
├── summary
│ ├── pipelineReport.html
│ ├── pipelineReport.txt
│ ├── resultsDescription.html
└── workflowOnComplete.txt
Trace¶
The nextflow tracing information will also be available:
results/
├── summary
│ └── trace
│ ├── DAG.pdf
│ ├── report.html
│ ├── timeline.html
│ └── trace.txt
Examples¶
Default¶
If all the tools are available in your path, just launch:
nextflow -c conf/test.config run main.nf -profile multiconda
Combine path/multipath profile with conda/multiconda¶
We see from the Process types and profiles table that, if you use the multiconda profile and one tool falls in the Custom install category, the workflow will fail unless the tool is already installed and available in your ${PATH}
. You also have the possibility to add the tool ins the path/
of the install directory (see multipath for details). To illustrate this, let’s try the following:
nextflow -c conf/test.config run main.nf -profile multiconda
Of course, it works.
Then, make the helloWorld
tool unavailable:
cd ..
mv pipeline/bin/geniac/helloWorld multipath/helloWorld/bin/helloWorld
cd -
nextflow -c conf/test.config run main.nf -profile multiconda
Of course, it fails: .command.sh: line 2: helloWorld: command not found
.
Thus try:
nextflow -c conf/test.config run main.nf -profile multiconda,multipath
It works!
Note
This example with the helloWorld
tool is not the most relevant as this tool is available whatever the profile you use (see Process types and profiles) but it is just here to show that it is possible to combine profiles to make sure that all the tools will be available.