FAQ

How can I use geniac on an existing repository?

The structure of the repository is based on nf-core and additional files and folders are expected.

All the resources for geniac are available here:

Follow the guidelines below if you want to use geniac on an existing repository.

Create a the folder geniac

The guidelines and additional utilities we developed are in geniac should be located in a folder named geniac in your new repository. The utilities in the geniac folder can either be copied or link to your pipeline repository as a git submodule.

Note

If the geniac is used as a submodule in your repository, execute the command git submodule update --init --recursive once you have created the geniac submodule, otherwise the geniac folder will remain empty.

If you want to create a submodule, you can edit and modify the variables in the file createSubmodule.bash and follow the procedure.

Create additional files and folders

The following files are mandatory:

  • CMakeLists.txt: create a folder named modules/fromSource and copy this file inside if your need to Install from source code. Check that the file is named CMakeLists.txt.

  • geniac.config: copy the file in the folder conf. This file contains a scope names geniac that defines all the nextflow variables needed to build, deploy and run the pipeline.

Moreover, depending on which case your are when you Add a process, you can create whenever you need them the following folders:

├── env
├── modules
└── recipes
    ├── conda
    ├── dependencies
    ├── docker
    └── singularity

How does the repository look like?

The source code of your repository should look like this:

├── assets                       # assets needed for runtime
├── bin                          # scripts or binaries for the pipeline
├── conf                         # configuration files for the pipeline
│   ├── geniac.config            # contains the scopes mandatory for geniac
├── docs                         # documentation of the pipeline
├── env                          # process specific environment variables
├── geniac                       # geniac utilities
│   ├── cmake                    # source files for the configuration step
│   ├── docs                     # guidelines for installation
│   ├── install                  # scripts for the build step
├── main.nf
├── modules
│   └── fromSource               # tools installed from source code
│       ├── CMakeLists.txt
│       └── helloWorld
├── nextflow.config
├── recipes                      # installation recipes for the tools
│   ├── conda
│   ├── dependencies
│   ├── docker
│   └── singularity
└── test                         # data to test the pipeline
    └── data

How can I install custom commands in the docker/singularity recipes automatically generated by geniac?

For some tools, it migth be necessary to add custom commands in the docker/singularity recipes automatically generated by geniac. In the conf/geniac.config file, you ca use:

  • params.geniac.containers.cmd.post: to define commands which will be executed at the end of the default commands generated by geniac.

  • params.geniac.containers.cmd.envCustom: to define environment variables which will be set inside the docker and singularity images.

For more details on how to proceed, see Add custom commands and environment variables in the docker/singularity recipes automatically generated by geniac.

How can I use a custom docker registry to build the containers?

Geniac automatically generate recipes for Docker and Singularity. To build the containers, it bootstraps on two docker containers from the official 4geniac docker hub registry which includes several Linux distributions used for the containers. Instead of using the official docker hub registry, you may want to use a custom registry. In this case, make sure that the exact same container tags available on 4geniac are available on this custom registry and use the following option at the configuration step:

cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} -Dap_docker_registry=my-registry-url/

For example, my-registry-url could be a docker registry available in your gitlab.

How can I write the config files for the different nextflow profiles?

The utilities we propose allow the automatic generation of all the config files for the nextflow Profiles. However, if you really want to write them yourself follow the examples described in Config files for the different nextflow profiles.

How should I define the path to the genome annotations?

When the pipeline is installed with geniac, the Structure of the installation directory tree contains a directory named annotations. This directory can be a symlink to the directory with your existing annotations (can be set during Configure with the option ap_annotation_path). Check that:

  1. The file geniac.config defines the genomeAnnotationPath in the scope params as follows:

params {

  genomeAnnotationPath = params.genomeAnnotationPath ?: "${projectDir}/../annotations"

}
  1. All the paths to your annotations are defined using the variable params.genomeAnnotationPath as shown in the file genomes.config

  2. You use the variables defined in the genomes.config in the main.nf, for example params.genomes['mm10'].fasta

How can I pass specific options to run docker or singularity containers?

If needed, you can set the singularityRunOptions and dockerRunOptions values to whatever is needed for your configuration in the geniac.config file. This will set the runOption parameters (see Nextflow configuration) of the Singularity and Docker directive respectively to the selected value when the Singularity and Docker profiles will be called.

How can a conda environment be activated in docker or singularity containers?

In order to activate a conda environment with conda activate myConda_env when a docker or singularity container is executed, some configuration are required insie the recipes. This is described in How can a conda environment be activated in docker or singularity containers?.

How can I see the recipes for the containers?

As geniac automatically generates the recipes of the containers, they are not available in the git repository. However, they can be easily retrieved in several ways. An example to Generate the recipes for the containers is provided.

Is geniac compatible with nextflow DSL2?

Since version 20.07.1, Nextflow provides the DSL2 syntax that allows the definition of module libraries and simplifies the writing of complex data analysis pipelines. geniac is fully compatible with DSL2 and we provide geniac demo DSL2 as an example. The guidelines to Add a process remain exactly the same.

The main difference between geniac demo and geniac demo DSL2 are:

  • each process is located in one dedicated file in the folder nf-modules/local/process

  • each subworkflow that combines different processes is located in the folder nf-modules/local/subworkflow

  • the main.nf includes these two folders and uses the workflow directive

├── assets                       # assets needed for runtime
├── bin                          # scripts or binaries for the pipeline
├── conf                         # configuration files for the pipeline
│   ├── geniac.config            # contains the geniac scope mandatory for nextflow
├── docs                         # documentation of the pipeline
├── env                          # process specific environment variables
├── geniac                       # geniac utilities
│   ├── cmake                    # source files for the configuration step
│   ├── docs                     # guidelines for installation
│   ├── install                  # scripts for the build step
├── main.nf
├── modules                      # tools installed from source code
│   ├── CMakeLists.txt
│   ├── helloWorld
├── nextflow.config
├── nf-modules                   # nextflow files for DSL2
│   └── local
│       ├── process
│       │   ├── alpine.nf
│       │   ├── checkDesign.nf
│       │   ├── execBinScript.nf
│       │   ├── fastqc.nf
│       │   ├── getSoftwareVersions.nf
│       │   ├── helloWorld.nf
│       │   ├── multiqc.nf
│       │   ├── outputDocumentation.nf
│       │   ├── standardUnixCommand.nf
│       │   ├── trickySoftware.nf
│       │   └── workflowSummaryMqc.nf
│       └── subworkflow
│           ├── myWorkflow0.nf
│           └── myWorkflow1.nf
├── recipes                      # installation recipes for the tools
│   ├── conda
│   ├── dependencies
│   ├── docker
│   └── singularity
└── test                         # data to test the pipeline
    └── data

The geniac demo DSL2 can be run as follows:

export WORK_DIR="${HOME}/tmp/myPipelineDSL2"
export SRC_DIR="${WORK_DIR}/src"
export INSTALL_DIR="${WORK_DIR}/install"
export BUILD_DIR="${WORK_DIR}/build"
export GIT_URL="https://github.com/bioinfo-pf-curie/geniac-demo-dsl2.git"

mkdir -p ${INSTALL_DIR} ${BUILD_DIR}

# clone the repository
# the option --recursive is needed if you use geniac as a submodule
# the option --remote-submodules will pull the last geniac version
# using the release branch from https://github.com/bioinfo-pf-curie/geniac
git clone --remote-submodules --recursive ${GIT_URL} ${SRC_DIR}

cd ${BUILD_DIR}
cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR}
make
make install

cd ${INSTALL_DIR}/pipeline

nextflow -c conf/test.config run main.nf -profile multiconda

How can a process have a label which is defined by a variable?

With nextflow, it is possible to define a label using a variable instead of a fixed string. In this case, the label value must be given in parenthesis. geniac also support such label. However, the label must be defined according to the following format: label (params.someValue ?: 'toolPrefix'). In any case, the content of the params.someValue must start by the toolPrefix value. The geniac linter will check that there is a tool with a name starting with such a prefix, if it is not the case, it will throw an error.

A typical use case is the possibility to launch a pipeline with a version of a tool given as an option on the nextflow command line. Let’s consider that you have declare three versions the mySoft tool in the geniac.config file as follows:

params {
   geniac{
      tools {
         mySoft = "conda-forge::mySoft=v0=r351h96ca727_1003`
         mySoft_v1 = "conda-forge::mySoft=v1=r351h96ca727_1003`
         mySoft_v2 = "conda-forge::mySoft=v1=r351h96ca727_1003`
      }
   }
}

Then, in the netxflow process, define the label as follows:

process mySoft {
  label (params.mySoftVersion ?: 'mySoft')
  label 'minMem'
  label 'minCpu'


  script:
  """
  mySoft --version
  """
}

When you launch nextflow, pass the option --mySoftversion to set which version of mySoft you want to use.

nextflow run main.nf --mySoftversion v2 -profile test,singularity

You may also write your nextflow code to use the default version (i.e. v0 with the mySoft label) if no version is specified.

What are the @git_*@ variables?

You will find in both the main.nf and nextflow.config some variables surrounded by @ such as @git_repo_name@. These variables are used during the cmake step to extract the information from the git repository and replace them by their value. These variables are used in the nextflow manifest for example. If needed, you can remove these variables and set the value to whatever you want.

Why does the conda profile fail to build its environment or take to much time?

The conda relies on the environment.yml that is automatically generated by geniac. However, building a Conda recipe can sometimes be very tricky as the order of the channels and the dependencies matters. geniac can not guess what is the appropriate order. Moreover, Conda may want to solve conflicts between incompatible packages. Thus, in some cases, you will have no choice but to correct the environment.yml file manually, add it the git repository (where is located the main.nf file) and install the pipeline with the following options:

cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} -Dap_keep_envyml_from_source=ON

Note that it may be impossible to have a working environment.yml file due to the incompatibility between tools. Use the multiconda profile instead of the conda profile.

Why are the tools available from source installed in pipeline/bin/fromSource and not in pipeline/bin?

The tools available from source are installed in pipeline/bin/fromSource to ensure that, when using the singularity or docker profiles, the tools installed inside the containers are used. Indeed, nextflow add the folder pipeline/bin in the environment variable PATH when the container is launched. To illustrate the impact of this setting, assume that the pipeline has been installed with the singularity images in ${INSTALL_DIR}/pipeline. Then, create the file ${INSTALL_DIR}/pipeline/bin/helloWorld which contains the following bash script:

#! /bin/bash

echo "Buenos dias!"

Make this bash script executable with chmod +x ${INSTALL_DIR}/pipeline/bin/helloWorld and execute the pipeline as follows:

nextflow run main.nf -profile test,singularity

Then, the file ${INSTALL_DIR}/pipeline/results/helloWorld/helloWorld.txt will contain Buenos dias! instead of Hello World!. This means that singularity uses the bash script in ${INSTALL_DIR}/pipeline/bin/helloWorld instead of the helloWorld executable which has been installed inside the image which raises reproducibility issue we can avoid by installing the tools in the folder pipeline/bin/fromSource which is not in the PATH.

What privileges do I need to build the singularity images?

There are several ways.

  • Using Geniac CLI (see Install the pipeline with the singularity images):

    • if you have the sudo privileges, use the singularity mode,

    • if your are allowed to use the fakeroot option, use the singularityfakeroot mode.

  • Using standard cmake options:

    • if you have the sudo privileges, pass the option -Dap_install_singularity_images=ON to cmake, and then run sudo make (see Install and run with singularity),

    • if your are allowed to use the fakeroot option, pass both options -Dap_install_singularity_images=ON and -Dap_singularity_build_options=--fakeroot to cmake, and then run make.

What is the difference between singularity and apptainer?

In may 2021, the commercial entity sylabs behind Singularity forked the project. The original Singularity repository has been moved to https://github.com/apptainer/singularity which will persist as an archive and be set to read-only after the first release of Apptainer (https://github.com/apptainer/apptainer). Apptainer will provide singularity as a command line link and will maintain as much of the CLI and environment functionality as possible. From the user’s perspective, very little, if anything, will change.