Syntax of config.yml

The file config.yml contains the information needed to render and run an application on BioLib. This configuration defines the entry to your application and what input arguments the user can set. When you edit an application using the graphical interface on BioLib the config.yml file is automatically updated.

The config.yml must be a valid YAML file located at .biolib/config.yml. The following sections describe the fields you can set in the configuration file. Note: all fields marked with asterisk (*) are required.

`biolib_version` *

Specifies the syntax version of the configuration file, for all new applications use version 2. This option exists to ensure backwards compatibility of applications.

biolib_version: 2

`modules` *

Applications on BioLib consist of one or more modules. A module defines where the containerized code is located and how it should be run. You must define at least one module called "main", which is the entry module of your application.

The example below shows how to define a module to run a Docker image from Dockerhub.

modules:
    main:
        image: 'dockerhub://ncbi/blast:latest'
        command: efetch -db protein -format fasta -id P01349 > queries/P01349.fsa
        working_directory: /home/biolib/
        input_files:
            - COPY / /home/biolib/
        output_files:
            - COPY /home/biolib/queries/ /

`image` *

A module must define an image to run. To use a local Docker image for a module, set image as local-docker://$DOCKER_IMAGE:$TAG

The example below uses a local docker image called protein-predictor and uses the latest version.

image: 'local-docker://protein-predictor:latest'

To use an image from dockerhub in your module, use the syntax dockerhub://$REPO/$TAG:$VERSION.

image: 'dockerhub://ncbi/blast:latest'

`command`

A command can be provided to specify what to run inside the image. When creating an application based on a Docker image, this field corresponds to calling docker run with the specified command.

The example below uses the command field to run an installed binary called efetch:

command: efetch -db protein -format fasta -id P01349 > queries/P01349.fsa

Another example could be running a Python script:

command: python3 script_to_run.py

`working_directory`

Specifies which directory the module.command will be run in. The path must be absolute, for example:

working_directory: /home/biolib/

`input_files` *

This field defines where to copy the input files that are sent from the user of the application. The field is defined as a list of COPY statements from the input file path to the input file destination.

The COPY statements has the following syntax: - COPY [SOURCE_PATH] [DESTINATION_PATH]. All paths must be absolute.

For example if the code in the module expects the input to be in /data/input/, this can be done in the following way:

input_files:
    - COPY / /data/input/

`output_files` *

This field defines where to copy the output files, if any, after the module has been run. The output files could be a csv file or a picture as a png file to show the user.

The syntax is a list of COPY statements of the form: - COPY [SOURCE_PATH] [DESTINATION_PATH]. All paths must be absolute.

Two common usecases are to send either a single file or a folder back to the user. In the ncbi/blast example above the command creates a file called /home/biolib/queries/P01349.fsa.

To send everything in the queries folder back to the user:

output_files:
    - COPY /home/biolib/queries/ /

To send only the P01349.fsa file:

output_files:
    - COPY /home/biolib/queries/P01349.fsa /

If no output files are generated by the module, an empty list can be defined in the following way:

output_files: [ ]

`arguments`

Specifies how input options and settings will be rendered to the user of the application, and how inputs will be parsed. The field should follow this structure:

arguments:
    -   key: --data # required
        description: 'Input Dropdown' # required
        key_value_separator: ' ' # optional, default is ' '
        default_value: '' # optional, default is ''
        type: dropdown # required
        options:
            'This will be shown as option one': 'value1'
            'This will be shown as option two': 'value2'
        required: true # optional, default is true

Under type you have the following options:

text provides a text input field
textarea provides a multi-line text input area
file provides a file select where users can upload an input file. You can optionally specify allowed_file_extensions to restrict file extensions.
multifile provides a file select where users can upload multiple input files. You can optionally specify allowed_file_extensions to restrict file extensions.
drag-and-drop-file provides a drag and drop area where users can upload a single file. You can optionally specify allowed_file_extensions to restrict file extensions.
drag-and-drop-files provides a drag and drop area where users can upload multiple files. You can optionally specify allowed_file_extensions to restrict file extensions.
text-file provides both a text input field and a file select allowing the user supply either. You can optionally specify allowed_file_extensions to restrict file extensions.
number provides a number input field
sequence a spreadsheet styled input which checks for valid sequence characters and pass a FASTA file to your application. By default it checks for valid protein sequence characters. You can optionally specify a sequence_type ('protein', 'dna', or 'rna') and/or additional_characters to customize validation. If sequence_type is empty, additional_characters defines the complete allowed character set.
table provides a table input where users can enter structured data in rows and columns
radio provides a "radio select" where users can select one amongst a number of prespecified options
dropdown provides a dropdown menu where users can select one amongst a number of prespecified options
multiselect provides a dropdown menu where users can select one or more prespecified options
multi-chain-compound provides an input for multi-chain molecular compound structures
toggle provides a toggle switch where users can choose two options. Note that the options need to be named 'on' : 'value1' and 'off': 'value2'
group allows grouping of multiple related arguments together for better organization
hidden allows the application creator to provide a default input argument without it being shown to the end-user

sub_arguments: Allow you to specify arguments that are only rendered if a user chooses a particular option in the parent argument. For example, an application might allow the user to run one of two commands, where each of these commands would need different input arguments:

arguments:
    -   key: --function
        description: 'Choose a function'
        key_value_separator: ''
        default_value: ''
        type: dropdown
        options:
            'Command A': a
            'Command B': b
        sub_arguments:
            a:
                -   key: --argument_a
                    description: "Argument A takes a file input"
                    type: file
            b:
                -   key: --argument_b
                    description: 'Argument B takes a text input'
                    type: text

File Type Restrictions

For file input types (file, text-file, multifile, drag-and-drop-file, drag-and-drop-files), you can restrict which file extensions users can upload by specifying allowed_file_extensions:

arguments:
    -   key: --input-file
        description: 'Upload an image file'
        type: file
        allowed_file_extensions:
            - 'png'
            - 'jpg'
            - 'jpeg'
    -   key: --data-file
        description: 'Upload a data file'
        type: text-file
        allowed_file_extensions:
            - 'csv'
            - 'txt'
            - 'json'

`remote_hosts`

In order for your application to be able to reach external servers each hostname must be specified in the config.yml as a remote host like the example below:

remote_hosts:
    - blast.ncbi.nlm.nih.gov

Note: the end-user must allow each of these hostnames in their account settings in order to run your application.

`main_output_file`

Specifies the path of the output file to render when the application finishes. If the file name ends with .md it is rendered as Markdown otherwise it will be rendered as text.

main_output_file: output.md

`description_file`

Specifies the path to the README file, which will be rendered as the description on the application page. The default path is README.md.

description_file: README.md

`license_file`

Specifies the path to the license file, which will be visible to users at the bottom of the application page. The default path is LICENSE.

license_file: LICENSE

`citation`

Provide a citation using the following structure:

citation:
    entry_type: book
    author: Dr. John Doe
    title: Using config.yml
    publisher: BioLib Community Press
    year: '2020'

Note that the choice of entry_type has implications for which fields are required. For a complete list of which fields are required for which types you can read more about the BibTex standard here.

* Required fields

Syntax of config.yml

biolib_version *

modules *

image *

command

working_directory

input_files *

output_files *

arguments