10 WDL Configuration Guide
This WILDS WDL configuration guide was inspired by the BioWDL and WARP guidelines and is intended to cater to the pedagogical “proof-of-concept” nature of the WILDS.
10.1 WILDS WDL Philosophy
- The mindset behind WILDS WDLs is for each repository to be a self-contained demonstration of a particular bioinformatic functionality. An ideal use-case would proceed as follows:
- A researcher reviews the repository to deem whether it is relevant for their needs, starting with the README for the over-arching purpose of the workflow, but extending to the the input json and WDL script itself for specific questions about toolsets, settings, and input/output data types.
- If the workflow seems useful, the researcher clones the repository locally, makes minimal updates to the input json, and executes the code with minimal effort using their favorite WDL executor.
- If the researcher would like to add their own flavor to the workflow, they can fork the repository, customize it as necessary to fit their exact research needs, and even resubmit the changes back to the original repository for consideration and review.
- To that end, WILDS WDL repositories are relatively minimal and will usually consist of:
- a detailed README describing the intended functionality and input/output file types
- a single WDL script containing the workflow as well as the tasks that make up the workflow
- a input json template providing examples of expected inputs
- a test case to ensure the workflow is running as expected
- We believe the minimal nature of this setup will aid from a readability/ease-of-use standpoint.
10.2 Structural Guidelines
- Structs should be at the top of the WDL script, followed by the workflow itself, followed by all of its corresponding tasks.
- While any order is technically allowed, we recommend this arrangement to promote consistency and improve readability.
- Tasks should be broken down into as small of operations as possible.
- If a task uses more than two command line tools, it should probably be broken up into individual tasks.
- Docker containers should be assigned to every task to ensure uniform execution, regardless of local context.
- Outside of very basic images from very trusted sources, Docker images should be pulled directly from WILDS’ Docker Library whenever possible.
- If you think a particular tool should be added to that library, submit an issue or email us at wilds@fredhutch.org.
- In general, runtime attributes should be defined whenever possible in order to enable execution on as many backends as possible.
- Some runtime attributes will be ignored/required based on the backend/WDL engine being used to run your workflow. Refer to the WDL 101 guide for more details.
10.3 Stylistic Guidelines
- Indentation: braces contents, inputs, and line continuations should all be indented by two spaces (not four).
- White Space: different input groups and code blocks should be separated by a single blank line.
- Line Breaks: line breaks should only occur in the following places:
- After a comma
- Before the
else
of anif
statement - Between inputs
- Opening and closing braces
- Line Character Limit: lines should be a maximum of 100 characters.
- Expression Spacing: spaces should surround operators to increase clarity and readability.
- Naming Conventions:
- Descriptive Commenting:
- Comments should be placed above each task in the workflow describing its function.
- Input descriptors should be provided in the
parameter_meta
component.
- Command Syntax:
- Command sections within a WDL task should use Heredoc syntax for added clarity in terms of input variables.
- Quotation marks around string/file variables are recommended within the command section to avoid confusion involving spaces.
- While it is usually not an issue within the context of Cromwell, file localization is also recommended in order to maximize the utility of the workflow across different WDL executors.
10.4 Repository Guidelines
- As with all repositories, each workflow should include a detailed README containing:
- Purpose and functionality of the workflow
- Basic diagram illustrating the flow of data
- Contact information in case issues pop up
- WILDS Badge at the top describing the development status of the workflow
- Make sure to include an example input json in the repository for users to modify and easily execute the workflow.
- For a skeleton template, try the
inputs
action of WOMtool.
- For a skeleton template, try the
- A GitHub Action executing WOMtool
validate
is highly recommended as a check before merging new features into the main branch.- If you’re feeling adventurous, try automating an entire test run using a very small validation dataset.