Build systems like make are frequently used to create complicated workflows, e.g. in bioinformatics. snakemake aims to reduce the complexity of creating workflows by providing a clean and modern domain specific specification language (DSL) in python style, together with a fast and comfortable execution environment.
Installation
- On Ubuntu 12.04, you can install the Debian package python3-snakemake available in our launchpad repository.
- On other systems, you need a working installation of Python >= 3.2. Depending on your system, you can then install snakemake by issuing either easy_install snakemake or easy_install3 snakemake in the command line. If you don't have administrator priviledges, have a look at the argument --user of easy_install.
- Finally, snakemake can be manually installed by downloading the source code archive from pypi.
Usage
Snakemake offers a simple DSL to describe workflows that create files in several subsequent steps:
samples = ["01", "02"]
# optionally define a directory where the work should be done.
workdir: "path/to/workdir"
# similar to make, define dummy rules that act as build targets.
rule all:
input: "diffexpr.tsv", ...
rule summarize:
input: "{sample}.mapped.bam".format(sample = s) for s in samples
output: "diffexpr.tsv"
run:
#... provide some python code to produce the output from the input files
#e.g. access input files by index
input[1]
# access wildcard values
wildcards.sample
# easily run shell commands automatically using your default shell while having direct access
# to all local and global variables via the format minilanguage
threads = 6
shell("somecommand --threads {threads} {input[0]} {output[0]}")
rule map_reads:
# assign names for input and output files
input: reads = "{sample}.fastq", hg19 = "hg19.fasta"
# mark output files to be write-protected after creation
output: mapped = protected("{sample}.mapped.sai")
# Optionally define messages that are displayed instead of generic rule description on execution of the rule:
message: "Mapping reads to {input.hg19}"
threads: 8
shell:
# directly provide shell commands (in a multi or single line string) if python syntax is not needed.
# again, global and local variables can be accessed via the format minilanguage.
# Further, number of threads used by the rule can be specified. The snakemake scheduler ensures that the rule is run with the specified number of threads if enough cores are made available via the -j command line option.
"""
bwa aln -t {threads} {input.hg19} {input.reads} > {output.mapped}
some --other --command
"""
Given a "Snakefile" with such a syntax, the workflow can be executed (e.g. using up to 6 parallel processes) by issueing:
snakemake -j6 -s Snakefile
For more details please see the Tutorial.
Features:
- Define workflows in a textual way by writing rules how to create output files from input files in a simple python based syntax. In contrast to GNU make (which is primarily a build system), snakemake allows a rule to create multiple output files.
- Snakemake automatically calculates which rules need to be executed to create the desired output.
- Both shell based rules as well as full python syntax inside a rule is supported. Shell commands have direct access to all local and global python variables.
- Like GNU make, snakemake can schedule parallel rule executions where possible. Further, inter rule parallelization can be combined with intra rule parallelization (e.g. threads) and snakemake ensures that the number of used cores does not exceed the given value.
- Files can be marked as temporary (i.e. they can be deleted once not needed any more) or protected (i.e. they will be write protected after creation).
- Input and output files can contain multiple named wildcards.
- Input and output files can be named so that addressing them inside the rule becomes handy.
- A map-reduce like functionality is accomplished by using the easy to read python list comprehension syntax.
- As an experimental feature, snakemake can run on a cluster by specifying the submit command (e.g. qsub for Sun Grid Engine).
Requirements:
- Python
Comments not found