Skip to content

Commit

Permalink
MRG: add some text on wildcards / wildcard constraints, and other rec…
Browse files Browse the repository at this point in the history
…ipes. (#3)

* expansion

* wrote some text on some things

* update text

* add never-fail-me

* upgrade/update wildcard exps

* add missing files

* switch up fenced code blocks in admonish

* renaming example

* cleanup

* much wildcard, wow

* update rsync etc for changes from linkcheck?

* subset with wildcards

* params stuff

* fix

* label recipes

* start on expand

* advanced section

* fix wildcard rule

* notes

* more wildcards

* even more wildcards

* even more wc

* upd

* cleanup on aisle 2

* even more wildcards

* examples

* wildcards

* notes on params
  • Loading branch information
ctb authored Mar 5, 2023
1 parent 4fb158d commit 41423d2
Show file tree
Hide file tree
Showing 138 changed files with 1,960 additions and 3 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
book
*~
.snakemake
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
all: build

build: .PHONY
mdbook build
mdbook build -d book

serve: .PHONY
mdbook serve --open
Expand Down
2 changes: 2 additions & 0 deletions code/examples/cluster.example/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
aggregate.txt
many_files
34 changes: 34 additions & 0 deletions code/examples/cluster.example/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
rule all:
input: "aggregate.txt"

# make a potentially unknown number of files
checkpoint make_many_files:
output: directory("many_files")
shell: """
mkdir -p many_files
echo 1 > many_files/1.out
echo 2 > many_files/2.out
"""

#
# create a Python function that loads in the filenames only AFTER
# the 'make_many_files' checkpoint rule is run.
#

def load_many_files(wc):
# wait for results of 'make_many_files'
checkpoint_output = checkpoints.make_many_files.get(**wc).output[0]

# this will only be run *after* 'make_many_files' is done.
many_files_names = glob_wildcards('many_files/{name}.out').name

return expand('many_files/{name}.out', name=many_files_names)

# use 'load_many_files' as an input - this rule will only be run AFTER
# 'make_many_files' is run.
rule work_with_many_files:
input:
load_many_files
output: "aggregate.txt"
shell:
"cat {input} > {output}"
1 change: 1 addition & 0 deletions code/examples/directory.within/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
subdir
12 changes: 12 additions & 0 deletions code/examples/directory.within/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
rule use_subdir_file:
input: "subdir/"
shell: """
cat subdir/a_file.txt
"""

rule make_file_in_subdir:
output: directory("subdir/")
shell: """
mkdir -p subdir
echo hello world > subdir/a_file.txt
"""
1 change: 1 addition & 0 deletions code/examples/params.basic/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
output*.txt
7 changes: 7 additions & 0 deletions code/examples/params.basic/snakefile.params
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
rule use_params:
params:
val = 5
output: "output.txt"
shell: """
echo {params.val} > {output}
"""
11 changes: 11 additions & 0 deletions code/examples/params.basic/snakefile.params_wildcards
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
rule all:
input:
"output.4.txt"

rule use_params:
params:
val = wildcards.val
output: "output.{val}.txt"
shell: """
echo {params.val} > {output}
"""
11 changes: 11 additions & 0 deletions code/examples/params.basic/snakefile.params_wildcards.2
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
rule all:
input:
"output.5.txt"

rule use_params:
params:
val = lambda w: w.val
output: "output.{val}.txt"
shell: """
echo {params.val} > {output}
"""
11 changes: 11 additions & 0 deletions code/examples/params.basic/snakefile.params_wildcards.3
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
rule all:
input:
"output.6.txt"

rule use_params:
params:
val = "{val}"
output: "output.{val}.txt"
shell: """
echo {params.val} > {output}
"""
1 change: 1 addition & 0 deletions code/examples/params.subset/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
big.subset25.fastq
23 changes: 23 additions & 0 deletions code/examples/params.subset/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
def calc_num_lines(wildcards):
# convert wildcards.num_records to an integer:
num_records = int(wildcards.num_records)

# calculate number of lines (records * 4)
num_lines = num_records * 4

return num_lines

rule all:
input:
"big.subset25.fastq"

rule subset:
input:
"big.fastq"
output:
"big.subset{num_records}.fastq"
params:
num_lines = calc_num_lines
shell: """
head -{params.num_lines} {input} > {output}
"""
200 changes: 200 additions & 0 deletions code/examples/params.subset/big.fastq

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions code/examples/params.subset_lambda/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
big.subset25.fastq
14 changes: 14 additions & 0 deletions code/examples/params.subset_lambda/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
rule all:
input:
"big.subset25.fastq"

rule subset:
input:
"big.fastq"
output:
"big.subset{num_records}.fastq"
params:
num_lines = lambda wildcards: int(wildcards.num_records) * 4
shell: """
head -{params.num_lines} {input} > {output}
"""
200 changes: 200 additions & 0 deletions code/examples/params.subset_lambda/big.fastq

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions code/examples/wildcards.basic_constrain/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# ANCHOR: constraints
# match all .txt files - no constraints
all_files = glob_wildcards("{filename}.txt").filename

# match all .txt files in this directory only - avoid /
this_dir_files = glob_wildcards("{filename,[^/]+}.txt").filename

# match all files with only a single period in their name - avoid .
prefix_only = glob_wildcards("{filename,[^.]+}.txt").filename

# match all files in this directory with only a single period in their name
# avoid / and .
prefix_and_dir_only = glob_wildcards("{filename,[^./]+}.txt").filename
# ANCHOR_END: constraints

print(all_files)
print(this_dir_files)
print(prefix_only)
print(prefix_and_dir_only)

assert all_files == ['file1.subset', 'file1', 'subdir/file2', 'subdir/file2.subset', 'subdir/nested/file3']
assert this_dir_files == ['file1.subset', 'file1']
assert prefix_only == ['file1', 'subdir/file2', 'subdir/nested/file3']
assert prefix_and_dir_only == ['file1']
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
2 changes: 2 additions & 0 deletions code/examples/wildcards.greedy/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
x.y.z.gz
longer_filename.gz
Empty file.
8 changes: 8 additions & 0 deletions code/examples/wildcards.greedy/snakefile.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
rule all:
input:
"x.y.z.gz"

rule something:
input: "{prefix}.{suffix}.txt"
output: "{prefix}.{suffix}.gz"
shell: "gzip -c {input} > {output}"
8 changes: 8 additions & 0 deletions code/examples/wildcards.greedy/snakefile.2
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
rule all:
input:
"longer_filename.gz"

rule something:
input: "{prefix}{suffix}.txt"
output: "{prefix}{suffix}.gz"
shell: "gzip -c {input} > {output}"
Empty file.
1 change: 1 addition & 0 deletions code/examples/wildcards.many/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
compressed/
14 changes: 14 additions & 0 deletions code/examples/wildcards.many/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
rule all:
input:
"compressed/F3D141_S207_L001_R1_001.fastq.gz",
"compressed/F3D141_S207_L001_R2_001.fastq.gz",
"compressed/F3D142_S208_L001_R1_001.fastq.gz",
"compressed/F3D142_S208_L001_R2_001.fastq.gz"

rule gzip_file:
input:
"original/{filename}"
output:
"compressed/{filename}.gz"
shell:
"gzip -c {input} > {output}"
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
4 changes: 4 additions & 0 deletions code/examples/wildcards.namespace/snakefile.broken
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
rule analyze_this:
input: "{a}.first.txt"
output: "{a}.second.txt"
shell: "analyze {input} -o {output} --title {a}"
4 changes: 4 additions & 0 deletions code/examples/wildcards.namespace/snakefile.works
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
rule analyze_this:
input: "{a}.first.txt"
output: "{a}.second.txt"
shell: "analyze {input} -o {output} --title {wildcards.a}"
Empty file.
3 changes: 3 additions & 0 deletions code/examples/wildcards.output/snakefile.output
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
rule a:
output: "{prefix}.a.out"
shell: "touch {output}"
17 changes: 17 additions & 0 deletions code/examples/wildcards.renaming/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# first, find matches to filenames of this form:
files = glob_wildcards("original/{sample}_L001_{r}_001.fastq")

# next, specify the form of the name you want:
rule all:
input:
expand("renamed/{sample}_{r}.fastq", zip,
sample=files.sample, r=files.r)

# finally, give snakemake a recipe for going from inputs to outputs.
rule rename:
input:
"original/{sample}_L001_{r}_001.fastq",
output:
"renamed/{sample}_{r}.fastq"
shell:
"cp {input} {output}"
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
1 change: 1 addition & 0 deletions code/examples/wildcards.renaming_simple/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
renamed/
16 changes: 16 additions & 0 deletions code/examples/wildcards.renaming_simple/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# first, find matches to filenames of this form:
files = glob_wildcards("original/{sample}_001.fastq")

# next, specify the form of the name you want:
rule all:
input:
expand("renamed/{sample}.fastq", sample=files.sample)

# finally, give snakemake a recipe for going from inputs to outputs.
rule rename:
input:
"original/{sample}_001.fastq",
output:
"renamed/{sample}.fastq"
shell:
"cp {input} {output}"
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
1 change: 1 addition & 0 deletions code/examples/wildcards.subset/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
big.subset100.fastq
12 changes: 12 additions & 0 deletions code/examples/wildcards.subset/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
rule all:
input:
"big.subset100.fastq"

rule subset:
input:
"big.fastq"
output:
"big.subset{num_lines}.fastq"
shell: """
head -{wildcards.num_lines} {input} > {output}
"""
Loading

0 comments on commit 41423d2

Please sign in to comment.