forked from Public-Health-Bioinformatics/versioned_data
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathversioned_data.xml
124 lines (92 loc) · 4.44 KB
/
versioned_data.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
<tool id="versioned_data" name="Versioned data retrieval" version="0.1.03">
<description>Retrieve versioned sequence files and/or their blast, bowtie, etc. database indexes</description>
<macros>
<token name="@BINARY@">versioned_data.py</token>
<import>bccdc_macros.xml</import>
</macros>
<expand macro="requirements" />
<command interpreter="python">
#assert $__user__, Exception( 'You must be logged in to use this tool.' )
versioned_data.py
#if $globalRetrievalDate.strip() > ''
-d "$globalRetrievalDate"
#end if
-r
"
#for $v in $versions:
${v.database},
#for $r in $v.retrieval:
${r.retrievalId}
#end for
,
#for $w in $v.workflows:
${w.workflow}
#end for
|
#end for
"
-o "$log"
-O "$__app__.security.encode_id($log.id)"
--api_info_path "$api_info_path" ##Actually a file path to configfile that holds api key
</command>
<!-- #:$log.hid:$log.id dataset_id -->
<expand macro="stdio" />
<inputs>
<!-- Implement as datepicker? http://www.learnfaceit.org/for-developers/adding-parameter-types-to-tool -->
<param name="globalRetrievalDate" type="text" label="Global retrieval date [YYYY-MM-DD]" help="The recall system will use this date to try to select the appropriate versions below. Leave empty to select current versions." size="25" />
<param name="api_info" display="radio" type="drill_down" label="For user with Galaxy API Key" dynamic_options="vdb_init_tool_user(__trans__)" />
<repeat name="versions" title="Data Source" min="1" max="15">
<param name="database" type="select" label="Data" dynamic_options="vdb_get_databases()" multiple="false" />
<repeat name="retrieval" title="Retrieval" min="0" max="1">
<param name="retrievalId" label="Version date/id" type="select" dynamic_options="vdb_get_versions(database, globalRetrievalDate)"/>
</repeat>
<repeat name="workflows" title="Workflow" min="0" max="5" >
<param name="workflow" type="select" label="Name" dynamic_options="vdb_get_workflows(database)" />
</repeat>
</repeat>
</inputs>
<configfiles>
<configfile name="api_info_path">${__user__.api_keys[0].key}
$api_info
</configfile>
</configfiles>
<outputs>
<data name="log" format="txt" label="Versioned Data Retrieval" />
</outputs>
<code file="versioned_data_form.py" />
<tests>
<test>
<param name="db_type" value="nucl"/>
<!-- ... -->
</test>
</tests>
<help>
.. class:: infomark
**What it does**
This tool retrieves links to current or past versions of fasta or other types of
data from a cache kept in the Galaxy data library called "Versioned Data". It then places
them into one's current history so that subsequent tools can work with that data.
For example, after using this tool to select a version of the NCBI nt database, a blast search can be carried out on it by selecting "BLAST database from your history" from the "Subject database/sequences" field of the NCBI BLAST+ search tool.
You can select one or more files or databases by version date or id. This list
is supplied from the Shared Data > Data Libraries > Versioned Data folder that has
been set up by an administrator.
The Workflows section allows you to select one or more pre-defined workflows
to execute on the versioned data. The results are placed in your history for use
by other tools or workflows.
A caching system exists to cache the versioned data or workflow data that the tool generates.
If you request versioned data or derivative data that isn't cached, it may take time to regenerate.
The top-level "Global retrieval date [YYYY-MM-DD]" field that the form starts with will be applied to
all selected databases. This can be overriden by a retrieval date or version that
you supply for a particular database. Leave it and any "Retrievals" inputs empty if you just need the latest version of selected databases.
-------
.. class:: warningmark
**Note**
Again, some past database versions can take time to regenerate if there is no cached version available, for example NCBI nt is a 50+ gigabyte file that needs to be read through to get a fasta version, and a makeblastdb workflow on top of that can take hours on the first call. Access to cached versions is immediate.
Setup of versioned data sources and workflow options can only be done by a Galaxy administrator.
-------
**References**
If you use this Galaxy tool in work leading to a scientific publication please
cite the following paper:
*Reference coming soon...*
</help>
</tool>