-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🗂️Allow version restrictions in dependencies_results
#52
Comments
These are good ideas. Im definitely down to implement something better to define dependencies |
Hi @juanesarango , to elaborate on our use case, we have two major analysis categories that roughly contain steps for preprocessing, one or more analyses, and then some downstream apps relying on the analyses results. All analyses are therefore linked to parent analyses, which trace back to either the proprocessing app or to the raw data itself. We use the following method to fetch the appropriate parent analyses. The repo this comes from is in the msk enterprise github; happy to add you as a collaborator to see it all in context. def get_latest_analysis_results(
app_name,
app_min_version,
assembly,
targets,
results_keys=['fq1', 'fq2'],
verbose=False,
sample_level=False,
):
"""Retrieve analysis results with the highest valid application version.
Args:
app_name (str): application name, e.g. "Shotgun Preprocessing"
app_min_version (str): minimum version of the application, e.g. '2.0.0'
Only analyses whose application version matches this argument's major version will be considered.
assembly (str): the name of the assembly
targets: a list of target experiment objects
results_keys (list): the expected results keys. The default is a fastq pair.
verbose: enable verbose logging
sample_level: find sample_level parent apps. Note this should be set to
FALSE when aggregating experiments to create sample-level
analyses
Returns:
analysis_keys: list with the primary keys of the retrieved analysis
results: dictionary of results
"""
if app_name is None:
if results_keys == ['fq1', 'fq2']:
return get_raw_fastqs_as_analysis(targets)
else:
raise ValueError(
"app_name can only be set to None to retrieve a fastq pair (ie results_keys = ['fq1', 'fq2']"
)
parent_analyses = get_analyses(
targets__pk__in=[x.pk for x in targets],
assembly__name=assembly,
application__name=app_name,
status='SUCCEEDED',
verbose=verbose,
)
assert (
len(parent_analyses) >= 1
), f'No parent {app_name} analyses found for {targets}'
filtered_parent_analyses = filter_candidate_analyses(
parent_analyses, app_min_version, targets=targets, sample_level=sample_level
)
analysis_keys = [x.pk for x in filtered_parent_analyses]
results = {}
for r in results_keys:
for analysis in filtered_parent_analyses:
try:
if r not in results.keys():
results[r] = [analysis.results[r]]
else:
results[r].append(analysis.results[r])
except KeyError:
raise ValueError(
f'Requested result {r} not available from analyses {analysis}'
)
return analysis_keys, results Our
where the Trimming app relies on the "Shotgun Preprocess" app with a minimum valid version of 3.0.0. Let me know if any of this is useful for you; we'd be happy to help get all cleaned up into a PR if so! |
Hi @nick, great stuff. What I do is something very similar.
|
Sure! Internally, we decided that major versions indicate changes to data, minor versions are bumped for results format changes, and patch versions are bumped for any other inconsequential change. Practically, this means that the major version gets bumped any time we have a consequential change to the preprocessing step, as that needs to cascade down to all child apps. So, in short, we check that the candidate parent analysis
def valid_parent_version(parent_version, min_version):
"""Check that the major version matches exactly and that the parent app meets minimum version requirements.
The first check ensure that apps are not run on incompatible upstream data;
eg if preprocessing is at version 2 but a downstream app can work on versions 1 and on,
we want prevent confusion from having both version 1 and version2 downstream results.
"""
major_version_is_valid = (
version.parse(parent_version).major == version.parse(min_version).major
)
meets_or_exceeds = version.parse(parent_version) >= version.parse(min_version)
return major_version_is_valid and meets_or_exceeds
def filter_candidate_analyses(
parent_analyses, app_min_version, targets, sample_level=False
):
# filter for analyses run meeting the min version requirement
# ensure that the major version of the downsampling app matches the
# major version of the preprocessing app and that the preprocessing
# app meets the minimum version of the preprocessing app
parent_analyses[:] = [
x
for x in parent_analyses
if valid_parent_version(x['application']['version'], app_min_version)
]
assert len(parent_analyses) >= 1, f'No analyses met version criteria'
if sample_level:
parent_analyses[:] = [
x
for x in parent_analyses
if get_sorted_targets_string(x.targets)
== get_sorted_targets_string(targets)
]
assert (
len(parent_analyses) >= 1
), f'No sample-level analyses found matching all targets'
# run on the most recent if multiple available, per target
most_recent_parent_analyses = []
for target in targets:
candidates = []
for pa in parent_analyses:
if target.identifier in [x.identifier for x in pa.targets]:
candidates.append(pa)
assert (
len(candidates) > 0
), f'no analyses from {parent_analyses} matched target identifiers: {target.identifier}. Should this sample be excluded?'
parent_versions = [
version.parse(x['application']['version']) for x in candidates
]
mrpa = [
x
for x in candidates
if version.parse(x['application']['version']) == max(parent_versions)
]
assert (
len(mrpa) == 1
), f'multiple valid parent versions for {target.identifier}: {mrpa}'
# don't add duplicate analyses; this is relevant for sample-level analyses
# where each target can have multiple parent sets
if mrpa[0].pk not in [x.pk for x in most_recent_parent_analyses]:
most_recent_parent_analyses.append(mrpa[0])
return most_recent_parent_analyses |
Maybe Isabl could accept a string to use with
https://packaging.pypa.io/en/latest/specifiers.html#packaging.specifiers.SpecifierSet |
The new
dependencies_result
features in PRs #22 #51 looks very nice.We often need to get the latest results from an app, given some version restrictions.
For example, we want an app to pull in the latest results from a dependency such that the dependency app's major version is
3
.Or we might also want to be able to specify a version range like
>=3.1,<4
(it wouldn't necessarily have to be specified in this format though).The text was updated successfully, but these errors were encountered: