-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to create hdf trigger merge files from PyCBC Live output trigger files #4697
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be discussion of certain points I'm not sure of (i.e. do we need template_hash sorting in this), but this is broadly right.
I think the arguments regarding the file discovery should be reworked / simplified though. I think something like #4354 should work, so I'll put the work in for that to be pulled in here
We are using this for inputs to fit_by_template and bin_trigger_rates_dq, which both use stat-threshold as input and discard anything below that - does it then make sense to implement the same here? That could save:
|
During testing, this code is quite slow, so I thought about what the later uses are; They both apply cuts straight away. As a result, I think adding cuts would be good here and we can drastically reduce
This should mean that we can reuse a lot of the file-reading code from |
I've just found a bug, but not sure how best to fix it - in the case that there are no triggers from that template, the boundaries are not being set properly, so there are not enough region references being made
I think I can find the bug / propose a fix fairly soon |
39d94be
to
ab4124d
Compare
ab4124d
to
dfcb4b3
Compare
DRAFT: changes to trigger collation script
@titodalcanton Gareth has added all the trigger finding stuff (and resolved a lot of the comments you had before). |
time -v output:
So approx 20 minutes, 1Gb memory usage, and it looks like most time is spent in I/O Link as the image seems to have disappeared for zooming in: https://ldas-jobs.ligo.caltech.edu/~gareth.cabourndavies/pycbclive/collate_trigs/testing/collate_profile.png |
bin/live/pycbc_live_collate_triggers
Outdated
if args.output_trigger_file_list: | ||
logging.info( | ||
'Writing list of trigger files to %s ', | ||
args.output_trigger_file_list | ||
) | ||
with open(args.output_trigger_file_list, 'w') as f: | ||
for item in trigger_files: | ||
f.write("%s\n" % item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArthurTolley is this still wanted or is it for testing only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was good for testing because the file system does spend a lot of time looking for the files themselves depending on how many directories were being searched but I don't think we need it anymore.
bin/live/pycbc_live_collate_triggers
Outdated
n_triggers_cut[ifo], | ||
) | ||
|
||
with h5py.File(args.bank_file,'r') as bank_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to the start, then you can avoid closing/reopening the output file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it should
Changes implemented, but I have now contributed enough directly that I wouldnt feel comfortable as a reviewer
poke @titodalcanton on approval/final review for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
…er files (gwastro#4697) * combined trigger file list and collate triggers, untested * working state * removing print statement * Cleaning up argparse and unused code * Trigger file checking and ifo permutation * Rework trigger file finding in pycbc_live_single_trigger_fits * enumerate + continue indentation * implement minor changes --------- Co-authored-by: GarethCabournDavies <[email protected]>
…er files (gwastro#4697) * combined trigger file list and collate triggers, untested * working state * removing print statement * Cleaning up argparse and unused code * Trigger file checking and ifo permutation * Rework trigger file finding in pycbc_live_single_trigger_fits * enumerate + continue indentation * implement minor changes --------- Co-authored-by: GarethCabournDavies <[email protected]>
This script is used to convert a large number of PyCBC Live trigger output files to a hdf trigger merge file.
Standard information about the request
This is a: new feature
This change affects: the live search
This change changes: scientific output
Motivation
Using template fits in the PyCBC Live search (#4527) requires template fits to be made separate to using them. One of the requirements to make these files is a hdf trigger merge. This code produces that trigger merge file.
Contents
The PyCBC Live search outputs a trigger file every stride (8s for Live, 1s for Early Warning) containing any triggers found within the previous stride. This code takes a number of options to be given these trigger files and will then collate them into a single hdf object: from a list of trigger files, from a directory containing subdirectories containing trigger files (typically one subdirectory for each day), from a start and end date, from a start date and a number of days, from a gps start and end time.
The template fit creation code expects the triggers to be in a certain format (the offline trigger file format) and therefore the Live triggers need to be converted to match that format, creating new datasets where needed and ensuring other datasets are correct: for example, PyCBC Live stores chisq as reduced chisq whereas the offline search doesn't, so we convert these PyCBC Live triggers to the offline format.
The triggers are also sorted by template_id and region references are created using the template_id boundaries to allow for rapid access in future codes.
Links to any issues or associated PRs
Testing performed
I have taken 2 days worth of PyCBC Live triggers with H1, L1 & V1 triggers and run the following scripts to test:
From a file containing a list of trigger files:
From a directory containing subdirectories containing trigger files:
From a start and end date:
From a start date and a number of days:
From a start and end gps time: