Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hadd] add option to filter objects/directories to merge #17612

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

silverweed
Copy link
Contributor

This Pull request:

depends on #17609.
It's a new take on #4615, which adds two flags to hadd:

  • -L <FILE>: specify a file that contains object or directory names, one per line, to be used as a filter. The filter type is specified by the second flag:
  • -Ltype <OnlyListed|SkipListed>: determines whether the list of object names contained in FILE should be used as a whitelist or a blacklist to know which objects to merge. This internally uses TFileMerger::AddObjectNames, so it has the same restrictions, notably you cannot specify a subobject name such as dir/object. If this is a desired feature, it should be added to TFileMerger directly, in a separate PR.

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

Copy link

github-actions bot commented Feb 4, 2025

Test Results

    18 files      18 suites   5d 7h 41m 59s ⏱️
 2 720 tests  2 719 ✅ 0 💤 1 ❌
47 262 runs  47 261 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit ebe491f.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but should have a corresponding test in roottest.

@@ -606,6 +674,10 @@ int main(int argc, char **argv)
if (args.fCacheSize)
Info() << "Using " << cacheSize << "\n";

////////////////////////////// end flags processing /////////////////////////////////

gSystem->Load("libTreePlayer");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, it was there already (I just moved it past the flags processing in this PR). Perhaps @pcanal knows it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency was originally a explicit link added to "allow support for TTree with a build index" and then was moved to a run-time dependency because the usage is hidden being a layer of indirect and was dropped in some case by the remove unused dependency feature of the linker (or something like that).

The original commit ended with "until hadd can use the autoloader" which is likely the case now.

So it probably can be technically removed.

A simple test would to create a TTree, Create and index, save it to the file, merge the result a couple of times and check the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we want to do this in a separate PR

@@ -34,6 +34,16 @@ def get_argparse():
"compression setting after fk (for example 206 when using -fk206)"), action = 'store_true')
parser.add_argument("-ff", help="The compression level use is the one specified in the first input", action = 'store_true')
parser.add_argument("-k", help="Skip corrupt or non-existent files, do not exit", action = 'store_true')
parser.add_argument("-L", help=textwrap.fill(
"Read the list of objects from FILE and either only merge or skip those objects depending on "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a powerful new features but does not match the title of the PR.

The title talks about filtering which files are being merged together while the feature documented here and implemented below is about filtering which objects and TDirectories within the files are actually merged (but does not affect the list of files opened).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See tutorials/io/mergeSelective.C

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcanal the PR title was indeed wrong; I changed "files" to "objects".

main/src/hadd-argparse.py Outdated Show resolved Hide resolved
main/src/hadd-argparse.py Outdated Show resolved Hide resolved
main/src/hadd.cxx Outdated Show resolved Hide resolved
Adding -L and -Ltype flags, which allow passing a file name as a
"filter file", containing one object path per line. -Ltype determines
the filter mode (SkipListed/OnlyListed).
@silverweed silverweed changed the title [hadd] add option to filter files to merge [hadd] add option to filter objects/directories to merge Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants