Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[io] Extract tkey walk logic from TFile::Map() #17575

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

silverweed
Copy link
Contributor

This Pull request:

refactors TFile::Map into 2 methods: Map and WalkTKeys. The latter contains the logic of traversing the TKeys in the file and returns an array with information about keys, gaps and errors. Map now simply calls that method and prints out the relevant information, in the same format as before.

The main advantage of splitting WalkTKeys is that it can be used by other places (like unit tests or client code) that are interested in the internal TKey structure.

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea! In this approach, all keys are loaded in memory before printing the information. Do we need a piecewise / cursor based API?

io/io/inc/TFile.h Outdated Show resolved Hide resolved
@silverweed
Copy link
Contributor Author

silverweed commented Jan 30, 2025

In this approach, all keys are loaded in memory before printing the information. Do we need a piecewise / cursor based API?

This is possible, although a file containing 1 million keys would only occupy 68 MiB of memory for the TKeyMapNodes (for reference, the 3.8 GB ttjet_13tev benchmark dataset has 278470 keys - about 38 MiB of memory). This is not counting the classname/keyname/key title strings; with them the figure is likely doubled or so.
Generating the array seems to be also quite fast on my machine (<1s in debug mode)

@pcanal
Copy link
Member

pcanal commented Jan 30, 2025

The current observed maximum number of baskets in TTree is 50 millions baskets ... and only because it reaches the 1Gb limit for the TTree object. It will/can grow larger once we lift the 1Gb limit and can already reach larger size with RNTuple (probably not quite as easily due to page size being larger than basket sizes).

Nonetheless that is 3.1 GiB of memory for the TKeyMapNodes .... so indeed I would recommend some sort of iterators mechanism (other-wise the code simply 'crash/out-of-memory' for large files.

Copy link

Test Results

    16 files      16 suites   4d 8h 50m 19s ⏱️
 2 648 tests  2 646 ✅ 0 💤 2 ❌
41 759 runs  41 757 ✅ 0 💤 2 ❌

For more details on these failures, see this check.

Results for commit 45adcda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants