[io] Extract tkey walk logic from TFile::Map() #17575

silverweed · 2025-01-30T11:37:02Z

This Pull request:

refactors TFile::Map into 2 methods: Map and WalkTKeys. The latter contains the logic of traversing the TKeys in the file and returns an array with information about keys, gaps and errors. Map now simply calls that method and prints out the relevant information, in the same format as before.

The main advantage of splitting WalkTKeys is that it can be used by other places (like unit tests or client code) that are interested in the internal TKey structure.

Checklist:

tested changes locally
updated the docs (if necessary)

jblomer

I like the idea! In this approach, all keys are loaded in memory before printing the information. Do we need a piecewise / cursor based API?

io/io/inc/TFile.h

silverweed · 2025-01-30T12:38:54Z

In this approach, all keys are loaded in memory before printing the information. Do we need a piecewise / cursor based API?

This is possible, although a file containing 1 million keys would only occupy 68 MiB of memory for the TKeyMapNodes (for reference, the 3.8 GB ttjet_13tev benchmark dataset has 278470 keys - about 38 MiB of memory). This is not counting the classname/keyname/key title strings; with them the figure is likely doubled or so.
Generating the array seems to be also quite fast on my machine (<1s in debug mode)

pcanal · 2025-01-30T15:11:59Z

The current observed maximum number of baskets in TTree is 50 millions baskets ... and only because it reaches the 1Gb limit for the TTree object. It will/can grow larger once we lift the 1Gb limit and can already reach larger size with RNTuple (probably not quite as easily due to page size being larger than basket sizes).

Nonetheless that is 3.1 GiB of memory for the TKeyMapNodes .... so indeed I would recommend some sort of iterators mechanism (other-wise the code simply 'crash/out-of-memory' for large files.

github-actions · 2025-01-30T18:16:33Z

Test Results

16 files 16 suites 4d 8h 50m 19s ⏱️
2 648 tests 2 646 ✅ 0 💤 2 ❌
41 759 runs 41 757 ✅ 0 💤 2 ❌

For more details on these failures, see this check.

Results for commit 45adcda.

silverweed added the in:I/O label Jan 30, 2025

silverweed requested review from jblomer, hahnjo, dpiparo, vepadulano and enirolf January 30, 2025 11:37

silverweed self-assigned this Jan 30, 2025

silverweed requested a review from pcanal as a code owner January 30, 2025 11:37

jblomer reviewed Jan 30, 2025

View reviewed changes

io/io/inc/TFile.h Outdated Show resolved Hide resolved

silverweed mentioned this pull request Jan 30, 2025

[ntuple] RMiniFile: properly write the free slot's nbytes #17568

Open

2 tasks

silverweed force-pushed the tfilemap_refactor branch from 6f4457a to 0491d72 Compare January 30, 2025 12:42

[io] Extract tkey walk logic from TFile::Map()

45adcda

silverweed force-pushed the tfilemap_refactor branch from 0491d72 to 45adcda Compare January 30, 2025 14:31

[io] Change TFile::WalkTKeys() to return an iterable

cde8d52

silverweed force-pushed the tfilemap_refactor branch from ef10a54 to cde8d52 Compare January 31, 2025 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[io] Extract tkey walk logic from TFile::Map() #17575

[io] Extract tkey walk logic from TFile::Map() #17575

silverweed commented Jan 30, 2025

jblomer left a comment

silverweed commented Jan 30, 2025 •

edited

Loading

pcanal commented Jan 30, 2025

github-actions bot commented Jan 30, 2025

[io] Extract tkey walk logic from TFile::Map() #17575

Are you sure you want to change the base?

[io] Extract tkey walk logic from TFile::Map() #17575

Conversation

silverweed commented Jan 30, 2025

This Pull request:

Checklist:

jblomer left a comment

Choose a reason for hiding this comment

silverweed commented Jan 30, 2025 • edited Loading

pcanal commented Jan 30, 2025

github-actions bot commented Jan 30, 2025

Test Results

silverweed commented Jan 30, 2025 •

edited

Loading