Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Fix handling of I/O rules' inputs #17534

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

jblomer
Copy link
Contributor

@jblomer jblomer commented Jan 27, 2025

Fix execution of I/O customization rules that have on-disk data member sources (inputs). Previously, TVirtualObject::fObject pointed to the target object, which is plain wrong (but happens to work in some cases). Now, rule inputs are examined, fields are created according to the provided input's type, and the data is read in a staging area that follows the memory layout of the on-disk class given by TClass. This staging area is now what TVirtualObject::fObject points to.

Note that the target still has to be a transient member. This restriction is going to be lifted in a second PR.

@jblomer jblomer self-assigned this Jan 27, 2025
@jblomer jblomer marked this pull request as draft January 27, 2025 10:53
}

// For the time being, we only support rules targeting transient members
auto referencesNonTransientMembers = [classp = fClass](const ROOT::TSchemaRule *rule) {
Copy link
Member

@pcanal pcanal Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto referencesNonTransientMembers = [classp = fClass](const ROOT::TSchemaRule *rule) {
auto targetsNonTransientMembers = [classp = fClass](const ROOT::TSchemaRule *rule) {

'references' could also means 'listed as one of the sources'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(side question: can transient members ever be a source? They are not onfile after all...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, transient member must not be a source; @pcanal ?

Copy link

github-actions bot commented Jan 27, 2025

Test Results

    18 files      18 suites   4d 1h 16m 8s ⏱️
 2 688 tests  2 687 ✅ 0 💤 1 ❌
46 688 runs  46 686 ✅ 0 💤 2 ❌

For more details on these failures, see this check.

Results for commit aac01e8.

♻️ This comment has been updated with latest results.

TClass::GetClass(classFieldDesc.GetTypeName().c_str())->GetStreamerInfo(classFieldDesc.GetTypeVersion());
clName += std::string("@@") + std::to_string(classFieldDesc.GetTypeVersion());
}
auto cl = TClass::GetClass(clName.c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding the @@ class relies on the related StreamerInfo to have been build, is that a guarantee here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the line before, calling TClass::GetClass(<class name>)->GetStreamerInfo(<version used in @@>) is enough. Is another call necessary?

const RFieldDescriptor &classFieldDesc)
{
auto clName = classFieldDesc.GetTypeName();
if (classFieldDesc.GetTypeVersion() != GetTypeVersion()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is not the right test. There can be staging even for the current version.

Inside TStreamerInfo the equivalent test are either (during building) whether there is a rule that applies for the pair (current version and onfile version) or whether first element of the TStreamerInfo being used start with an @@alloc element (i.e. an element with type 'TStreamerInfo::kCacheNew') [The element also give direct access to the staging/alloc TClass).

With the file sameversion.h, we get:

root [0] .L sameversion.h+
root [1] auto cl = TClass::GetClass("SameClassRule");
root [2] cl->GetStreamerInfo()->ls()

StreamerInfo for class: SameClassRule, version=3, checksum=0x1bf8119e
  SameClassRule@@3 @@alloc         offset=  0 type=1001                     
  int            fValue          offset=  8 type= 3  (cached,repeat)                     
  int            fValue          offset=  0 type= 3                     
  void           SameClassRule_rule0 offset=  0 type=1000  (wholeObject)                     
  SameClassRule@@3 @@dealloc       offset=  0 type=1002                     
   i= 0, @@alloc         type=1001, offset=  0, len=1, method=0
   i= 1, fValue          type=  3, offset=  8, len=1, method=0 [cached,repeat]
   i= 2, fValue          type=  3, offset=  0, len=1, method=0
   i= 3, SameClassRule_rule0 type=1000, offset=  0, len=1, method=0 [wholeObject]
   i= 4, @@dealloc       type=1002, offset=  0, len=1, method=0
// File sameversion.h
#pragma once

struct SameClassRule
{
   int fValue;
};


#ifdef __ROOTCLING__
#pragma link C++ options = version(3) class SameClassRule+;
#pragma read sourceClass="SameClassRule" targetClass="SameClassRule" versions="[1-]" \
   source="int fValue" target="" code="{ printf(\"inside rule\n\"); }"
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the code, the staging area is created regardless of this check, which is only there to determine which clName to ask from TClass::GetClass.


stagingItem.fOffset = cl->GetDataMemberOffset(source->GetName());
stagingAreaSize = std::max(stagingAreaSize, stagingItem.fOffset + stagingItem.fField->GetValueSize());
R__ASSERT(stagingAreaSize > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stagingAreaSize is already calculated and we should have at the end stagingAreaSize == cl->Size();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may skip rules, so we'll check for stagingAreaSize <= fStagingClass->Size()

Comment on lines -838 to +982
if (GetTypeVersion() != 1) {
if (GetOnDiskTypeVersion() != 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note for future reviewers: the in-memory class version is asserted in the constructor)

tree/ntuple/v7/inc/ROOT/RField.hxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RFieldMeta.cxx Show resolved Hide resolved
}

// For the time being, we only support rules targeting transient members
auto referencesNonTransientMembers = [classp = fClass](const ROOT::TSchemaRule *rule) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(side question: can transient members ever be a source? They are not onfile after all...)

tree/ntuple/v7/src/RFieldMeta.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RFieldMeta.cxx Outdated Show resolved Hide resolved
}

for (const auto rule : rules) {
AddReadCallbacksFromIORule(rule);
}

// Iterate over all sub fields in memory and mark those as missing that are not in the descriptor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is important, but before the refactoring GetOnDiskId() == kInvalidDescriptorId would early-exit while now we land here. I think there's no observable difference because SetArtificial() applies recursively to all sub-fields, so they should have already been marked by the parent...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the point that if GetOnDiskId() == kInvalidDescriptorId then knownSubFields is empty, so the loop does nothing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, knownSubFields is empty so knownSubFields.count(field->GetFieldName()) will always be 0, marking all sub-fields as artificial. As I said, this may be fine...

const RFieldDescriptor &classFieldDesc)
{
auto clName = classFieldDesc.GetTypeName();
if (classFieldDesc.GetTypeVersion() != GetTypeVersion()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the code, the staging area is created regardless of this check, which is only there to determine which clName to ask from TClass::GetClass.

tree/ntuple/v7/src/RFieldMeta.cxx Show resolved Hide resolved
@jblomer jblomer marked this pull request as ready for review February 1, 2025 23:38
@jblomer jblomer requested review from pcanal and hahnjo February 1, 2025 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants