Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HHH-18326 Inefficient identity hash code computation tracking persistent instances #9650

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mbladel
Copy link
Member

@mbladel mbladel commented Jan 21, 2025

https://hibernate.atlassian.net/browse/HHH-18326

This PR tries to address the performance impact of relying on identity hashes to track persistent instances. See the initial benchmark results for the current state here: hibernate/hibernate-orm-benchmark#15.

This strategy is using a unique int (but could be long too) instance id that is assigned once an entity/collection is made persistent, and through that maps instances to a new collection (InstanceIdentityMap) that simply stores information in an array using the unique id as index. This allows us to avoid relying on identity hash code for:

  • immutable bytecode-enhanced entities
  • persistent collections

I will add more details on the proposed change along with benchmarks results of using the new strategy in the linked hibernate-orm-benchmarks PR, but to synthetize them here:

  • querying 100 immutable bytecode-enhanced entities, we get 8.4% more ops/s than before and we align with the result of mutable entities (that do not need to rely on a map at all); increasing the entity count to 1_000 the result is similar, with 9.5% more ops/s. Looking at the flamegraphs StatefulPersistenceContext.addEntry matches are basically aligned between mutable true and false tests.
  • querying 100 entities containing collections, we get 18.3% more ops/s with a mapping of only 2 collections (increasing to more than 50% for 7+ collections) if we're not initializing the associations (matches for StatefulPersistenceContext.addCollection go down from 17.4% to 1.8%; if we do initialize the collections lazily (i.e. 1 additional query for each collection) we still get a very relevant performance increase with a 5-10% improvement in ops/s for 2 to 7 mapped collections, dropping to just 1% with 16 mapped collections.

Note this solution requires a change in the bytecode-enhanced entity classes, adding a new field to store the required unique ID. Other than that, only the ManagedEntity and PersistentCollection SPI interfaces have been changed. Marking this as draft as it will probably need to wait until after 7.0.0 is released.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license
and can be relicensed under the terms of the LGPL v2.1 license in the future at the maintainers' discretion.
For more information on licensing, please check here.


@mbladel mbladel force-pushed the instance-id branch 4 times, most recently from 010c6dc to 0db7d26 Compare January 21, 2025 20:56
@yrodiere
Copy link
Member

Marking this as draft as it will probably need to wait until after 7.0.0 is released.

Wouldn't 7.0 be a better time than 7.1 to make potentially dangerous changes? cc @sebersole

@mbladel
Copy link
Member Author

mbladel commented Jan 24, 2025

Updated the PR accounting for the initial concerns. For anyone interested, here you can find the updated benchmark results.

@mbladel mbladel marked this pull request as ready for review January 24, 2025 09:13
}
return this.collectionEntries;
collection.$$_hibernate_setInstanceId( nextCollectionInstanceId() );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
collection.$$_hibernate_setInstanceId( nextCollectionInstanceId() );
assert collection.$$_hibernate_getInstanceId() == -1;
collection.$$_hibernate_setInstanceId( nextCollectionInstanceId() );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collection might have been associated to a different persistence context, we don't reset instance ids.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but didn't we discuss that we could unset the instance id in unsetSession() when changing the order of operations that call this method right now?
I'm not saying this check is super critical to have, just that it might be a nice way to capture potential bugs in our code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, for collections I do believe we can reset it without having side-effects, and the pros are we can add this assertion.

For cons, handling -1 adds an if check every time we access the map (instanceId >= 0) and since we already do the == check, we don't strictly need it. Also, for entities that would be not ideal, as we'd need to instrument enhanced entities' constructors and when resetting we'd run into an additional itable stub call, which impacts performance.

Not sure what I prefer, but keeping both behaviors aligned seems more consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resetting happens in org.hibernate.engine.internal.EntityEntryContext#removeEntityEntry and there are currently 2 itable callsites there, I suggest we introduce a new method that handles "reset" which manages previous/next entries and the instance id, which will even improve perf further :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single method for previous/next is a great improvement idea! Why not do that as a separate improvement?

I would not stress too much on resetting ids though, as we've already discussed there might still be cases which we need to handle (with the == check), and it will require an additional check on the id when accessing the collection, so I don't see that much additional value by doing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single method for previous/next is a great improvement idea! Why not do that as a separate improvement?

Sounds good to me.

I would not stress too much on resetting ids though, as we've already discussed there might still be cases which we need to handle (with the == check), and it will require an additional check on the id when accessing the collection, so I don't see that much additional value by doing that.

Like I wrote before, it's not super critical to have this, it would IMO just be nice because it can be a nice safety belt for us. Since we'd only check the id in assertions, regular code would not be impacted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ( immutableManagedEntityXref == null ) {
immutableManagedEntityXref = new IdentityHashMap<>();
if ( !isReferenceCachingEnabled( entityEntry.getPersister() ) ) {
managed.$$_hibernate_setInstanceId( nextManagedEntityInstanceId() );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
managed.$$_hibernate_setInstanceId( nextManagedEntityInstanceId() );
assert managed.$$_hibernate_getInstanceId() == -1;
managed.$$_hibernate_setInstanceId( nextManagedEntityInstanceId() );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entity might have been associated to a different persistence context, we don't reset instance ids.

*/
public interface InstanceIdentity {
/**
* Retrieve the unique identifier of this instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Retrieve the unique identifier of this instance
* Retrieve the unique identifier of this instance, or {@code -1} if the instance is not associated to a persistence context.

@@ -225,6 +225,14 @@ private DynamicType.Builder<?> doEnhance(Supplier<DynamicType.Builder<?>> builde
EnhancerConstants.USE_TRACKER_SETTER_NAME
);

builder = addFieldWithGetterAndSetter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to initialize the field to -1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to intercept and enhance all entity constructors to do that. Also, since we don't reset instance ids and we have the == check to make sure we have the actual same object, IMO setting this to -1 doesn't bring much value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, you just have to call .value( -1 ) after defineField() e.g.

				.defineField( fieldName, type, constants.fieldModifierPRIVATE_TRANSIENT )
						.value( -1 )
						.annotateField( constants.TRANSIENT_ANNOTATION )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be only for static fields, see raphw/byte-buddy#553 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting the value could happen in org.hibernate.engine.internal.EntityEntryContext#removeEntityEntry which also manages the entity entry reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would result in an itable stub invocation, which would degrade performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it seems all constructors would have to initialize this field :/
raphw/byte-buddy#227

if ( page != null ) {
final int offset = toPageOffset( keyIndex );
final Object k = page.get( offset );
if ( k == key ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw an exception when the keys don't match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't: in case of "pre-set" instance-ids (see our previous discussion), it's plausible that we find an existing instance in the array at the same index but that isn't the same object. IMO we should just return null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is dependent on whether we want/can ensure instance ids are reset.

final int pageOffset = toPageOffset( keyIndex );
Object k = page.set( pageOffset, null );
// Check that the provided instance really matches with the key contained in the store
if ( k == key ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw an exception when the keys don't match?

final Map.Entry<K, V> entry = page.set( pageOffset, null );
// Check that the provided instance really matches with the key contained in the map
if ( entry != null ) {
if ( entry.getKey() == key ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw an exception when the keys don't match?

*/
public @Nullable V get(int instanceId, Object key) {
final Entry<K, V> entry = get( instanceId );
if ( entry != null && entry.getKey() == key ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw an exception when the keys don't match?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants