-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCORE-2152 Don't emit transaction log instructions for mutations on newly-created objects #7734
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pull Request Test Coverage Report for Build thomas.goyne_391Details
💛 - Coveralls |
tgoyne
force-pushed
the
tg/create-object-repl
branch
5 times, most recently
from
June 3, 2024 17:39
5dca5c6
to
a3880c3
Compare
We appear to currently not have any benchmarks which exercise this very well; everything which measures insertion performance only has one column and this becomes more beneficial the more columns there are. The single column case is ~5% faster. |
tgoyne
force-pushed
the
tg/create-object-repl
branch
from
June 3, 2024 18:20
a3880c3
to
feb080c
Compare
tgoyne
changed the title
Don't emit transaction log instructions for mutations on newly-created objects
RCORE-2152 Don't emit transaction log instructions for mutations on newly-created objects
Jun 3, 2024
danieltabacaru
approved these changes
Jun 4, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
tgoyne
force-pushed
the
tg/create-object-repl
branch
2 times, most recently
from
June 4, 2024 19:00
ef23dc3
to
8e51722
Compare
Nothing observes this table for notifications, so we don't need CT history. This is a pretty minor optimization unless we get a very large number of very small changests from the server.
…d objects The non-sync transaction logs are only used to drive notifications and notifications don't care about mutations on objects in the same commit as the objects were created it, so we don't need to emit the instructions at all. This significantly cuts the size of the transaction log for commits which are primarily inserting objects. This does a very basic check for "newly-created" which tracks the most recently created object for each table and skips mutation instructions for that object. This handles recursively creating an object and all of its embedded objects without the overhead of tracking every single object created within a transaction, and insertion workflows will typically not return to an object after creating another object in the same table. This requires adding an additional small amount of tracking for embedded objects, as Replication previously didn't know when new embedded objects were created.
These were used for converting to the file format where ObjKeys were derived from primary keys.
tgoyne
force-pushed
the
tg/create-object-repl
branch
from
June 4, 2024 21:52
8e51722
to
5d5e26d
Compare
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The non-sync transaction logs are only used to drive notifications and notifications don't care about mutations on objects in the same commit as the objects were created it, so we don't need to emit the instructions at all. This significantly cuts the size of the transaction log for commits which are primarily inserting objects.
This does a very basic check for "newly-created" which tracks the most recently created object for each table and skips mutation instructions for that object. This handles recursively creating an object and all of its embedded objects without the overhead of tracking every single object created within a transaction, and insertion workflows will typically not return to an object after creating another object in the same table.
This requires adding an additional small amount of tracking for embedded objects, as Replication previously didn't know when new embedded objects were created.
The sample Realm which prompted this had a ton of CT history entries for updating the bootstrap store that were mildly annoying to skip over while looking for the actual bootstrap application instructions, so I made the bootstrap store not emit CT history. I don't think this is a meaningful optimization outside of very narrow circumstances (e.g. if the server decides to send us a whole bunch of single-object changesets).
realm-trawler turned out to be broken for sync Realms, so I fixed that.
The existing object-store notification tests do a pretty good job of validating that instructions are emitted when they need to be, so the new tests focus on validating that they aren't when they shouldn't, which isn't testable at the object-store level (as the point of this is to stop emitting instructions which the object-store transaction log handler was just discarding).