Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataShard: uncommitted changes are leaking #13387

Closed
snaury opened this issue Jan 15, 2025 · 1 comment · Fixed by #13444
Closed

DataShard: uncommitted changes are leaking #13387

snaury opened this issue Jan 15, 2025 · 1 comment · Fixed by #13444
Assignees
Labels
area/datashard Issues related to datashard tablets (relational table partitions)

Comments

@snaury
Copy link
Member

snaury commented Jan 15, 2025

Ever since adding the OpenTxCount counter I noticed that it's abnormally high while running Jepsen. Initially I dismissed it as maybe cleanup doesn't happen fast enough, but now I can see that some databases have very high number of open tablet transactions as well, even when databases have no activity. I just tried stopping Jepsen mid run and examined tables, and can confirm that tablets have hundreds of persistent transactions open without corresponding locks or volatile transactions. Compacting and rebooting tablets doesn't help, there are some persistent changes that are marked neither committed nor removed.

We are supposed to commit or remove uncommitted transactions when removing corresponding lock or volatile tx, and we clean them up before split/merge. The fact they are leaking is extremely troublesome, and might suggest something is very wrong at the tablet layer, with TxStatus parts maybe?

@snaury snaury added the area/datashard Issues related to datashard tablets (relational table partitions) label Jan 15, 2025
@snaury snaury self-assigned this Jan 15, 2025
@snaury snaury changed the title DataShard: open transactions are leaking DataShard: persistent transactions are leaking Jan 15, 2025
@snaury snaury changed the title DataShard: persistent transactions are leaking DataShard: uncommitted changes are leaking Jan 15, 2025
@snaury
Copy link
Member Author

snaury commented Jan 15, 2025

After some experiments it looks like when we set a read lock and it's broken, but we later try to write into the same lock, we write uncommitted changes with that LockTxId first, but then in ApplyLocks we discover the lock is broken so we don't extend it with write ranges and it's not persisted. LocalDB transaction commits, but now those changes are not attached to anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datashard Issues related to datashard tablets (relational table partitions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant