You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ever since adding the OpenTxCount counter I noticed that it's abnormally high while running Jepsen. Initially I dismissed it as maybe cleanup doesn't happen fast enough, but now I can see that some databases have very high number of open tablet transactions as well, even when databases have no activity. I just tried stopping Jepsen mid run and examined tables, and can confirm that tablets have hundreds of persistent transactions open without corresponding locks or volatile transactions. Compacting and rebooting tablets doesn't help, there are some persistent changes that are marked neither committed nor removed.
We are supposed to commit or remove uncommitted transactions when removing corresponding lock or volatile tx, and we clean them up before split/merge. The fact they are leaking is extremely troublesome, and might suggest something is very wrong at the tablet layer, with TxStatus parts maybe?
The text was updated successfully, but these errors were encountered:
After some experiments it looks like when we set a read lock and it's broken, but we later try to write into the same lock, we write uncommitted changes with that LockTxId first, but then in ApplyLocks we discover the lock is broken so we don't extend it with write ranges and it's not persisted. LocalDB transaction commits, but now those changes are not attached to anything.
Ever since adding the OpenTxCount counter I noticed that it's abnormally high while running Jepsen. Initially I dismissed it as maybe cleanup doesn't happen fast enough, but now I can see that some databases have very high number of open tablet transactions as well, even when databases have no activity. I just tried stopping Jepsen mid run and examined tables, and can confirm that tablets have hundreds of persistent transactions open without corresponding locks or volatile transactions. Compacting and rebooting tablets doesn't help, there are some persistent changes that are marked neither committed nor removed.
We are supposed to commit or remove uncommitted transactions when removing corresponding lock or volatile tx, and we clean them up before split/merge. The fact they are leaking is extremely troublesome, and might suggest something is very wrong at the tablet layer, with TxStatus parts maybe?
The text was updated successfully, but these errors were encountered: