Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in send_upload_message #7088

Merged
merged 7 commits into from
Nov 1, 2023
Merged

Conversation

danieltabacaru
Copy link
Collaborator

@danieltabacaru danieltabacaru commented Oct 30, 2023

What, How & Why?

A race in send_upload_message() causes subscriptions not being sent to the server (and as a side effect a notification on that subscription may never be fulfilled). This can happen when send_upload_message() is called as a result of a commit previous to the one for subscriptions. send_upload_message() then forwards the upload cursor past the version associated with the subscription and so the subscription is never uploaded to the server.

A second bug was uncovered while investigating this issue.

SessionWrapper reads the state of the SubscriptionStore when it is created, but this state can change after a client reset (when then state of existing subscription may change, in addition to new subscriptions being created and old subscriptions being removed). This can cause a crash with realm::KeyNotFound when trying to update a subscription which does not exist anymore.

Fixes #7076, #7090.

☑️ ToDos

  • 📝 Changelog update
  • [ ] 🚦 Tests (or not relevant)
  • [ ] C-API, if public C++ API changed.

@coveralls-official
Copy link

coveralls-official bot commented Oct 30, 2023

Pull Request Test Coverage Report for Build daniel.tabacaru_600

  • 20 of 20 (100.0%) changed or added relevant lines in 3 files are covered.
  • 96 unchanged lines in 13 files lost coverage.
  • Overall coverage decreased (-0.009%) to 91.573%

Files with Coverage Reduction New Missed Lines %
src/realm/sync/instruction_applier.cpp 2 70.42%
src/realm/sync/network/http.hpp 2 80.87%
src/realm/sync/noinst/server/server_history.cpp 2 67.69%
src/realm/sync/network/network.cpp 3 89.87%
src/realm/util/serializer.cpp 3 90.03%
src/realm/query_expression.hpp 4 93.43%
src/realm/sync/noinst/changeset_index.cpp 4 80.97%
src/realm/sync/noinst/client_reset_operation.cpp 4 87.23%
src/realm/unicode.cpp 4 90.15%
src/realm/util/assert.hpp 4 87.5%
Totals Coverage Status
Change from base Build 1792: -0.009%
Covered Lines: 230694
Relevant Lines: 251923

💛 - Coveralls

@danieltabacaru danieltabacaru marked this pull request as ready for review October 30, 2023 14:38
Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this against BAAS version 27f42f55a7944ed7d8ba9fad1854a4b22714cb8d where I had been able to repro the hang in the test "app: flx-sync basic tests" and it looks like it is fixed. Great job tracking down the issue here.
I'd suggest adding a changelog note as it may be affecting customers or SDK tests.

@danieltabacaru
Copy link
Collaborator Author

danieltabacaru commented Oct 30, 2023

I'd suggest adding a changelog note as it may be affecting customers or SDK tests.

I'll definitely add changelog entries for both bugs.

@michael-wb
Copy link
Contributor

michael-wb commented Oct 31, 2023

Could the failure above be related to these changes?

4: Assertion failure: /System/Volumes/Data/data/mci/cf69e96e7aa9ce3fd7f27dd3fa585338/realm-core/test/object-store/sync/flx_migration.cpp:796
4: 	 from expresion: 'active_subs.size() == 2'
4: 	 with expansion: '1 == 2'

https://spruce.mongodb.com/task/realm_core_stable_macos_1100_x64_asan_baas_integration_tests_patch_56a048d9f450445f2d52c7405f3019a533bdaa3f_65402af732f4172339970ae6_23_10_30_22_15_20/tests?execution=0&sortBy=STATUS&sortDir=ASC

@danieltabacaru
Copy link
Collaborator Author

danieltabacaru commented Oct 31, 2023

Could the failure above be related to these changes?

Everything looks fine. I looked at the logs and we do send the query, but it seems we don't wait for it to be complete:

[2023/10/30 22:49:07.675] 4: Connection[3]: Session[7]: Sending: QUERY(query_version=3, query_size=88, query="{"Object":"(realm_id = \"migration-test\")","Object2":"(realm_id = \"migration-test\")"}", snapshot_version=18)

My theory is that there is a race and wait_for_download fires before the query is bootstrapped.

If my theory is correct, one way to fix it is:

timed_sleeping_wait_for(
          [&]() -> bool {
              return realm->get_latest_subscription_set().version() == 2;
          },
          std::chrono::seconds(60));
realm->get_latest_subscription_set()
            .get_state_change_notification(sync::SubscriptionSet::State::Complete)
            .get();

There will be no wait in most cases.

@danieltabacaru danieltabacaru merged commit 1527dc4 into master Nov 1, 2023
@danieltabacaru danieltabacaru deleted the dt/fix_flx_subscriptions branch November 1, 2023 06:42
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FLX subscriptions are not sent to the server
3 participants