Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async operations should terminate with an error if a fatal session error occurs #5315

Closed
nirinchev opened this issue Mar 8, 2022 · 7 comments · Fixed by #7073
Closed

Async operations should terminate with an error if a fatal session error occurs #5315

nirinchev opened this issue Mar 8, 2022 · 7 comments · Fixed by #7073
Assignees

Comments

@nirinchev
Copy link
Member

All async wait operations - Realm::get_synchronized_realm,wait_for_download_completion, get_state_change_notification will wait forever in the event of a session error. We should instead cancel the wait and propagate the error back to the waiter so that they can handle it. To repro, make an additive schema change when dev mode is off and call wait_for_upload_completion. The logs should say something like:

ending session with error: integrating changesets failed: additive schema change: adding schema for Realm table "Board", additive changes from clients are restricted when developer mode is disabled (ProtocolErrorCode=201)

But the wait callback will not be invoked.

@sync-by-unito
Copy link

sync-by-unito bot commented Feb 7, 2023

➤ Jonathan Reams commented:

So I'm looking at the implementation of SyncSession::handle_error(). At the end of the function, before we call the error handler, if the result of the big switch statement is that next_state is None, then we check if SyncConfig::cancel_waits_on_nonfatal_error and cancel all the pending waits if it is or do nothing if it isn't. If the next_state is Inactive, then we tear down the SyncSession which should also cancel all the waits, otherwise if the error is fatal and the next_state is Error we unconditionally cancel all waits. We don't cancel any pending FLX subscription state change promises, but we easily could.

I've written this integration test and it appears the test case does what we want already

TEST_CASE("app: sync error cancels pending waits", "[sync][app]") {
    std::string base_url = get_base_url();
    const std::string valid_pk_name = "_id";
    REQUIRE(!base_url.empty());

    std::vector<ObjectSchema> schema{
        {"TopLevel",
         {
             {valid_pk_name, PropertyType::ObjectId, Property::IsPrimary{true}},
         }},
    };

    auto server_app_config = minimal_app_config(base_url, "set_new_embedded_object", {schema});
    server_app_config.dev_mode_enabled = false;
    TestAppSession app_session(create_app(server_app_config));
    auto partition = random_string(100);

    create_user_and_log_in(app_session.app());
    SyncTestFile test_file(app_session.app()->current_user(), partition, schema);
    auto [error_occured_promise, error_occurred] = util::make_promise_future<void>();
    test_file.sync_config->error_handler = [promise = util::CopyablePromiseHolder(std::move(error_occured_promise))](
                                               std::shared_ptr<SyncSession>, SyncError) mutable {
        promise.get_promise().emplace_value();
    };

    auto realm = Realm::get_shared_realm(test_file);
    wait_for_download(*realm);

    realm->sync_session()->pause();
    auto [download_complete_promise, download_complete] = util::make_promise_future<void>();
    realm->sync_session()->wait_for_upload_completion(
        [promise = std::move(download_complete_promise)](std::error_code) mutable {
            promise.emplace_value();
        });

    schema[0] = {"TopLevel",
                 {
                     {valid_pk_name, PropertyType::ObjectId, Property::IsPrimary{true}},
                     {"other_col", PropertyType::Int | PropertyType::Nullable},
                 }};

    realm->update_schema(schema, 2);

    realm->sync_session()->resume();
    download_complete.get();
    error_occurred.get();
}

Is all that's remaining here to make sure that waits for flexible sync subscription state changes also get canceled? Do we want to revisit the SyncConfig parameter that makes it so non-fatal errors don't cancel waits by default?

@nirinchev
Copy link
Member Author

Hm, looks like this has changed since I filed the ticket and indeed now async operations are canceled on fatal errors, which is great! I think for consistency's sake, we should probably also do it for flx subscriptions as a fatal error would mean your subscription change will never go through and there's no point in waiting.

@sync-by-unito
Copy link

sync-by-unito bot commented Jul 5, 2023

➤ michael-wb commented:

The canceling of pending subscription waiters when an error occurs was updated as part of the PBS->FLX migration feature. I will be adding some tests as part of this task to verify they are working as expected.

@sync-by-unito
Copy link

sync-by-unito bot commented Oct 23, 2023

➤ danieltabacaru commented:

I ran into a similar issue and the pending subscription notifications are not cancelled actually. That can be easily fixed. On another note, most (if not all) fatal errors transition the sync session state to inactive and not error. I think ApplicationBug and ProtocolViolation should be treated as fatal (that's an oversight from the sync session error refactoring).

Completion callbacks are canceled also when the session becomes inactive (pretty much whenever an error is received as explained above or if the user pauses the session), so should the subscription callbacks also be canceled in this case? cc [~[email protected]]

@sync-by-unito
Copy link

sync-by-unito bot commented Oct 26, 2023

➤ nirinchev commented:

I guess? I mean from a user's perspective, if something happens that would prevent the completion callback from ever firing and we know that it'll never fire, then we should notify the waiters that this is the case.

@sync-by-unito
Copy link

sync-by-unito bot commented Oct 26, 2023

➤ danieltabacaru commented:

I would cancel the subscription "callbacks" because the user may use the promise/future api (unlike waiting for the callback to be called) and be blocked until the session is resumed. The completion callbacks don't actually block the user, it's just annoying that they would not be called if they weren't cancelled. You can argue that when the session is paused, the completion callbacks could be de-registered and registered again when the session is resumed (currently the user has to register them again). They would have to register the notification for the subscription again too (but at least there is some uniformity).

@sync-by-unito
Copy link

sync-by-unito bot commented Oct 26, 2023

➤ danieltabacaru commented:

we can't really know if the callbacks would fire (I guess that's why we play it safe and cancel them when the session becomes inactive)

@sync-by-unito sync-by-unito bot assigned danieltabacaru and unassigned michael-wb Oct 30, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants