Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure hard-delete mode doesn't delete data it shouldn't when ACTIVATE_VERSION messages are processed #391

Merged
merged 3 commits into from
Jul 30, 2024

Conversation

sebastianswms
Copy link
Contributor

@sebastianswms sebastianswms commented Jul 19, 2024

What hard delete was previously doing wrong

As discussed in #340, hard delete deleted data that it shouldn't. This was caused by a <= where there should have been a <.

So why were the tests passing?

All ACTIVATE_VERSION messages were run before any records were synced. This is because RECORD messages were being batched until the end of the sync, whereas ACTIVATE_VERSION messages were processed as they arrived.

The intent of test_activate_version_hard_delete was to:

  1. Sync seven records
  2. Check that seven records were synced
  3. Add two records "manually"
  4. Check that there are nine records in total
  5. Sync the same seven records, with ACTIVATE_VERSION removing the two manual records.
  6. Check that seven records remain

Here's what was really happening:

  1. Sync seven records
  2. Seven records were synced? ✅
  3. Add two records "manually"
  4. There are nine records in total? ✅
  5. ACTIVATE_VERSION deletes all nine records and then syncs the same seven records again.
  6. Seven records remain? ⚠️ Yes, but not for the correct reason

My fix

I don't know if I fully understand the ACTIVATE_VERSION specification, so please correct me, but here's my change.

Drain the sink immediately before an ACTIVATE_VERSION message is processed for a sink. This means:

  1. The target still batches records when possible.
  2. ACTIVATE_VERSION messages are still processed as they arrive.
  3. Importantly, ACTIVATE_VERSION messages correctly apply to all preceding records.

Closes #340

@sebastianswms sebastianswms marked this pull request as ready for review July 19, 2024 20:10
@edgarrmondragon
Copy link
Member

I plan to review this on the week of the 29th

@edgarrmondragon
Copy link
Member

The intent of test_activate_version_hard_delete was to:

1. Sync seven records

2. Check that seven records were synced

3. Add two records "manually"

4. Check that there are nine records in total

5. Sync the same seven records, with ACTIVATE_VERSION removing the two manual records.

6. Check that seven records remain

Here's what was really happening:

1. Sync seven records

2. Seven records were synced? ✅

3. Add two records "manually"

4. There are nine records in total? ✅

5. ACTIVATE_VERSION deletes all nine records and then syncs the same seven records again.

6. Seven records remain? ⚠️ Yes, but not for the correct reason

This is a great analysis of what's happening under the hood 🙏

Comment on lines +327 to +334
if self._pending_batch:
self.logger.info(
"An activate version message for '%s' was received. Draining...",
self.stream_name,
)
draining_status = self.start_drain()
self.process_batch(draining_status)
self.mark_drained()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking a note to make sure we do this upstream: meltano/sdk#2576

@edgarrmondragon edgarrmondragon changed the title fix: hard delete doesn't delete data it shouldn't fix: Ensure hard-delete mode doesn't delete data it shouldn't Jul 30, 2024
@edgarrmondragon edgarrmondragon changed the title fix: Ensure hard-delete mode doesn't delete data it shouldn't fix: Ensure hard-delete mode doesn't delete data it shouldn't when ACTIVATE_VERSION messages are processed Jul 30, 2024
@edgarrmondragon edgarrmondragon merged commit c182c93 into MeltanoLabs:main Jul 30, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Turning on hard_delete seems to delete more data than intended
2 participants