-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: refactor delete dag #27661
base: master
Are you sure you want to change the base?
chore: refactor delete dag #27661
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR removes local storage dependencies from the person deletion DAG by implementing direct data piping between PostgreSQL and ClickHouse, making it more suitable for Kubernetes deployments.
- Added chunked processing in
/dags/deletes.py
with 10,000 record batches for efficient PostgreSQL to ClickHouse transfers - Implemented ClickHouse dictionary creation in
/dags/deletes.py
for optimized person deletion lookups - Added comprehensive mocking of ClickHouse client in
/dags/tests/test_deletes.py
to verify direct database interactions - Introduced versioned table/dictionary naming with
get_versioned_names()
for better cleanup and isolation - Added one-hour timeout setting for large deletion operations in
delete_person_events
asset
2 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
dags/tests/test_deletes.py
Outdated
mock_clickhouse_client.execute.assert_called_once_with( | ||
f""" | ||
INSERT INTO {expected_names["table"]} (team_id, person_id, created_at) | ||
VALUES | ||
""", | ||
expected_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: SQL query is missing VALUES clause formatting - could cause syntax error in ClickHouse. Should include parentheses around values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this had me second guessing myself
https://clickhouse-driver.readthedocs.io/en/latest/quickstart.html#inserting-data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* master: (103 commits) feat(postgres-estimated-rows): pg Estimated Rows on Data Warehouse Sync (#27634) fix: revert darkmode class toggle, updated content on fills (#27783) chore: upgrade posthog-js (#27790) chore(editor-3001): add back join actions (#27740) feat: Add person distinct ID overrides squash job (as dagster job) (#27710) fix(created-by-sources): Adding `created_by` to sources (#27751) Revert "feat(data-warehouse): V2 pipeline release " (#27791) fix: typo for feature flags (#27786) fix(defer-unmounting): Defer unmounting of react elements (#27742) feat(data-warehouse): V2 pipeline release (#27732) fix(data-warehouse): Ensure dates are actual datetime formats (#27777) fix: enable hot reload for the products dir (#27746) fix: assignee selector when null (#27737) chore: clarify rrweb imports (#27776) chore(deps): Update posthog-js to 1.207.3 (#27779) feat(retention): filters on start/return event (#27770) fix(experiments): only show supported math functions (#27589) feat(web-analytics): Set unique conversions graph when adding conversions goal (#27774) chore: color design system part 1: banner and accents (#27756) chore(experiments): Add tests for funnel attribution options (#27752) ...
Problem
Changes
no local storage
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Does this work well for both Cloud and self-hosted?
How did you test this code?