Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: refactor delete dag #27661

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

chore: refactor delete dag #27661

wants to merge 14 commits into from

Conversation

fuziontech
Copy link
Member

@fuziontech fuziontech commented Jan 18, 2025

Problem

image

Changes

no local storage

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

@fuziontech fuziontech changed the title chore: rework dag to not use storage in between steps chore: rework delete dag to not use storage in between steps Jan 18, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR removes local storage dependencies from the person deletion DAG by implementing direct data piping between PostgreSQL and ClickHouse, making it more suitable for Kubernetes deployments.

  • Added chunked processing in /dags/deletes.py with 10,000 record batches for efficient PostgreSQL to ClickHouse transfers
  • Implemented ClickHouse dictionary creation in /dags/deletes.py for optimized person deletion lookups
  • Added comprehensive mocking of ClickHouse client in /dags/tests/test_deletes.py to verify direct database interactions
  • Introduced versioned table/dictionary naming with get_versioned_names() for better cleanup and isolation
  • Added one-hour timeout setting for large deletion operations in delete_person_events asset

2 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 70 to 75
mock_clickhouse_client.execute.assert_called_once_with(
f"""
INSERT INTO {expected_names["table"]} (team_id, person_id, created_at)
VALUES
""",
expected_data,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: SQL query is missing VALUES clause formatting - could cause syntax error in ClickHouse. Should include parentheses around values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dags/deletes.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Daesgar Daesgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

* master: (103 commits)
  feat(postgres-estimated-rows): pg Estimated Rows on Data Warehouse Sync (#27634)
  fix: revert darkmode class toggle, updated content on fills (#27783)
  chore: upgrade posthog-js (#27790)
  chore(editor-3001): add back join actions (#27740)
  feat: Add person distinct ID overrides squash job (as dagster job) (#27710)
  fix(created-by-sources): Adding `created_by` to sources (#27751)
  Revert "feat(data-warehouse): V2 pipeline release " (#27791)
  fix: typo for feature flags (#27786)
  fix(defer-unmounting): Defer unmounting of react elements (#27742)
  feat(data-warehouse): V2 pipeline release (#27732)
  fix(data-warehouse): Ensure dates are actual datetime formats (#27777)
  fix: enable hot reload for the products dir (#27746)
  fix: assignee selector when null (#27737)
  chore: clarify rrweb imports (#27776)
  chore(deps): Update posthog-js to 1.207.3 (#27779)
  feat(retention): filters on start/return event (#27770)
  fix(experiments): only show supported math functions (#27589)
  feat(web-analytics): Set unique conversions graph when adding conversions goal (#27774)
  chore: color design system part 1: banner and accents (#27756)
  chore(experiments): Add tests for funnel attribution options (#27752)
  ...
@fuziontech fuziontech changed the title chore: rework delete dag to not use storage in between steps chore: rework delete dag to not stage in parquet Jan 23, 2025
@fuziontech fuziontech changed the title chore: rework delete dag to not stage in parquet chore: refactor delete dag Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants