Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support benchmarks #35

Merged
merged 27 commits into from
Jan 7, 2025
Merged

Conversation

leo-mazzone
Copy link
Collaborator

@leo-mazzone leo-mazzone commented Dec 30, 2024

Context

Changes proposed in this pull request

  • Generalised generation of dummy probabilities to support dedupers. This now uses deterministic logic for base edges, and generates extra edges by choosing among all possible edges (we need to avoid duplicate edges)
  • Moved factories.py out of test, into src/common (because it's now used by the Postgres server benchmarks)
  • Created sub-module for benchmarking Postgres
  • Added script that outputs schema initialisation SQL
  • Added script that generates all 6 Postgres tables
  • Moved test_transform.py under common tests
  • Solved quite a few bugs
  • Added various tests

Guidance to review

  • The logic for generating probabilities has changed such that we can use the output to produce tables conforming to the ORM constraints, as well as working with the logic for generating clusters. We need to write checks for the input to the cluster generation: no duplicate edges; no self-references; and for linking, ensure graph is bipartite? When do we do it?

Checklist:

  • My code follows the style guidelines of this project
  • New and existing unit tests pass locally with my changes

@leo-mazzone leo-mazzone marked this pull request as ready for review January 3, 2025 14:11
Copy link
Collaborator

@wpfl-dbt wpfl-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

Is it worth using mb.query() or mb.match() to test that the dummy data behaves as we expect?

Copy link
Collaborator

@wpfl-dbt wpfl-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending unit tests passing.

Copy link
Collaborator

@wpfl-dbt wpfl-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with new changes

@leo-mazzone leo-mazzone merged commit bd52235 into feature/new-ingest-process Jan 7, 2025
3 checks passed
@leo-mazzone leo-mazzone deleted the support_benchmarks branch January 7, 2025 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants