Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CommunityAggregatedFields #886

Merged
merged 3 commits into from
Oct 20, 2023
Merged

Add CommunityAggregatedFields #886

merged 3 commits into from
Oct 20, 2023

Conversation

anttimaki
Copy link
Collaborator

No description provided.

@anttimaki
Copy link
Collaborator Author

anttimaki commented Oct 3, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@anttimaki
Copy link
Collaborator Author

TODO: ensure the fields are updated in an efficient manner, add tests, doc strings, celery job etc.

@anttimaki anttimaki requested a review from MythicManiac October 3, 2023 13:07
@codecov
Copy link

codecov bot commented Oct 3, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (4a901b7) 92.21% compared to head (b086bca) 92.23%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #886      +/-   ##
==========================================
+ Coverage   92.21%   92.23%   +0.02%     
==========================================
  Files         270      271       +1     
  Lines        7741     7778      +37     
  Branches      737      739       +2     
==========================================
+ Hits         7138     7174      +36     
- Misses        501      502       +1     
  Partials      102      102              
Files Coverage Δ
django/conftest.py 97.12% <100.00%> (+0.01%) ⬆️
...understore/api/cyberstorm/serializers/community.py 100.00% <100.00%> (ø)
...jango/thunderstore/community/context_processors.py 73.68% <ø> (-1.32%) ⬇️
django/thunderstore/community/tasks.py 100.00% <100.00%> (ø)
...store/frontend/api/experimental/views/frontpage.py 100.00% <100.00%> (ø)
django/thunderstore/community/models/community.py 93.23% <97.36%> (+0.37%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anttimaki anttimaki force-pushed the aggregated-fields branch 3 times, most recently from f78d19c to 52e46d3 Compare October 6, 2023 08:19
@anttimaki anttimaki marked this pull request as ready for review October 6, 2023 08:30
@anttimaki
Copy link
Collaborator Author

@MythicManiac ready for review. Since last you checked I made the batch_size optional, added babby's first celery task in a true monkey-see-monkey-do manner, and updated the linting warning commit.

@anttimaki anttimaki changed the title WIP: RFC for how to handle computationally heavy fields updated by background processes Add CommunityAggregatedFields Oct 6, 2023
Copy link
Member

@MythicManiac MythicManiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments don't do a good job at outlining the important parts, so summarized:

Important

  • If this PR were to be merged as-is and deployed, functionality currently in production would break. Address this e.g. by setting up a task schedule with a data migration and/or creating a migration that populates initial data with the current prod equivalent.
  • Current implementation of the aggregation isn't very memory-usage friendly & will eventually hit resource limits in prod (if not immediately)
  • Not as critical, but splitting the aggregation into two phases (one for creation, one for updating) should simplify the code quite a bit

django/thunderstore/community/tasks.py Outdated Show resolved Hide resolved
django/thunderstore/community/tasks.py Outdated Show resolved Hide resolved
django/thunderstore/community/models/community.py Outdated Show resolved Hide resolved
django/thunderstore/community/models/community.py Outdated Show resolved Hide resolved
Copy link
Member

@MythicManiac MythicManiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly seems good. I'd not consider the issues I commented on blockers individually since they could be addressed e.g. in a separate PR, but combined together I feel this could use at least one more round of changes.

Computationally heavy fields, package_count and download_count will be
updated periodically by a background task from now on. They are stored
in a separate model to make it clearer they're not regular fields.

To save memory and improve parallelization, updates are done in steps:
first the related object is created for all Communities that doesn't
have one yet, and then the fields are updated for each Community
separately.

Existing queryset helpers are used to avoid duplicating the logic for
which packages and versions should be included in the calculations.
This means more database operations, but it was deemed a fair tradeoff
since the updates are done as background tasks.

The values should be normally accessed via .aggregated getter to avoid
errors if the Community doesn't have the values computed yet.
.aggregated_fields relation is also available e.g. when working with
QuerySets.

Refs TS-1850
The task is scheduled to run once per hour, on a minute mark that
doesn't clash with other currently scheduled tasks.

Refs TS-1850
Copy link
Member

@MythicManiac MythicManiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@anttimaki anttimaki merged commit 182d423 into master Oct 20, 2023
28 checks passed
@anttimaki anttimaki deleted the aggregated-fields branch October 20, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants