Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admin DB Cleanup for TmWorkspaces #2

Merged
merged 11 commits into from
Mar 4, 2025
Merged

Admin DB Cleanup for TmWorkspaces #2

merged 11 commits into from
Mar 4, 2025

Conversation

porterbot
Copy link
Contributor

This are model changes for a new admin functionin the GUI for cleaning up large TmWorkspaces. It first searches through the database to return the largest workspaces, then allows the user to select the workspaces for removal. It also fixes an issue with workspace deletion to target the actual neuron collection where the workspace's neurons were stored.

This is intended for admins of HortaCloud to clean up their Mongo database by removing large fragment workspaces that might be taking a lot of memory on disk.

@porterbot porterbot requested review from krokicki, cgoina and olbris March 4, 2025 15:16
Copy link
Member

@krokicki krokicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel aggregation is a nice touch!

@@ -13,6 +13,7 @@
import java.util.stream.Collectors;

import com.google.common.collect.ImmutableList;
import com.mongodb.WriteConcern;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintentionally added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just forgot to remove because I was trying to optimize the speed of the bulk delete. In the end it didn't add any speed. I can remove this unused import

// Step 1: Get all accessible workspaces using existing method
List<TmWorkspace> workspaces = getAllTmWorkspaces(subjectKey);

List<TmWorkspaceInfo> workspaceInfoList = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this new model necessary? Why not just return the TmWorkspace objects themselves? That would make this method more generally usable, and it wouldn't require expanding the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm calculating the bson size of the workspace by totaling all the document size of the associated neurons. I could add this size to the TmWorkspace, but since it's dynamically calculated doesn't make sense to be persisted.

@porterbot porterbot merged commit f7c2c9b into master Mar 4, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants