Admin DB Cleanup for TmWorkspaces #2

porterbot · 2025-03-04T15:16:22Z

This are model changes for a new admin functionin the GUI for cleaning up large TmWorkspaces. It first searches through the database to return the largest workspaces, then allows the user to select the workspaces for removal. It also fixes an issue with workspace deletion to target the actual neuron collection where the workspace's neurons were stored.

This is intended for admins of HortaCloud to clean up their Mongo database by removing large fragment workspaces that might be taking a lot of memory on disk.

krokicki

Parallel aggregation is a nice touch!

krokicki · 2025-03-04T15:23:46Z

jacs-model-access/src/main/java/org/janelia/model/access/domain/dao/mongo/MongoDaoHelper.java

@@ -13,6 +13,7 @@
 import java.util.stream.Collectors;

 import com.google.common.collect.ImmutableList;
+import com.mongodb.WriteConcern;


Unintentionally added?

Oh, I just forgot to remove because I was trying to optimize the speed of the bulk delete. In the end it didn't add any speed. I can remove this unused import

krokicki · 2025-03-04T15:32:53Z

...odel-access/src/main/java/org/janelia/model/access/domain/dao/mongo/TmWorkspaceMongoDao.java

+        // Step 1: Get all accessible workspaces using existing method
+        List<TmWorkspace> workspaces = getAllTmWorkspaces(subjectKey);
+
+        List<TmWorkspaceInfo> workspaceInfoList = new ArrayList<>();


Why is this new model necessary? Why not just return the TmWorkspace objects themselves? That would make this method more generally usable, and it wouldn't require expanding the model.

I'm calculating the bson size of the workspace by totaling all the document size of the associated neurons. I could add this size to the TmWorkspace, but since it's dynamically calculated doesn't make sense to be persisted.

porterbot added 10 commits February 26, 2025 12:16

added method to search for largest TmWorkspaces by size and return

e0fc2f8

added logging to analyze workspace search

1602157

added filter to improve performance of logging

0b2d6f3

fixed query to return results

5929749

fixed query for classcast exception

befa0ac

added parallel query processing

ae65b14

moved result to an object to simplify transport

8a2794c

added default constructor

5e26ed8

fixed delete neurons to remove from workspace neuron collection

f4b1c64

optimized delete query and added GridFS boundingbox removal

9923746

porterbot requested review from krokicki, cgoina and olbris March 4, 2025 15:16

krokicki approved these changes Mar 4, 2025

View reviewed changes

removed unused import

126ed70

porterbot merged commit f7c2c9b into master Mar 4, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Admin DB Cleanup for TmWorkspaces #2

Admin DB Cleanup for TmWorkspaces #2

porterbot commented Mar 4, 2025

krokicki left a comment

krokicki Mar 4, 2025

porterbot Mar 4, 2025

krokicki Mar 4, 2025

porterbot Mar 4, 2025

Admin DB Cleanup for TmWorkspaces #2

Admin DB Cleanup for TmWorkspaces #2

Conversation

porterbot commented Mar 4, 2025

krokicki left a comment

Choose a reason for hiding this comment

krokicki Mar 4, 2025

Choose a reason for hiding this comment

porterbot Mar 4, 2025

Choose a reason for hiding this comment

krokicki Mar 4, 2025

Choose a reason for hiding this comment

porterbot Mar 4, 2025

Choose a reason for hiding this comment