-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[builder] refactor builder to utilize Dask (#964)
* first cut at fixed budget anndata handling * memory * refactor consolidate * checkpoint refactoring for memory budget * always have at least one worker * smaller strides * improve memory diagnostics * autoupdate precommit modules * fix bug in no-consolidate * update test to match new manifest field requirements * remove unused code * further memory budget refinement and tuning * add missing __len__ to AnnDataProxy * further memory usage reduction * preserve column ordering in dataframe loading * comments and cleanup * add extra verbose logging level * back out parallel consolidation for now * added a todo reminder * a few more memory tuning tweaks * simplify open_anndata interface * pr review * clean up logger * lint * snapshot initial dask explorations * pr feedback * additional dask refactoring * fix empty slice bug * additional refactoring to use dask * refine async consolidator * checkpoint progress * additional X layer processing refinement * fix pytest * fix mocks in test * update package deps for builder * comment * improve dataset shuffle * tuning * update to latest tiledb * update to latest tiledb * cleanup * additional scale updates * fix numpy cast error * shorten step count for async consolidator * additional cleanup * update to latest cellxgene_census * update tiledbsoma dep * lint * tune thread count cap * update to latest tiledbsoma * lint * remove debugging code
- Loading branch information
Bruce Martin
authored
Feb 23, 2024
1 parent
a4cdcf4
commit 38fff2d
Showing
21 changed files
with
1,044 additions
and
940 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 0 additions & 2 deletions
2
tools/cellxgene_census_builder/src/cellxgene_census_builder/build_soma/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,5 @@ | ||
from .build_soma import build | ||
from .validate_soma import validate | ||
|
||
__all__ = [ | ||
"build", | ||
"validate", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.