[Epic] Preprocessing plugins #24

bcebere · 2023-02-01T10:56:52Z

Description

Preprocessing plugins, for scaling or dimensionality reduction.

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

drop constant features - TODO
handle multicollinearity - TODO
drop low variance features - TODO
encode data

* Update all configurations * Set up data format and validation (#2) * Implement library config * Isort docs/conf.py * Improve configuration * Add logging * Add .print to logger * Set up way of updating things on config changes * Simplify and use get_config() * Tidy multi-line logger messages * Add logger.raise_ convenience method * Move logging into a package * Add diagnose, backtrace as configurable options * Make default config be defined in yaml file only * Add validator design * Add extra config test * Define global data settings * Clean up validator * Use builtin types in DataSettings * Minor refactor validator * Set up DataRequirements * Set up validation implementation * Decouple tests, tidy imports * Imports to Google style where possible * Implement some data validators (root_validate) * Use decorator in val. impl. method registry * Improve exception logging * Update some validation_implementation methods * Reorganize dir structure * Clean up tests * Allow for inheritance in val. impl. * Add validator > df tests * Separate out SupportsContainer interface * Shorten package names * Factor out SupportsImplementations * Factor out RegisterMethodDecorator * Add dispatch_to_implementation() * Separate out interface * Set up framework for *Samples objects * Add data utils * Implement as_array for TimeSeriesSamples * Introduce data container def. * Update data container defs, separate out tests * Set up default container flavor mechanism * Add check_untyped_defs = True in mypy.ini * Implement EventSamples * Implement as_array for EventSamples * Update setup requirements * Set up docs (#4) * Remove unneeded BAK file * Small additions to README * Update image display * Major update README * Table width * Change image align * Prepare README for docs * Add logo * Update docs * Fix issue in README * Dr shushen/model setup (#23) * Remove unneeded BAK file * Change abc import to be Google code format style * Introduce TemporBaseModel * Introduce fit() * Bugfix, add typing overloads for fit() * Add core requirements * Introduce RequirementCategory * Set up requirement validator concept * Reorganize data/ * Introduce DataBundle * Add DataBundle requirements * Set up from_data_containers static method * Update install_requires * Introduce RequirementsConfig * Reorganize packages to improve naming * Develop RequirementsConfig further * Improve RequirementsConfig repr * Deal with _validate_method_config * Minor bugfixes * Add some int. tests for base model fit config * Add _fit_called flag * Introduce transform method * Tidy imports * Update LICENSE (#24) * [Feat] Basic plugin interface and loader (#27) * Rename model dir to plugins * Set up plugins/core dir * Simplify estimator, transformer * Add predictor * Fix circular import * Add core plugin methods like name() * Implement a Plugin interface * Factorize out test utility "patch_module" * Add test for plugin infrastructure * Add test for loaded plugins * add fit_{predict,transform} methods * Add hyperparameter methods * Add Base* to indicate base models * Keep only estimator init with params as kwargs * Rename test file * Add parent constructor calls for clarity * Set up Dataset (#36) * Remove old data format * Initial interface for Dataset * Remove unnecessary decorator * Add StaticSamples validation and tests * Add TimeSeriesSamples validation * Add EventSamples validation * Add tests for samples basics * Test EventSamples.split method * Add time/sample_index helper methods * Add @validate_arguments and some note comments * Add repr's * Implement from_numpy for {Static,Event}Samples * Implement .numpy for {Static,Event}Samples * Add utils for array/df manipulation * Implement TimeSeriesSamples .numpy() * Fix some typing definitions * Write utils for array -> TS df conversion * Check in register_plugin if re-imported (no exc.) * Add docstrings in utils * Add pydantic.validate_arguments in data utils * Implement TimeSeriesSamples from_numpy() * Update docstring in TimeSeriesSamples.__init__ * Add unit tests for Dataset * Set default debug level to INFO * Update tests * Add data format tutorial notebook * Add reprs * Simplify _check_same_class * Add docstring to Dataset classes

bcebere added the enhancement New feature or request label Feb 1, 2023

DrShushen transferred this issue from another repository Mar 3, 2023

DrShushen added a commit that referenced this issue Mar 3, 2023

Update LICENSE (#24)

b916bb2

DrShushen added the Epic label May 25, 2023

DrShushen added enhancement New feature or request and removed enhancement New feature or request labels Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epic] Preprocessing plugins #24

[Epic] Preprocessing plugins #24

bcebere commented Feb 1, 2023

[Epic] Preprocessing plugins #24

[Epic] Preprocessing plugins #24

Comments

bcebere commented Feb 1, 2023

Description

Why?

Breakdown