Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer continuous v3 #95

Closed
wants to merge 74 commits into from
Closed

Developer continuous v3 #95

wants to merge 74 commits into from

Conversation

enryH
Copy link
Member

@enryH enryH commented Jun 18, 2024

No description provided.

ri-heme and others added 30 commits January 5, 2023 16:26
- new preprocessing
- work on cont. associations
- checking different config settings.

Highlights:
- preprocessing.py: feature_min_max function
-perturbations.py:
perturb...extended:
change target_dataset feature by feature for min/max
- identify_associations.py: branched flow depending on target_value
- *preprocessing.py:* feature_min_max to feature_stats (added std)
-* perturbations.py:* added std (1 for now) feature by feature

-*identify_associations.py:*
added predefined list of cont target values
It is used in:
- encode data (show before and after preprocessing)
- identify associations (plot each feature after perturbation)
Reorganizing the file identify_associations:
- ttest and bayes functions put outside
- dataloader preparation defined in functions
- save results function added
- single identify_associations function
- Most comments addressed
Reformat constant CONTINUOUS_TARGET_VALUES
Reused code put in main function:
Identify associations

Working branch for both modes (Continuous assoc finds self correlations)
- Tested bayes and ttest on new data
- Added config files for continuous test
TODO:
Create folder with files to create synthetic datasets
- Add GPU compatibility
- Fix typing
- Make test dataloader batch size configurable
- Add extra columns to output table
- Exploring bayes behaviour on continuous pert
- Plot feature associations and vae architecture as graphs
- Use VSCode debugger (json edited)
Added ks method to calculate distances
Functioning Kolmogorov Smirnov method

- QC of features is given as separate csv from KS scores
- schema: feature names of feature to visualize added
- Basic visualization functions added to dataset_distribution.py
Henry Webel and others added 28 commits May 16, 2024 09:39
- 🐛 fix incomplete publishing package file
- 🎨 format src files
- in case of no categorical variables, the error surfaced: input_config has to be used!
double check type hints
- also FloatArray is not very informative about the layout of the array (2D, 3D, 4D?)
- play also with sample data configuration
- f-string formatting error.
- check if less latent dimensions reduces the t-test runtime
@enryH enryH closed this Jun 20, 2024
@enryH enryH deleted the developer-continuous-v3 branch June 20, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants