Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/518/formula #309

Merged
merged 26 commits into from
Dec 21, 2021
Merged

Dev/518/formula #309

merged 26 commits into from
Dec 21, 2021

Conversation

jassak
Copy link
Contributor

@jassak jassak commented Dec 20, 2021

  • New version of Descriptive Statistics:
    • Adds null/total counts for every variable
    • Replaces variable code with label
    • Removes unnecessary lower/upper confidence intervals
  • New formula functionality for Descriptive Statistics and Logistic Regression
    • New formula_description property in algorithm specifications.
    • Descriptive Stats and Logistic Regression now accept a new formula_description argument. This argument expects a description of a formula in Wilkinson notation in JSON format. From the description an actual string formula is constructed and passed to the patsy library from which the algorithm's design matrices are constructed.
    • Currently only a subset of the full Wilkinson notation is implemented. Interactions are up to 3 terms and singleton terms have a limited number of transformations allowed. These are nop, log, exp, center, standardize, mul and div for numerical variables.
  • Refactoring of mipframework.
  • Modification of algorithm tests for Descriptive Statistics and Logistic Regression to reflect the above changes.
  • New tests for the above two algorithms with the addition of the formula argument.

jassak and others added 25 commits July 9, 2021 07:45
This version:
- Adds null/total counts for every variable
- Replaces variable code with label
- Removes unnecessary lower/upper confidence intervals
Algorithms now accept a new formula argument. This arg expects a
description of a formula in Wilkinson notation in JSON format. From the
description an actual string formula is constructed and passed to the
patsy library from which the algorithm's design matrices are
constructed. The rest of the execution remains unchanged.

Currently only a subset of the full Wilkinson notation is implemented.
Interactions are up to 3 terms and singleton terms have a limited number of
transformations allowed. These are nop, log, exp, center, standardize,
mul and div for numerical variables.
Currently logistic_regression tests with formula are failing. There is a
subtle interaction between the formula coding of categorical
variables and the the add_missing_levels method. The reason is that
add_missing_levels is based on the known enumerations of categorical
vars found the in the CDEs but the codings supported by the formula
might add different enums to categorical vars.
Working version of descriptive_stats with an optional formula parameter.
The formula applies only to model results as it make little sense in
single variable results. Moreover, all floats in the result are now
rounded to two decimals for pesentation, and the tests have been
corected accordingly.
@jassak jassak requested a review from ThanKarab December 20, 2021 12:52
@ThanKarab ThanKarab merged commit c0d92f7 into master Dec 21, 2021
@ThanKarab ThanKarab deleted the dev/518/formula branch December 21, 2021 09:12
@ThanKarab ThanKarab restored the dev/518/formula branch December 21, 2021 09:12
@ThanKarab ThanKarab deleted the dev/518/formula branch September 21, 2022 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants