Knockoffs(1/4): add comments and docstring of the functions #128

lionelkusch · 2025-01-15T10:40:37Z

I aggregated most of the file for knockoff in one file.
I separate the main method from the computation of p_value and e_value and associate tests.

bthirion · 2025-01-15T16:57:33Z

doc_conf/api.rst

+   model_x_knockoff_filter
+   model_x_knockoff_pvalue
+   model_x_knockoff_bootstrap_quantile
+   model_x_knockoff_bootstrap_e_value


Are all these functions meant to be public ?

Yes, they should be public.

src/hidimstat/gaussian_knockoff.py

Small modification Co-authored-by: bthirion <[email protected]>

lionelkusch · 2025-01-17T17:11:22Z

I merge the function model_x_knockoff and model_x_knockoff_aggregation together for having a unique function.

bthirion · 2025-01-20T23:25:03Z

LMK when you want another review

src/hidimstat/gaussian_knockoff.py

AngelReyero · 2025-02-05T13:10:45Z

src/hidimstat/knockoffs.py

+        return test_scores[0]
+    else:
+        return test_scores
+


Not really sure if this function should be this "compacted". In this function we are both creating the knockoffs variables and applying the knockoffs statistic. Maybe it could be interesting to have them separate to be able to have more freedom (maybe we want to have knockoffs covariates not computed with the covariance matrix estimation or we want knockoffs statistic without a model like lasso). In the same line, an output like this one is not directly interpretable and the model_x_knockoff_filter is always applied after this. Why compact in the same function the knockoff variable, statistic by not the filter?

Actually, there is only one way to compute the knockoff statistic. I didn't see the reason for cutting the function more.
Additionally, my other refactoring is more or based on the same "separation" of the function:

the first part of getting some data or test

second part evaluating the p-value from the previous result

Moreover, I was thinking in this case, the test_score was enough for getting the importance variables. Mainly because, for computing the pvalue, there were multiple ways to use this quantity.

To fulfill the knockoffs statistic condition for Candès et al. 2018 it is not mandatory to be of that type. It is true that they are usually lasso or logistic regression, but there may be others. Similarly for the knockoff variable construction, there are other ways of generating them, not forcely with the Covariance matrix. Therefore, I think that they should be better separated, even though they are the standard approaches.

It's quite easy to transform the function _estimate_distribution, _stat_coefficient_diff, gaussian_knockoff_generation into argument.
Due to we don't have yet an example where is required this separation, I prefer to stay with this implementation.

What do you think @bthirion ?

I'd rather keep the full outputs. I don't have a good reason to cut the APIs. It is easy for the user to not consider the outputs he/shi is not interested in.

What do you mean by keeping the full output?

selected, test_score, thresh, X_tilde

I push the modification for a better separation of the API

AngelReyero · 2025-02-05T13:15:52Z

src/hidimstat/knockoffs.py

+        return selected
+    else:
+        return selected, pvals
+


it is also a 'filter', not only computes the p-values. Shouldn't it be more explicit in the function name?

What do you mean by 'filters'?

The name is coming to be homogeneous with the other functions and because there is not another option actually to "filter".

In the same way that you have called the previous function 'model_x_knockoff_filter' because it selects the important covariates, the other aggregated functions and this one, are also filters because they return also the selected covariates

The difference is that the 'model_x_knockoff_filter' is based on a 'filter' proposal in the original paper and the other methods are based on 'empirical_p_value'. For me, this makes the difference.
This is the cause of the name.
I should perhaps change the name of 'model_x_knockoff_filter' to avoid confusion.

I removed the function model_x_knockoff_filter.
Do @AngelReyero think that I should still rename the other functions?

Why did you remove the function?

I didn't really remove it.
I merged this function with model_x_knockoff, following the comments.
Did I miss something?

In that case I think it is consistent.

bthirion

Almost ready to merge now.

bthirion · 2025-02-11T22:58:24Z

src/hidimstat/gaussian_knockoff.py

-    Sigma : 2D ndarray (n_features, n_features)
-        Covariance matrix
+    mu_tilde : 2D ndarray (n_samples, n_features)
+        The matrix of means used to generate the knockoff variables, returned by gaussian_knockoff_generation.


I have the feeling that you make long lines (> 80 characters), which makes reading more difficult.
Can be addressed in a later PR though.

Black doesn't consider the size of the comment. There will be better formatting once Ruff will be used.

test/test_knockoff.py

bthirion

LGTM, thx.

lionelkusch added 28 commits January 10, 2025 10:13

Remove unecesarry argument

e202b84

Change the a bit the behavior

82b829c

Update the variables

d4f80c2

Put all the knockoff test together

47dbe24

Fix a bug

81e64c2

remove the function for estimating covariance

d6ded88

Remove unnecessary file

801c5f9

Remove a function

4c61dd7

comparison with original code

b890cba

improve docstring and function

b6fe948

Merge file for knockoff together

43a3e9a

Add function for repeat the gaussian knockoff

a116b53

Put all the test for knockoff in one file

1699675

Include the new function in the init

db6098f

Fix bug

e38f376

Fix bugs

8861480

Fix test for new signature of the function

a0d53b0

Improve the docstring

4bb9eb4

Improve docstring knockoff

d21f46a

/bin/bash: line 1: :wq: command not found

f752f9e

Remove the begining of the file

071ea6d

Add equations

68dd7e6

Change reference for paper

727ee7d

add a reference

efbe49a

a a new tests

bfe111a

rename function adn remove warning for test

211599e

Add parameters of knockoff

beeda0a

Merge branch 'main' into PR_knockoffs

3736661

lionelkusch requested review from jpaillard and removed request for jpaillard January 15, 2025 10:40

lionelkusch added 3 commits January 15, 2025 18:41

Update example knockoff

20d0567

Formating

c71cc22

Update function

bbd3252

bthirion reviewed Jan 15, 2025

View reviewed changes

lionelkusch and others added 9 commits January 16, 2025 10:06

Apply suggestions from code review

9af787f

Small modification Co-authored-by: bthirion <[email protected]>

Fix name variables

548e283

Fix name variables

0ad893e

Fix test and name variables

252a012

Add tests and fix bugs

b34cf1f

Add a tests and formating file

2ddfd9d

Undo delete of the tests

df45c82

Improve coverage and test

e3eeb72

Format

4df5d90

lionelkusch mentioned this pull request Jan 17, 2025

Configuration of the linter #88

Open

lionelkusch removed the request for review from jpaillard January 17, 2025 13:25

Group the function agregation and not aggregation together

8aee168

lionelkusch added 2 commits January 17, 2025 18:11

formating

bfe6346

Replace lambda by alpha

4e3f4d2

jpaillard reviewed Feb 5, 2025

View reviewed changes

src/hidimstat/gaussian_knockoff.py Show resolved Hide resolved

AngelReyero reviewed Feb 5, 2025

View reviewed changes

AngelReyero approved these changes Feb 5, 2025

View reviewed changes

Change cut in the API

5675969

jpaillard mentioned this pull request Feb 11, 2025

Covariance matrix regularization in knockoffs #149

Open

bthirion mentioned this pull request Feb 11, 2025

This might be expensive. I think that having the increment fixed and pre-defined may not be optimal. #150

Closed

bthirion reviewed Feb 11, 2025

View reviewed changes

Formating file

70a9a63

bthirion approved these changes Feb 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knockoffs(1/4): add comments and docstring of the functions #128

Knockoffs(1/4): add comments and docstring of the functions #128

lionelkusch commented Jan 15, 2025

bthirion Jan 15, 2025

lionelkusch Jan 16, 2025

lionelkusch commented Jan 17, 2025

bthirion commented Jan 20, 2025

AngelReyero Feb 5, 2025

lionelkusch Feb 5, 2025 •

edited

Loading

AngelReyero Feb 5, 2025

lionelkusch Feb 5, 2025

bthirion Feb 5, 2025

lionelkusch Feb 10, 2025

bthirion Feb 10, 2025

lionelkusch Feb 10, 2025

AngelReyero Feb 5, 2025

lionelkusch Feb 5, 2025

AngelReyero Feb 5, 2025

lionelkusch Feb 5, 2025

lionelkusch Feb 10, 2025

AngelReyero Feb 11, 2025

lionelkusch Feb 11, 2025

AngelReyero Feb 11, 2025

bthirion left a comment

bthirion Feb 11, 2025

lionelkusch Feb 12, 2025

bthirion left a comment

Knockoffs(1/4): add comments and docstring of the functions #128

Are you sure you want to change the base?

Knockoffs(1/4): add comments and docstring of the functions #128

Conversation

lionelkusch commented Jan 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lionelkusch commented Jan 17, 2025

bthirion commented Jan 20, 2025

Choose a reason for hiding this comment

lionelkusch Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bthirion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bthirion left a comment

Choose a reason for hiding this comment

lionelkusch Feb 5, 2025 •

edited

Loading