Add Brenner reports; Simplify Brenner setting #74

gaviram · 2024-12-05T14:37:34Z

Changes made:

Added a notebook that generates pdf reports useful for Brenner threshold determination
Modified the set_brenner notebook to be easier to use

Sagykri

Proceeding the preprocessing pipeline of the NIH data is in greater priority,
But, here are my comments:

Hi, since this is a notebook I can't really comment per line. Hence, I summarize here the comments:
Comments for both files:

Please delete code that you don't need (I know that it was there before), such as definitions of functions at the top (including the weird "d" param we create temporarily at some point and can be risky) of the notebook, the original code block and what's after it.
Since, in the end, a non-computational person should be able to run this alone, I found it easier to update the mappings automatically every time you run the Examine Brenner block.
I think the "create_brenner_reports" can be nicely integrated into the set_brenner notebook as another block. Moreover, there are some overlapping code between them such as finding the percentiles and filtering by them etc. I'd try to extract this logic to function and use it in both blocks of code.
Please move the configuration params, meaning the df filepath, the mappings_filepath, percentile_ranges ,output_folder , etc.., to a new code block at the begining of the notebook and titled it as "Params"/"Configuration"/something like this.
Make sure the notebook is setting NOVA_HOME to the ".....Collaboration/NOVA" (instead of NOVA_GAL) when pushing it to the main branch.
You have sys.insert(...) twice
Whenever you push a notebook, please make sure it's empty from images since it may inflate the size of the file

Comments specifically for create_brenner_reports.ipynb:

Set the output_folder to be something inside the preprocessing_tools folder
Since we have other data sources than NIH, we should probably save the pdfs to a subfolder (ex. ....../brenner_reports/NIH). While saying that, maybe you should save all markers in the same pdf file (multiple pages) and call it, for example ..../brenner_reports/NIH.pdf)
Please break the content of the for-loops in the last block to functions (I believe there is a shared code here with the brenner notebook, and if you'll merge them, these extract function may help improving the clarity of the code)
The dims for fit_image_shape (1024,1024) should be set in the new configuration code block
[create_histogram ] Please document the function new function create_histogram (i.e. add param types in the function signature, add return type (using ->), and use """ to document the params and the returned value)
[create_histogram] add an 'assert' to make sure the 'low_perc' and lower than 'high_perc'
Change param overlay_Cellline to overlay_cellline (with a small c)

Sagykri · 2025-01-27T12:26:07Z

manuscript/plot_config.py

To avoid dups, I think you can just extend the original color map with the new markers.
Meaning, self.COLOR_MAPPINGS_NIH = {**self.THE_ORIGINAL_COLOR_MAPPINS} # to copy
self.COLOR_MAPPINGS_NIH.update({NEW KEYS AND VALUES})

Sagykri · 2025-01-27T12:29:07Z

src/figures/distances_plotting.py

@@ -299,7 +299,7 @@ def __plot_boxplot(distances:pd.DataFrame, baseline:str, condition:str,
    color_key=config_plot.MAPPINGS_COLOR_KEY
    condition_name_color_dict = config_plot.COLOR_MAPPINGS_CELL_LINE_CONDITION
    condition_to_color = {key: value[color_key] for key, value in condition_name_color_dict.items()}
-    if not yaxis_cut_ranges: # case where we don't split the y axis
+    if all(value is None for value in yaxis_cut_ranges.values()): # case where we don't split the y axis


The main branch now has a fix for this bug

Sagykri · 2025-01-27T12:31:54Z

src/figures/distances_plotting.py

@@ -601,6 +600,16 @@ def __bin_pvalues(pvalues):
    adjusted_pvalues = np.where((0.0001 <= adjusted_pvalues) & (adjusted_pvalues < 0.01), 0.0001, adjusted_pvalues)
    return np.where(adjusted_pvalues < 0.0001, 10**math.floor(np.log10(adjusted_pvalues.min())), adjusted_pvalues)

+def __fixed_bin_pvalues(pvalues):


I think we should merge the two functions for the bin_pvalues somehow to avoid duplications. Maybe just get the ranges as an input to the function, or even better put this in the plot configuration file

Sagykri · 2025-01-27T12:40:02Z

src/preprocessing/preprocessors/preprocessor_nih.py

+        """
+
+        # Taking the name as id (i.e. 's' and then a number)
+        return get_filename(path)


IMPORTANT

How does it work? How does it pick the right image id if the function returns the entire filename?
The function is supposed to return the site id for each marker in order to know which DAPI image should be coupled with it.
Having the image id as the full name should result in finding no corresponding DAPIs

Sagykri · 2025-01-27T12:44:02Z

src/preprocessing/preprocessing_config_NIH.py

Custom preprocessing config should be located under the 'manuscript' folder.
Everything under the 'src' folder should behave as the core/base/interface versions.

Sagykri · 2025-01-27T12:44:54Z

tools/preprocessing_tools/qc_reports/qc_config.py

I believe it's correct, just make sure it aligns with the experiment structure (the real markers, panels, etc organization)

Sagykri · 2025-01-27T12:48:18Z

tools/preprocessing_tools/qc_reports/qc_utils.py

@@ -60,7 +60,7 @@ def sample_and_calc_variance(INPUT_DIR, batch, sample_size_per_markers=200, num_

    return variance

-def validate_files_proc(path, batch_df, bad_files, marker_info, cell_lines_for_disp):
+def validate_files_proc(path, batch_df, bad_files, marker_info, cell_lines_for_disp, validate_antibody= True):


When would you want to set the validate_antibody to False?
This validation should always be happening, no?

Sagykri · 2025-01-27T12:49:24Z

tools/preprocessing_tools/qc_reports/qc_utils.py

@@ -342,7 +344,7 @@ def plot_filtering_heatmap(filtered, extra_index, xlabel='', figsize=(5,5), seco
            second_p = second_data.pivot_table(index=['rep', extra_index],
                                    columns='cell_line_cond',
                                    values='index')
-            second_p = second_p.sort_values(by=extra_index)
+            second_p = second_p.sort_values(by=[extra_index,'rep'])


Can you please explain the reason? Maybe it's necessary; I just want to understand why.

Sagykri · 2025-01-27T12:53:18Z

tools/preprocessing_tools/set_brenner.ipynb

Since it's a notebook it's a bit hard to see the changes here :/ Can you please describe in words the key changes that you did here?

Sagykri · 2025-01-27T12:55:28Z

tools/preprocessing_tools/show_marker_images.ipynb

Since there is already a notebook for plotting images, can you please explain how this notebook is different? (Sorry, github doesn't present notebooks very nicely so hard to see the code for them :/)

Add Brenner reports; Simplify Brenner setting

b4976e3

Sagykri requested changes Dec 8, 2024

View reviewed changes

gaviram added 9 commits December 19, 2024 16:31

fixed bug in validation of yaxis_cut_ranges to match its type (dict)

8e62a29

add validate_antibody flag to qc_report

be51752

NIH configurations

d07e126

configs NIH

3083668

Added spacing after function definition

04e58f0

Additional binning method for comparison to paper

b865577

NIH QC report

3f99bbc

add notebook for easy display of markers with DAPI

af6410d

Refactor set brenner code, add reports generation

1af3b83

Sagykri requested changes Jan 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Brenner reports; Simplify Brenner setting #74

Add Brenner reports; Simplify Brenner setting #74

gaviram commented Dec 5, 2024

Sagykri left a comment

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Sagykri Jan 27, 2025

Add Brenner reports; Simplify Brenner setting #74

Are you sure you want to change the base?

Add Brenner reports; Simplify Brenner setting #74

Conversation

gaviram commented Dec 5, 2024

Sagykri left a comment

Choose a reason for hiding this comment

Proceeding the preprocessing pipeline of the NIH data is in greater priority, But, here are my comments:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IMPORTANT

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Proceeding the preprocessing pipeline of the NIH data is in greater priority,
But, here are my comments: