Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Brenner reports; Simplify Brenner setting #74

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

gaviram
Copy link
Collaborator

@gaviram gaviram commented Dec 5, 2024

Changes made:

  1. Added a notebook that generates pdf reports useful for Brenner threshold determination
  2. Modified the set_brenner notebook to be easier to use

Copy link
Owner

@Sagykri Sagykri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proceeding the preprocessing pipeline of the NIH data is in greater priority,
But, here are my comments:

Hi, since this is a notebook I can't really comment per line. Hence, I summarize here the comments:
Comments for both files:

  • Please delete code that you don't need (I know that it was there before), such as definitions of functions at the top (including the weird "d" param we create temporarily at some point and can be risky) of the notebook, the original code block and what's after it.
  • Since, in the end, a non-computational person should be able to run this alone, I found it easier to update the mappings automatically every time you run the Examine Brenner block.
  • I think the "create_brenner_reports" can be nicely integrated into the set_brenner notebook as another block. Moreover, there are some overlapping code between them such as finding the percentiles and filtering by them etc. I'd try to extract this logic to function and use it in both blocks of code.
  • Please move the configuration params, meaning the df filepath, the mappings_filepath, percentile_ranges ,output_folder , etc.., to a new code block at the begining of the notebook and titled it as "Params"/"Configuration"/something like this.
  • Make sure the notebook is setting NOVA_HOME to the ".....Collaboration/NOVA" (instead of NOVA_GAL) when pushing it to the main branch.
  • You have sys.insert(...) twice
  • Whenever you push a notebook, please make sure it's empty from images since it may inflate the size of the file

Comments specifically for create_brenner_reports.ipynb:

  • Set the output_folder to be something inside the preprocessing_tools folder
  • Since we have other data sources than NIH, we should probably save the pdfs to a subfolder (ex. ....../brenner_reports/NIH). While saying that, maybe you should save all markers in the same pdf file (multiple pages) and call it, for example ..../brenner_reports/NIH.pdf)
  • Please break the content of the for-loops in the last block to functions (I believe there is a shared code here with the brenner notebook, and if you'll merge them, these extract function may help improving the clarity of the code)
  • The dims for fit_image_shape (1024,1024) should be set in the new configuration code block
  • [create_histogram ] Please document the function new function create_histogram (i.e. add param types in the function signature, add return type (using ->), and use """ to document the params and the returned value)
  • [create_histogram] add an 'assert' to make sure the 'low_perc' and lower than 'high_perc'
  • Change param overlay_Cellline to overlay_cellline (with a small c)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid dups, I think you can just extend the original color map with the new markers.
Meaning, self.COLOR_MAPPINGS_NIH = {**self.THE_ORIGINAL_COLOR_MAPPINS} # to copy
self.COLOR_MAPPINGS_NIH.update({NEW KEYS AND VALUES})

@@ -299,7 +299,7 @@ def __plot_boxplot(distances:pd.DataFrame, baseline:str, condition:str,
color_key=config_plot.MAPPINGS_COLOR_KEY
condition_name_color_dict = config_plot.COLOR_MAPPINGS_CELL_LINE_CONDITION
condition_to_color = {key: value[color_key] for key, value in condition_name_color_dict.items()}
if not yaxis_cut_ranges: # case where we don't split the y axis
if all(value is None for value in yaxis_cut_ranges.values()): # case where we don't split the y axis
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main branch now has a fix for this bug

@@ -601,6 +600,16 @@ def __bin_pvalues(pvalues):
adjusted_pvalues = np.where((0.0001 <= adjusted_pvalues) & (adjusted_pvalues < 0.01), 0.0001, adjusted_pvalues)
return np.where(adjusted_pvalues < 0.0001, 10**math.floor(np.log10(adjusted_pvalues.min())), adjusted_pvalues)

def __fixed_bin_pvalues(pvalues):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should merge the two functions for the bin_pvalues somehow to avoid duplications. Maybe just get the ranges as an input to the function, or even better put this in the plot configuration file

"""

# Taking the name as id (i.e. 's' and then a number)
return get_filename(path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTANT

How does it work? How does it pick the right image id if the function returns the entire filename?
The function is supposed to return the site id for each marker in order to know which DAPI image should be coupled with it.
Having the image id as the full name should result in finding no corresponding DAPIs

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom preprocessing config should be located under the 'manuscript' folder.
Everything under the 'src' folder should behave as the core/base/interface versions.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's correct, just make sure it aligns with the experiment structure (the real markers, panels, etc organization)

@@ -60,7 +60,7 @@ def sample_and_calc_variance(INPUT_DIR, batch, sample_size_per_markers=200, num_

return variance

def validate_files_proc(path, batch_df, bad_files, marker_info, cell_lines_for_disp):
def validate_files_proc(path, batch_df, bad_files, marker_info, cell_lines_for_disp, validate_antibody= True):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would you want to set the validate_antibody to False?
This validation should always be happening, no?

@@ -342,7 +344,7 @@ def plot_filtering_heatmap(filtered, extra_index, xlabel='', figsize=(5,5), seco
second_p = second_data.pivot_table(index=['rep', extra_index],
columns='cell_line_cond',
values='index')
second_p = second_p.sort_values(by=extra_index)
second_p = second_p.sort_values(by=[extra_index,'rep'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain the reason? Maybe it's necessary; I just want to understand why.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a notebook it's a bit hard to see the changes here :/ Can you please describe in words the key changes that you did here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is already a notebook for plotting images, can you please explain how this notebook is different? (Sorry, github doesn't present notebooks very nicely so hard to see the code for them :/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants