-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Brenner reports; Simplify Brenner setting #74
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proceeding the preprocessing pipeline of the NIH data is in greater priority,
But, here are my comments:
Hi, since this is a notebook I can't really comment per line. Hence, I summarize here the comments:
Comments for both files:
- Please delete code that you don't need (I know that it was there before), such as definitions of functions at the top (including the weird "d" param we create temporarily at some point and can be risky) of the notebook, the original code block and what's after it.
- Since, in the end, a non-computational person should be able to run this alone, I found it easier to update the mappings automatically every time you run the Examine Brenner block.
- I think the "create_brenner_reports" can be nicely integrated into the set_brenner notebook as another block. Moreover, there are some overlapping code between them such as finding the percentiles and filtering by them etc. I'd try to extract this logic to function and use it in both blocks of code.
- Please move the configuration params, meaning the df filepath, the mappings_filepath, percentile_ranges ,output_folder , etc.., to a new code block at the begining of the notebook and titled it as "Params"/"Configuration"/something like this.
- Make sure the notebook is setting NOVA_HOME to the ".....Collaboration/NOVA" (instead of NOVA_GAL) when pushing it to the main branch.
- You have sys.insert(...) twice
- Whenever you push a notebook, please make sure it's empty from images since it may inflate the size of the file
Comments specifically for create_brenner_reports.ipynb:
- Set the output_folder to be something inside the preprocessing_tools folder
- Since we have other data sources than NIH, we should probably save the pdfs to a subfolder (ex. ....../brenner_reports/NIH). While saying that, maybe you should save all markers in the same pdf file (multiple pages) and call it, for example ..../brenner_reports/NIH.pdf)
- Please break the content of the for-loops in the last block to functions (I believe there is a shared code here with the brenner notebook, and if you'll merge them, these extract function may help improving the clarity of the code)
- The dims for fit_image_shape (1024,1024) should be set in the new configuration code block
- [create_histogram ] Please document the function new function create_histogram (i.e. add param types in the function signature, add return type (using ->), and use """ to document the params and the returned value)
- [create_histogram] add an 'assert' to make sure the 'low_perc' and lower than 'high_perc'
- Change param overlay_Cellline to overlay_cellline (with a small c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid dups, I think you can just extend the original color map with the new markers.
Meaning, self.COLOR_MAPPINGS_NIH = {**self.THE_ORIGINAL_COLOR_MAPPINS} # to copy
self.COLOR_MAPPINGS_NIH.update({NEW KEYS AND VALUES})
@@ -299,7 +299,7 @@ def __plot_boxplot(distances:pd.DataFrame, baseline:str, condition:str, | |||
color_key=config_plot.MAPPINGS_COLOR_KEY | |||
condition_name_color_dict = config_plot.COLOR_MAPPINGS_CELL_LINE_CONDITION | |||
condition_to_color = {key: value[color_key] for key, value in condition_name_color_dict.items()} | |||
if not yaxis_cut_ranges: # case where we don't split the y axis | |||
if all(value is None for value in yaxis_cut_ranges.values()): # case where we don't split the y axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main branch now has a fix for this bug
@@ -601,6 +600,16 @@ def __bin_pvalues(pvalues): | |||
adjusted_pvalues = np.where((0.0001 <= adjusted_pvalues) & (adjusted_pvalues < 0.01), 0.0001, adjusted_pvalues) | |||
return np.where(adjusted_pvalues < 0.0001, 10**math.floor(np.log10(adjusted_pvalues.min())), adjusted_pvalues) | |||
|
|||
def __fixed_bin_pvalues(pvalues): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge the two functions for the bin_pvalues somehow to avoid duplications. Maybe just get the ranges as an input to the function, or even better put this in the plot configuration file
""" | ||
|
||
# Taking the name as id (i.e. 's' and then a number) | ||
return get_filename(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMPORTANT
How does it work? How does it pick the right image id if the function returns the entire filename?
The function is supposed to return the site id for each marker in order to know which DAPI image should be coupled with it.
Having the image id as the full name should result in finding no corresponding DAPIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom preprocessing config should be located under the 'manuscript' folder.
Everything under the 'src' folder should behave as the core/base/interface versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's correct, just make sure it aligns with the experiment structure (the real markers, panels, etc organization)
@@ -60,7 +60,7 @@ def sample_and_calc_variance(INPUT_DIR, batch, sample_size_per_markers=200, num_ | |||
|
|||
return variance | |||
|
|||
def validate_files_proc(path, batch_df, bad_files, marker_info, cell_lines_for_disp): | |||
def validate_files_proc(path, batch_df, bad_files, marker_info, cell_lines_for_disp, validate_antibody= True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would you want to set the validate_antibody to False?
This validation should always be happening, no?
@@ -342,7 +344,7 @@ def plot_filtering_heatmap(filtered, extra_index, xlabel='', figsize=(5,5), seco | |||
second_p = second_data.pivot_table(index=['rep', extra_index], | |||
columns='cell_line_cond', | |||
values='index') | |||
second_p = second_p.sort_values(by=extra_index) | |||
second_p = second_p.sort_values(by=[extra_index,'rep']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain the reason? Maybe it's necessary; I just want to understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a notebook it's a bit hard to see the changes here :/ Can you please describe in words the key changes that you did here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is already a notebook for plotting images, can you please explain how this notebook is different? (Sorry, github doesn't present notebooks very nicely so hard to see the code for them :/)
Changes made: