Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool for analysing images using Bioimage AI models #1391

Merged
merged 35 commits into from
Aug 2, 2024

Conversation

anuprulez
Copy link
Contributor

@anuprulez anuprulez commented Mar 1, 2024

images

Test files:
https://bioimage.io/#/?id=10.5281%2Fzenodo.5764892
https://zenodo.org/api/records/6647674

Remaining tasks:

  • Support for creating the original predicted image matrix
  • Add tool tests (imaging models are large (> 50 MB and sometimes as large as 500 MB) and it is unclear how to add sample data for tests)
  • Take input asTiff files if possible
  • Test with models of different input/output dimensions (3,4,5,...)
  • Add support for TensorFlow

</command>
<inputs>
<param name="input_imaging_model" type="data" format="zip" label="BioImage model" help="Please load a BioImage model from file uploader"/>
<param name="input_image_file" type="data" format="npz" label="Image to be analysed" help="Please provide an image"/>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPZ is numpy isn't it? How do people create such image? Should we create an NPZ file from a tiff, png as part of this tool?

Copy link
Contributor Author

@anuprulez anuprulez Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, NPZ is a numpy zipped file, I think. To show images as PNG or Tiff, we have to reduce the dimensionality of the original multi-dimensional image. Sometimes, these are 3 or 4 dimensional, containing several channels. To display as PNG, I am taking only the first channels where we lose some information from images. But my idea is to save these originally predicted matrices with all dimensions. I also need to think about taking PNG and Tiff as inputs and producing the original matrix along side displayable formats such as PNG, Tiff.

Copy link
Contributor Author

@anuprulez anuprulez Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beatrizserrano do you know how these NPY input/output files are generated to be used with bioimage models (e.g. https://bioimage.io/#/?id=10.5281%2Fzenodo.5764892)? Do we use some Galaxy tool to generate it? I tried to use TIFF files as input to a few models, but they do not contain a pixel matrix of images in the correct dimensions for the model to take as input. However, NPY files work fine with the models but could be tricky for the end users of this tool to generate such test/input NPY files. Or do you know someone who worked on such models?

Thank you!

Copy link

@FynnBe FynnBe Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chiming in on this on @beatrizserrano 's request:
I'm unaware of the context of this PR, but can report from the bioimage.io side: We demand .npy test files to be provided alongside every model contributed to bioimage.io. Among the utility functions we provide in bioimageio.core there is
https://github.com/bioimage-io/core-bioimage-io-python/blob/f798344213c179a6c836938aff9e4f6c46f8d23c/bioimageio/core/utils/image_helper.py#L136
-- a util function using imageio to attempt to load any given image and then attempting to guess it's axes, see
https://github.com/bioimage-io/core-bioimage-io-python/blob/f798344213c179a6c836938aff9e4f6c46f8d23c/bioimageio/core/utils/image_helper.py#L42

Software using the model zoo (or even specifically bioimageio.core) should try to delegate this guesswork to the end user instead.

note: the links above are from a development branch. These updated functions will be available in the coming bioimageio.core release, currently this functionality is implemented here

anuprulez and others added 3 commits March 15, 2024 15:26
Fix suggestion in name

Co-authored-by: Leonid Kostrykin <[email protected]>
Fix suggestion to include edam annotation

Co-authored-by: Leonid Kostrykin <[email protected]>
@bgruening
Copy link
Owner

@anuprulez is this still WIP or can we merge it. @beatrizserrano needs it :)

@kostrykin
Copy link
Contributor

kostrykin commented Jun 30, 2024

Guys, my impression of the comments above is that there has been some confusion. So in order to clear things up, I want to try to summarize the main concerns:

Concern 1: Support of TIFF/PNG

@bgruening was addressing the input file format, which currently is only NPZ. For those people who are the target audience of this tool, this isn't a well-established standard file format such as TIFF or PNG. However, @anuprulez explained the conversion to TIFF or PNG of the output files, pointing out that it is not straightforward.

Lets consider the questions of the input file formats and the output file formats separately.

1.1) Inputs: The NPY/NPZ formats are more general than TIFF and PNG, since they can store arbitrary numpy arrays, with arbitrary data types (even mixed) and number of dimensions. Thus, conversion from TIFF/PNG to NPY/NPZ should be straightforward. Given the concerns above, I think the tool wrapper really should also accept TIFF and PNG input files and do this conversion automatically. This should be as simple as something like this:

im = imread('tiff or png image file path')
np.save('input.npy', im)  # to produce an NPY
np.savez('input.npz', im)  # to produce an NPZ

1.2) Outputs: It was pointed out that the intention of outputting the originally predicted arrays as NPZ was that no information is lost. However, I'm wondering whether converting the data to TIFF is even capable of losing information? My gut feeling is that using TIFF as the output file format should be safe, at least as long as we restrict the wrapper to NPY instead of NPZ (see below).

Concern 2: NPY or NPZ?

Please keep in mind that NPY is "the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk", and NPZ is "the standard format for persisting multiple NumPy arrays on disk", which can be compressed, but not necessarily (docs). The key difference here is that NPY is for single arrays, NPZ is for multiple arrays.

At this point I'm somewhat confused. I hope that @FynnBe can maybe add something regarding the following two concerns:

2.1) What happens if the NPZ contains more than one array? Do the bioimage.io models process them independently and yield another NPZ, which contains predictions for each array in the NPZ? Or do the models bluntly fail to process an NPZ with more than one array?

2.2.) As far as I understand the comment made by @FynnBe (link), the bioimage.io models are only guaranteed to work with NPY inputs. Doesn't that mean, that the wrapper should actually take/produce NPY instead of NPZ? If this is right, then we should forget about NPZ and restrict our discussions here to PNG, TIFF, and NPY.


cc @beatrizserrano

@anuprulez
Copy link
Contributor Author

I was on holidays. I will have a look at it this week. Can you let me know when you require this feature, I can prioritise it @beatrizserrano thanks!

@kostrykin thank you for the clarification.

@beatrizserrano
Copy link
Contributor

Thank you @anuprulez! 🤗

We want to develop tutorials at the Biohackathon in November, so it would be perfect to be able to test the tool well before that. Thanks again!

@anuprulez
Copy link
Contributor Author

@kostrykin thank you for the explanation of NPY and NPZ.

I would like to elaborate the issue of using TIFF or PNG files as input. When these files are read (using OpenCV), they are represented as 2 dimensional image.

Example from Neuron Segmentation in EM model

TIFF input file shape: torch.Size([360, 360])

However, the NPY file has shape torch.Size([1, 1, 32, 256, 256])

When I use the TIFF file as the test input for prediction, I receive the following error:

RuntimeError: Expected 5-dimensional input for 5-dimensional weight [64, 1, 3, 3, 3], but got 2-dimensional input of size [360, 360] instead

The model expects an input with 5 dimensions but the TIFF file is only 2 dimensional. For other models, the input shape varies - sometimes it is 3,4 or 5. When I use the NPY file as input, it works fine and the model predicts the segmentations.

As suggested by @FynnBe, I tried to use imageo and bioimageo packages and the suggested converter methods, but TIFF file dimensions don't match that of the respective model.

The issue is to find a way to transform TIFF or PNG images to be represented in the valid dimensions so that the respective AI model can consume the input files and produce desired output.

Any suggestions here would be really helpful!

Thank you!

@beatrizserrano

@kostrykin
Copy link
Contributor

kostrykin commented Jul 22, 2024

Thanks for the follow-up @anuprulez

I see the challenge that comes with conversion of PNG/TIFF to NPY. However, by accepting only NPY and dropping support for PNG/TIFF, we would simply shift that challenge to the user of the tool, which IMO we shouldn't do.

The core issue is to determine the expected number of dimensions of the input, the conversion then simply is some reshaping, right? I see a couple of options here, in descending order of preference:

Option 1. My suspicion is that the only way to get the dimensions of the TIFF right automatically is to rely on what the model declares as the required input dimension — if it does declare anything. You do model = torch.load(model_path) to load the model, is there anything like model.input_ndim defined? I'm not familiar with the PyTorch API, but unfortunately, torch.load does not specify anything useful for what it returns.

Option 2. You quoted the error "RuntimeError: Expected 5-dimensional input for 5-dimensional weight [64, 1, 3, 3, 3], but got 2-dimensional input of size [360, 360] instead", where is that raised from? By looking into the code for where this check is being performed, we might be able to reverse-engineer how the expected dimension of the models is stored.

Option 3. In the worst-case (i.e. if Option 2 leads to no solution for some reason), I'd fallback to the documentation, like: Point the user to the documentation of the models, where, hopefully, the required number of dimensions is stated. Add an input field to the tool UI, where the user has to write down the number that was looked up from the model documentation.

@anuprulez
Copy link
Contributor Author

@kostrykin thank you for the interesting pointers. I will explore how to get the input shape of the models dynamically.

Copy link
Contributor

@kostrykin kostrykin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much thanks for adding the PNG support and everything!

Just very few comments left, sorry for being a bit picky maybe.

tools/bioimaging/bioimage_inference.xml Outdated Show resolved Hide resolved
<token name="@VERSION_SUFFIX@">0</token>
</macros>
<creator>
<organization name="BioImage.IO" url="https://bioimage.io/#/" email="" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you affiliated with BioImage.IO? @anuprulez Otherwise I wouldn't list them as a creator of this tool wrapper.

Comment on lines 39 to 40
<param name="input_image_file" type="data" format="tiff,png" label="Input image" help="Please provide an input image for the analysis."/>
<param name="input_image_input_size" type="text" label="Shape of the input image" help="Provide shape of input image. See chosen model's RDF file to find correct input shape. For example: for the BioImage.IO model MitochondriaEMSegmentationBoundaryModel, the input shape is 256 x 256 x 32 x 1. Enter the shape as '256,256,32,1'."/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation, replace x by multiplication symbol, add some missing articles:

Suggested change
<param name="input_image_file" type="data" format="tiff,png" label="Input image" help="Please provide an input image for the analysis."/>
<param name="input_image_input_size" type="text" label="Shape of the input image" help="Provide shape of input image. See chosen model's RDF file to find correct input shape. For example: for the BioImage.IO model MitochondriaEMSegmentationBoundaryModel, the input shape is 256 x 256 x 32 x 1. Enter the shape as '256,256,32,1'."/>
<param name="input_image_file" type="data" format="tiff,png" label="Input image" help="Please provide an input image for the analysis."/>
<param name="input_image_input_size" type="text" label="Shape of the input image" help="Provide the shape of the input image. See the chosen model's RDF file to find the correct input shape. For example, for the BioImage.IO model MitochondriaEMSegmentationBoundaryModel, the input shape is 256 256 32 1. Enter the shape as '256,256,32,1'."/>

I also believe that avoiding the genitive s is a good practice in technical writing, since it helps keeping things clearly structured, and the apostrophe can easily be misperceived as a quotation mark, so I suggested it in my previous review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I replaced the x symbol by *. For this sentence Enter the shape as '256,256,32,1', I have left out any quotation or symbols to avoid confusion. It should be entered just like any other text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anuprulez I just looked into the RDF file because I thought it would be good to use their notation and saw that they use the x notation :) much sorry! I didn't know that you sticked to the RDF notation… so actually using the x notation which you used originally, then is the best option, I think.

tools/bioimaging/bioimage_inference.xml Outdated Show resolved Hide resolved
tools/bioimaging/bioimage_inference.xml Show resolved Hide resolved
tools/bioimaging/bioimage_inference.xml Outdated Show resolved Hide resolved
anuprulez and others added 2 commits August 2, 2024 09:24
Fix review comments

Co-authored-by: Leonid Kostrykin <[email protected]>
Fix review comments

Co-authored-by: Leonid Kostrykin <[email protected]>
Add missing articles and replace shape by size

Co-authored-by: Leonid Kostrykin <[email protected]>
Copy link
Contributor

@kostrykin kostrykin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anuprulez!

<creator>
<organization name="BioImage.IO" url="https://bioimage.io/#/" email="" />
<person name="Beatriz Serrano-Solano" email="" />
<person name="Leonid Kostrykin" email="[email protected]" />
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anuprulez you forgot yourself here.

@beatrizserrano your email?

I would probably use <person givenName="" familyName="" email="" />

tools/bioimaging/bioimage_inference.xml Outdated Show resolved Hide resolved
tools/bioimaging/bioimage_inference.xml Outdated Show resolved Hide resolved
</xrefs>
<requirements>
<requirement type="package" version="3.9.12">python</requirement>
<requirement type="package" version="2.3.1">pytorch</requirement>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<requirement type="package" version="2.3.1">pytorch</requirement>
<requirement type="package" version="@TOOL_VERSION@">pytorch</requirement>

was that the intention behind the version number?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I actually forgot to add this in my review.

Copy link
Contributor Author

@anuprulez anuprulez Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tool's version should not replace Pytorch's version. Something else changes in the tool (making the tool's version to change) while Pytorch remains that same - could be problematic in future. Since, this is the first tool, I would keep the tool's version as 1.0.0

Copy link
Contributor

@kostrykin kostrykin Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wrapper is rather thin and is mostly "just" a wrapper for torch.load and the invocation of the model (the rest is pre- and post-processing). This suggests that the wrapper should have the same version as the wrapped tool, PyTorch in this case. If the wrapper changes in the future, you would usually increment @VERSION_SUFFIX@. This is also what I had suggested in #1391 (comment) and it is more in line with IUC recommendations https://galaxy-iuc-standards.readthedocs.io/en/latest/best_practices/tool_xml.html#tool-versions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I misinterpreted the previous comment. It's restored based on your comment. Thanks!!

tools/bioimaging/.shed.yml Show resolved Hide resolved
Copy link
Contributor

@kostrykin kostrykin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor consistency issue

tools/bioimaging/bioimage_inference.xml Outdated Show resolved Hide resolved
tools/bioimaging/bioimage_inference.xml Outdated Show resolved Hide resolved
Co-authored-by: Leonid Kostrykin <[email protected]>
@bgruening bgruening merged commit 57f4673 into bgruening:master Aug 2, 2024
11 checks passed
@bgruening
Copy link
Owner

Thanks everyone!

@anuprulez
Copy link
Contributor Author

Thanks @bgruening for merging.

Thank you very much @kostrykin for spending so much time reviewing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants