Skip to content

Commit

Permalink
Merge pull request #6 from hdr-bgnn/release
Browse files Browse the repository at this point in the history
Make code workflow-compatible, including automated container build
  • Loading branch information
DrJPepper authored Dec 8, 2022
2 parents fa37ac2 + 9f39d93 commit 699e400
Show file tree
Hide file tree
Showing 5 changed files with 211 additions and 40 deletions.
43 changes: 43 additions & 0 deletions .github/workflows/deploy-image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Create and publish a Docker image

on:
release:
types: [published]

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Log in to the Container registry
uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

- name: Build and push Docker image
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
with:
context: .
build-args: |
DATAVERSE_API_TOKEN=${{ secrets.DATAVERSE_API_TOKEN }}
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
39 changes: 39 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
FROM ghcr.io/imageomics/dataverse-access:1 as model_fetcher
ARG DATAVERSE_API_TOKEN
ENV DATAVERSE_URL=https://datacommons.tdai.osu.edu/
ENV MODEL_DV_DOI=doi:10.5072/FK2/MMX6FY

# Download model_final.pth
RUN mkdir -p /model \
&& dva download $MODEL_DV_DOI /model

FROM python:3.8.10-slim-buster
LABEL "org.opencontainers.image.authors"="John Bradley <[email protected]>"
LABEL "org.opencontainers.image.description"="Tool to extract metadata information from fish images"

# Install build requirements
RUN apt-get update \
&& apt-get install -y python3-dev git gcc g++ libgl1-mesa-glx libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*

# Upgrade pip and install pipenv
RUN pip install --upgrade pip
RUN pip install pipenv

WORKDIR /pipeline

# ADD scripts in /pipeline to the PATH
ENV PATH="/pipeline:${PATH}"

COPY Pipfile /pipeline/.

# Install requirements
RUN pipenv install --skip-lock --system && pipenv --clear

COPY config /pipeline/config
COPY --from=model_fetcher /model/model_final.pth \
/pipeline/output/enhanced/model_final.pth

COPY gen_metadata.py /pipeline

CMD echo "python gen_metadata.py"
2 changes: 1 addition & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ pycallgraph = "*"
[dev-packages]

[requires]
python_version = "3.10"
python_version = "3.8.10"
53 changes: 46 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,6 @@

## Goal

To develop a tool to check the validity of metadata associated with an image, and generate things that are missing. Currently setting up off the shelf machine learning to detect presence of fish and count how many there are.
## Status

# In this branch go straight to the folder gen_metadata_mini
=======
To develop a tool to check the validity of metadata associated with an image, and generate things that are missing. Also includes various geometric and statistical properties on the mask generated over the biological specimen presented.

## Functionality
Expand Down Expand Up @@ -72,11 +67,51 @@ The metadata generated is extremely specific to our use case. In addition, we pe

The metadata generated produces various statistical and geometric properties of a biological specimen image or collection in a JSON format. When a single file is passed, the data is yielded to the console (stdout). When a directory is passed, the data is stored in a JSON file.

### Model
The trained model is available as "Drexel_metadata_generator" at https://datacommons.tdai.osu.edu/dataverse/fish-traits/.
The model can be downloaded from that website or via the [dva](https://github.com/Imageomics/dataverse-access) command line utility.
To download from the command line install dva then run the following command:
```
dva download --url https://datacommons.tdai.osu.edu/ doi:10.5072/FK2/MMX6FY .
```
The above command will download the file and verify the checksum.

### Running
To generate the metadata, run the following command:
```bash
pipenv run python3 gen_metadata.py [file_or_dir_name]
```

Usage:
```
gen_metadata.py [-h] [--device {cpu,cuda}] [--outfname OUTFNAME] [--maskfname MASKFNAME] [--visfname VISFNAME]
file_or_directory [limit]
```

The `limit` parameter will limit
the number of files processed in the directory. The `limit` positional argument is only applicable when passing a directory.

#### Device Configuration
By default `gen_metadata.py` requires a GPU (cuda).
To use a CPU instead pass the `--device cpu` argument to `gen_metadata.py`.

#### Single File Usage
The following three arguments are only supported when processing a single image file:
- `--outfname <filename>` - When passed the script will save the output metadata JSON to `<filename>` instead of printing to the console (the default behavior when processing one file).
- `--maskfname <filename>` - Enables logic to save an output mask to `<filename>` for the single input file.
- `--visfname <filename>` - Changes the script to save the output visualization to `<filename>` instead of the hard coded location.

These arguments are meant to simplify adding `gen_metadata.py` to a workflow that process files individually.


### Running with Singularity
A Docker container is automatically built for each **drexel_metadata** release. This container has the requirements installed and includes the model file.
To run the singularity container for a specific version follow this pattern:
```
singularity run docker://ghcr.io/hdr-bgnn/drexel_metadata:<release> gen_metadata.py ...
```


## Properties Generated

| **Property** | **Association** | **Type** | **Explanation** |
Expand Down Expand Up @@ -120,10 +155,14 @@ pipenv run python3 gen_metadata.py [file_or_dir_name]
| solidity | Per Fish | Float | The ratio of pixels in the fish to pixels of the convex hull image. |
| std | Per Fish | Float | The standard deviation of the mask pixel coordinate distribution. |

## Associated Publication

J. Pepper, J. Greenberg, Y. Bakiş, X. Wang, H. Bart and D. Breen, "Automatic Metadata Generation for Fish Specimen Image Collections," 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021, pp. 31-40, doi: [10.1109/JCDL52503.2021.00015](https://doi.org/10.1109/JCDL52503.2021.00015).

Kevin Karnani, Joel Pepper, Yasin Bakis et al. Computational Metadata Generation Methods for Biological Specimen Image Collections, 27 April 2022, PREPRINT (Version 1) available at Research Square <https://doi.org/10.21203/rs.3.rs-1506561/v1>

## Authors

Joel Pepper

Kevin Karnani


Loading

0 comments on commit 699e400

Please sign in to comment.