-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
386f630
commit 1abc003
Showing
1 changed file
with
22 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Fine-tune with a SVM | ||
Creates a model that combines pybioclip image embeddings with a SVM using images from the [Somnath01/Birds_Species](https://huggingface.co/datasets/Somnath01/Birds_Species) dataset. This dataset contains 1000 train images, 403 test images, and 50 validation images. This notebook only uses the train and test images. This dataset was chosen for convenience. No analysis of the suitability of this dataset has been done. | ||
|
||
When running this notebook in COLAB change the _runtime type_ to a GPU type to speed up processing. Additionally when running the next step in COLAB you you may see an error about the version of `fsspec` installed. This issue doesn't seem to cause any problem with this notebook. | ||
|
||
## Load dataset | ||
This step takes around 7 minutes to download the images the first time it is run. | ||
|
||
## Setup a SVM model | ||
The `init_svc()` function is copied from [biobench newt](https://github.com/samuelstevens/biobench/blob/637432bfda2b567d966d49bf8c4b37b339d4dc2a/biobench/newt/__init__.py#L247-L262) | ||
created by [@samuelstevens](https://github.com/samuelstevens). | ||
|
||
## Train the SVM model | ||
Trains the SVM using the train dataset. This step takes ~ 10 minutes when running on CPU and ~1 minute otherwise. | ||
|
||
|
||
## Create predictions | ||
Predicts species for the test dataset. This step takes ~ 5 minutes when running on CPU and ~1 minute otherwise. | ||
|
||
## Compare against untrained pybioclip model | ||
This step takes ~ 6 minutes when running on CPU and ~1 minute otherwise. | ||
|