Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #34

Merged
merged 59 commits into from
Jan 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
ca35d5b
BUG: Fix the __str__ method from all the ML models
acsenrafilho Jan 10, 2025
1e731a4
ENH: Add new test data for image classification dataset being already…
acsenrafilho Jan 10, 2025
2f7b5cb
ENH: Changed module name from model_download to kaggle_helpers
acsenrafilho Jan 10, 2025
fca10ff
ENH: Rename utils.py and improve dataset loading logic to check for o…
acsenrafilho Jan 10, 2025
8f261c8
STY: Refactor import statements to streamline utility function access
acsenrafilho Jan 10, 2025
ad3eb8f
ENH: Add tests for image classification and segmentation models' stri…
acsenrafilho Jan 10, 2025
37be488
STY: Update import path for utility functions in test_trainers_utils.py
acsenrafilho Jan 10, 2025
e5ab965
ENH: Add tests for dataset loading and parameter changes in ImageClas…
acsenrafilho Jan 10, 2025
c873842
STY: Update import path for load_cucaracha_dataset in ImageSegmentati…
acsenrafilho Jan 10, 2025
588b401
ENH: Update ImageClassificationTrainer to include batch size, epochs,…
acsenrafilho Jan 10, 2025
bc31328
ENH: Update post_test command to delete .keras files after coverage r…
acsenrafilho Jan 10, 2025
271ee91
ENH: Add Image Classification Builder script with parameter validatio…
acsenrafilho Jan 10, 2025
26d1292
ENH: Add image classification task with input validation and model pr…
acsenrafilho Jan 10, 2025
a039329
STY: Update import paths for alignment, noise removal, and threshold …
acsenrafilho Jan 10, 2025
3a35ff7
ENH: Add sample personal document paths for testing
acsenrafilho Jan 10, 2025
b63d944
BUG: Update import paths to use tasks module for alignment, noise rem…
acsenrafilho Jan 10, 2025
04e467a
ENH: Implement collect_cucaracha_model function for model retrieval a…
acsenrafilho Jan 10, 2025
d9ed76b
ENH: Add default model lake path and extend CUCARACHA_PRESETS with ne…
acsenrafilho Jan 10, 2025
8855f6e
ENH: Update image classification builder to provide class labels file…
acsenrafilho Jan 10, 2025
5e858ad
STY: Remove outdated tests for downloading Cucaracha model and dataset
acsenrafilho Jan 10, 2025
7fb1243
STY: Add TODO comment to clarify visibility of prepare_image_classifi…
acsenrafilho Jan 10, 2025
920036d
ENH: Add identification functions for personal documents and signed d…
acsenrafilho Jan 10, 2025
0129375
DOC: Add detailed docstrings for model architect and identification f…
acsenrafilho Jan 10, 2025
c0af7d1
STY: Clean up docstring example that is not passing doctest in call_c…
acsenrafilho Jan 10, 2025
2075c1e
DOC: Fix typos and improve formatting in docstrings for identificatio…
acsenrafilho Jan 10, 2025
eed1ee3
ENH: Add data augmentation support and improve model training configu…
acsenrafilho Jan 14, 2025
35d297a
WIP: Refactor ImageSegmentationTrainer to improve model initializatio…
acsenrafilho Jan 14, 2025
d7adf65
ENH: Add keras and scipy dependencies to pyproject.toml for improved …
acsenrafilho Jan 14, 2025
294b476
ENH: Update keras to version 3.8.0 and add scipy dependency for enhan…
acsenrafilho Jan 14, 2025
528be53
ENH: Add numpy and tensorflow imports to image classification builder…
acsenrafilho Jan 14, 2025
4a00931
ENH: Add data augmentation tests and validate architecture patterns i…
acsenrafilho Jan 16, 2025
07c734d
ENH: Add script for evaluating pretrained image classification models…
acsenrafilho Jan 16, 2025
ee33725
ENH: Add --arch_list option to list available image classification ar…
acsenrafilho Jan 16, 2025
af7ba0f
ENH: Add functionality to remove existing 'logs' folder in output dir…
acsenrafilho Jan 16, 2025
f069df4
ENH: Add custom TensorBoard callback for logging training images duri…
acsenrafilho Jan 16, 2025
b108987
ENH: Update argument parsing to require dataset path, number of class…
acsenrafilho Jan 16, 2025
114a88f
REF: Remove redundant architecture modality check in architecture val…
acsenrafilho Jan 16, 2025
7d33184
ENH: Improve readability and add pragma comments in image dataset loa…
acsenrafilho Jan 16, 2025
cf46b69
ENH: Add function to plot confusion matrix
acsenrafilho Jan 16, 2025
df8f783
ENH: Add scikit-learn and seaborn dependencies to pyproject.toml
acsenrafilho Jan 16, 2025
3b401d0
ENH: Update poetry.lock
acsenrafilho Jan 16, 2025
c190f79
ENH: Sort subfolders in image dataset loading function for matching t…
acsenrafilho Jan 17, 2025
acdefe4
ENH: Add pragma comments to abstract methods in ModelArchitect class
acsenrafilho Jan 17, 2025
db743d3
STY: Adjust method position inside ImageClassificationTrainer class f…
acsenrafilho Jan 17, 2025
242e9da
ENH: Add support for loading class labels from a JSON file in model e…
acsenrafilho Jan 17, 2025
7110209
ENH: Add test for loading dataset without data augmentation in ImageC…
acsenrafilho Jan 17, 2025
57f8f4a
BUG: Adding init file to tasks module
acsenrafilho Jan 17, 2025
3e8de3f
ENH: Update variation path for Brazilian personal document model and …
acsenrafilho Jan 17, 2025
db67397
STY: Fix formatting in ModelCheckpoint callback initialization for co…
acsenrafilho Jan 17, 2025
1dc10e5
ENH: Add test for downloading all Cucaracha models to ensure valid mo…
acsenrafilho Jan 17, 2025
1cb583b
ENH: Add image_auto_fit function to resize images for Keras model input
acsenrafilho Jan 17, 2025
e03a3c2
ENH: Integrate image_auto_fit function for automatic input resizing i…
acsenrafilho Jan 17, 2025
cdc0296
WIP: Remove commented-out test cases in image segmentation and identi…
acsenrafilho Jan 17, 2025
c95f377
STY: Remove unnecessary blank line in __init__.py for improved readab…
acsenrafilho Jan 20, 2025
ce666fb
ENH: Add allowed image extensions for validation in configuration
acsenrafilho Jan 20, 2025
a990cc1
ENH: Add image dataset augmentation script for generating augmented d…
acsenrafilho Jan 20, 2025
bcad26a
ENH: Add option to visualize training samples before proceeding with …
acsenrafilho Jan 20, 2025
54ec923
ENH: Improve data augmentation techniques and add method for collecti…
acsenrafilho Jan 20, 2025
2c59293
BUG: Remove unused image files links from the classification dataset
acsenrafilho Jan 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions cucaracha/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
from rich import print
from rich.progress import track

from cucaracha.aligment import inplane_deskew
from cucaracha.noise_removal import sparse_dots
from cucaracha.threshold import otsu
from cucaracha.tasks.aligment import inplane_deskew
from cucaracha.tasks.noise_removal import sparse_dots
from cucaracha.tasks.threshold import otsu


class Document:
Expand Down
1 change: 1 addition & 0 deletions cucaracha/configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ALLOWED_IMAGE_EXTENSIONS = ('.png', '.jpg', '.jpeg')
21 changes: 13 additions & 8 deletions cucaracha/ml_models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,27 @@
import os

VALID_MODALITIES = [
'image_classification',
'image_keypoint_detection',
'image_object_detection',
'image_segmentation',
]
DEFAULT_MODEL_LAKE = os.path.join(
os.path.expanduser('~'), '.cache', 'kagglehub', 'models'
)

# Pre-treined cucahacha models
CUCARACHA_PRESETS = {
'image_classification': {
'doc_is_signed': {
'variation': 'cucaracha-project/cucaracha-imgclass-document-is-signed/tensorFlow2/cucaracha-imgclass-document_is_signed-v0.1.0',
'dataset': 'cucaracha-project/cucaracha-mod-imgclass-constains-signature',
}
},
'image_segmentation': {
'TBD': {
'variation': 'cucaracha-project/TBD',
'dataset': 'cucaracha-project/TBD',
}
},
'labels': {0: 'unsigned', 1: 'signed'},
},
'cnh_cpf_rg': {
'variation': 'cucaracha-project/cucaracha-imgclass-brazilian-personal-document/keras/cucaracha-imgclass-brazilian-personal-document',
'dataset': 'cucaracha-project/cucaracha-mod-imgclass-brazilian-personal-doc',
'labels': {0: 'rg', 1: 'others', 2: 'cpf', 3: 'cnh'},
},
}
}
3 changes: 2 additions & 1 deletion cucaracha/ml_models/image_classification/alex_net.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,5 +77,6 @@ def get_model(self):
return keras.Model(inputs, outputs)

def __str__(self):
super().__str__()
output = super().__str__()
self.get_model().summary(show_trainable=True)
return output
3 changes: 2 additions & 1 deletion cucaracha/ml_models/image_classification/dense_net_121.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,5 +40,6 @@ def get_model(self):
)

def __str__(self):
super().__str__()
output = super().__str__()
self.get_model().summary(show_trainable=True)
return output
3 changes: 2 additions & 1 deletion cucaracha/ml_models/image_classification/model_soup.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,5 +47,6 @@ def get_model(self):
return keras.Model(model.input, prediction)

def __str__(self):
super().__str__()
output = super().__str__()
self.get_model().summary(show_trainable=True)
return output
3 changes: 2 additions & 1 deletion cucaracha/ml_models/image_classification/res_net_50.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,5 +40,6 @@ def get_model(self):
)

def __str__(self):
super().__str__()
output = super().__str__()
self.get_model().summary(show_trainable=True)
return output
3 changes: 2 additions & 1 deletion cucaracha/ml_models/image_classification/small_xception.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,5 +70,6 @@ def get_model(self):
return keras.Model(inputs, outputs)

def __str__(self):
super().__str__()
output = super().__str__()
self.get_model().summary(show_trainable=True)
return output
3 changes: 2 additions & 1 deletion cucaracha/ml_models/image_segmentation/unet_xception.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,5 +91,6 @@ def get_model(self):
return keras.Model(inputs, outputs)

def __str__(self):
super().__str__()
output = super().__str__()
self.get_model().summary(show_trainable=True)
return output
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import os

import kagglehub

from cucaracha.ml_models import CUCARACHA_PRESETS
from cucaracha.ml_models import CUCARACHA_PRESETS, DEFAULT_MODEL_LAKE


def download_cucaracha_model(model_url: str):
Expand Down Expand Up @@ -74,3 +76,52 @@ def download_cucaracha_dataset(dataset_url: str):
raise ValueError(f'Error downloading the dataset: {e}')

return path


def collect_cucaracha_model(cucaracha_preset: str):
"""
Collects the Cucaracha model from the given preset.

This function checks if the provided model preset is present in the CUCARACHA_PRESETS dictionary.
If the preset is valid, it attempts to download the model using the kagglehub library.

The downloaded files are located in the home/.cache folder.

Note:
We used the kagglehub library to make all the operations here. The
CUCARACHA_PRESETS dictionary is expected to have a nested structure where
the model variations are stored under a 'variation' key. If the URL is
valid, it attempts to download the model using the kagglehub library.

Args:
cucaracha_preset (str): The name of the model preset to be downloaded.
Returns:
str: The path where the model is downloaded.
Raises:
ValueError: If the model preset is not present in CUCARACHA_PRESETS or if there is an error during download.
"""
found = False
modality = None
for mod in CUCARACHA_PRESETS.values():
if cucaracha_preset in mod:
found = True
modality = next(
key
for key, value in CUCARACHA_PRESETS.items()
if cucaracha_preset in value
)
break

if not found:
raise ValueError(
f'Model preset {cucaracha_preset} is not present in CUCARACHA_PRESETS'
)

model_url = CUCARACHA_PRESETS[modality][cucaracha_preset]['variation']

output = {
'model_path': download_cucaracha_model(model_url),
'modality': modality,
'labels': CUCARACHA_PRESETS[modality][cucaracha_preset]['labels'],
}
return output
16 changes: 13 additions & 3 deletions cucaracha/ml_models/model_architect.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,27 @@


class ModelArchitect(ABC):
"""
Abstract base class for defining model architectures.
Attributes:
modality (str): The modality of the model architecture. Expected values are defined in VALID_MODALITIES.
Methods:
get_model():
Abstract method to be implemented by subclasses to return the model architecture.
__str__():
Returns a string representation of the model architecture, including its modality.
"""

def __init__(self, **kwargs):
self.modality = kwargs.get('modality', None)
# valid_modalities = ['image_classification', 'image_keypoint_detection', 'image_object_detection']
if self.modality is None or self.modality not in VALID_MODALITIES:
raise ValueError(
f'Invalid modality. Expected one of {VALID_MODALITIES}, got {self.modality}'
)

@abstractmethod
def get_model(self):
def get_model(self): # pragma: no cover
pass

def __str__(self):
def __str__(self): # pragma: no cover
return f'Model Architecture modality: {self.modality}'
Loading
Loading