Added predictors and their serializers #13

danyoungday · 2024-07-30T21:58:18Z

Fixed #12
Transferred over the predictors from MVP. Updated them all to hold on to a CAO mapping object since we can't hard-code that anymore.
Added some unit tests for hf persistor.

danyoungday · 2024-07-30T21:59:05Z

src/prsdk/data/cao_mapping.py

This is an immutable struct that we can use to store context, actions, and outcomes inside our models. This allows us to check inputs/outputs/whatever when we compare models, etc. in project-specific contexts.

danyoungday · 2024-07-30T21:59:26Z

src/prsdk/persistence/serializers/neural_network_serializer.py

+        config = {
+            "context": model.cao.context,
+            "actions": model.cao.actions,
+            "outcomes": model.cao.outcomes,


We store the context, actions, and outcomes in our serialization now

danyoungday · 2024-07-30T21:59:39Z

src/prsdk/persistence/serializers/neural_network_serializer.py

+        with open(path / "config.json", "r", encoding="utf-8") as file:
+            config = json.load(file)
+        # Grab CAO out of config
+        cao = CAOMapping(config.pop("context"), config.pop("actions"), config.pop("outcomes"))


We also reconstruct our cao in loading

danyoungday · 2024-07-30T21:59:52Z

src/prsdk/persistence/serializers/sklearn_serializer.py

+        # Extract CAO from config
+        with open(load_path / "config.json", "r", encoding="utf-8") as file:
+            config = json.load(file)
+        cao = CAOMapping(config.pop("context"), config.pop("actions"), config.pop("outcomes"))


We reconstruct cao when loading

danyoungday · 2024-07-30T22:00:01Z

src/prsdk/persistence/serializers/sklearn_serializer.py

+        # Add CAO to the config
+        config = dict(model.config.items())
+        cao_dict = {"context": model.cao.context, "actions": model.cao.actions, "outcomes": model.cao.outcomes}
+        config.update(cao_dict)


Dump our cao into the config now

danyoungday · 2024-07-30T22:00:24Z

src/prsdk/predictors/neural_network/neural_net_predictor.py

+    Data is automatically standardized and the scaler is saved with the model.
+    TODO: We want to be able to have custom scaling in the future.
+    """
+    def __init__(self, cao: CAOMapping, model_config: dict):


New cao arg to pass in to every predictor

danyoungday · 2024-07-30T22:00:43Z

src/prsdk/predictors/neural_network/torch_neural_net.py

+import torch
+
+
+class TorchNeuralNet(torch.nn.Module):


Renamed to TorchNeuralNet from ELUCNeuralNet because it's task-agnostic

danyoungday · 2024-07-30T22:00:56Z

src/prsdk/predictors/predictor.py


 class Predictor(ABC):
    """
    Abstract class for predictors to inherit from.
    Predictors must be able to be fit and predict on a DataFrame.
    It is up to the Predictor to keep track of the proper label to label the output DataFrame.
    """
-    def __init__(self, context: list[str], actions: list[str], outcomes: list[str]):
+    def __init__(self, cao: CAOMapping):


Instead of taking 3 manual lists, we now just pass the singular object

danyoungday · 2024-07-30T22:01:11Z

src/prsdk/predictors/sklearn_predictors/sklearn_predictor.py

+    Simple abstract class for sklearn predictors.
+    Keeps track of features fit on and label to predict.
+    """
+    def __init__(self, cao: CAOMapping, model, model_config: dict):


Takes cao now instead of 3 distinct lists

danyoungday · 2024-07-30T22:01:28Z

tests/persistence/test_hf_persistence.py

Unit tests for persistence. We don't do any saving though

danyoungday · 2024-07-30T22:01:40Z

tests/persistence/test_predictor_serialization.py

Same tests as before

danyoungday · 2024-07-30T22:14:42Z

src/prsdk/persistence/serializers/neural_network_serializer.py

+                                   config["linear_skip"],
+                                   config["dropout"])
+        # Set map_location to CPU to avoid issues with GPU availability
+        nnp.model.load_state_dict(torch.load(path / "model.pt", map_location="cpu"))


We set the map location to CPU to avoid errors if we're loading from a state dict that was saved while on a different device. This is technically not necessary because we move to CPU on save but helps with backward-compatibility

danyoungday · 2024-07-30T22:15:10Z

src/prsdk/persistence/serializers/neural_network_serializer.py

+        with open(path / "config.json", "w", encoding="utf-8") as file:
+            json.dump(config, file)
+        # Put model on CPU before saving
+        model.model.to("cpu")


Move to CPU so that we don't error if we go from M1 to NVIDIA or change architectures like that

…h may no longer match

…ethod to manually set model's device.

danyoungday added 7 commits July 30, 2024 14:53

Updated pylint to work with new X variables

25e8f0e

Added neural net predictor

ea77d71

Added sklearn predictors

408ae5a

Updated models to use new CAO mapping

a156a16

Transferred over unit tests and added some for hf persistence

2cdca67

Updated requirements

2abe978

Added serializers

2677243

danyoungday added the enhancement New feature or request label Jul 30, 2024

danyoungday self-assigned this Jul 30, 2024

danyoungday commented Jul 30, 2024

View reviewed changes

tests/persistence/test_hf_persistence.py Outdated

Copy link

Contributor Author

danyoungday Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests for persistence. We don't do any saving though

danyoungday commented Jul 30, 2024

View reviewed changes

tests/persistence/test_predictor_serialization.py Outdated

Copy link

Contributor Author

danyoungday Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same tests as before

danyoungday added 3 commits July 30, 2024 15:03

Updated workflow to properly set pythonpath

ec589d2

Updated nn serializer to not automatically load onto mps

6c679d1

Set map location to CPU for NN loading to avoid MPS issues

6a871e0

danyoungday commented Jul 30, 2024

View reviewed changes

danyoungday added 3 commits July 30, 2024 15:21

Move values in prediction based on model's device vs self.device whic…

1660c77

…h may no longer match

Does not save device model was trained on to avoid confusion. Added m…

7d35fd0

…ethod to manually set model's device.

Removed erroneous white space

418d676

danyoungday merged commit 94e79fb into main Jul 30, 2024
1 check passed

danyoungday deleted the predictors branch July 30, 2024 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added predictors and their serializers #13

Added predictors and their serializers #13

danyoungday commented Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

danyoungday Jul 30, 2024

Added predictors and their serializers #13

Added predictors and their serializers #13

Conversation

danyoungday commented Jul 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment