Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added predictors and their serializers #13

Merged
merged 13 commits into from
Jul 30, 2024
Merged

Added predictors and their serializers #13

merged 13 commits into from
Jul 30, 2024

Conversation

danyoungday
Copy link
Contributor

Fixed #12
Transferred over the predictors from MVP. Updated them all to hold on to a CAO mapping object since we can't hard-code that anymore.
Added some unit tests for hf persistor.

@danyoungday danyoungday added the enhancement New feature or request label Jul 30, 2024
@danyoungday danyoungday self-assigned this Jul 30, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an immutable struct that we can use to store context, actions, and outcomes inside our models. This allows us to check inputs/outputs/whatever when we compare models, etc. in project-specific contexts.

config = {
"context": model.cao.context,
"actions": model.cao.actions,
"outcomes": model.cao.outcomes,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store the context, actions, and outcomes in our serialization now

with open(path / "config.json", "r", encoding="utf-8") as file:
config = json.load(file)
# Grab CAO out of config
cao = CAOMapping(config.pop("context"), config.pop("actions"), config.pop("outcomes"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also reconstruct our cao in loading

# Extract CAO from config
with open(load_path / "config.json", "r", encoding="utf-8") as file:
config = json.load(file)
cao = CAOMapping(config.pop("context"), config.pop("actions"), config.pop("outcomes"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reconstruct cao when loading

# Add CAO to the config
config = dict(model.config.items())
cao_dict = {"context": model.cao.context, "actions": model.cao.actions, "outcomes": model.cao.outcomes}
config.update(cao_dict)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dump our cao into the config now

Data is automatically standardized and the scaler is saved with the model.
TODO: We want to be able to have custom scaling in the future.
"""
def __init__(self, cao: CAOMapping, model_config: dict):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New cao arg to pass in to every predictor

import torch


class TorchNeuralNet(torch.nn.Module):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to TorchNeuralNet from ELUCNeuralNet because it's task-agnostic


class Predictor(ABC):
"""
Abstract class for predictors to inherit from.
Predictors must be able to be fit and predict on a DataFrame.
It is up to the Predictor to keep track of the proper label to label the output DataFrame.
"""
def __init__(self, context: list[str], actions: list[str], outcomes: list[str]):
def __init__(self, cao: CAOMapping):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of taking 3 manual lists, we now just pass the singular object

Simple abstract class for sklearn predictors.
Keeps track of features fit on and label to predict.
"""
def __init__(self, cao: CAOMapping, model, model_config: dict):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takes cao now instead of 3 distinct lists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests for persistence. We don't do any saving though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same tests as before

config["linear_skip"],
config["dropout"])
# Set map_location to CPU to avoid issues with GPU availability
nnp.model.load_state_dict(torch.load(path / "model.pt", map_location="cpu"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set the map location to CPU to avoid errors if we're loading from a state dict that was saved while on a different device. This is technically not necessary because we move to CPU on save but helps with backward-compatibility

with open(path / "config.json", "w", encoding="utf-8") as file:
json.dump(config, file)
# Put model on CPU before saving
model.model.to("cpu")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to CPU so that we don't error if we go from M1 to NVIDIA or change architectures like that

@danyoungday danyoungday merged commit 94e79fb into main Jul 30, 2024
1 check passed
@danyoungday danyoungday deleted the predictors branch July 30, 2024 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transfer predictors from MVP
1 participant