Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Metadata work with single table synthesizers #2140

Merged
merged 10 commits into from
Aug 1, 2024

Conversation

lajohn4747
Copy link
Contributor

@lajohn4747 lajohn4747 commented Jul 18, 2024

resolves #2128
CU-86b19amxa

Made metadata work with single table synthesizers by converting it to a SingleTableMetadata before inserting into the synthesizers.

Added a variable called .real_metadata that is used to hold the Metadata version. This will help with phasing out SIngleTableMetadata in the future. get_metadata is updated to always return the real_metadata which should be of type class Metadata for interface consistency. Altered and fixed some tests so they ensure Metadata is the type that is produced.

This approach allows us to keep backward compatibility with SingleTableMetadata but still force the move over to Metadata. We can converge methods later without the need of creating branch of metadata handling in all the our synthesizers.

@lajohn4747 lajohn4747 requested a review from amontanez24 July 18, 2024 18:08
self._validate_inputs(enforce_min_max_values, enforce_rounding)
self.metadata = metadata
self.metadata = single_metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works but the only issue I see is that the get_metadata method should return the new metadata class, and with this it will return the single table one.

@npatki I think the get_metadata method is the only public way to access the metadata. Is that right? The synthesizers have a .metadata attribute, but that doesn't seem to be in the docs so Idk if that also should return the new metadata class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could update get_metadata() to return a Metadata object instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Another question I have for @npatki is if that method should return a SingleTableMetadata in the case where that's what the user provided, or just always return the new object? I'm thinking of the case where the person saved the synthesizer on an older version, loads it up and does

synthesizer.get_metadata()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amontanez24 @lajohn4747 I imagined that get_metadata() would always return the new, streamlined Metadata object no matter when the synthesizer was first created.

From this point onwards, the streamlined Metadata object is the only external-facing metadata object in play. All references to SingleTableMetadata and MultiTableMetadata would either be deprecated or deleted in the docs.

@lajohn4747 lajohn4747 changed the base branch from main to feature/metadata July 18, 2024 21:57
@lajohn4747 lajohn4747 changed the title Issue 2128 single table use metadata Make Metadata work with single table synthesizers Jul 18, 2024
@sdv-team
Copy link
Contributor

@lajohn4747 lajohn4747 marked this pull request as ready for review July 18, 2024 22:05
@lajohn4747 lajohn4747 requested a review from a team as a code owner July 18, 2024 22:05
@@ -50,6 +51,25 @@ def load_from_dict(cls, metadata_dict, single_table_name=None):
instance._set_metadata_dict(metadata_dict, single_table_name)
return instance

@classmethod
def load_from_single_table_metadata(cls, single_metadata_table, table_name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method be private? (_).

sdv/metadata/metadata.py Outdated Show resolved Hide resolved
sdv/single_table/base.py Outdated Show resolved Hide resolved
@lajohn4747 lajohn4747 marked this pull request as draft July 22, 2024 18:46
@lajohn4747
Copy link
Contributor Author

lajohn4747 commented Jul 22, 2024

I will take a look into converting everything over now instead of trying to make Metadata produce a SingleTableMetadata.

Doing it here: #2144

This current PR just pushes the eventual conversion down the line, but with the above PR, we could easily just delete SingleTableMetadata from synthesizers.

@@ -99,6 +107,14 @@ def __init__(
):
self._validate_inputs(enforce_min_max_values, enforce_rounding)
self.metadata = metadata
if isinstance(metadata, Metadata):
self.metadata = metadata._convert_to_single_table()
self.real_metadata = metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll remove this attribute and just convert it to the new metadata when calling get_metadata

@lajohn4747
Copy link
Contributor Author

@amontanez24 why not take over #2144 instead?

@amontanez24
Copy link
Contributor

@amontanez24 why not take over #2144 instead?

I think rewriting the code to support the new Metadata class in internal classes is overkill. To maintain backwards compatibility, we have to make sure to convert the class over everywhere that uses it like you were doing in that PR. I think it ends up being harder to read and riskier when the change is really only required for user facing purposes. If we were to do another major release, then we could just drop these old classes and write everything with the new class which would be cleaner, but until then I don't think it's worth it.

I'm also not sure we'll remove the concept of "SingleTableMetadata". The internal classes like constraints and the DataProcessor only work with one table at a time, so that construct still seems to make sense for them

@amontanez24 amontanez24 marked this pull request as ready for review July 31, 2024 16:47
@@ -269,7 +267,7 @@ def get_parameters(self):

def get_metadata(self):
"""Return the ``Metadata`` for this synthesizer."""
return self.real_metadata
return Metadata.load_from_dict(self.metadata.to_dict())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, we always get a new instance of the metadata (it's okay, I just want to confirm that this is the expected return).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the docs say we return a copy so it should be fine

@amontanez24 amontanez24 merged commit 9adc130 into feature/metadata Aug 1, 2024
39 checks passed
@amontanez24 amontanez24 deleted the issue_2128_single_table_use_metadata branch August 1, 2024 17:22
R-Palazzo pushed a commit that referenced this pull request Sep 19, 2024
R-Palazzo pushed a commit that referenced this pull request Sep 24, 2024
R-Palazzo pushed a commit that referenced this pull request Sep 26, 2024
R-Palazzo pushed a commit that referenced this pull request Sep 26, 2024
amontanez24 added a commit that referenced this pull request Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants