Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty dataframe handling in conditional sampling #2335

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

omelyanchikd
Copy link

@omelyanchikd omelyanchikd commented Jan 7, 2025

Resolves #2334

Error 1:

  • If single table synthesizer model doesn't produce any valid samples while trying to sample remaining columns, method _filter_conditions will throw an unhandled error when trying to access any of dataframe columns, because it is empty, and sampling will be stopped:

From sdv/single_table/base.py/_sample_rows

            sampled = self._data_processor.filter_valid(sampled)

            if conditions is not None:
                sampled = self._filter_conditions(sampled, conditions, float_rtol)

Line causing an error from sdv/single_table/base.py/_filter_conditions

        for column, value in conditions.items():
            column_values = sampled[column]

Error 2:

  • If single table synthesizer model has more than one constraint, and one of these constraints filters out all rows in the generated sample while trying to sample remaining columns, loop in method filter_valid will throw an unhandled error when trying to access any of sample dataframe columns, because it is empty, and sampling will be stopped:

From sdv/data_processing/data_processor.py/filter_valid

        for constraint in self._constraints:
            data = constraint.filter_valid(data)

@omelyanchikd omelyanchikd force-pushed the gh-2334-fix-empty-dataframe-errors-in-conditional-sampling branch from d93d186 to 5754179 Compare January 7, 2025 18:15
@sdv-team
Copy link
Contributor

sdv-team commented Jan 7, 2025

Hi @omelyanchikd , it’s great to see your interest in the SDV ecosystem. The SDV software (and its related libraries) is owned and maintained by DataCebo. It is available under the Business Source License for you to browse.

We are a small team supporting a large set of users with enterprise-specific intricacies and reliability needs. This has required us to be deliberate about setting the roadmap for SDV libraries. As a result, we are unable to prioritize reviewing and accepting external pull requests.

Is your PR is meant to address a bug or feature request? If you haven't already, we would greatly appreciate it if you could file an issue instead with the overall description of your problem. We can determine whether it’s aligned with our framework. Once discussed, our team typically resolves smaller issues within a few release cycles. We appreciate your understanding.

@omelyanchikd
Copy link
Author

Hi @omelyanchikd , it’s great to see your interest in the SDV ecosystem. The SDV software (and its related libraries) is owned and maintained by DataCebo. It is available under the Business Source License for you to browse.

We are a small team supporting a large set of users with enterprise-specific intricacies and reliability needs. This has required us to be deliberate about setting the roadmap for SDV libraries. As a result, we are unable to prioritize reviewing and accepting external pull requests.

Is your PR is meant to address a bug or feature request? If you haven't already, we would greatly appreciate it if you could file an issue instead with the overall description of your problem. We can determine whether it’s aligned with our framework. Once discussed, our team typically resolves smaller issues within a few release cycles. We appreciate your understanding.

Yes, I am already reporting a bug in #2334 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect handling of empty dataframe during conditional sampling
2 participants