Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flytekit][Test] Structured dataset pickleable test #3121

Merged
merged 2 commits into from
Feb 12, 2025

Conversation

mao3267
Copy link
Contributor

@mao3267 mao3267 commented Feb 9, 2025

Tracking issue

flyteorg/flyte#6144

Why are the changes needed?

We want to determine whether modifying the _downloader functions in FlyteFile and FlyteDirectory from flyteorg/flyte#6144 affects the ability of structured datasets to be pickled.

What changes were proposed in this pull request?

  1. Add an unit test to check if structured dataset is pickleable. We create a literal structured dataset, then we pickle and unpickle it to verify that it remains unchanged.
def test_structured_dataset_pickleable():
    import pickle

    upstream_output = Literal(
        scalar=literals.Scalar(
            structured_dataset=StructuredDataset(
                dataframe=pd.DataFrame({"a": [1, 2], "b": [3, 4]}),
                uri="bq://test_uri",
                metadata=StructuredDatasetMetadata(
                    structured_dataset_type=StructuredDatasetType(
                        columns=[
                            StructuredDatasetType.DatasetColumn(
                                name="a",
                                literal_type=LiteralType(simple=SimpleType.INTEGER)
                            ),
                            StructuredDatasetType.DatasetColumn(
                                name="b",
                                literal_type=LiteralType(simple=SimpleType.INTEGER)
                            )
                        ],
                        format="parquet"
                    )
                )
            )
        )
    )

    downstream_input = TypeEngine.to_python_value(
        FlyteContextManager.current_context(),
        upstream_output,
        StructuredDataset
    )

    pickled_input = pickle.dumps(downstream_input)
    unpickled_input = pickle.loads(pickled_input)

    assert downstream_input == unpickled_input

How was this patch tested?

  1. Run make unit_test to start the unit tests.

Setup process

git clone https://github.com/flyteorg/flytekit.git
gh pr checkout 3121
make setup && pip install -e .

Screenshots

image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

#3030

Docs link

None

Summary by Bito

Added test cases for structured dataset serialization/deserialization using pickle module. The changes include verification of successful pickling/unpickling of pandas DataFrame within structured datasets, along with new error handling test cases. This ensures compatibility with FlyteFile and FlyteDirectory _downloader function modifications.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 1

@flyte-bot
Copy link
Contributor

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - The AI Code Review Agent skipped reviewing this change because it is configured to exclude certain pull requests based on the source/target branch or the pull request status. You can change the settings here, or contact the agent instance creator at [email protected].

Copy link

codecov bot commented Feb 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.55%. Comparing base (1eb6743) to head (94b8d03).
Report is 9 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3121      +/-   ##
==========================================
- Coverage   79.60%   75.55%   -4.05%     
==========================================
  Files         203      295      +92     
  Lines       21625    25663    +4038     
  Branches     2788     2787       -1     
==========================================
+ Hits        17214    19389    +2175     
- Misses       3629     5476    +1847     
- Partials      782      798      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@flyte-bot
Copy link
Contributor

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - The AI Code Review Agent skipped reviewing this change because it is configured to exclude certain pull requests based on the source/target branch or the pull request status. You can change the settings here, or contact the agent instance creator at [email protected].

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks great!

@Future-Outlier Future-Outlier changed the title [WIP][Flytekit][Test] Structured dataset pickleable test [Flytekit][Test] Structured dataset pickleable test Feb 12, 2025
@Future-Outlier Future-Outlier marked this pull request as ready for review February 12, 2025 07:13
@Future-Outlier Future-Outlier merged commit ac906ca into flyteorg:master Feb 12, 2025
109 of 112 checks passed
@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 12, 2025

Code Review Agent Run #5fdc66

Actionable Suggestions - 1
  • tests/flytekit/unit/types/structured_dataset/test_structured_dataset.py - 1
Review Details
  • Files reviewed - 1 · Commit Range: 65b3a42..94b8d03
    • tests/flytekit/unit/types/structured_dataset/test_structured_dataset.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Testing - Add Structured Dataset Pickle Test

test_structured_dataset.py - Added test to verify structured dataset objects can be pickled and unpickled correctly

Comment on lines +717 to +753
def test_structured_dataset_pickleable():
import pickle

upstream_output = Literal(
scalar=literals.Scalar(
structured_dataset=StructuredDataset(
dataframe=pd.DataFrame({"a": [1, 2], "b": [3, 4]}),
uri="bq://test_uri",
metadata=StructuredDatasetMetadata(
structured_dataset_type=StructuredDatasetType(
columns=[
StructuredDatasetType.DatasetColumn(
name="a",
literal_type=LiteralType(simple=SimpleType.INTEGER)
),
StructuredDatasetType.DatasetColumn(
name="b",
literal_type=LiteralType(simple=SimpleType.INTEGER)
)
],
format="parquet"
)
)
)
)
)

downstream_input = TypeEngine.to_python_value(
FlyteContextManager.current_context(),
upstream_output,
StructuredDataset
)

pickled_input = pickle.dumps(downstream_input)
unpickled_input = pickle.loads(pickled_input)

assert downstream_input == unpickled_input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error test cases

Consider adding test cases for error scenarios in test_structured_dataset_pickleable(). The current test only verifies successful pickling/unpickling but doesn't test behavior with invalid/corrupted data or when pickling fails.

Code suggestion
Check the AI-generated fix before applying
 -def test_structured_dataset_pickleable():
 +def test_structured_dataset_pickleable_success():
      import pickle
      upstream_output = Literal(
          scalar=literals.Scalar(
              structured_dataset=StructuredDataset(
                  dataframe=pd.DataFrame({"a": [1, 2], "b": [3, 4]}),
                  uri="bq://test_uri",
                  metadata=StructuredDatasetMetadata(
                      structured_dataset_type=StructuredDatasetType(
                          columns=[
                              StructuredDatasetType.DatasetColumn(
                                  name="a",
                                  literal_type=LiteralType(simple=SimpleType.INTEGER)
                              ),
                              StructuredDatasetType.DatasetColumn(
                                  name="b",
                                  literal_type=LiteralType(simple=SimpleType.INTEGER)
                              )
                          ],
                          format="parquet"
                      )
                  )
              )
          )
      )
      downstream_input = TypeEngine.to_python_value(
          FlyteContextManager.current_context(),
          upstream_output,
          StructuredDataset
      )
      pickled_input = pickle.dumps(downstream_input)
      unpickled_input = pickle.loads(pickled_input)
      assert downstream_input == unpickled_input
 +
 +def test_structured_dataset_pickleable_error():
 +    import pickle
 +    with pytest.raises(pickle.PicklingError):
 +        pickle.dumps(object())
 +    
 +    with pytest.raises(pickle.UnpicklingError):
 +        pickle.loads(b'invalid')

Code Review Run #5fdc66


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants