-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flytekit][Test] Structured dataset pickleable test #3121
[Flytekit][Test] Structured dataset pickleable test #3121
Conversation
Signed-off-by: mao3267 <[email protected]>
Code Review Agent Run Status
|
…ucture-dataset-pickleable
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3121 +/- ##
==========================================
- Coverage 79.60% 75.55% -4.05%
==========================================
Files 203 295 +92
Lines 21625 25663 +4038
Branches 2788 2787 -1
==========================================
+ Hits 17214 19389 +2175
- Misses 3629 5476 +1847
- Partials 782 798 +16 ☔ View full report in Codecov by Sentry. |
Code Review Agent Run Status
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks great!
Code Review Agent Run #5fdc66Actionable Suggestions - 1
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
def test_structured_dataset_pickleable(): | ||
import pickle | ||
|
||
upstream_output = Literal( | ||
scalar=literals.Scalar( | ||
structured_dataset=StructuredDataset( | ||
dataframe=pd.DataFrame({"a": [1, 2], "b": [3, 4]}), | ||
uri="bq://test_uri", | ||
metadata=StructuredDatasetMetadata( | ||
structured_dataset_type=StructuredDatasetType( | ||
columns=[ | ||
StructuredDatasetType.DatasetColumn( | ||
name="a", | ||
literal_type=LiteralType(simple=SimpleType.INTEGER) | ||
), | ||
StructuredDatasetType.DatasetColumn( | ||
name="b", | ||
literal_type=LiteralType(simple=SimpleType.INTEGER) | ||
) | ||
], | ||
format="parquet" | ||
) | ||
) | ||
) | ||
) | ||
) | ||
|
||
downstream_input = TypeEngine.to_python_value( | ||
FlyteContextManager.current_context(), | ||
upstream_output, | ||
StructuredDataset | ||
) | ||
|
||
pickled_input = pickle.dumps(downstream_input) | ||
unpickled_input = pickle.loads(pickled_input) | ||
|
||
assert downstream_input == unpickled_input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding test cases for error scenarios in test_structured_dataset_pickleable()
. The current test only verifies successful pickling/unpickling but doesn't test behavior with invalid/corrupted data or when pickling fails.
Code suggestion
Check the AI-generated fix before applying
-def test_structured_dataset_pickleable():
+def test_structured_dataset_pickleable_success():
import pickle
upstream_output = Literal(
scalar=literals.Scalar(
structured_dataset=StructuredDataset(
dataframe=pd.DataFrame({"a": [1, 2], "b": [3, 4]}),
uri="bq://test_uri",
metadata=StructuredDatasetMetadata(
structured_dataset_type=StructuredDatasetType(
columns=[
StructuredDatasetType.DatasetColumn(
name="a",
literal_type=LiteralType(simple=SimpleType.INTEGER)
),
StructuredDatasetType.DatasetColumn(
name="b",
literal_type=LiteralType(simple=SimpleType.INTEGER)
)
],
format="parquet"
)
)
)
)
)
downstream_input = TypeEngine.to_python_value(
FlyteContextManager.current_context(),
upstream_output,
StructuredDataset
)
pickled_input = pickle.dumps(downstream_input)
unpickled_input = pickle.loads(pickled_input)
assert downstream_input == unpickled_input
+
+def test_structured_dataset_pickleable_error():
+ import pickle
+ with pytest.raises(pickle.PicklingError):
+ pickle.dumps(object())
+
+ with pytest.raises(pickle.UnpicklingError):
+ pickle.loads(b'invalid')
Code Review Run #5fdc66
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Tracking issue
flyteorg/flyte#6144
Why are the changes needed?
We want to determine whether modifying the
_downloader
functions in FlyteFile and FlyteDirectory from flyteorg/flyte#6144 affects the ability of structured datasets to be pickled.What changes were proposed in this pull request?
How was this patch tested?
make unit_test
to start the unit tests.Setup process
Screenshots
Check all the applicable boxes
Related PRs
#3030
Docs link
None
Summary by Bito
Added test cases for structured dataset serialization/deserialization using pickle module. The changes include verification of successful pickling/unpickling of pandas DataFrame within structured datasets, along with new error handling test cases. This ensures compatibility with FlyteFile and FlyteDirectory _downloader function modifications.Unit tests added: True
Estimated effort to review (1-5, lower is better): 1