Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1887901: fix partitioning logic #2944

Open
wants to merge 3 commits into
base: dev/data-source
Choose a base branch
from

Conversation

sfc-gh-yuwang
Copy link
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang commented Jan 27, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1887901

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    fix partition logic, now it would generate same results as spark do

@sfc-gh-yuwang sfc-gh-yuwang marked this pull request as ready for review January 27, 2025 23:04
@sfc-gh-yuwang sfc-gh-yuwang requested review from a team as code owners January 27, 2025 23:04
@sfc-gh-yuwang sfc-gh-yuwang requested review from sfc-gh-jdu and sfc-gh-jrose and removed request for a team January 27, 2025 23:04
if column_type != int
else int(processed_lower_bound + i * stride)
l_bound = (
f"{column} >= {self._to_external_value(current_value, column_type)}"
Copy link
Contributor

@sfc-gh-aling sfc-gh-aling Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in spark

partitionColumn must be a numeric, date, or timestamp column

do we handle all 3 types here?
also can you check the pyspark behavior if passing column unsupported type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found I actually wrote the code to support all types, so I just added a test here

@@ -8,6 +8,8 @@
from unittest.mock import MagicMock
import pytest

from snowflake.snowpark.types import IntegerType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add tests for a couple of more types? like float, decimal, datetime, date

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also unsupported type test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants