Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use USE_VECTORIZED_SCANNER=True for loading parquet files to Snowflake #2251

Open
VioletM opened this issue Jan 30, 2025 · 0 comments
Open
Assignees
Labels
support This issue is monitored by Solution Engineer

Comments

@VioletM
Copy link
Contributor

VioletM commented Jan 30, 2025

Feature description

Snowflake has a way of optimizing the loading of parquet data with USE_VECTORIZED_SCANNER option.

https://www.snowflake.com/en/engineering-blog/loading-terabytes-into-snowflake-speeds-feeds-techniques/
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table#label-use-vectorized-scanner

The feature request is to enable this by default (it's already marked as TODO:

# TODO: USE_VECTORIZED_SCANNER inserts null strings into VARIANT JSON
)

Important note:

You can only enable the vectorized scanner if the following conditions are met:

The ON_ERROR option must be set to ABORT_STATEMENT or SKIP_FILE.

The other values, CONTINUE, SKIP_FILE_num, 'SKIP_FILE_num%' are not supported.

Are you a dlt user?

Yes, I run dlt in production.

Implementation details

    • add an option to do that to SnowflakeConfiguration, disabled by default
    • investigate if test_parquet_loading test works with option enabled,
    • write a specialized test for snowflake based on test above
    • document the new option
@VioletM VioletM added the support This issue is monitored by Solution Engineer label Jan 30, 2025
@rudolfix rudolfix moved this from Todo to Planned in dlt core library Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support This issue is monitored by Solution Engineer
Projects
Status: Planned
Development

No branches or pull requests

2 participants