Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upsert on bigquery poc #14

Open
namiyousef opened this issue Jul 31, 2024 · 1 comment
Open

Upsert on bigquery poc #14

namiyousef opened this issue Jul 31, 2024 · 1 comment

Comments

@namiyousef
Copy link
Owner

  • Experiment adding upsert on bigquery
    • Using 1: merge idea. This would require writing the whole data to a temp table (need to create the correct schemas), then merging onto the target table that you want to write on
  • Using 2: delete, and then write. For deletion you have to only delete the rows you wish to delete, which will not scale very well

For this, do a speed and cost analysis

@namiyousef
Copy link
Owner Author

Rough logic using Merge query:

  • create a temp table using the BQ API, then load your data into it
  • use a merge query to merge your target table with your temp table

CONS:

  • you will get hurt by cost. There is no way to filter the data on target_table. The whole thing will get scanned. This is not tenable for large scale data...

The other alternative is to be doing UPSERT, but this will hit you due to the filter conditions required for deletion. You'd need to be smarter, e.g. order by partition column, then see if you can find contiguous segments, and continue from that. This requires thought

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant