-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: As a last resort, try inserting TEXT into a UUID column #387
Conversation
Hi @grigi, thanks for the PR! Could you work around this by overriding the schema to something like: plugins:
extractors:
- name: tap-example
schema:
my_stream:
my_uuid_column:
type: ["string", "null"]
format: uuid The UUID format is already handled: target-postgres/target_postgres/connector.py Line 290 in fd881a6
nb This is if you're using Meltano, otherwise you could pass a custom catalog file. |
Thanks for the rapid response. I've tried that (using meltano, yes) but I can't get it to work. Manually updating the discovered catalog and providing it, it also gets ignored. There's a high chance that I'm missing something 🤷 |
I think the change here can be problematic for the general use case, since a non-uuid text value can really break pipelines if the target column is UUID. That said, we could move the add a new user setting to check right before target-postgres/target_postgres/connector.py Lines 431 to 438 in fd881a6
if we should even attempt to adapt column types to the schema. That way, we'd skip this problem entirely if the user considers mismatches are acceptable. Something like Wdyt? cc @visch That said, the fact that the tap doesn't support schema overrides is bad so I dug a bit and submitted a PR transferwise/pipelinewise-tap-mysql#186. I did test it a bit but let's hope I didn't do something terrible and the Wise folks accept it 😅. In the meantime it can be tested by updating the plugins:
extractors:
- name: tap-mysql
variant: transferwise
pip_url: "'pipelinewise-tap-mysql @ git+https://github.com/edgarrmondragon/pipelinewise-tap-mysql.git@allow-schema-overrides'"
config:
user: root
host: 'localhost'
database: mysql
filter_dbs: mysql
select:
- "mysql-tests.*"
schema:
mysql-tests:
my_uuid:
type: [string, "null"]
format: uuid
inclusion: available |
Ooh thanks for that. That PR makes it all work for me :-) |
That seems like a reasonable setting idea! Gives folks a way to allow postgres to just figure it out for you which will work in a lot of cases. I think there's a whole flow with the other targets around different data types https://www.stitchdata.com/docs/replication/loading/understanding-table-structural-changes#columns-mixed-data-types that we haven't gone down but we could. I kind of like the current method as it's explicit and instead of making columns you don't expect it just fails and then you have to create a solution (we could do a much better job explaining what's going on when it fails for this reason) but it all depends on the use case people have. |
I see your PR got closed without comment. That's sad. So, considering that, I feel that this PR can be closed. Thanks for your assistance! |
Have some data that is of type UUID where the source treats it as text.
Instead of failing with an exception:
Just treat them as compatible.
I couldn't think of a different way to do this, as even though the Postgres integration supports UUID, not all do. And technically UUID isn't TEXT (just that it's represented as text in sql, so the DB handles it transparently), so it shouldn't be part of
merge_sql_types
.