-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(store-sync): skip invalid utf-8 characters in strings before inserting into postgres #3562
Conversation
🦋 Changeset detectedLatest commit: 0a524cd The changes in this PR will be included in the next version bump. This PR includes changesets to release 30 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
wouldn't this corrupt the data if a |
some interesting discussion here around this problem: ponder-sh/ponder#1456 |
Postgres doesn't support Seems like the discussion in ponder-sh/ponder#1456 also tends towards this behaviour as default. I don't think we need to add a flag to disable this behaviour until someone requests it. |
3bd890d
to
48c70f5
Compare
@@ -0,0 +1,10 @@ | |||
import { SchemaToPrimitives, ValueSchema } from "@latticexyz/protocol-parser/internal"; | |||
|
|||
export function cleanStrings<TSchema extends ValueSchema>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Ponder's naming
export function cleanStrings<TSchema extends ValueSchema>( | |
export function removeNullCharacters<TSchema extends ValueSchema>( |
encodedLengths: record.encodedLengths ?? "0x", | ||
dynamicData: record.dynamicData ?? "0x", | ||
}), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts on doing this call just before we pass into postgres, rather than here, in case we later need to use value for something?
e.g.
await tx
.insert(sqlTable)
.values({
__keyBytes: keyBytes,
__lastUpdatedBlockNumber: blockNumber,
...key,
...removeNullCharacters(value),
})
ohh I was worried that these fields were being used to re-encode the data before mutating via event, but I don't think that's a concern here because we still store the raw bytes and use that as the source of truth |
Currently an invalid UTF-8 character in a
string
type column makes the decoded postgres indexer error withPostgresError: invalid byte sequence for encoding "UTF8": 0x00
. This PR adds a cleanup step before upserting values into postgres to remove0x00
characters.