Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode outside BMP #87

Open
ericprud opened this issue Mar 19, 2023 · 0 comments
Open

Unicode outside BMP #87

ericprud opened this issue Mar 19, 2023 · 0 comments

Comments

@ericprud
Copy link
Collaborator

iirc, PyShEx failed tests where the schema (or data?) had codepoints > U+FFFD . I stumbled across a repo that I created for dealing with this in Java and Javascript, both of which use UTF16 internally and thus require the grammar to be written not in terms of codepoints U+10000- but instead surrogate pairs. I don't remember the state of this repot, but it could be handy to clone it and play with the python rather than experimenting in the larger ShEx g4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant