Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build regex from JSON Schema in Rust #15

Merged
merged 14 commits into from
Aug 29, 2024
Merged

Conversation

ErikKaum
Copy link
Collaborator

@ErikKaum ErikKaum commented Aug 23, 2024

Resolves #12

In relation to the conversation in #12 I opened this PR to share the code that I wrote a while ago. Essentially this was intended as a 1-to-1 Rust port of the build_regex_from_schema function.

Running pytest tests/fsm/test_json_schema.py I get: 18 failed, 45 passed, this excludes the tests for to_regex since that function isn't fully implemented on the rust side.

  • write build_regex_from_schema and to_regex to rust
  • get all tests to pass
  • address all todos left in code
  • remove all .unwrap()s
  • clean up code & make more readable
  • ensure performance hasn't been degraded

NB!
The original python code has a

    schema = json.loads(schema)
    Validator.check_schema(schema)

which I didn't find a good rust equivalent for. We can use the jsonschema crate like so:

    let compiled_schema = JSONSchema::compile(&json_value)
        .map_err(|e| anyhow!("Failed to compile JSON schema: {}", e))?;

but it's slightly different and using for example "float" data types fail.
Additionally there's already the let json_value: Value = serde_json::from_str(json)?; which performs a form of validation on the incoming json.

@brandonwillard
Copy link
Member

This looks great, thanks!

@brandonwillard brandonwillard linked an issue Aug 23, 2024 that may be closed by this pull request
@brandonwillard brandonwillard force-pushed the main branch 4 times, most recently from ab04e5b to 15d9e19 Compare August 23, 2024 22:55
@brandonwillard
Copy link
Member

FYI: I just rebased this to account for the changes in main.

@ErikKaum
Copy link
Collaborator Author

Status update: all the test for the build_regex_from_schema function pass. The next step is exposing the to_regex function and making sure those tests pass as well. The test for to_regex have been commented out for now.

@ErikKaum
Copy link
Collaborator Author

All tests pass 🙌 I'll clean up the code, remove todos and unwraps and then we can proceed for review.

@ErikKaum
Copy link
Collaborator Author

Benchmark results

Screenshot 2024-08-27 at 14 00 53

@ErikKaum ErikKaum marked this pull request as ready for review August 27, 2024 12:06
@ErikKaum
Copy link
Collaborator Author

Additional questions for reviewers. The original python implementation exposed the to_regex:

def to_regex(
    resolver: Resolver, instance: dict, whitespace_pattern: Optional[str] = None
):

But as far as I can tell it's only used in tests and only consumed by the build_regex_from_schema function.
Should we leave it unexposed to python and not have bindings to it?

Similarly I left function get_schema_from_signature to the python side since it's so python specific and probably wouldn't make sense to have it exposed to other languages. Maybe we even should move it to outlines completely?

@brandonwillard
Copy link
Member

But as far as I can tell it's only used in tests and only consumed by the build_regex_from_schema function.
Should we leave it unexposed to python and not have bindings to it?

It can be a rather useful means of debugging (e.g. in an issue), so it's probably worth exposing at the Python level, but we can skip that for now if you want.

@ErikKaum
Copy link
Collaborator Author

It can be a rather useful means of debugging (e.g. in an issue), so it's probably worth exposing at the Python level, but we can skip that for now if you want.

Okay that's good to know. For me we can keep it, no worries 👍

@brandonwillard brandonwillard force-pushed the main branch 2 times, most recently from 347e191 to c448997 Compare August 29, 2024 15:36
@brandonwillard
Copy link
Member

Had to rebase again for upstream fixes/changes.

Also, I'll provide a review shortly!

Copy link
Member

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@brandonwillard brandonwillard merged commit 6391615 into main Aug 29, 2024
5 checks passed
@brandonwillard brandonwillard deleted the json_schema_in_rust branch August 29, 2024 16:16
@ErikKaum
Copy link
Collaborator Author

Thank you 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rust-only JSON schema construction
2 participants