Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Handle unterminated escape character when parsing partial JSON #29065

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

balvisio
Copy link
Contributor

@balvisio balvisio commented Jan 7, 2025

Description
Currently, when parsing a partial JSON, if a string ends with the escape character, the whole key/value is removed. For example:

>>> from langchain_core.utils.json import parse_partial_json
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> 
>>> parse_partial_json(my_str)
{'foo': 'bar'}

My expectation (and with this fix) would be for parse_partial_json() to return:

>>> from langchain_core.utils.json import parse_partial_json
>>> 
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> parse_partial_json(my_str)
{'foo': 'bar', 'baz': 'qux'}

Notes:

  1. It could be argued that current behavior is still desired.
  2. I have experienced this issue when the streaming output from an LLM and the chunk happens to end with \\
  3. I haven't included tests. Will do if change is accepted.
  4. This is specially troublesome when this function is used by
    def _transform(self, input: Iterator[Union[str, BaseMessage]]) -> Iterator[Any]:

since what happens is that, for example, if the received sequence of chunks are: {"foo": "b , ar\\ :

Then, the result of calling self.parse_result() is:

{"foo": "b"}

and the second time:

{}

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jan 7, 2025
Copy link

vercel bot commented Jan 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Jan 8, 2025 2:27am

@dosubot dosubot bot added Ɑ: core Related to langchain-core 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jan 7, 2025
Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @balvisio thanks for looking into this. Could you add a unit test for this scenario?

@balvisio balvisio force-pushed the fix/ba/parse-partial-json branch from a78ea9d to 261d1c0 Compare January 7, 2025 18:01
@balvisio
Copy link
Contributor Author

balvisio commented Jan 7, 2025

@eyurtsev Done

Copy link
Member

@efriis efriis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jan 8, 2025
@efriis efriis changed the title [draft] core: Handle unterminated escape character when parsing partial JSON core: Handle unterminated escape character when parsing partial JSON Jan 8, 2025
@efriis efriis enabled auto-merge (squash) January 8, 2025 02:27
@efriis efriis self-assigned this Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: core Related to langchain-core lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

3 participants