Parsing arbitrarily long streams? #1280

davidmcnabnz · 2023-05-27T03:03:21Z

davidmcnabnz
May 27, 2023
Sponsor

I'm aware the more common usage pattern for a parser framework is to send it fixed-sized input, then see what comes back (transformed tree(s) and/or error(s)).

But I've also got an interest in parsing a text stream of indefinite length. The pattern here would be to feed in chunks of text, and get callbacks with tree objects, when certain targets have been satisfied.

An example usage scenario is implementing a Unix-style stdin/stdout pipe program, and tailing a server logfile into it.

Ideally the stream-parser would need to be able to instruct the Lark parser to reset its state and start back at the beginning, as if it's receiving a whole new input. Otherwise, memory leaks, CPU load etc.

This would be an unbelievably useful way to process an often very spammy and cluttered logfile in-place, and spit out simpler, cleaner and more useful outputs. In addition to just human-facing stdout, the pipe could send transformed trees to an inter-process queue (or even as JSON objects to a Unix file, or websocket, or a pubsub broker), for actions to happen elsewhere.

Anyone got thoughts on this? Is the capability already in Lark, or do I need to ramp up on internals and do it myself?

Cheers
David

MegaIng · 2023-05-27T03:27:11Z

MegaIng
May 27, 2023
Collaborator

#1211

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing arbitrarily long streams? #1280

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Parsing arbitrarily long streams? #1280

davidmcnabnz May 27, 2023 Sponsor

Replies: 1 comment

MegaIng May 27, 2023 Collaborator

davidmcnabnz
May 27, 2023
Sponsor

MegaIng
May 27, 2023
Collaborator