Parsing arbitrarily long streams? #1280
davidmcnabnz
started this conversation in
General
Replies: 1 comment
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm aware the more common usage pattern for a parser framework is to send it fixed-sized input, then see what comes back (transformed tree(s) and/or error(s)).
But I've also got an interest in parsing a text stream of indefinite length. The pattern here would be to feed in chunks of text, and get callbacks with tree objects, when certain targets have been satisfied.
An example usage scenario is implementing a Unix-style stdin/stdout pipe program, and tailing a server logfile into it.
Ideally the stream-parser would need to be able to instruct the Lark parser to reset its state and start back at the beginning, as if it's receiving a whole new input. Otherwise, memory leaks, CPU load etc.
This would be an unbelievably useful way to process an often very spammy and cluttered logfile in-place, and spit out simpler, cleaner and more useful outputs. In addition to just human-facing stdout, the pipe could send transformed trees to an inter-process queue (or even as JSON objects to a Unix file, or websocket, or a pubsub broker), for actions to happen elsewhere.
Anyone got thoughts on this? Is the capability already in Lark, or do I need to ramp up on internals and do it myself?
Cheers
David
Beta Was this translation helpful? Give feedback.
All reactions