Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RuneReader #441

Closed
ianlewis opened this issue Aug 28, 2023 · 4 comments · Fixed by #842
Closed

Refactor RuneReader #441

ianlewis opened this issue Aug 28, 2023 · 4 comments · Fixed by #842
Labels
performance An issue with performance refactor A code refactor or cleanup

Comments

@ianlewis
Copy link
Owner

RuneReader was meant to be used as an easy way to read charset converted text from a file into Go's utf-8 strings.

However, this doesn't seem necessary and todos should be able to deal purely in bytes (so long as language identifiers for comments or strings are not defined in non-ASCII characters). This is especially true if output text of todos is not converted into the local terminal's character set.

@ianlewis ianlewis added enhancement New feature or request performance An issue with performance refactor A code refactor or cleanup and removed enhancement New feature or request labels Aug 28, 2023
@ianlewis
Copy link
Owner Author

ianlewis commented Aug 31, 2023

I wonder if it's wise to work with bytes rather than runes. I think using bytes mostly depends on the assumption that any character encoding we encounter is going to be compatible with ASCII. Some character encodings like UTF-16 are incompatible.

A few other folks have implemented buffered rune readers

Other folks seem to roll their own using bufio.

go-toml used to use go-buffruneio but it seems like they just work on bytes now. It would be interesting to see if go-toml even works on UTF-16 files. Err, no it seems that TOML must be UTF-8 so that's why they can do that.

It might be better to just scrap my implementation and use go-buffruneio as it seems to do everything mine does but it's pretty old and doesn't seem to have gotten any updates. Maybe another project to contribute a bit of maintenance to.

@ianlewis
Copy link
Owner Author

ianlewis commented Aug 31, 2023

Another idea is to refactor the RuneReader and release it as a separate package that's a bit more modern than the existing packages.

I'm not sure I like the "unbounded growth unless you call Forget" part of go-buffruneio and the fact that they have a special EOF rune. Probably I would allow the user to specify the size of the buffer rather than allowing it to grow unbounded since you probably don't ever need to call UnreadRune on more than a few runes at a time. I guess the thinking its that once you've found a full token in the input you would call Forget but it seems strange to do it this way.

I don't think I can use bufrr at all since I would need to peek at more than one rune.

Or maybe expand it into a full lexing package that allows for some customization?

@ianlewis
Copy link
Owner Author

One person created a lexer package based on Rob Pike's GDG Sydney talk (Related #459). This makes me think to try it with generics.
https://github.com/bbuck/go-lexer

But it seems someone has done this already. Seems fairly reasonable. I wonder how fast it would be relative to an implementation I could write. Generics and all the other code to generalize it might slow it down a bit.
https://github.com/zalgonoise/lex

@ianlewis
Copy link
Owner Author

Hmm, it seems that the lex package's Lex only operates on slices and LexBuffer uses the gio library but that doesn't support buffered io. When would you ever really be reading anything other than bytes?

It looks like an interesting idea, but I would have implemented it differently for sure; using just the simple io.Reader interface (which can work with strings, slices, as well as other Readers etc.).

@ianlewis ianlewis changed the title Remove RuneReader Refactor RuneReader Aug 31, 2023
@ianlewis ianlewis added performance An issue with performance and removed performance An issue with performance labels Sep 1, 2023
@ianlewis ianlewis mentioned this issue Oct 3, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance An issue with performance refactor A code refactor or cleanup
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant