Improper parsing of symbols #29

Hirevo · 2023-01-25T21:51:16Z

The current state of how symbols are parsed in both interpreters in som-rs is somewhat non-standard, compared to other SOMs.

This issue stands to track the cases where som-rs behaves differently from other SOMs, in order to get them all fixed.

Here are the problematic cases that I am currently aware of:

Spaces between # and identifier (ex: # foo, accepted by most SOMs, rejected by som-rs)
Spaces between # and operator (ex: # +, accepted by most SOMs, rejected by som-rs)
Spaces between # and string literal (ex: # 'foo', accepted by most SOMs, rejected by som-rs)
Non-leading successive colons in selector (ex: #foo::, rejected by most SOMs, accepted by som-rs)
Leading digits after colons (ex: #foo:2:, rejected by most SOMs, accepted by som-rs)

Somewhat related to this issue is the situation with array literals, which suffer from a similar problem due to also using the # in the syntax:

Spaces between # and ( (ex: # (1 2 3), accepted by most SOMs, rejected by som-rs)

Most of these issues are due to the fact that the lexer is currently tokenizing the whole symbol at once (as: Token::Symbol(String)) instead of simply outputting its fragments (something like: [Token::Pound, Token::Selector(String)]).
Delegating the construction of the symbol to the parser would likely be the way forward to address these problems.

The text was updated successfully, but these errors were encountered:

smarr · 2023-01-26T09:45:39Z

Hmmm. Interesting. I am not sure how I feel about these things.

I think we need more tests :)
Especially the situation around spaces is a little odd and an artifact of having a separate lexer in most SOM implementations. The lexer simply discards the space.
But the Smalltalk grammar (ANSI Smalltalk) doesn't explicitly mention spaces, instead it says that a quoted string is to be immediately preceded by a pound sign.

Squeak has the same behavior as SOM, allowing spaces, but it really looks odd to me, and Pharo seems to have fixed it, disallowing spaces between # and the rest of the symbol.

# foo just doesn't look right. The # could be misread as an operator here, for instance in something like 54 # bar, which should be a parse error, because 54 #bar is not a valid expression.

smarr · 2023-01-26T23:51:24Z

Hmm. I think the biggest problem with this at the moment is that we don't have a cross-SOM mechanism to test for parser errors.

Hirevo · 2024-09-18T06:59:26Z

Sorry about notifications, I was looking at this issue on my phone and forgot to lock the screen when putting it back in my pocket, so some unintentional inputs got registered and it inadvertently posted some comments.

Hirevo added C-bug Category: Bugs M-lexer Module: Lexer M-parser Module: Parser P-medium Priority: Medium labels Jan 25, 2023

Hirevo self-assigned this Jan 25, 2023

Hirevo mentioned this issue Jan 25, 2023

Accept proper identifiers as symbols #28

Merged

smarr mentioned this issue Jan 26, 2023

Parsing of Symbols SOM-st/SOM#111

Open

Hirevo mentioned this issue Jan 30, 2023

Improved parsing of keyword symbols #32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improper parsing of symbols #29

Improper parsing of symbols #29

Hirevo commented Jan 25, 2023 •

edited

Loading

smarr commented Jan 26, 2023

smarr commented Jan 26, 2023

Hirevo commented Sep 18, 2024

Improper parsing of symbols #29

Improper parsing of symbols #29

Comments

Hirevo commented Jan 25, 2023 • edited Loading

smarr commented Jan 26, 2023

smarr commented Jan 26, 2023

Hirevo commented Sep 18, 2024

Hirevo commented Jan 25, 2023 •

edited

Loading