You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current state of how symbols are parsed in both interpreters in som-rs is somewhat non-standard, compared to other SOMs.
This issue stands to track the cases where som-rs behaves differently from other SOMs, in order to get them all fixed.
Here are the problematic cases that I am currently aware of:
Spaces between # and identifier (ex: # foo, accepted by most SOMs, rejected by som-rs)
Spaces between # and operator (ex: # +, accepted by most SOMs, rejected by som-rs)
Spaces between # and string literal (ex: # 'foo', accepted by most SOMs, rejected by som-rs)
Non-leading successive colons in selector (ex: #foo::, rejected by most SOMs, accepted by som-rs)
Leading digits after colons (ex: #foo:2:, rejected by most SOMs, accepted by som-rs)
Somewhat related to this issue is the situation with array literals, which suffer from a similar problem due to also using the # in the syntax:
Spaces between # and ( (ex: # (1 2 3), accepted by most SOMs, rejected by som-rs)
Most of these issues are due to the fact that the lexer is currently tokenizing the whole symbol at once (as: Token::Symbol(String)) instead of simply outputting its fragments (something like: [Token::Pound, Token::Selector(String)]).
Delegating the construction of the symbol to the parser would likely be the way forward to address these problems.
The text was updated successfully, but these errors were encountered:
Hmmm. Interesting. I am not sure how I feel about these things.
I think we need more tests :)
Especially the situation around spaces is a little odd and an artifact of having a separate lexer in most SOM implementations. The lexer simply discards the space.
But the Smalltalk grammar (ANSI Smalltalk) doesn't explicitly mention spaces, instead it says that a quoted string is to be immediately preceded by a pound sign.
Squeak has the same behavior as SOM, allowing spaces, but it really looks odd to me, and Pharo seems to have fixed it, disallowing spaces between # and the rest of the symbol.
# foo just doesn't look right. The # could be misread as an operator here, for instance in something like 54 # bar, which should be a parse error, because 54 #bar is not a valid expression.
Sorry about notifications, I was looking at this issue on my phone and forgot to lock the screen when putting it back in my pocket, so some unintentional inputs got registered and it inadvertently posted some comments.
The current state of how symbols are parsed in both interpreters in
som-rs
is somewhat non-standard, compared to other SOMs.This issue stands to track the cases where
som-rs
behaves differently from other SOMs, in order to get them all fixed.Here are the problematic cases that I am currently aware of:
#
and identifier (ex:# foo
, accepted by most SOMs, rejected bysom-rs
)#
and operator (ex:# +
, accepted by most SOMs, rejected bysom-rs
)#
and string literal (ex:# 'foo'
, accepted by most SOMs, rejected bysom-rs
)#foo::
, rejected by most SOMs, accepted bysom-rs
)#foo:2:
, rejected by most SOMs, accepted bysom-rs
)Somewhat related to this issue is the situation with array literals, which suffer from a similar problem due to also using the
#
in the syntax:#
and(
(ex:# (1 2 3)
, accepted by most SOMs, rejected bysom-rs
)Most of these issues are due to the fact that the lexer is currently tokenizing the whole symbol at once (as:
Token::Symbol(String)
) instead of simply outputting its fragments (something like:[Token::Pound, Token::Selector(String)]
).Delegating the construction of the symbol to the parser would likely be the way forward to address these problems.
The text was updated successfully, but these errors were encountered: