Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Config to Lexer #442

Merged
merged 9 commits into from
Dec 31, 2024
Merged

Introduce Config to Lexer #442

merged 9 commits into from
Dec 31, 2024

Conversation

magicant
Copy link
Owner

@magicant magicant commented Dec 30, 2024

Summary by CodeRabbit

Based on the comprehensive summary, here are the release notes:

Release Notes

  • New Features

    • Added a new configuration approach for lexer initialization
    • Introduced Lexer::with_code() method for simplified lexer creation
    • Added Config struct for more flexible lexer configuration
  • Improvements

    • Streamlined lexer initialization process
    • Enhanced documentation consistency
    • Removed unnecessary NonZeroU64 type usage
    • Updated input handling across multiple modules
  • Changes

    • Replaced Lexer::from_memory() with Lexer::with_code()
    • Simplified lexer configuration methods
    • Removed SourceInput structure
  • Documentation

    • Updated and standardized comment styles
    • Improved clarity of documentation across modules

The Lexer::new method now only takes an input object.
The start_line_number and source arguments have been removed in favor of
construction with a Config struct. The two arguments are not essential
for the lexer's operation, so they should not be required in the
convenience constructor.
Almost all calls to Lexer::from_memory were passed Source::Unknown. This
commit introduces a new function, Lexer::with_code, which only takes a
string argument and uses Source::Unknown internally. This makes the
code more readable and reduces boilerplate.
The `SourceInput` struct was not very useful and was only used to pass
the input object and the source description to the caller. This commit
removes the `SourceInput` struct and instead returns the `Lexer` object
directly from the `prepare_input` function.
@magicant magicant self-assigned this Dec 30, 2024
@magicant magicant added documentation Improvements or additions to documentation breaking change This enhancement is backward-incompatible labels Dec 30, 2024
Copy link

coderabbitai bot commented Dec 30, 2024

Walkthrough

This pull request introduces a comprehensive refactoring of the lexer initialization process across the Yash shell syntax library. The primary change involves replacing the Lexer::from_memory method with a new Lexer::with_code method and introducing a more flexible configuration approach for lexer creation. The modifications span multiple files, standardizing how lexers are instantiated and removing the need for explicit Source and line number parameters. This refactoring aims to simplify lexer initialization while maintaining the existing parsing functionality.

Changes

File Change Summary
yash-syntax/src/parser/lex/core.rs Added new Config struct for lexer configuration, modified Lexer::new and introduced Lexer::with_code method
Multiple parser test files Replaced Lexer::from_memory(input, Source::Unknown) with Lexer::with_code(input)
yash-cli/src/startup/input.rs Updated prepare_input to return Lexer instead of SourceInput
Documentation files Removed trailing periods from comments, updated documentation style

Sequence Diagram

sequenceDiagram
    participant Client
    participant LexerConfig
    participant Lexer
    participant InputObject

    Client->>LexerConfig: Create configuration
    LexerConfig-->>Client: Configuration object
    Client->>LexerConfig: Set input
    LexerConfig->>Lexer: Create lexer
    Lexer->>InputObject: Use input
    Lexer-->>Client: Configured lexer
Loading

Possibly related PRs

Poem

🐰 A Lexer's New Dance

With config so bright and clean,
No more sources left unseen,
Lexers leap with newfound grace,
A rabbit's parsing embrace!

🔍✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
yash-syntax/src/parser/lex/op.rs (1)

17-17: Refined module-level doc.
Removing the trailing period is a minor style nit; otherwise, no issues.

yash-syntax/src/parser/lex/word.rs (1)

17-17: Remove trailing period in doc comment.
No functional impact; this is purely a stylistic and documentation change.

yash-syntax/src/parser/from_str.rs (1)

52-52: Removed trailing period in doc comment.
Minor style adjustment; no impact on functionality.

yash-syntax/src/parser/simple_command.rs (1)

58-58: Doc comment style update.
Removed period for consistency. No functional changes.

yash-syntax/src/parser/lex/token.rs (1)

131-135: Consider enhancing tilde expansion test coverage

While the test verifies basic tilde expansion, consider adding test cases for:

  • Location information verification
  • Edge cases (e.g., ~user/path, ~+, ~-)
  • Invalid tilde expressions
yash-syntax/CHANGELOG.md (3)

20-22: Consider enhancing the Config struct documentation

The changelog entries for the Config structs could be more descriptive. Consider:

  • Explaining the motivation behind splitting the Config into two modules
  • Providing examples of how these configs improve flexibility
  • Documenting the relationship between the two Config structs

24-25: Consider adding migration guide for lexer initialization changes

Since this is a significant change in how lexers are initialized, consider adding:

  • Migration examples from from_memory to with_code
  • Benefits of the new initialization method
  • Impact on existing code

37-40: Consider clarifying the breaking changes

The removal of start_line_number and source arguments from Lexer::new is a breaking change. Consider:

  • Adding a "Breaking Changes" section
  • Providing migration steps for existing code
  • Explaining how to achieve the same functionality with the new Config struct
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c591d60 and 44e21f9.

📒 Files selected for processing (49)
  • yash-builtin/src/eval.rs (1 hunks)
  • yash-builtin/src/source/semantics.rs (1 hunks)
  • yash-cli/src/lib.rs (1 hunks)
  • yash-cli/src/startup/init_file.rs (1 hunks)
  • yash-cli/src/startup/input.rs (7 hunks)
  • yash-prompt/src/expand_posix.rs (1 hunks)
  • yash-prompt/src/lib.rs (1 hunks)
  • yash-semantics/src/runner.rs (15 hunks)
  • yash-semantics/src/runner_legacy.rs (10 hunks)
  • yash-syntax/CHANGELOG.md (2 hunks)
  • yash-syntax/src/alias.rs (4 hunks)
  • yash-syntax/src/input.rs (4 hunks)
  • yash-syntax/src/lib.rs (1 hunks)
  • yash-syntax/src/parser.rs (2 hunks)
  • yash-syntax/src/parser/and_or.rs (4 hunks)
  • yash-syntax/src/parser/case.rs (24 hunks)
  • yash-syntax/src/parser/command.rs (4 hunks)
  • yash-syntax/src/parser/compound_command.rs (11 hunks)
  • yash-syntax/src/parser/core.rs (23 hunks)
  • yash-syntax/src/parser/error.rs (4 hunks)
  • yash-syntax/src/parser/for_loop.rs (18 hunks)
  • yash-syntax/src/parser/from_str.rs (17 hunks)
  • yash-syntax/src/parser/function.rs (8 hunks)
  • yash-syntax/src/parser/grouping.rs (9 hunks)
  • yash-syntax/src/parser/if.rs (13 hunks)
  • yash-syntax/src/parser/lex.rs (1 hunks)
  • yash-syntax/src/parser/lex/arith.rs (8 hunks)
  • yash-syntax/src/parser/lex/backquote.rs (9 hunks)
  • yash-syntax/src/parser/lex/braced_param.rs (29 hunks)
  • yash-syntax/src/parser/lex/command_subst.rs (4 hunks)
  • yash-syntax/src/parser/lex/core.rs (20 hunks)
  • yash-syntax/src/parser/lex/dollar.rs (10 hunks)
  • yash-syntax/src/parser/lex/escape.rs (22 hunks)
  • yash-syntax/src/parser/lex/heredoc.rs (8 hunks)
  • yash-syntax/src/parser/lex/keyword.rs (2 hunks)
  • yash-syntax/src/parser/lex/misc.rs (7 hunks)
  • yash-syntax/src/parser/lex/modifier.rs (23 hunks)
  • yash-syntax/src/parser/lex/op.rs (15 hunks)
  • yash-syntax/src/parser/lex/raw_param.rs (6 hunks)
  • yash-syntax/src/parser/lex/text.rs (22 hunks)
  • yash-syntax/src/parser/lex/token.rs (6 hunks)
  • yash-syntax/src/parser/lex/word.rs (21 hunks)
  • yash-syntax/src/parser/list.rs (18 hunks)
  • yash-syntax/src/parser/pipeline.rs (10 hunks)
  • yash-syntax/src/parser/redir.rs (18 hunks)
  • yash-syntax/src/parser/simple_command.rs (28 hunks)
  • yash-syntax/src/parser/while_loop.rs (10 hunks)
  • yash-syntax/src/source.rs (3 hunks)
  • yash-syntax/src/syntax.rs (2 hunks)
✅ Files skipped from review due to trivial changes (17)
  • yash-syntax/src/source.rs
  • yash-syntax/src/parser/redir.rs
  • yash-syntax/src/lib.rs
  • yash-syntax/src/parser/case.rs
  • yash-syntax/src/syntax.rs
  • yash-syntax/src/parser/lex.rs
  • yash-syntax/src/parser/while_loop.rs
  • yash-syntax/src/parser/if.rs
  • yash-syntax/src/parser/function.rs
  • yash-syntax/src/alias.rs
  • yash-syntax/src/parser/lex/keyword.rs
  • yash-syntax/src/parser/pipeline.rs
  • yash-syntax/src/parser/error.rs
  • yash-syntax/src/parser/lex/text.rs
  • yash-syntax/src/parser/compound_command.rs
  • yash-syntax/src/parser/list.rs
  • yash-syntax/src/parser/for_loop.rs
🔇 Additional comments (301)
yash-cli/src/startup/input.rs (9)

17-25: Documentation improvements look good.

These added explanations provide clarity on how input is prepared for the shell syntax parser, including details about decorators and error types. Nothing to change here.


48-48: Import for lexer usage is consistent with the refactor.

The import of Lexer from yash_syntax::parser::lex is needed to construct and configure lexer instances. This aligns with the new approach of returning Lexer objects.


51-51: Descriptive error documentation.

Maintains alignment with the new prepare_input function. Good clarification for potential file-open errors.


61-64: Accurate docstring for prepare_input.

The updated comments correctly describe the new decorators and how they relate to Lexer. No issues found.


87-94: Helper function to unify lexer creation.

The lexer_with_input_and_source function improves maintainability by centralizing Lexer configuration. Nicely done.


109-109: Confirmed consistent usage of the helper function.

Return statement shows a clean integration with the newly added lexer_with_input_and_source.


132-132: Ensuring file-based inputs align with the new helper.

The updated return statement indicates file-based input also flows through lexer_with_input_and_source. Matches the new pattern.


145-145: String-based input is now consistent with other sources.

Completes the unification for all three input types (stdin, file, string) under this single pattern.


86-86: Return type change from SourceInput to Lexer.

This is a major interface change. Ensure all callers have been updated to handle the new Lexer return type.

✅ Verification successful

Based on the search results, I can see that Lexer is a widely used type throughout the codebase, and there are no references to SourceInput type. The change from SourceInput to Lexer appears to be a new addition rather than a replacement of an existing type.

Let me verify the callers of this function to ensure we catch all impacts.


Return type change is safe and properly handled

The change from SourceInput to Lexer is not a breaking interface change as initially suspected. The verification shows:

  1. The prepare_input function is only used in one place in yash-cli/src/lib.rs
  2. The caller is already handling the new Lexer return type correctly
  3. There are no traces of the old SourceInput type in the codebase, suggesting this is a new addition rather than a replacement
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# This script verifies all references to the old `SourceInput` type in the codebase.
rg "SourceInput"

Length of output: 16


Script:

#!/bin/bash
# Let's find all references to the function that returns Lexer
ast-grep --pattern 'fn prepare_input($$$) -> Result<Lexer<$_>, $_>'

# Also search for any usage of Lexer type
rg "Lexer"

Length of output: 64879


Script:

#!/bin/bash
# Find all references to prepare_input function
ast-grep --pattern 'prepare_input($$$)'

# Also search for any direct text matches
rg "prepare_input"

Length of output: 535

yash-prompt/src/lib.rs (1)

60-62: Configuration-based lexer initialization is clearly demonstrated.

Shifting from direct instantiation to a Lexer::config() object clarifies the process and fosters extensibility. Good improvement.

yash-builtin/src/eval.rs (1)

82-86: Adoption of Lexer::config() enhances flexibility.

The changes remove the fixed line number usage and allow specifying a custom Source, making the code more maintainable.

yash-syntax/src/parser.rs (2)

17-17: Enhanced overview of the syntax parser and lexer roles.

This documentation clarifies the relationship between tokenization and parsing. No issues found.


40-40: Simple lexer creation in code example.

Demonstrates the new usage succinctly. Confirm that the Lexer::new(input) call corresponds to the current interface of Lexer.

yash-syntax/src/parser/command.rs (4)

Line range hint 57-66: Test refactored to use Lexer::with_code.

Eliminating Lexer::from_memory in favor of Lexer::with_code simplifies creation in tests. Implementation looks correct.


Line range hint 72-81: Compound command test updated with Lexer::with_code.

Consistent usage across tests. Maintains readability.


Line range hint 87-96: Function command test updated.

Using Lexer::with_code("fun () ( echo )") remains clear and mirrors the refactor’s pattern. Good consistency.


Line range hint 102-108: EOF test using an empty string is straightforward.

Ensures correct behavior at end of file. No concerns found.

yash-syntax/src/parser/lex/command_subst.rs (4)

17-17: Consistent documentation style.

Removing the trailing period in the doc comment is consistent with the broader documentation style in this PR. No issues here.


68-68: Verified correctness of new lexer initialization.

Switching from Lexer::from_memory() to Lexer::with_code() aligns with the refactor's objective, and these test assertions appear correct.


91-91: Good usage of Lexer::with_code.

Ensures test coverage for the case when the command substitution is not recognized. The logic remains coherent.


108-108: Unclosed command substitution test remains valid.

The new Lexer::with_code call here keeps the test meaningful and ensures correct error handling coverage.

yash-cli/src/lib.rs (1)

79-80: Great error handling strategy.

The match prepare_input(ref_env, &work.source) block cleanly handles input preparation failures. Printing errors and mapping specific I/O error codes to ExitStatus::NOT_FOUND or ExitStatus::NOEXEC is a robust approach.

yash-syntax/src/input.rs (4)

17-17: Doc comment style consistency.

Removing the trailing period in the module-level doc comment matches the style changes in other files.


59-59: Clear, concise error type alias.

Using pub type Error = std::io::Error; remains straightforward and aligns well with typical Rust patterns.


62-62: Explicit result type for clarity.

pub type Result = std::result::Result<String, Error>; consistently enforces the same error type across the input trait.


131-135: Useful From<&str> implementation for Memory.

This new impl<'a> From<&'a str> for Memory<'a> simplifies creating a Memory reader from string literals or slices, improving ergonomics.

yash-syntax/src/parser/lex/misc.rs (2)

17-17: Doc comment update acknowledged.

The updated documentation for the lexer extension is consistent with the removed trailing period style.


74-74: Uniform Lexer::with_code adoption in tests.

All these test scenarios correctly adopt the new lexer constructor, maintaining the original test logic. This unification of lexer initialization simplifies the codebase.

Also applies to: 96-96, 103-103, 112-112, 124-124, 131-131, 153-153, 177-177

yash-prompt/src/expand_posix.rs (1)

44-44: Use Lexer::with_code for prompt expansion
This switch from Lexer::from_memory to Lexer::with_code is consistent with the new approach to lexer initialization and simplifies usage.

yash-syntax/src/parser/lex/raw_param.rs (6)

17-17: Minor doc-comment adjustment
No issues found; the revised comment remains clear.


101-101: Switch to Lexer::with_code in test
Replacing Lexer::from_memory ensures consistency with the revised lexer API.


119-119: Unit test updated to new lexer instantiation
Maintains uniformity with the refactored lexer usage.


137-137: Consistent refactor to Lexer::with_code
No functional changes; aligns test with the new method.


155-155: Adopting Lexer::with_code with multiline input
Retains the same logic while removing the deprecated call.


173-173: Refactored lexer instantiation in test
Looks good. Testing behavior remains unchanged.

yash-syntax/src/parser/lex/arith.rs (8)

17-17: Clarified doc-comment for arithmetic expansions
No content issues; the comment remains accurate.


98-98: Refactor to Lexer::with_code in arithmetic test
Matches the broader refactoring direction.


117-117: Test updated to new lexer method
Ensures consistency in unit tests.


129-129: Multiline test now uses Lexer::with_code
No further changes required; functionality is unaffected.


148-148: Escapes test leverages Lexer::with_code
Preserves original test coverage.


178-178: Unclosed expression test
Switch to Lexer::with_code is straightforward and aligns with the new approach.


199-199: Refactor within unclosed second expression test
All good; consistent with prior changes.


220-220: Partial arithmetic expansion test
Again, the new invocation is correct.

yash-syntax/src/parser/grouping.rs (10)

Line range hint 17-17: Doc-comment reworded
Remains descriptive; no issues found.


109-109: Short grouping test updated
Switching to Lexer::with_code retains existing semantics.


121-121: Long grouping test updated
New instantiation method is consistent with the refactor.


133-133: Unclosed grouping test
Use of Lexer::with_code preserves test logic.


153-153: Empty grouping POSIX test
Still checks for errors correctly with updated lexer call.


167-167: Aliasing test
The refactored lexer initialization is correct.


200-200: Short subshell test
Reflects the standardized lexer creation approach.


216-216: Long subshell test
No functional impact beyond aligning with the new API.


232-232: Unclosed subshell test
Continues to test error handling while using Lexer::with_code.


252-252: Empty subshell POSIX test
The refactoring is consistent; no operational changes introduced.

yash-syntax/src/parser/lex/backquote.rs (9)

17-17: No issues
Removing the trailing period in the doc comment is a minor style change and looks fine.


95-95: Use new with_code
Refactoring the test setup to use Lexer::with_code("X") is consistent with the new approach to lexer instantiation.


106-106: Use new with_code
This aligns with the standardized pattern of initializing the lexer using with_code.


122-122: Use new with_code
Consistent across the test suite and improves uniformity in lexer creation.


146-146: Use new with_code
Switching to with_code maintains consistency with the new lexer initialization convention.


175-175: Use new with_code
Stays aligned with the refactoring goal of adopting a standardized instantiation method for Lexer.


205-205: Use new with_code
Good to see the line continuation test migrated to the new initialization approach.


224-224: Use new with_code
The unclosed empty backquote test now matches the updated lexer pattern.


245-245: Use new with_code
The unclosed nonempty backquote test also aligns with the unified Lexer instantiation.

yash-builtin/src/source/semantics.rs (1)

67-73: Refactor to config-based Lexer
Using Lexer::config() to set source and then initializing the lexer with config.input(...) is consistent with the goal of standardizing lexer creation across the codebase.

yash-syntax/src/parser/lex/heredoc.rs (9)

137-137: Use new with_code
Adopting Lexer::with_code for the test is consistent with the broader refactor.


141-141: Use new with_code
Maintaining consistency with the refactored initialization approach.


160-160: Use new with_code
The here-doc test now correctly employs with_code.


180-180: Use new with_code
Continues the standardization of lexer instantiation in the here-doc tests.


200-200: Use new with_code
Good consistency in migrating longer content tests to the updated initialization.


223-223: Use new with_code
This refactoring ensures escapable behavior is tested under the standardized instantiation.


256-256: Use new with_code
Quoted delimiter here-doc test also now aligns with the new approach.


294-294: Use new with_code
Removing tabs in here-doc lines is now tested with the revised lexer method.


314-314: Use new with_code
The unclosed here-doc scenario is properly handled under the new config approach.

yash-cli/src/startup/init_file.rs (1)

173-178: Refactor to config-based Lexer
Switching to Lexer::config(), setting the source property, and using config.input(...) is consistent with the broader normalization of lexer initialization.

yash-semantics/src/runner_legacy.rs (10)

59-59: Refactor to the new lexer initialization is consistent.
No issues found with using Lexer::with_code.


111-111: Refactor to the new lexer constructor.
Switch to Lexer::new(input) aligns with updated approach.


191-191: Empty input lexer check.
Using Lexer::with_code("") is valid for testing empty input scenarios.


207-207: Consistent usage of Lexer::with_code.
Good job ensuring tests match the new method.


222-222: Verifying multi-line input.
No issues with the updated Lexer::with_code("echo 1\necho 2\necho 3;").


243-243: Simplified creation of lexer for aliases test.
Approach is consistent with the codebase refactoring.


269-269: Consistent usage of Lexer::new with I/O input.
Deprecation notice is properly handled here.


287-287: Syntax error test with Lexer::with_code(";;").
No functional issues identified.


301-301: Combining script lines for error handling.
The updated initialization is fine for multi-line error testing.


330-330: Signal handling and traps verification.
Using Lexer::with_code("echo $?") is correct.

yash-syntax/src/parser/lex/op.rs (21)

31-31: Documentation refinement.
Switched to a sentence without a trailing period, maintaining consistency in doc comments.


160-160: Edge struct doc.
Documentation updates look good.


163-163: Field documentation.
Adding brief doc on character value is clear; no issues found.


166-166: Continuation doc.
Mentioning "matches" here is helpful context.


168-168: Elaboration on sub-trie.
No issues; doc is concise and clear.


187-187: Public trie constant doc.
“Trie containing all the operators” is sufficiently descriptive.


231-231: Logical separation for '&'.
No functional changes; doc is consistent with new style.


238-238: Documentation clarity for operators starting with ';'.
No issues found.


257-257: Nested operator trie doc.
The notation “operators that start with ;;” is consistent.


264-264: Doc update for < operators.
Implementation remains unchanged; doc is fine.


288-288: Doc update for << operators.
No further suggestions.


302-302: Doc update for > operators.
Using _GREATER naming approach is consistent with style.


326-326: Doc update for >> operators.
All good here.


333-333: Doc update for | operators.
No issues with this minor documentation tweak.


340-340: Doc for empty trie.
Comment clarifies that it contains no edges.


469-469: Refactored test for <<-.
Lexer::with_code("<<-") is correct for verifying longest match.


487-487: Refactored test for <<>
Ensures correct delimiter detection.


507-507: Verifying << at EOF.
No issues found.


524-524: Test with multi-line input containing line continuations.
Accurately tests the combined input scenario.


541-541: Test for whitespace after backslash.
Correct usage of Lexer::with_code("\\\n ") for EOF detection.


560-560: Custom Input test with Lexer::new(...).
This verifies minimal lines read from a custom input.

yash-semantics/src/runner.rs (15)

69-69: Adapting example to Lexer::with_code.
No issues; consistent with the new API.


91-91: Using Lexer::new(input) in sample code.
Refactoring is straightforward; no concerns.


222-222: Empty lexer test.
No problems with Lexer::with_code("") usage.


238-238: Testing environment variable substitution.
The new invocation aligns with the updated Lexer approach.


253-253: Multi-line command execution test.
Lexer::with_code("echo 1\necho 2\necho 3;") is valid.


274-274: Alias parsing consistency.
Shifted to with_code("echo") is correct.


291-291: Verbose option test with Lexer::new.
Correct approach for interactive usage.


308-308: Interactive read-eval loop test.
Using Lexer::with_code("${X?}\necho $?\n") is valid.


327-327: Interactive loop break test.
No concerns with Lexer::with_code("return 123\necho $?\n").


345-345: Non-interactive interrupt test.
All good with with_code("${X?}\necho $?\n").


358-358: Syntax error test with single statement.
Implementation is fine.


371-371: Syntax error test with multiple lines
Lexer::with_code(";;\necho !") is acceptable.


387-387: Syntax error continuation in interactive loop.
Ensures that the parser recovers and continues.


406-406: BrokenInput simulation with Lexer::new(...).
Verifies input error handling.


437-437: Traps between parse and execute.
Lexer::with_code("echo $?") usage is consistent.

yash-syntax/src/parser/lex/modifier.rs (23)

17-17: Refined module-level doc.
Single-line doc comment is consistent with style.


160-160: Empty input test.
Lexer::with_code("") is correct for verifying EOF behavior.


172-172: Modifier parsing with }.
Logic remains correct for Lexer::with_code("}").


186-186: Switch suffix modifier on +}.
No issues with this test scenario.


206-206: Switch suffix with additional whitespace.
Lexer::with_code(r"+a z}") is consistent.


234-234: Switch with colon prefix.
Ensures UnsetOrEmpty condition is tested.


254-254: Default modifier test with -}.
Implementation is valid.


274-274: Default modifier with colon prefix.
Lexer::with_code(r":-cool}") is good coverage of expansions.


302-302: Assign modifier with colon prefix.
Ensures environment assignment scenario is covered.


322-322: Assign modifier without colon.
Covers SwitchType::Assign for normal usage.


349-349: Error suffix modifier test.
Lexer::with_code("?}") logic is correct.


369-369: Error suffix with colon prefix.
Tilde expansions remain unaffected.


395-395: Tilde expansions in switch word (WordContext::Word).
Properly recognized as WordUnit::Tilde.


409-409: Tilde expansions in switch word (WordContext::Text).
Confirming no expansions in text context.


426-426: Shortest prefix trim.
Tested with single-quoted pattern.


446-446: Trim shortest prefix in text context.
Identical logic with no distinct issues.


466-466: Longest prefix trim.
Checks for repeated # usage.


489-489: Shortest suffix trim with backslash.
Ensures correct character handling.


512-512: Longest suffix trim test.
Implementation matches expected pattern usage.


535-535: Tilde expansion in trim word.
Confirms prefix-only expansions recognized in WordContext.


549-549: Invalid orphan colon at EOF.
Generates correct InvalidModifier error.


563-563: Invalid orphan colon followed by letter.
Handled gracefully with correct error.


577-577: Invalid orphan colon followed by symbol.
Proper error is raised for ":#}".

yash-syntax/src/parser/lex/word.rs (20)

105-105: Doc comment refinement.
Removing the period from the doc comment is consistent with the style changes noted across the codebase.


193-193: Update to Lexer::with_code.
This migration away from from_memory aligns with the new lexer initialization strategy and looks good.


215-215: Test lexer instantiation update.
Using Lexer::with_code is consistent with the broader refactoring effort.


273-273: Refactor to Lexer::with_code.
Correct usage in a test context; no issues found.


312-312: Adopt Lexer::with_code.
Matches the refactor pattern; test remains concise.


333-333: Lexer initialization updated.
Straightforward substitution of the old method; no functional concerns.


351-351: Refactored lexer setup in test.
Good continuation of the migration to with_code.


369-369: Refactor test to use with_code.
Alignment with the new API call looks correct.


395-395: Use Lexer::with_code in text context test.
No issues; ensures consistency with the updated API.


417-417: Updated lexer creation.
Continues the pattern of replacing from_memory; verified no side effects.


438-438: Consistent shift to Lexer::with_code.
Maintains uniform instantiation across tests.


468-468: Adapting to the new lexer pattern.
No functional concerns spotted.


487-487: Test instantiation updated.
Aligns with the rest of the code changes.


505-505: Small refactor for clarity.
Switching to with_code is consistent and straightforward.


526-526: Double-quote test setup.
The use of with_code is appropriately applied here as part of the refactoring.


563-563: Double-quote unclosed test update.
Correct usage of the new constructor method.


589-589: Refactor variable capturing test.
Maintains consistency for lexer initialization.


617-617: Empty input test updated.
The new method usage is valid here.


636-636: Switch to with_code in a bracket expansion scenario.
No issues found.


659-659: String literal test.
Good continuity in updating the lexer method.

yash-syntax/src/parser/lex/escape.rs (28)

281-281: Use Lexer::with_code for escape test.
No functional differences from the old approach.


289-289: Updated test for named escapes.
Ensures consistency with the new lexer initialization.


323-323: Incomplete escape test refactored.
No issues; approach is correct.


337-337: Escape unit control tests updated.
Migration is consistent across the test suite.


353-353: Refactored test for incomplete control escape.
Aligns with the established pattern.


367-367: First incomplete control backslash test.
No concerns; consistent with the new approach.


378-378: Second incomplete control backslash test.
Same approach applied; looks good.


392-392: Unknown control escape test.
Correct usage of with_code.


406-406: Octal escape test updated.
Method substitution appears correct.


418-418: Test to handle partial octal digits.
No issues found in this refactor.


423-423: Octal test with input end.
Appropriate update of the method call.


431-431: Non-byte octal escape test.
Again, Lexer::with_code is consistent with the new standard.


445-445: Hexadecimal escape test.
Refactor is straightforward and correct.


455-455: Sound approach for 2-digit hex escape test.
Maintains uniform usage of the new constructor.


463-463: Incomplete hex escape test.
No functional changes besides constructor usage.


477-477: Short Unicode escape test.
with_code usage is correct.


485-485: Long Unicode escape test.
Remains consistent with the default approach.


496-496: Incomplete short Unicode escape test.
Correct substitution for the new method.


507-507: Incomplete long Unicode escape test.
Synchronous with the codebase refactor.


522-522: Invalid Unicode scalar value test updated.
No issues; logic unchanged apart from with_code.


536-536: Unknown escape test.
Retains test logic with the new constructor method.


549-549: Escaped string literals test.
Standard with_code usage—looks good.


560-560: Escaped string with mixed escapes.
Consistent with the refactor.


581-581: Line continuation disabled—test updated.
No concerns.


589-589: Verify invalid escape scenario.
Method usage is correct.


604-604: Single quoted empty test.
Again, consistent usage of with_code.


615-615: Nonempty single quoted test.
No issues found; refactor is correct.


637-637: Unclosed single quoted test.
Method name updated with no functional impact.

yash-syntax/src/parser/from_str.rs (16)

94-94: Use Lexer::with_code in TextUnit parser.
Maintains consistency with the new initialization approach.


107-107: Refactoring to Lexer::with_code for Text parsing.
No issues found.


120-120: EscapeUnit FromStr test using updated method.
Implementation remains stable.


129-129: EscapedString parser now uses with_code.
Straightforward substitution.


142-142: WordUnit parsing updated to with_code.
Consistent usage across all parsing logic.


160-160: Initialize Word parser with Lexer::with_code.
Refactor is cohesive.


217-217: Operator parsing method realignment.
Shifting to with_code for consistent usage.


249-249: Redir FromStr logic updated.
No anomalies in switching constructors.


278-278: SimpleCommand FromStr method.
Good alignment with the project-wide refactor.


304-304: CaseItem from_str parser alignment.
No negative side effects detected.


331-331: CompoundCommand from_str usage.
Proper update to Lexer::with_code.


354-354: Refactoring for FullCompoundCommand.
Remains consistent with all other changes.


377-377: Command FromStr is now using with_code.
No concerns; correct propagation of changes.


400-400: Pipeline from_str updated.
Follows the standardized instantiation pattern.


432-432: AndOrList parser fix.
Switch to with_code is consistent.


450-450: List parser from_str method.
This final refactoring is inline with broader changes.

yash-syntax/src/parser/simple_command.rs (29)

263-263: Array values test refactor.
Using Lexer::with_code now matches the rest of the code.


271-271: Empty array test updated.
Approach is consistent with the global refactoring.


283-283: Multiple array values test.
Maintains uniform usage of the with_code method.


295-295: Newlines and comments test.
No issues in the constructor switch.


312-312: Unclosed array value test.
Refactored to use with_code; logic remains the same.


330-330: Invalid word inside array test.
Dependency on Lexer::with_code is consistent.


348-348: Simple command end-of-file test.
Method usage aligns with the new standard.


357-357: Keyword-based test.
No concerns in adopting with_code.


366-366: Single assignment test updated.
Successfully migrated to the new instantiation approach.


384-384: Multiple assignments parsing test.
No functional differences from switching the constructor.


414-414: Single word command test.
Use of with_code is fully consistent.


428-428: Multiple words test.
Implementation remains intact post-refactor.


446-446: Single redirection test.
No issues noted with updated construction method.


466-466: Multiple redirections test.
The usage matches the rest of this PR’s refactoring.


496-496: Assignment plus word test.
Looks good; no hidden pitfalls.


512-512: Word plus redirection test.
Migration to the new constructor is successful.


531-531: Redirection plus assignment scenario.
Final code is uniform with the rest of the PR changes.


550-550: Assignment + redirection + word test.
All elements are tested with the new approach.


571-571: Array assignment test.
Lexer::with_code usage is consistent with other array tests.


590-590: Empty assignment followed by blank parenthesis.
Switch from the old function is valid.


611-611: Non-empty assignment followed by parenthesis.
No issues with rewriting the constructor call.


634-634: Single expansion mode test.
export recognized as a declaration utility, factoring in new method usage.


652-652: Multiple expansion mode test.
Supports a non-assignment word; consistent with changes.


670-670: Non-declaration utility scenario.
The updated lexer method is correctly applied.


688-688: command command export test.
The approach remains correct, with the new constructor in place.


695-695: command command foo export test.
Verified correct usage of the updated logger method.


706-706: Empty glossary test.
Lexer::config().input(...) usage remains aligned with the rest.


733-733: Custom glossary usage.
Refactor to with_code is consistent here as well.


759-759: Assignment scenario with custom glossary.
No issues; the new method usage is correct.

yash-syntax/src/parser/core.rs (29)

17-20: Documentation updates look consistent
The removal of trailing punctuation in the doc comments helps maintain a uniform style.


35-37: Sufficient clarity in Result<T> documentation
No functional change here; the doc comment is concise and sufficiently descriptive.


Line range hint 39-57: Well-explained flow for alias substitution
The updated documentation for Rec<T> variant is clear and accurately explains how alias substitution triggers a state requiring the parser to restart.


58-60: Variant renaming clarifies each outcome
“AliasSubstituted” and “Parsed” names are descriptive and consistent with the new doc comments.


Line range hint 97-126: Configuration structure for parser is well-documented
Using Config<'a> to encapsulate parser settings is straightforward, with doc comments adequately describing each field.


Line range hint 455-471: Use of Lexer::with_code("X") in tests
Changing to with_code is consistent with the refactor. Test code retains clarity.


Line range hint 476-492: Functionality remains unchanged while tests read cleaner
The transition to with_code is a straightforward rename, preserving the same test coverage.


Line range hint 494-516: Escaped strings still function correctly
Relying on with_code(r"\X") suits the new approach. No issues observed.


Line range hint 518-536: Semicolon operator test remains accurate
Ensuring the operator is recognized correctly demonstrates that the rename to with_code didn’t introduce regressions.


Line range hint 537-546: No functional changes
All tests validate existing parser logic appropriately.


Line range hint 547-584: Recursive alias substitution test
Retains logic for ensuring multiple expansions are processed. Using Lexer::with_code("X") is consistent with the codebase changes.


Line range hint 585-618: Blank-ending alias substitution
Behavior remains intact, verifying alias expansions across blank separations. Looks good.


Line range hint 619-649: No concerns
The test logic is straightforward, verifying no blank-ending scenario.


Line range hint 650-670: Confirming global alias handling
Test remains understandable and consistent in verifying a global alias.


Line range hint 671-687: Keyword-based termination checks
Good coverage of skipping alias expansion for recognized keywords.


Line range hint 688-708: Checking alias vs. recognized keywords
Ensures if remains a reserved word despite a matching alias.


Line range hint 709-725: Test ensures fallback to alias
If the keyword isn’t matched, alias expansion applies. No issues.


Line range hint 726-752: Alias substituting into another reserved word
Demonstrates chain expansions to recognized keywords. Nicely tested.


753-760: Blank check logic
Using empty space string for verifying blank recognition is a valid approach.


761-768: Parenthesis as a non-blank
No concerns; test ensures parser identifies the character properly.


769-776: EOF blank scenario
Test confirms no blank if the input is empty.


777-784: Continuations recognized as blanks
Line continuations remain properly recognized. The approach is robust.


Line range hint 785-793: Continuations when parentheses appear
Verifies that line continuation logic is unaffected by certain tokens.


794-801: Pending token check
Ensures that calling has_blank before clearing the pending token results in a panic. Good negative test.


Line range hint 802-814: No pending heredoc
Empty scenario test with here_doc_contents. Straightforward.


Line range hint 815-839: Single heredoc reading
Verifies that END is recognized properly. Tests remain correct.


Line range hint 840-876: Multiple heredoc reading
Shows that each delimiter gets unique content. No issues with the rename.


Line range hint 877-903: Reading heredocs across two calls
Additional coverage ensures partial reads. Test usage of with_code is fine.


904-906: Check for pending token before reading heredoc
Good negative scenario ensuring no partial read.

yash-syntax/src/parser/lex/braced_param.rs (1)

17-19: Minor doc comment normalization
No functionality changed; the concise doc comment is good.

yash-syntax/src/parser/lex/core.rs (26)

Line range hint 17-48: Documentation improvements
Removing trailing periods harmonizes style. No logic changes.


Line range hint 49-66: PeekChar enum clarity
The doc clearly explains possible states. No issues spotted.


Line range hint 67-87: TokenId documentation
Enum description reads well, accurately describing POSIX lexical tokens.


Line range hint 88-110: EndOfInput usage
Complements the existing logic. No functional regression.


Line range hint 111-131: Token struct doc
Clear doc on fields word, id, and index.


114-119: Short doc lines are consistent
Concise explanation without trailing punctuation.


120-121: Token ID’s positioning
No issues.


Line range hint 122-131: Index usage
Well-defined for tracking the token’s start position.


Line range hint 132-153: InputState and SourceCharEx
Internal usage clarity improved with straightforward doc lines.


Line range hint 154-432: LexerCore struct
The refactoring and doc expansions are well-structured. No major concerns.


433-442: Config struct introduction
A builder approach for lexer configuration is an excellent design.


443-452: start_line_number doc
Explains how line numbers are annotated. Clear improvement in documentation.


453-465: source doc
Setting a Source is optional but recommended, providing more context.


466-467: Configuration completeness
Fields are straightforward, no issues.


468-488: Config::new() and Config::input()
Offers simple defaults while allowing customization. Implementation is consistent with the PR’s goals.


490-495: Default trait
Mirrors Config::new(). Implementation is standard.


Line range hint 496-524: Lexer doc updates
Highlights new config usage and how to incorporate line numbers or sources.


505-523: Reference to parser usage
The doc references usage in Parser. These notes remain helpful.


Line range hint 525-534: Lexer struct
Wrapper approach over LexerCore is explained. No issues.


535-543: Lexer::config()
Simple forward to Config::new(), consistent naming.


545-561: Lexer::new()
Matches the new config approach for default usage. No concerns.


562-571: Lexer::with_code
Provides a simpler constructor for ephemeral scenarios. Good shift from from_memory.


Line range hint 573-592: Deprecation note for from_memory
In-line suggestion to use with_code or full config approach clarifies the recommended usage path.


585-589: Internal usage
The bridging function to configure the lexer is valid.


Line range hint 911-927: PlainLexer
Well-documented for disabling line continuations. No further changes needed.


928-930: WordLexer
Minimal doc clarifications. Implementation aligns nicely with the refactoring.

yash-syntax/src/parser/and_or.rs (4)

Line range hint 83-90: Switch to with_code("")
Test for empty commands remains valid.


Line range hint 92-102: Single command test
No significant changes besides the lexer constructor rename; logic is intact.


Line range hint 103-117: Multiple command and-or test
Shows no break in functionality after refactoring the lexer creation.


Line range hint 118-128: Missing command scenario
Still verifies correct error condition. The rename to with_code did not alter behavior.

yash-syntax/src/parser/lex/token.rs (5)

Line range hint 99-108: LGTM: Test coverage for empty input is comprehensive

The test thoroughly verifies the lexer's behavior with empty input, including location, line numbers, and token identification.


Line range hint 112-127: LGTM: Test coverage for word parsing is thorough

The test effectively verifies word parsing, including unit decomposition, location tracking, and input state management.


Line range hint 139-155: LGTM: IO number parsing test is comprehensive

The test thoroughly verifies IO number parsing with the '<' delimiter, including proper token identification and location tracking.


Line range hint 157-175: LGTM: IO number parsing with '>' delimiter is well tested

The test effectively verifies IO number parsing with the '>' delimiter, including proper location tracking after parsing.


Line range hint 177-196: LGTM: Blank handling and subsequent token parsing is well tested

The test comprehensively verifies blank space handling and subsequent token parsing, including proper location tracking and end of input detection.

yash-syntax/src/parser/lex/dollar.rs (1)

74-82: LGTM: Dollar expansion tests successfully migrated to new lexer initialization

All test cases have been properly updated to use Lexer::with_code while maintaining their original test coverage and assertions. The tests continue to verify:

  • No-dollar scenarios
  • Dollar followed by non-special characters
  • Raw special parameters
  • Command substitutions
  • Arithmetic expansions
  • Line continuations

Also applies to: 82-91, 91-98, 102-111, 111-118, 122-138, 141-156, 157-173, 176-191, 195-205

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
yash-syntax/src/parser/lex/core.rs (2)

437-498: Well-structured Config implementation!

The Config struct implementation follows Rust best practices with:

  • Builder pattern for flexible initialization
  • Clear documentation for each field
  • #[non_exhaustive] attribute for future extensibility
  • Proper Default trait implementation

Consider adding an example in the documentation showing how to use the Config with custom settings:

/// ```rust
/// use yash_syntax::parser::lex::Lexer;
/// use yash_syntax::source::Source;
/// 
/// let mut config = Lexer::config();
/// config.start_line_number = 42.try_into().unwrap();
/// config.source = Some(Source::Stdin.into());
/// let lexer = config.input(Box::new(/* input object */));
/// ```

573-575: Consider adding error handling for empty input

The with_code implementation could benefit from validation of the input string.

Consider adding a check for empty input or adding a note in the documentation about how empty input is handled:

 pub fn with_code(code: &'a str) -> Lexer<'a> {
+    // Note: Empty input is valid and will result in immediate end of input
     Self::new(Box::new(Memory::new(code)))
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44e21f9 and d90248c.

📒 Files selected for processing (1)
  • yash-syntax/src/parser/lex/core.rs (20 hunks)
🔇 Additional comments (3)
yash-syntax/src/parser/lex/core.rs (3)

17-17: Documentation looks good!

The documentation header clearly describes the module's purpose.


577-591: Good deprecation approach for from_memory

The soft deprecation of from_memory with clear migration instructions to either with_code or config().input() is well done. The implementation maintains backward compatibility while guiding users to the new API.


Line range hint 1519-1784: Comprehensive test coverage!

The test suite is well-structured with:

  • Coverage for the new Config and Lexer implementations
  • Edge case handling (empty input, line continuations)
  • Error condition testing
  • Clear test names and assertions

@magicant magicant changed the title Refactor lexer Introduce Config to Lexer Dec 31, 2024
@magicant magicant merged commit 8a58f1f into master Dec 31, 2024
8 checks passed
@magicant magicant deleted the refactor-lexer branch December 31, 2024 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change This enhancement is backward-incompatible documentation Improvements or additions to documentation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant