Support IELR(1) Parser Generation #398

junk0612 · 2024-04-23T05:41:29Z

Support the generation of the IELR(1) parser described in this paper.
https://www.sciencedirect.com/science/article/pii/S0167642309001191

This PR will be ready for merging when the following is done.

Add tests
- Currently, I have only created a grammar file and verified its operation in my local environment.
Refactor
- The data structure is awful and needs to be improved because the priority was to achieve the expected results.

jasone · 2024-07-26T05:18:16Z

A couple months ago I poked around the internet looking for independent IELR(1) implementations, and besides lrama found only an inactive rust-based attempt. Recently I completed my own implementation and wrote a technical report with the hope that it might help others. IELR(1) turns out to be a real bear to figure out, even if it takes only a moderate amount of code. I hope you find the time and inspiration to complete this effort. Good luck!

junk0612 · 2024-11-14T05:41:30Z

On 7da9676 , Lrama can parse CRuby's parse.y at ruby/ruby@028958f .

These are the files of parsing results.
ruby path/to/id2token path/to/parse.y | bundle exec lrama -v - lalr
lalr.output

ruby path/to/id2token path/to/parse.y | bundle exec lrama -D lr.type=ielr -v - ielr
ielr.output

yui-knk · 2025-01-13T03:31:07Z

lib/lrama/states.rb

@@ -92,6 +92,20 @@ def compute
      report_duration(:compute_default_reduction) { compute_default_reduction }
    end

+    def compute_ielr


[note]: Ref "3.5.4. Algorithm" of the paper.

yui-knk · 2025-01-13T03:32:50Z

lib/lrama/states.rb

@@ -524,5 +541,52 @@ def compute_default_reduction
        end.first
      end
    end
+
+    def split_states
+      transition_queue = []


[nits]: It seems this variable is not used.

yui-knk · 2025-01-13T03:33:41Z

lib/lrama/states.rb

@@ -524,5 +541,52 @@ def compute_default_reduction
        end.first
      end
    end
+
+    def split_states


[note]: "Definition 3.45 (split_states)"

yui-knk · 2025-01-13T03:37:38Z

lib/lrama/state.rb

@@ -142,5 +160,274 @@ def rr_conflicts
        conflict.type == :reduce_reduce
      end
    end
+
+    def propagate_lookaheads(next_state)


[note]: Definition 3.40 (propagate_lookaheads)

yui-knk · 2025-01-13T03:41:49Z

lib/lrama/state.rb

+    def propagate_lookaheads(next_state)
+      next_state.kernels.map {|item|
+        lookahead_sets =
+          if item.position == 1


[note]: Corresponding to case 2 of Definition 3.40 because position is 0-origin in this code base.

yui-knk · 2025-01-13T03:44:35Z

lib/lrama/state.rb

+    def propagate_lookaheads(next_state)
+      next_state.kernels.map {|item|
+        lookahead_sets =
+          if item.position == 1


[nits]: I personally like to make the condition order to be aligned with the original paper for readability of the algorithm. For example

if item.position > 1 ... else ... end

yui-knk · 2025-01-13T03:50:11Z

lib/lrama/state.rb

+          if item.position == 1
+            goto_follow_set(item.lhs)
+          else
+            kernel = kernels.find {|k| k.predecessor_item_of?(item) }


[note]: C[k] is the kernel (1) whose LHS and RHS are same with the item's one and (2) whose position is item.position - 1. #predecessor_item_of? is a method for such calculation.

yui-knk · 2025-01-13T03:55:20Z

lib/lrama/state.rb

+            item_lookahead_set[kernel]
+          end
+
+        [item, lookahead_sets & next_state.lookahead_set_filters[item]]


[note]: This is same with the definition case 1 because case 1 requires tokens which are in both lookahead_set_filters of s' kernel and item_lookahead_sets of s kernel.

yui-knk · 2025-01-13T04:05:00Z

lib/lrama/states.rb

@@ -92,6 +92,20 @@ def compute
      report_duration(:compute_default_reduction) { compute_default_reduction }
    end

+    def compute_ielr
+      report_duration(:split_states) { split_states }
+      report_duration(:compute_direct_read_sets) { compute_direct_read_sets }


[note]: Compute LA set again, once states are splitted. See Phase 4 and Phase 5 on "3.1. Overview".

yui-knk · 2025-01-13T04:10:41Z

lib/lrama/state.rb

+      !@item_lookahead_set.nil?
+    end
+
+    def compatible_lookahead?(filtered_lookahead)


[note]: This method is same with "Definition 3.43 (is_compatible)".

[nits]: If so, I prefer to use is_compatible? as a method name for clarity.

yui-knk · 2025-01-14T02:38:22Z

lib/lrama/state.rb

@@ -23,6 +23,12 @@ def initialize(id, accessing_symbol, kernels)
      @conflicts = []
      @resolved_conflicts = []
      @default_reduction_rule = nil
+      @predecessors = []
+      @lalr_isocore = self


[note]:

@lalr_isocore is an implementation of lalr1_isocores (Definition 3.36)

@ielr_isocores is an implementation of isocore_nexts (Definition 3.37)

Their initialization logic is written in Definition 3.45 (split_states).

yui-knk · 2025-01-14T02:52:32Z

lib/lrama/state.rb

+        }
+    end
+
+    def lookahead_set_filters


[note]: Definition 3.38 (lookahead_set_filters)

yui-knk · 2025-01-14T03:36:24Z

lib/lrama/state.rb

+            [action, nil]
+          elsif action.is_a?(Reduce)
+            if action.rule.empty_rule?
+              [action, lhs_contributions(action.rule.lhs, inadequacy_list.key(actions))]


[nits]: If need to get the key, I like to use inadequacy_list.map or h = {}; inadequacy_list.each ... so that the code accesses its key clearly.

yui-knk · 2025-01-14T03:37:17Z

lib/lrama/state.rb

+
+    def annotate_predecessor(predecessor)
+      annotation_list.transform_values {|actions|
+        token = annotation_list.key(actions)


[nits]: Same with #annotate_manifestation, it's more clear to use inadequacy_list.map or h = {}; inadequacy_list.each ... to access its key.

yui-knk · 2025-01-14T04:33:58Z

lib/lrama/state.rb

+      }
+    end
+
+    def inadequacy_list


[note]: Definition 3.27 (inadequacy_lists). Difference from the original paper is that tuple (state, token, actions) is used in the original paper but tuple (token, actions) is used in this implementation.

yui-knk · 2025-01-14T04:41:56Z

lib/lrama/state.rb

+      inadequacy_list.transform_values {|actions|
+        actions.map {|action|
+          if action.is_a?(Shift)
+            [action, nil]


[note]: nil means "undefined" in the original paper.

yui-knk · 2025-01-14T04:42:58Z

lib/lrama/state.rb

+          if action.is_a?(Shift)
+            [action, nil]
+          elsif action.is_a?(Reduce)
+            if action.rule.empty_rule?


[nits]: I prefer the order of sub condition follows the same order of the original paper.

yui-knk · 2025-01-14T04:45:01Z

lib/lrama/state.rb

+            if action.rule.empty_rule?
+              [action, lhs_contributions(action.rule.lhs, inadequacy_list.key(actions))]
+            else
+              contributions = kernels.map {|kernel| [kernel, kernel.rule == action.rule && kernel.end_of_rule?] }.to_h


[note]: "C[j] = (l -> r, |r| + 1)" in Definition 3.30 2-(a) means end_of_rule.

yui-knk · 2025-01-14T04:49:52Z

lib/lrama/state.rb

+      }
+    end
+
+    def lhs_contributions(sym, token)


[note]: sym is the symbol (non-terminal) of the rule's LHS.

yui-knk · 2025-01-14T05:46:54Z

lib/lrama/state.rb

+    end
+
+    def lhs_contributions(sym, token)
+      shift, next_state = nterm_transitions.find {|sh, _| sh.next_sym == sym }


[nits]: Nonterminal transition is called goto. Shift is used for terminal transition.

yui-knk · 2025-01-14T05:55:08Z

spec/fixtures/integration/ielr.y

+%token c
+%define lr.type ielr
+
+%%


This grammar example comes from Fig 5 of the original paper. Could clarify it as comments?

yui-knk · 2025-01-14T13:50:51Z

@junk0612 I created 2 test cases as reproduction of #398 (comment).
See yui-knk@ca288c8

yui-knk · 2025-01-15T02:27:44Z

lib/lrama/states.rb

+
+    def split_states
+      transition_queue = []
+      @states.each do |state|


I'm wondering it needs to limit the iteration only for the LALR states like @states[[email protected]].each ..., in other words do we need to iterate states which are created by #compute_state?

It might be my misunderstandings. We may need to iterate all states which includes newly created one.

To reproduce mysterious behaviors for ruby parse.y, this commit add 2 test cases releated these behaviors. 1. "states/ielr_prec.y" causes meaningless state split. All conflicts are resovled with only LALR, however `#compute_ielr` splits state 8 to state 11, state 10 to state 12 and changes state 10 moves to state 11 with "relop". 2. "states/ielr_nonassoc.y" causes duplicated "error" on state 6. this commit is picked from yui-knk@ca288c8

junk0612 force-pushed the generate_ielr_parser branch from 2d975c5 to a49fe82 Compare June 25, 2024 17:19

junk0612 force-pushed the generate_ielr_parser branch from a49fe82 to 604737d Compare September 4, 2024 06:28

junk0612 force-pushed the generate_ielr_parser branch 7 times, most recently from b3078e1 to 0a300f7 Compare September 25, 2024 18:33

junk0612 marked this pull request as ready for review September 25, 2024 18:33

junk0612 requested review from ydah and yui-knk September 25, 2024 18:33

junk0612 force-pushed the generate_ielr_parser branch from 0a300f7 to 0ba3241 Compare October 11, 2024 09:42

junk0612 force-pushed the generate_ielr_parser branch from 9bf8584 to 7da9676 Compare November 14, 2024 05:10

yui-knk reviewed Jan 13, 2025

View reviewed changes

yui-knk mentioned this pull request Jan 13, 2025

v0.7.0 #506

Closed

yui-knk reviewed Jan 14, 2025

View reviewed changes

lib/lrama/state.rb

}

end

def lookahead_set_filters

Copy link

Collaborator

yui-knk Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: Definition 3.38 (lookahead_set_filters)

yui-knk reviewed Jan 14, 2025

View reviewed changes

lib/lrama/state.rb

}

end

def lhs_contributions(sym, token)

Copy link

Collaborator

yui-knk Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: sym is the symbol (non-terminal) of the rule's LHS.

yui-knk reviewed Jan 14, 2025

View reviewed changes

yui-knk reviewed Jan 15, 2025

View reviewed changes

junk0612 added 7 commits January 16, 2025 22:24

Set %define when parsing grammar files

e12ab4f

Parse --define options

1cd2913

Support IELR(1) parser generation

968725e

Add a integration test case

cf8cd80

Optimize calculating predecessors

1dd195b

Check existence of contributions

d3f0f1f

junk0612 force-pushed the generate_ielr_parser branch from 25ef188 to fead1a4 Compare January 16, 2025 13:43

Add a type definition

462c998

yui-knk approved these changes Jan 18, 2025

View reviewed changes

junk0612 merged commit b1081fb into ruby:master Jan 18, 2025
22 checks passed

junk0612 deleted the generate_ielr_parser branch January 18, 2025 05:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support IELR(1) Parser Generation #398

Support IELR(1) Parser Generation #398

junk0612 commented Apr 23, 2024 •

edited

Loading

jasone commented Jul 26, 2024

junk0612 commented Nov 14, 2024 •

edited

Loading

yui-knk Jan 13, 2025

yui-knk Jan 13, 2025 •

edited

Loading

yui-knk Jan 13, 2025

yui-knk Jan 13, 2025

yui-knk Jan 13, 2025

yui-knk Jan 13, 2025

yui-knk Jan 13, 2025 •

edited

Loading

yui-knk Jan 13, 2025

yui-knk Jan 13, 2025

yui-knk Jan 13, 2025 •

edited

Loading

yui-knk Jan 13, 2025

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025 •

edited

Loading

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025

yui-knk Jan 14, 2025

yui-knk commented Jan 14, 2025 •

edited

Loading

yui-knk Jan 15, 2025

yui-knk Jan 18, 2025

Support IELR(1) Parser Generation #398

Support IELR(1) Parser Generation #398

Conversation

junk0612 commented Apr 23, 2024 • edited Loading

jasone commented Jul 26, 2024

junk0612 commented Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

yui-knk Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yui-knk Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yui-knk Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yui-knk Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yui-knk commented Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junk0612 commented Apr 23, 2024 •

edited

Loading

junk0612 commented Nov 14, 2024 •

edited

Loading

yui-knk Jan 13, 2025 •

edited

Loading

yui-knk Jan 13, 2025 •

edited

Loading

yui-knk Jan 13, 2025 •

edited

Loading

yui-knk Jan 14, 2025 •

edited

Loading

yui-knk commented Jan 14, 2025 •

edited

Loading