GLR Algorithm
This chapter describes the GLR parsing algorithm implemented in OpenLexer.
Data Structures
Graph-Structured Stack (GSS)
The GSS is a directed acyclic graph where:
- Nodes represent parser states
- Edges represent stack entries (symbols and semantic values)
#![allow(unused)] fn main() { struct GssNode { state: usize, links: Vec<GssEdge>, } struct GssEdge { predecessor: NodeId, symbol: Symbol, value: SemanticValue, } }
GSS Operations
Push: Add a new node with an edge to the current node.
Reduce: Follow edges backward by the rule length, then push the result.
Merge: When two paths reach the same state, merge them instead of duplicating.
Shared Packed Parse Forest (SPPF)
The SPPF compactly represents all parse trees:
#![allow(unused)] fn main() { enum SppfNode { Symbol { symbol: String, span: (usize, usize), }, Packed { rule: usize, children: Vec<NodeId>, }, Ambiguous { alternatives: Vec<NodeId>, }, } }
Algorithm Steps
1. Initialization
Create the initial GSS node with state 0.
GSS: [state 0]
Active: { node_0 }
2. Per-Token Processing
For each input token:
pending_shifts = {}
pending_reductions = {}
for each active node:
action = get_action(node.state, token)
if action contains SHIFT:
add to pending_shifts
if action contains REDUCE:
add to pending_reductions
3. Reduction Phase
Process all pending reductions:
while pending_reductions not empty:
(node, rule) = pop from pending_reductions
for each path of length rule.rhs_len from node:
base_node = end of path
goto_state = goto[base_node.state, rule.lhs]
if node with goto_state exists in GSS:
merge edge into existing node
else:
create new node with goto_state
check for new reductions from the new/merged node
4. Shift Phase
Process all pending shifts:
new_active = {}
for each (node, shift_state) in pending_shifts:
if node with shift_state exists:
merge
else:
create new node
add to new_active
active = new_active
5. Termination
After processing all tokens:
- If any active node can accept, parsing succeeds
- If no active nodes remain, parsing fails
Conflict Actions
The glr_conflict_actions table stores all actions at conflict points:
#![allow(unused)] fn main() { struct ParsingTable { // Standard LALR action table (winner of conflicts) action: HashMap<(usize, String), Action>, // All actions at conflicts (for GLR) glr_conflict_actions: HashMap<usize, HashMap<String, Vec<Action>>>, } }
During GLR parsing, both actions are explored when a conflict exists.
Path Enumeration
Finding all paths of length N in the GSS:
#![allow(unused)] fn main() { fn enumerate_paths(node: NodeId, length: usize) -> Vec<Path> { if length == 0 { return vec![Path::new(node)]; } let mut paths = vec![]; for edge in gss[node].links { for subpath in enumerate_paths(edge.predecessor, length - 1) { paths.push(subpath.prepend(edge)); } } paths } }
Complexity
- Time: O(n^3) worst case, O(n) for unambiguous input
- Space: O(n^2) for GSS, O(n^3) for full SPPF
The practical performance depends on the degree of ambiguity in the grammar and input.