A Lexer, Parser, and Automaton-based Language Recognizer
This project is developed for the Formal Languages and Automata (LFA) course, implementing a Lexer, Parser, and Automaton-based Recognizer. The goal is to convert regular expressions into NFAs, transform them into DFAs, and use them for language acceptance while also implementing lexical analysis and parsing functionalities.
- Implement lexical analysis using regex-based tokenization.
- Parse expressions and syntactic structures.
- Convert regular expressions to NFAs using Thompson’s construction.
- Implement subset construction to transform NFAs into DFAs.
- Minimize DFAs using the Hopcroft minimization algorithm.
- Accept or reject input strings based on the final DFA state.
- Utilize frozensets for efficient state tracking and dictionary key management.
- Lexer.py – Tokenizes input using regex-based rules and NFAs.
- Parser.py – Implements a parser that processes tokenized input.
- Regex.py – Defines regex operations and transformations into NFAs.
- NFA.py – Implements Non-Deterministic Finite Automata (NFA) with ε-transitions.
- DFA.py – Implements Deterministic Finite Automata (DFA) with minimization.
- Uses Regular Expressions (Regex) to define token specifications.
- Converts regex patterns into NFAs using Thompson’s Construction.
- Uses subset construction to transform NFAs into DFAs for token recognition.
from Lexer import Lexer
# Define token specification: (TOKEN_NAME, REGEX_PATTERN)
spec = [
("NUMBER", r"[0-9]+"),
("PLUS", r"\+"),
("MULT", r"\*"),
("LPAREN", r"\("),
("RPAREN", r"\)"),
]
lexer = Lexer(spec)
tokens = lexer.lex("3 + 5 * (2 + 8)")
print(tokens)
# Output: [('NUMBER', '3'), ('PLUS', '+'), ('NUMBER', '5'), ('MULT', '*'), ('LPAREN', '('), ('NUMBER', '2'), ('PLUS', '+'), ('NUMBER', '8'), ('RPAREN', ')')]
- Parses expressions using operator precedence and parentheses handling.
- Identifies lambda expressions, arithmetic operations, and variable assignments.
- Constructs an Abstract Syntax Tree (AST) to represent expressions.
from Parser import Parser
parser = Parser()
result = parser.parse("3 + 5 * (2 + 8)")
print(result)
# Outputs: Parsed AST representation
- Uses Thompson’s construction to convert regex into NFAs.
- Supports Kleene Star (*), Concatenation, Alternation (|), and Grouping (parentheses).
- Uses subset construction to eliminate non-determinism.
- Implements epsilon-closure to handle empty transitions.
- Uses frozensets for states to ensure hashable dictionary keys.
- Uses the Hopcroft Algorithm for state minimization.
- Reduces redundant states while maintaining language recognition.
from DFA import DFA
dfa = DFA(S={'0', '1'}, K={'q0', 'q1'}, q0='q0', d={
('q0', '0'): 'q1',
('q0', '1'): 'q0',
('q1', '0'): 'q1',
('q1', '1'): 'q0'
}, F={'q1'})
print(dfa.accept("1010")) # True (Accepted)
print(dfa.accept("1111")) # False (Rejected)
In this project, frozenset is used to represent states in the DFA when converting from an NFA. Since NFAs allow multiple possible states at once (non-determinism), during subset construction, we use frozensets to create unique sets of states that can be used as dictionary keys.
- Hashability: Python dictionaries require immutable keys. Since normal sets are mutable, they cannot be used as dictionary keys, whereas frozensets can.
- Uniqueness: Each unique set of states in an NFA transition corresponds to a single DFA state. Frozensets ensure that equivalent sets are treated as the same state.
- Efficient Lookups: Using frozensets allows quick membership checks and efficient dictionary storage.
from NFA import NFA
# Example subset construction for DFA conversion
def subset_construction(nfa: NFA) -> DFA:
d_q0 = frozenset(nfa.epsilon_closure(nfa.q0)) # Initial DFA state as a frozenset
d_K = {d_q0} # Set of DFA states
d_d = {} # Transition dictionary
d_F = set() # Final states
unprocessed_states = [d_q0]
while unprocessed_states:
curr_state = unprocessed_states.pop()
for character in nfa.S:
next_states = set()
for state in curr_state:
if (state, character) in nfa.d:
next_states.update(nfa.d.get((state, character)))
for next_state in list(next_states):
next_states.update(nfa.epsilon_closure(next_state))
frozen_states = frozenset(next_states) # Use frozenset as DFA state
d_d[(curr_state, character)] = frozen_states
if frozen_states not in d_K:
d_K.add(frozen_states)
unprocessed_states.append(frozen_states)
return DFA(nfa.S, d_K, d_q0, d_d, d_F)
This ensures that the DFA states remain immutable, hashable, and efficiently manageable in Python dictionaries.
- Python 3.12+