Skip to content

Lexer, Parser, and Automaton-based Language Recognizer

Notifications You must be signed in to change notification settings

robertpaulp/LexParse-Automata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LexParse Automata

A Lexer, Parser, and Automaton-based Language Recognizer

Project Overview

This project is developed for the Formal Languages and Automata (LFA) course, implementing a Lexer, Parser, and Automaton-based Recognizer. The goal is to convert regular expressions into NFAs, transform them into DFAs, and use them for language acceptance while also implementing lexical analysis and parsing functionalities.

Objectives

  • Implement lexical analysis using regex-based tokenization.
  • Parse expressions and syntactic structures.
  • Convert regular expressions to NFAs using Thompson’s construction.
  • Implement subset construction to transform NFAs into DFAs.
  • Minimize DFAs using the Hopcroft minimization algorithm.
  • Accept or reject input strings based on the final DFA state.
  • Utilize frozensets for efficient state tracking and dictionary key management.

Project Structure

  • Lexer.py – Tokenizes input using regex-based rules and NFAs.
  • Parser.py – Implements a parser that processes tokenized input.
  • Regex.py – Defines regex operations and transformations into NFAs.
  • NFA.py – Implements Non-Deterministic Finite Automata (NFA) with ε-transitions.
  • DFA.py – Implements Deterministic Finite Automata (DFA) with minimization.

Implementation Details

1. Lexical Analysis (Lexer)

  • Uses Regular Expressions (Regex) to define token specifications.
  • Converts regex patterns into NFAs using Thompson’s Construction.
  • Uses subset construction to transform NFAs into DFAs for token recognition.
from Lexer import Lexer

# Define token specification: (TOKEN_NAME, REGEX_PATTERN)
spec = [
    ("NUMBER", r"[0-9]+"),
    ("PLUS", r"\+"),
    ("MULT", r"\*"),
    ("LPAREN", r"\("),
    ("RPAREN", r"\)"),
]

lexer = Lexer(spec)
tokens = lexer.lex("3 + 5 * (2 + 8)")
print(tokens)  
# Output: [('NUMBER', '3'), ('PLUS', '+'), ('NUMBER', '5'), ('MULT', '*'), ('LPAREN', '('), ('NUMBER', '2'), ('PLUS', '+'), ('NUMBER', '8'), ('RPAREN', ')')]

2. Syntax Parsing (Parser)

  • Parses expressions using operator precedence and parentheses handling.
  • Identifies lambda expressions, arithmetic operations, and variable assignments.
  • Constructs an Abstract Syntax Tree (AST) to represent expressions.
from Parser import Parser

parser = Parser()
result = parser.parse("3 + 5 * (2 + 8)")
print(result)  
# Outputs: Parsed AST representation

3. Finite Automata Operations

(a) Regular Expression to NFA Transformation

  • Uses Thompson’s construction to convert regex into NFAs.
  • Supports Kleene Star (*), Concatenation, Alternation (|), and Grouping (parentheses).

(b) NFA to DFA Conversion

  • Uses subset construction to eliminate non-determinism.
  • Implements epsilon-closure to handle empty transitions.
  • Uses frozensets for states to ensure hashable dictionary keys.

(c) DFA Minimization

  • Uses the Hopcroft Algorithm for state minimization.
  • Reduces redundant states while maintaining language recognition.
from DFA import DFA

dfa = DFA(S={'0', '1'}, K={'q0', 'q1'}, q0='q0', d={
    ('q0', '0'): 'q1',
    ('q0', '1'): 'q0',
    ('q1', '0'): 'q1',
    ('q1', '1'): 'q0'
}, F={'q1'})

print(dfa.accept("1010"))  # True (Accepted)
print(dfa.accept("1111"))  # False (Rejected)

4. Frozensets in DFA/NFA Implementation

In this project, frozenset is used to represent states in the DFA when converting from an NFA. Since NFAs allow multiple possible states at once (non-determinism), during subset construction, we use frozensets to create unique sets of states that can be used as dictionary keys.

Why use frozenset?

  • Hashability: Python dictionaries require immutable keys. Since normal sets are mutable, they cannot be used as dictionary keys, whereas frozensets can.
  • Uniqueness: Each unique set of states in an NFA transition corresponds to a single DFA state. Frozensets ensure that equivalent sets are treated as the same state.
  • Efficient Lookups: Using frozensets allows quick membership checks and efficient dictionary storage.

Example Usage in Subset Construction

from NFA import NFA

# Example subset construction for DFA conversion
def subset_construction(nfa: NFA) -> DFA:
    d_q0 = frozenset(nfa.epsilon_closure(nfa.q0))  # Initial DFA state as a frozenset
    d_K = {d_q0}  # Set of DFA states
    d_d = {}  # Transition dictionary
    d_F = set()  # Final states
    unprocessed_states = [d_q0]

    while unprocessed_states:
        curr_state = unprocessed_states.pop()

        for character in nfa.S:
            next_states = set()
            for state in curr_state:
                if (state, character) in nfa.d:
                    next_states.update(nfa.d.get((state, character)))

            for next_state in list(next_states):
                next_states.update(nfa.epsilon_closure(next_state))

            frozen_states = frozenset(next_states)  # Use frozenset as DFA state
            d_d[(curr_state, character)] = frozen_states

            if frozen_states not in d_K:
                d_K.add(frozen_states)
                unprocessed_states.append(frozen_states)

    return DFA(nfa.S, d_K, d_q0, d_d, d_F)

This ensures that the DFA states remain immutable, hashable, and efficiently manageable in Python dictionaries.


Prerequisites

  • Python 3.12+

Releases

No releases published

Packages

No packages published