LexParse Automata

A Lexer, Parser, and Automaton-based Language Recognizer

Project Overview

This project is developed for the Formal Languages and Automata (LFA) course, implementing a Lexer, Parser, and Automaton-based Recognizer. The goal is to convert regular expressions into NFAs, transform them into DFAs, and use them for language acceptance while also implementing lexical analysis and parsing functionalities.

Objectives

Implement lexical analysis using regex-based tokenization.
Parse expressions and syntactic structures.
Convert regular expressions to NFAs using Thompson’s construction.
Implement subset construction to transform NFAs into DFAs.
Minimize DFAs using the Hopcroft minimization algorithm.
Accept or reject input strings based on the final DFA state.
Utilize frozensets for efficient state tracking and dictionary key management.

Project Structure

Lexer.py – Tokenizes input using regex-based rules and NFAs.
Parser.py – Implements a parser that processes tokenized input.
Regex.py – Defines regex operations and transformations into NFAs.
NFA.py – Implements Non-Deterministic Finite Automata (NFA) with ε-transitions.
DFA.py – Implements Deterministic Finite Automata (DFA) with minimization.

Implementation Details

1. Lexical Analysis (Lexer)

Uses Regular Expressions (Regex) to define token specifications.
Converts regex patterns into NFAs using Thompson’s Construction.
Uses subset construction to transform NFAs into DFAs for token recognition.

from Lexer import Lexer

# Define token specification: (TOKEN_NAME, REGEX_PATTERN)
spec = [
    ("NUMBER", r"[0-9]+"),
    ("PLUS", r"\+"),
    ("MULT", r"\*"),
    ("LPAREN", r"\("),
    ("RPAREN", r"\)"),
]

lexer = Lexer(spec)
tokens = lexer.lex("3 + 5 * (2 + 8)")
print(tokens)  
# Output: [('NUMBER', '3'), ('PLUS', '+'), ('NUMBER', '5'), ('MULT', '*'), ('LPAREN', '('), ('NUMBER', '2'), ('PLUS', '+'), ('NUMBER', '8'), ('RPAREN', ')')]

2. Syntax Parsing (Parser)

Parses expressions using operator precedence and parentheses handling.
Identifies lambda expressions, arithmetic operations, and variable assignments.
Constructs an Abstract Syntax Tree (AST) to represent expressions.

from Parser import Parser

parser = Parser()
result = parser.parse("3 + 5 * (2 + 8)")
print(result)  
# Outputs: Parsed AST representation

3. Finite Automata Operations

(a) Regular Expression to NFA Transformation

Uses Thompson’s construction to convert regex into NFAs.
Supports Kleene Star (*), Concatenation, Alternation (|), and Grouping (parentheses).

(b) NFA to DFA Conversion

Uses subset construction to eliminate non-determinism.
Implements epsilon-closure to handle empty transitions.
Uses frozensets for states to ensure hashable dictionary keys.

(c) DFA Minimization

Uses the Hopcroft Algorithm for state minimization.
Reduces redundant states while maintaining language recognition.

from DFA import DFA

dfa = DFA(S={'0', '1'}, K={'q0', 'q1'}, q0='q0', d={
    ('q0', '0'): 'q1',
    ('q0', '1'): 'q0',
    ('q1', '0'): 'q1',
    ('q1', '1'): 'q0'
}, F={'q1'})

print(dfa.accept("1010"))  # True (Accepted)
print(dfa.accept("1111"))  # False (Rejected)

4. Frozensets in DFA/NFA Implementation

In this project, frozenset is used to represent states in the DFA when converting from an NFA. Since NFAs allow multiple possible states at once (non-determinism), during subset construction, we use frozensets to create unique sets of states that can be used as dictionary keys.

Why use frozenset?

Hashability: Python dictionaries require immutable keys. Since normal sets are mutable, they cannot be used as dictionary keys, whereas frozensets can.
Uniqueness: Each unique set of states in an NFA transition corresponds to a single DFA state. Frozensets ensure that equivalent sets are treated as the same state.
Efficient Lookups: Using frozensets allows quick membership checks and efficient dictionary storage.

Example Usage in Subset Construction

from NFA import NFA

# Example subset construction for DFA conversion
def subset_construction(nfa: NFA) -> DFA:
    d_q0 = frozenset(nfa.epsilon_closure(nfa.q0))  # Initial DFA state as a frozenset
    d_K = {d_q0}  # Set of DFA states
    d_d = {}  # Transition dictionary
    d_F = set()  # Final states
    unprocessed_states = [d_q0]

    while unprocessed_states:
        curr_state = unprocessed_states.pop()

        for character in nfa.S:
            next_states = set()
            for state in curr_state:
                if (state, character) in nfa.d:
                    next_states.update(nfa.d.get((state, character)))

            for next_state in list(next_states):
                next_states.update(nfa.epsilon_closure(next_state))

            frozen_states = frozenset(next_states)  # Use frozenset as DFA state
            d_d[(curr_state, character)] = frozen_states

            if frozen_states not in d_K:
                d_K.add(frozen_states)
                unprocessed_states.append(frozen_states)

    return DFA(nfa.S, d_K, d_q0, d_d, d_F)

This ensures that the DFA states remain immutable, hashable, and efficiently manageable in Python dictionaries.

Prerequisites

Python 3.12+

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README_assignment		README_assignment
src		src
test		test
tests_1		tests_1
tests_2		tests_2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LexParse Automata

Project Overview

Objectives

Project Structure

Implementation Details

1. Lexical Analysis (Lexer)

2. Syntax Parsing (Parser)

3. Finite Automata Operations

(a) Regular Expression to NFA Transformation

(b) NFA to DFA Conversion

(c) DFA Minimization

4. Frozensets in DFA/NFA Implementation

Why use frozenset?

Example Usage in Subset Construction

Prerequisites

About

Releases

Packages

Languages

robertpaulp/LexParse-Automata

Folders and files

Latest commit

History

Repository files navigation

LexParse Automata

Project Overview

Objectives

Project Structure

Implementation Details

1. Lexical Analysis (Lexer)

2. Syntax Parsing (Parser)

3. Finite Automata Operations

(a) Regular Expression to NFA Transformation

(b) NFA to DFA Conversion

(c) DFA Minimization

4. Frozensets in DFA/NFA Implementation

Why use frozenset?

Example Usage in Subset Construction

Prerequisites

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages