This project is a simple regex engine implemented in C. The primary purpose of this project is educational, focusing on learning and implementing key components of a regex compiler and matcher, including:
- Lexer
- Parser
- Abstract Syntax Tree (AST)
- Converting ASTs to NFAs (Non-deterministic Finite Automata)
- Regex compiler
- Regex matcher
- C programming language
- GNU Make for build automation
- Address Sanitizer for memory checks
No external libraries are used in this project.
To build and run this project, you need:
- A C compiler
- GNU Make
Optional:
- Valgrind (for running tests in a memory-checking environment)
-
Clone the repository:
git clone https://github.com/mkpro118/Regex-Engine.git
-
Navigate to the project directory:
cd Regex-Engine
-
Build the project:
make
or
make build
To use the regex engine in your C program:
-
Include the necessary headers:
#include "regex.h"
-
Create a regex object:
Regex* regex = regex_create("your_pattern_here");
-
Match a string against the regex:
bool match = regex_match(regex, "string_to_match");
-
Free the regex object when done:
regex_free(regex); free(regex);
Example:
#include <stdio.h>
#include "regex.h"
int main() {
Regex* regex = regex_create("a(b|c)*d");
if (regex == NULL) {
printf("Failed to create regex\n");
return 1;
}
printf("Match: %s\n", regex_match(regex, "abbbcd") ? "true" : "false");
printf("Match: %s\n", regex_match(regex, "ad") ? "true" : "false");
printf("Match: %s\n", regex_match(regex, "abba") ? "true" : "false");
regex_free(regex);
return 0;
}
-
Basic Regex Operations: Supports basic regex operations including:
- Character matching
- Alternation (
|
) - Kleene star (
*
) - Kleene Plus (
+
) - Optional (
?
) - Grouping with parentheses
-
NFA-based Matching: Uses Non-deterministic Finite Automata (NFA) for pattern matching, allowing for efficient and flexible regex processing.
-
AST Representation: Builds an Abstract Syntax Tree (AST) representation of the regex pattern, which is then converted to an NFA.
-
Epsilon Closure: Implements epsilon closure for NFA transitions, enabling proper handling of epsilon (empty) transitions in the regex.
-
Memory Management: Careful memory management with proper initialization and cleanup functions for all major components (Lexer, Parser, AST, NFA).
-
Portability: Includes portability considerations for different operating systems (Windows, Unix-like systems).
Contributions to this project are welcome. You can contribute by:
- Submitting issues for bugs or feature requests
- Creating pull requests for proposed changes or improvements
Please ensure that your contributions align with the project's goals and coding standards.
This project is licensed under the MIT License. See the LICENSE file for details.
A comprehensive test suite for the regex engine can be found in the tests directory. Tests can be run using either ASan or Valgrind to monitor program memory.
Running
make test
will run the tests twice, first with ASan, then with valgrind.
Optionally, you can run the tests only using one of those tool using the following commands
make test_asan # Only run under ASan
make test_valgrind # Only run under Valgrind
This project was developed using a Ubuntu 22.04 LTS environment within a Docker container.
The image used was mkpro118/mk-sandbox:latest,
the Dockerfile for that can be found at the mkpro118/mk-sandbox repository.
While it should work on other Unix-like systems, it has been primarily tested in this environment.
Note: The development was done within a Docker container for convenience, but Docker is not required to build or run this project.
For any questions or further information about this project, please open an issue in the project repository.