-
Notifications
You must be signed in to change notification settings - Fork 2
simple regex implementation for c, based on articles by Russ Cox: http://swtch.com/~rsc/regexp/regexp2.html
lukesandberg/Regex
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project was designed to learn more about finite state machines and in general good c design. 12/12/2010 there are plenty of features yet to implement, bracket expressions, fast DFA searching and a variety of special cases for when we don't need submatch boundaries. 11/18/2010 A lot of refactoring has gone into the compiler and a little bit has gone into the parser. I will be focusing on simplifying the parser error handling and then the next major piece will be bracket expressions and character class handling. I also have an idea for changing the regex struct to eliminate a call to malloc. Also i will be providing a few helper procedures and macros to encapsulate the pattern of malloc -> test -> initialize pattern. this should also allow me to better test error handling in low memory situations. (i.e. clean up state and return an appropriate error without leaking memory) 11/14/2010 I am very confident about counted repetition expressions at this point. There are a number of passing tests that pretty well test the feature. Next the parser will be refactored to decrease memory usage and recursive depth. Currently we essentially copy the whole string as tokens to read it backwards. This has overly complicated the parsing process so i think i will be switching to a forward parse technique storing a stack of already parsed sub expressions. This should eliminate almost all recursion as well as allow for easier additions to the grammar. 11/11/2010 after a long break...there has been many changes * a header file for exported symbols "src/re.h" if you want to use the project you should include that file and link with the librar * building the re code as a library libre.a * simplified makefiles with seperate makefiles for the tests and the lib * implemented counted repetition support, corresponding tests need to be added * switched from using instruction tags to test if we have already executed an instruction to explicitly tracking the instructions executed at each step with a sparse_set * added the corresponding sparse set implementation * refactored the lexer somewhat... more work to be done * and about 5 million more refactorings 9/12/2010 captures are fully implemented and working, they (as with everything) need more tests. Copious interactive testing (via ./testre c...) so i feel pretty good about it. At this point the major pieces to add would be counted repetitions (e.g. e{n,m}). im not sure what the best way to do this would be. we could associate additional information with a thread so it could track the number of outstanding repetitions. but we might need to store a large number of repetition counters concurrently, so how do we bound the registers to hold it? you only need the number of iteration indices as the deepest nesting of counted rep expressions. so it would likely be a small number per thread, plus we could calculate it during compilation (just like we count capture regs). This would add extra copying and allocation overhead to our execution engine. (extra calls to memcpy). but maybe by just having a generic register array (either char or ins ptrs) we could minimize additional overhead at the expense of a little type safety. 9/1/2010 lots of tests added, playing around with fuzz testings (sending random data to the engine to try to crash it). 8/16/2010 rewrote the parser 3-4 times :) it seems like * and basic character classes are working, some tests are failing and many more tests need to be written to confirm the operation of the parser. 7/25/2010 first major bit of functionality, direct character matching and '.' wildcard support. also an extremely simple testing framework was added along with a 'make test' target. -Luke Sandberg
About
simple regex implementation for c, based on articles by Russ Cox: http://swtch.com/~rsc/regexp/regexp2.html
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published