-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
74 lines (62 loc) · 3.67 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
This project was designed to learn more about finite state machines
and in general good c design.
12/12/2010
there are plenty of features yet to implement, bracket expressions, fast DFA searching and
a variety of special cases for when we don't need submatch boundaries.
11/18/2010
A lot of refactoring has gone into the compiler and a little bit has gone into the
parser. I will be focusing on simplifying the parser error handling and then the
next major piece will be bracket expressions and character class handling. I also
have an idea for changing the regex struct to eliminate a call to malloc. Also i
will be providing a few helper procedures and macros to encapsulate the pattern of
malloc -> test -> initialize pattern. this should also allow me to better test
error handling in low memory situations. (i.e. clean up state and return an appropriate
error without leaking memory)
11/14/2010
I am very confident about counted repetition expressions at this point. There are
a number of passing tests that pretty well test the feature. Next the parser will be
refactored to decrease memory usage and recursive depth. Currently we essentially copy
the whole string as tokens to read it backwards. This has overly complicated the parsing
process so i think i will be switching to a forward parse technique storing a stack of
already parsed sub expressions. This should eliminate almost all recursion as well as
allow for easier additions to the grammar.
11/11/2010
after a long break...there has been many changes
* a header file for exported symbols "src/re.h" if you want to use the project you should
include that file and link with the librar
* building the re code as a library libre.a
* simplified makefiles with seperate makefiles for the tests and the lib
* implemented counted repetition support, corresponding tests need to be added
* switched from using instruction tags to test if we have already executed an instruction
to explicitly tracking the instructions executed at each step with a sparse_set
* added the corresponding sparse set implementation
* refactored the lexer somewhat... more work to be done
* and about 5 million more refactorings
9/12/2010
captures are fully implemented and working, they (as with everything) need more
tests. Copious interactive testing (via ./testre c...) so i feel pretty good
about it. At this point the major pieces to add would be counted repetitions
(e.g. e{n,m}). im not sure what the best way to do this would be. we could
associate additional information with a thread so it could track the
number of outstanding repetitions. but we might need to store a large number
of repetition counters concurrently, so how do we bound the registers to hold
it?
you only need the number of iteration indices as the deepest nesting of counted
rep expressions. so it would likely be a small number per thread, plus we could
calculate it during compilation (just like we count capture regs). This would
add extra copying and allocation overhead to our execution engine. (extra calls to
memcpy). but maybe by just having a generic register array (either char or ins ptrs)
we could minimize additional overhead at the expense of a little type safety.
9/1/2010
lots of tests added, playing around with fuzz testings (sending random data to
the engine to try to crash it).
8/16/2010
rewrote the parser 3-4 times :)
it seems like * and basic character classes are working, some tests are failing
and many more tests need to be written to confirm the operation of the parser.
7/25/2010
first major bit of functionality, direct character matching and '.'
wildcard support.
also an extremely simple testing framework was added along with a
'make test' target.
-Luke Sandberg