This repo contains several version of an algorithm to map newlines to offsets.
Said otherwise, it takes a file as input and returns a vector Offsets
where
Offsets[i]
returns the offset where the i
th line begins.
This algorithm is used (at least) in LLVM.
This repo provides both validation and performance measurement for various versions. The existing implementations are:
ref.cpp
: the reference implementationseq.cpp
: the reference implementation, with some short-circuit and branch hint.seq_memchr.cpp
: same asseq.cpp
with a short path if the file contains no\r
.bithack_scan.cpp
: loads a word at a time and uses bit twiddling for multi-byte word handlingbithack.cpp
: same hasbithack_scan.cpp
with a cheaper fuzzy checksse_align.cpp
: legacy SSE implementaion which enforces alignmentsse.cpp
: SSE implementation without any alignement enforcment.sse_memchr.cpp
: same assse.cpp
with a short path if the file contains no\r
.
from the src directory, run
`
$ make perf
`
to get some performance measurements and
`
$ make check
`
to run the testsuite (well, there's only one test, but you got the idea).