Skip to content

Repo for Text Processing code that was the basis for a series of articles on my blog.

Notifications You must be signed in to change notification settings

TristanRhodes/TextProcessing

Repository files navigation

Text Processing

Project Build Status: Project Build Status

Running Solution

git clone https://github.com/TristanRhodes/TextProcessing.git

cd TextProcessing

dotnet test

Multi Language Text Processing

This is a small research project to go along with a series of blog posts where I build a parser from scratch. It covers Regex, two phase parsing (Lexing/Tokenising and Expression Parsing), combinatorial parsers, scannerless parsing and using Sprache. It is written in C# .Net 5.0 with the end result being a simple two phase, multi language expression parser.

Multi Language Parsing

It's now a playground for Lexing / Parsing and text processing in general.

Blog Posts

Using regex to match different string formats of DayOfWeek and ClockTime.

Breaking a longer DayTime string into recognized Day and LocalTime parts.

Use a basic suite of Object Orientated IParser implementations to parse a DayTime string and a number of other simple expressions.

// Separate two part element context => DayTime range
"Pickup Mon 08:00 dropoff wed 17:00"

// Range elements with different separators => Open Days range and Hours Range
"Open Mon to Fri 08:00 - 18:00"

// Repeating tokens => List of tour times
"Tours 10:00 12:00 14:00 17:00 20:00"

// Repeating complex elements => List of event day times
"Events Tuesday 18:00 Wednesday 15:00 Friday 12:00"

Replace all IParser interfaces with Delegates and go all in on functional combinators.

Instead of using an array of pre-parsed Tokens we're going to use the string/Char[] array directly and implement our parser in Sprache.

Parser Implementations

A simple tokeniser and parser system implemented using Interfaces and an Object Orientated style.

A simple tokeniser and parser system implemented using Delegates and monads in a functional style.

An implementation of the demo parsers written in Sprache, the scannerless C# functional parsing library.

About

Repo for Text Processing code that was the basis for a series of articles on my blog.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages