Skip to content

Latest commit

 

History

History
137 lines (98 loc) · 4.76 KB

tutorial-part-4.md

File metadata and controls

137 lines (98 loc) · 4.76 KB

Tutorial Part 4: Building utility parsers

Back to part 3

Goal: To be able to build helpful utility parsers that enhance expressivity

arcsecond has a lot of parser combinators to help in building utility parsers, but you can also build your own.

A very common scenario is to build a parser which matches values that are separated by some marker. For example, we might want to build a parser combinator for matching comma separated values:

const { sepBy, char, letters } = require('arcsecond');

const commaSeparated = sepBy(char(','));

const finalParser = commaSeparated(letters);

finalParser.run('big,list,of,words');
// -> Right([ 'big', 'list', 'of', 'words' ])

Here we assigned commaSeparated as a variable, because this is a common task we might want to use to capture many different kinds of data, not only letters. While this is not necessary, it's less expressive if this is just done inline:

const { sepBy, char, letters } = require('arcsecond');

const finalParser = sepBy(char(','))(letters);

finalParser.run('big,list,of,words');
// -> {
//      isError: false,
//      result: [ "big", "list", "of", "words" ],
//      index: 17,
//      data: null
//    }

sepBy is what is known as a curried function, which essentially means it takes one argument at a time, over multiple function calls. It is implemented this way in arcsecond because the major use case is building utility parsers, where you mostly want to supply the second argument later, but it can be confusing if you're not used to it.

We could make this even more flexible (depending on the situation of course!) by allowing optional whitespace:

const {
  sepBy,
  char,
  letters,
  optionalWhitespace,
  sequenceOf,
} = require('arcsecond');

const commaSeparated = sepBy(
  sequenceOf([optionalWhitespace, char(','), optionalWhitespace]),
);

const finalParser = commaSeparated(letters);

finalParser.run('big,       list, of,                        words');
// -> {
//      isError: false,
//      result: [ "big", "list", "of", "words" ],
//      index: 49,
//      data: null
//    }

Another common use case is extracting data that comes between two known endpoints, such as brackets. We can write a parser called betweenBrackets quite easily:

const { between, char } = require('arcsecond');

const betweenBrackets = between(char('('))(char(')'));

const finalParser = betweenBrackets(letters);

finalParser.run('(hello)');
// -> {
//      isError: false,
//      result: "hello",
//      index: 7,
//      data: null
//    }

Writing custom combinators

sepBy is useful in a very general sense, but if you needed a combinator like sepBy, except you wanted to allow a trailing comma at the end, you would have to write it from scratch. Thankfully, these kinds of parsers are easy enough to write yourself in different ways.

const {
  between,
  char,
  sequenceOf,
  possibly,
  coroutine,
  either,
} = require('arcsecond');

const customSepByWithSequenceOf = separatorParser => valueParser =>
  sequenceOf([
    sepBy(separatorParser)(valueParser),
    possibly(separatorParser),
  ]).map(results => results[0]);

const customSepByWithCoroutine = separatorParser => valueParser =>
  coroutine(run => {
    const results = [];

    while (true) {
      const value = run(either(valueParser));

      // We can't parse more values, break from the loop
      if (value.isError) break;

      // Push the captured value into the results array
      results.push(value);

      const sep = run(either(separatorParser));

      // There are no more separators, break from the loop
      if (sep.isError) break;
    }

    // sepBy allows for zero results to be matched, so we will here too.
    // If you need *at least one* result, you should use sepBy1
    return results;
  });

For a full list of the built in parser combinators, you can checkout the API docs. Building a parser of any substantial complexity will definitely require many utility parsers, so it will be worth the time to give them names and work out the range of flexibility each should offer.

Summary

arcsecond has many built in combinators that allow you to easily create your own utility parsers with. All the parser combinators in arcsecond are curried functions, and thus only take a single argument at a time. Sometimes the general parser combinators provided in the library will not be perfectly suited to a specific need, but custom parsers can be created without too much difficulty.

Next: Recursive Structures

In the next section we'll explore how to parse recursive data.