Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimize nom macros #3

Open
benkay86 opened this issue Jan 10, 2020 · 4 comments
Open

minimize nom macros #3

benkay86 opened this issue Jan 10, 2020 · 4 comments

Comments

@benkay86
Copy link

With nom 5.0 the author, Geoffroy, has soft-deprecated macros in nom. Although macros used to be needed to write concise parsers with nom, as Rust has evolved it is now possible to write equally concise parsers without macros. The advantages of minimizing macro use include more comprehensible error messages and less of a learning curve for newcomers since they do not have to learn and understand nom's niche macro conventions.

I'm interested in contributing to matfile and would prefer to write code using nom methods instead of nom macros. I realize that as the library author you may have strong preferences about this. I've rewritten parse_header() as an example of what parsing would look like with less reliance on nom macros. (I also removed panics and removed the assumption that the host machine is little endian.) Of course, the rewritten version still passes cargo test.

Please let me know what you think. I would be happy to work on the rest of the nom parsing code, but I don't want to waste time if you hate the macro-less syntax.

@dthul
Copy link
Owner

dthul commented Jan 10, 2020

I think it would be great to move to idiomatic nom 5 code! I never liked the macro approach much, since it makes debugging hard and control flow unwieldy to express. So I'm definitely up for replacing the macros with functions.

I took a brief look at your example snippet. I'm not familiar with the new nom yet, but it looks like you can get rid of the try_parse! invocations and instead just call the parse function (so take(8usize)(i)? instead of try_parse!(i, take(8usize))). The endianness check seems to have gotten more complicated than the original version (and is now by itself a single massive nested function call spanning almost 30 lines). I believe that this can be improved.
Overall I like what I see! Thanks for bringing it up! If you want to work on this, please go ahead and I am certain we will be able to merge your changes.

@benkay86
Copy link
Author

Unfortunately try_parse! is still a necessary evil. Ideally you could write:

let (i, o) = some_nom_parser(i)?;

However, for better or worse, nom's error type in IResult does not implement std::error::Error therefore you can't use the ? operator on an IResult. (There is an unstable feature std::ops::Try that gets around this, but it requires compiling with nightly.)

Instead we have the try_result! macro, which expands to something like:

let (i, o) = match some_nom_parser(i) {
    Ok(i, o) => (i, o),
    Err(e) => return Err(e)
};

While I'm obviously in favor of minimizing macro use, try_result! is a simple pattern expansion that won't cause too much trouble with debugging and definitely improves readability of code. The other alternative would be to wrap the entire parser in a nom::sequence::tuple, but that gets hard to read for long sequences.

I updated the branch to lift the logic for endianness conversion out of parse_header(). Unfortunately it doesn't make the endianness check much shorter, but ultimately this is limited by MathWorks's decision to put the u16 version before the endianness indicator in the header instead of after it. If it still bothers you I can try and break out the logic into multiple subparsers.

@dthul
Copy link
Owner

dthul commented Jan 10, 2020

Ideally you could write:

let (i, o) = some_nom_parser(i)?;

That does indeed work, no? As a quick test I compiled your branch and then replaced try_parse!(i, take(8usize)) in src/parse.rs:171 with take(8usize)(i)?. It still compiles.

I think your endianness check got a little more complicated since it introduced a bug(?). le_u16 reads a little endian encoded u16 and already takes care of byte-swapping, so the only thing we need to do is call swap_bytes if the initial assumption of it being a little endian encoded value turned out to be wrong. That only depends on whether the file is little or big endian, it does not depend on whether the host itself is little or big endian.

Based on your changes I pushed another version: https://github.com/dthul/matfile/compare/nom_no_macros#diff-258c0555e6f9c4bc4fba3a551d81638aL117

@benkay86
Copy link
Author

You are right, I was confused about the limitations on ?. I cannot ? nom parsers in a function that returns a dyn Error, but it is perfectly legal to ? nom parsers in a function that returns a nom::IResult. The try_parse! macro is not needed.

Thank you for pointing out my bug with regard to endian byte swapping. I see your way of parsing/verifying the version and endianness uses about the same number of lines, but it is easier to follow because it is broken up into multiple statements rather than one big nested parser.

I will work on converting the remaining methods in parse.rs to macro-less nom and send you a pull request when finished. I will try to stick with the style of splitting nested parsers into multiple statements for readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants