Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect of the specification, error handling and refactoring. #2

Open
nbrr opened this issue Apr 29, 2019 · 1 comment
Open

Respect of the specification, error handling and refactoring. #2

nbrr opened this issue Apr 29, 2019 · 1 comment

Comments

@nbrr
Copy link
Contributor

nbrr commented Apr 29, 2019

After implementing support for Sparse arrays, I have begun to work on support for Structure arrays. I believe it is mostly done but my test case contains char arrays so I need to implement this first. Since I have worked a bit on the code, I have a few comments.

My initial goal is to be able to read matrices from https://sparse.tamu.edu/, so that is where I picked test files. As I was testing I noticed that they don't seem to comply well with the mat file specification.
For example, array name's tag should indicate i8 but I have found it to be tagged as i32 in jgl009.mat, if I am not mistaken (I found it while decomposing the parsing process in order to debug, it's the commented tag3 here. This raises issue while parsing because of the check made on the type here. Note that when this check is disregarded, the name parses just fine as u8. We should rethink this type check even though it matches the specification. It should maybe be noted that Octave parses this file fine.

Second point is error handling. I have not done much error handling with Rust so far, but I feel that nom::ErrorKind::Custom(some int) isn't very clear. In the test case I just mentioned just before I have trouble finding where some assert went wrong. I don't really know whether the error handling that has been implemented so far is good or bad generally speaking, or how it could be improved.

Third comment is that I felt the code could be more organized. Maybe a module with elementary parsers such as numeric types, header, tags; a module with parsers for compound structures; having the content of lib.rs moved to a module; better naming for the two NumericData enums that exist in the library. If you are open to this I will give it more thought.

@dthul
Copy link
Owner

dthul commented Apr 29, 2019

There is a very comprehensive C library that implements (I believe) the full MAT file format and is well tested called matio. I took a look at their code before to figure out some cases that were not sufficiently well described in the official specification document. We can check how they handle character arrays. I'm all for relaxing the parsing checks if it makes files that are actually out there work as long as it doesn't break files that adhere to the specification.

I agree that the error handling situation is pretty bad. If something does not parse successfully, it is very hard to figure out what exactly went wrong. This is a problem of nom's design, since it only allows you to return an error parameterized by an int afaik. There was an announcement from the nom developer about an upcoming new version that is supposed to have much improved error management, so I hope that will come out soon.

Feel free to re-organize the code. Since the internals are not exposed in the public interface anyway, it doesn't hurt if we move those types and functions around. And if you have proposals to change the public interface, go ahead as well, the library is still 0.x after all :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants