Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: String literals in MLIR are not as permissive as those in C #468

Closed
pgoodman opened this issue Dec 15, 2023 · 7 comments · Fixed by #467
Closed

[Bug]: String literals in MLIR are not as permissive as those in C #468

pgoodman opened this issue Dec 15, 2023 · 7 comments · Fixed by #467
Labels
bug Something isn't working

Comments

@pgoodman
Copy link
Contributor

VAST version

master

LLVM version

17

Operating system

All

Description

String literals are more permissive in C. Especially those found in things like __attribute__((section("..."))), which are "unevaluated" string literals. C string literals support more escape formats than those supported by MLIR's string literals. Unevaluated C string literals can contain arbitrary / made up escape strings/formats.

VAST should introduce additional escape characters into the string literals where necessary.

Steps to Reproduce

const char * x = "\r"; should probably do it. Unfortunately MLIR's verifier doesn't catch this, so you need to round-trip and parse it.

@pgoodman pgoodman added the bug Something isn't working label Dec 15, 2023
@xlauko
Copy link
Member

xlauko commented Dec 15, 2023

This should be fixed in LLVM 18. @Jezurko can you check this?

@Jezurko
Copy link
Collaborator

Jezurko commented Dec 15, 2023

This should only happen in operations/types that use a StringRefParameter(e.g.: the type created by typeof). And it happens because this hasn't been merged into an llvm release yet.
But right now both of the cases you mentioned should be handled by our own core::StringLiteralAttr* and should work properly (and at least for me it did on current master branch).

I think it's preferable to wait until the fix gets into a release, as we would be duplicating logic while bringing more complexity to our TableGen code.

*soon to be replaced by mlir::StringAttr that also handles it and we don't need our own String attribute anymore.

@lkorenc
Copy link
Contributor

lkorenc commented Dec 15, 2023

"Should probably do it" is not exactly a step to reproduce, please next time provide proper steps.

@Jezurko
Copy link
Collaborator

Jezurko commented Dec 15, 2023

Just in case I didn't understand you properly - try it with #467 which is removing the core::StringLiteralAttr

@pgoodman
Copy link
Contributor Author

pgoodman commented Dec 16, 2023

Just in case I didn't understand you properly - try it with #467 which is removing the core::StringLiteralAttr

I will test this branch once it is merge to master. The issue I observed is in this MLIR function:

/// Lex a string literal.
///
///   string-literal ::= '"' [^"\n\f\v\r]* '"'
///
/// TODO: define escaping rules.
Token Lexer::lexString(const char *tokStart) {

Found here. I figured the issue was actually in the pretty printers, i.e. that when pretty printing string literals, an extra add slashes step would be needed, so that MLIR's string literal lexer would not complain.

@pgoodman
Copy link
Contributor Author

pgoodman commented Dec 16, 2023

const char * x = "\r"; should probably do it. Unfortunately MLIR's verifier doesn't catch this, so you need to round-trip and parse it.

@lkorenc, @Jezurko I've now confirmed that my suggested example is indeed enough to reproduce this issue. I have not yet tested against #467.

Again, the key is that the relevant error message (shown below) only manifests when parsing the human-readable MLIR, and is not caught by the MLIR verifier.

Example error:

loc("-":66:33): error: unknown escape in string literal

@Jezurko
Copy link
Collaborator

Jezurko commented Dec 17, 2023

@pgoodman sorry, my fault. I now reproduced the issue on master and can also confirm that it is fixed with the PR (because the LLVM/MLIR printer uses the hex codes instead of the "special" characters - and therefore it can also parse it)

@Jezurko Jezurko linked a pull request Dec 17, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants