Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understand metacharacters and escape character #40

Open
StephenArk30 opened this issue Jul 9, 2020 · 4 comments
Open

Understand metacharacters and escape character #40

StephenArk30 opened this issue Jul 9, 2020 · 4 comments

Comments

@StephenArk30
Copy link

StephenArk30 commented Jul 9, 2020

In most languages, the escape character is '\'. Sequences start with an escape character have special meaning, eg: '\n', '\t'.

Metacharacters are escape characters, but often match a group of characters, eg: '\s' matches any whitespace like '\t', '\n'.

So, when we are matching single comment lines in C, we need to use:

\/\/[^\n]*,

instead of

\/\/[^\\n]*

which won't match a line break but a "\n" string.

@kokke
Copy link
Owner

kokke commented Feb 15, 2021

Hi @StephenArk30 and thanks for your comment

I'm not really sure what you mean. Could you please have a look at https://github.com/kokke/tiny-regex-c/blob/master/tests/test1.c
... and write a few test cases so I'm absolutely sure what we're talking about.

I'm trying to mimic the Python implementation, but the code is in C which just handles escapes like that. Or maybe I'm misunderstanding you?

@marler8997
Copy link

marler8997 commented Mar 8, 2021

@kokke what he's asking for is this:

  { OK, "\\n", "\n",  (char*) 1 },
  { OK, "\\t", "\t",  (char*) 1 },

@StephenArk30
Copy link
Author

StephenArk30 commented Apr 19, 2021

Sorry for the late reply. This is just a note for some beginners who thinks themselves "smart" like me. Not asking for anything.

So the case was I was trying to matching single comment lines in C like "// this is a comment", and I used

\/\/[^\\n]*

because I thought I should keep the '\'. But it didn't work so I created a pr here trying to match "\n". But after some research, I realize it should be

\/\/[^\n]*

instead. The "metacharacters" mentioned in the pr are some characters like '\s' which matches many different characters, so '\n' should not be a metacharacter.

This might be a common mistake beginners will make so I put it here. Closed it if needed.

@marler8997
Copy link

I think this would be a common mistake. The first reason being that most regex implementations support standard escape sequences like \n and \t etc. The second being the confusion that comes with including regex escape characters in C String literals which have to escape the regex escape character, \. Note that tiny-regex-c can support these escape sequences as well with a few lines of changes so it's up to @kokke as to whether it should be supported. The python implementation supports it so that's one thing to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants