Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping bad tags instead of deleting them #145

Open
P-E-Meunier opened this issue Aug 4, 2021 · 3 comments
Open

Escaping bad tags instead of deleting them #145

P-E-Meunier opened this issue Aug 4, 2021 · 3 comments

Comments

@P-E-Meunier
Copy link

I don't know whether it would make sense from a security point of view, but in some cases users expect "pseudo-tags" like <email> to appear as escaped HTML, for example when giving examples of something.

Would it be possible to escape the < and > for bad tags rather than removing them completely?

@notriddle
Copy link
Member

The problem is that, as far as I know, html5ever doesn't preserve the case of the tag names. You can't distinguish <email> from <EMAIL> after parsing, so you can't reproduce exactly what the user typed.

@palant
Copy link

palant commented Feb 23, 2023

In my case it’s blog comments. HTML is legit in Markdown, so the Markdown parser will leave such pseudo-tags unchanged. It would be great to convert them to text during sanitization.

I understand that things will change after HTML parsing. The “tag” will be lower-cased, whitespace will change. But that should still be preferable compared to outright dropping parts of the text.

@phil-scholarcy
Copy link

This would be a great enhancement. The only reason I'm still using the bleach package is because it has a strip parameter which can be set to False to escape rather than delete tags. Deletion can be dangerous when mathematics are present where < symbol can be mistaken for a tag start, or where we are working the LLM pseudo XML tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants