You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know whether it would make sense from a security point of view, but in some cases users expect "pseudo-tags" like <email> to appear as escaped HTML, for example when giving examples of something.
Would it be possible to escape the < and > for bad tags rather than removing them completely?
The text was updated successfully, but these errors were encountered:
The problem is that, as far as I know, html5ever doesn't preserve the case of the tag names. You can't distinguish <email> from <EMAIL> after parsing, so you can't reproduce exactly what the user typed.
In my case it’s blog comments. HTML is legit in Markdown, so the Markdown parser will leave such pseudo-tags unchanged. It would be great to convert them to text during sanitization.
I understand that things will change after HTML parsing. The “tag” will be lower-cased, whitespace will change. But that should still be preferable compared to outright dropping parts of the text.
This would be a great enhancement. The only reason I'm still using the bleach package is because it has a strip parameter which can be set to False to escape rather than delete tags. Deletion can be dangerous when mathematics are present where < symbol can be mistaken for a tag start, or where we are working the LLM pseudo XML tags.
I don't know whether it would make sense from a security point of view, but in some cases users expect "pseudo-tags" like
<email>
to appear as escaped HTML, for example when giving examples of something.Would it be possible to escape the
<
and>
for bad tags rather than removing them completely?The text was updated successfully, but these errors were encountered: