Escaping bad tags instead of deleting them #145

P-E-Meunier · 2021-08-04T10:08:38Z

I don't know whether it would make sense from a security point of view, but in some cases users expect "pseudo-tags" like <email> to appear as escaped HTML, for example when giving examples of something.

Would it be possible to escape the < and > for bad tags rather than removing them completely?

The text was updated successfully, but these errors were encountered:

notriddle · 2021-08-04T14:43:33Z

The problem is that, as far as I know, html5ever doesn't preserve the case of the tag names. You can't distinguish <email> from <EMAIL> after parsing, so you can't reproduce exactly what the user typed.

palant · 2023-02-23T14:49:19Z

In my case it’s blog comments. HTML is legit in Markdown, so the Markdown parser will leave such pseudo-tags unchanged. It would be great to convert them to text during sanitization.

I understand that things will change after HTML parsing. The “tag” will be lower-cased, whitespace will change. But that should still be preferable compared to outright dropping parts of the text.

phil-scholarcy · 2024-08-13T18:34:55Z

This would be a great enhancement. The only reason I'm still using the bleach package is because it has a strip parameter which can be set to False to escape rather than delete tags. Deletion can be dangerous when mathematics are present where < symbol can be mistaken for a tag start, or where we are working the LLM pseudo XML tags.

messense mentioned this issue Aug 14, 2023

Feature goals compared to Bleach (/ full ammonia API)? messense/nh3#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escaping bad tags instead of deleting them #145

Escaping bad tags instead of deleting them #145

P-E-Meunier commented Aug 4, 2021

notriddle commented Aug 4, 2021

palant commented Feb 23, 2023

phil-scholarcy commented Aug 13, 2024

Escaping bad tags instead of deleting them #145

Escaping bad tags instead of deleting them #145

Comments

P-E-Meunier commented Aug 4, 2021

notriddle commented Aug 4, 2021

palant commented Feb 23, 2023

phil-scholarcy commented Aug 13, 2024