Entity bug #178

ArthaTi · 2017-12-29T13:41:33Z

That's my third issue in a row, sorry if that's annoying. I understand it's not easy to make such a library, I would help if I could...

Here's a simple script this issue occurs in:

p = pq("<span>&lt;foo&gt;&lt;bar&gt;</span>")
print(p("span").html(), p("span").text())
p = pq("<span><b>&lt;foo&gt;</b>&lt;bar&gt;</span>")
print(p("span").html(), p("span").text())

Output:

<foo><bar> <foo><bar>
<b>&lt;foo&gt;</b>&lt;bar&gt; <foo> <bar>

while it should be

&lt;foo&gt;&lt;bar&gt; <foo><bar>
<b>&lt;foo&gt;</b>&lt;bar&gt; <foo><bar>

Basically, if there are entities on the beginning and end of the selected element, then entities are decoded, even in HTML, when they shouldn't... Possible reason? Tried to search for it, but I really don't understand the code... Thanks.

The text was updated successfully, but these errors were encountered:

ArthaTi · 2017-12-29T13:43:38Z

Perhaps it's here:

if not children:
                return tag.text

tag.text is unencoded, while this function shouldn't return unencoded. Maybe just return it encoded?

I would fix it, but I don't see any function here for encoding/decoding entities in the project... how do I do it? I only know that there's one in html built-in library

gawel · 2017-12-29T14:33:04Z

I guess there's no such thing because lxml do that

ArthaTi mentioned this issue Dec 29, 2017

Fix #178 #179

Closed

jcushman mentioned this issue Aug 2, 2021

Escape entities in html() output #221

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entity bug #178

Entity bug #178

ArthaTi commented Dec 29, 2017

ArthaTi commented Dec 29, 2017 •

edited

Loading

gawel commented Dec 29, 2017

Entity bug #178

Entity bug #178

Comments

ArthaTi commented Dec 29, 2017

ArthaTi commented Dec 29, 2017 • edited Loading

gawel commented Dec 29, 2017

ArthaTi commented Dec 29, 2017 •

edited

Loading