Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HTML DEL and INS elements. #27

Open
Zegnat opened this issue Apr 7, 2013 · 0 comments
Open

Support HTML DEL and INS elements. #27

Zegnat opened this issue Apr 7, 2013 · 0 comments

Comments

@Zegnat
Copy link

Zegnat commented Apr 7, 2013

Noticed this when I was using the Pismo powered ‘entry text extraction’ on Feedbin.

>> Pismo['http://hsivonen.iki.fi/accept-charset/'].lede
=> "Accept-Charset Is No More. Now that Firefox 10 has been released, the Accept-Charset HTTP header. During the Firefox 4 development cycle, I noticed that IE and Safari were not sending the Accept-Charset HTTP header in their HTTP requests. This meant that the Web had to work even without browser sending that header." 

The first sentence given by Pismo:

Now that Firefox 10 has been released, the Accept-Charset HTTP header.

Comes from the following HTML:

<p>Now that Firefox 10 has been released, <del>none of the major browsers send</del> <ins>only Chrome sends</ins> the <code>Accept-Charset</code> HTTP header.</p>

If anything I would have expected Pismo to drop the DEL elements but keep the INS elements like so:

Now that Firefox 10 has been released, only Chrome sends the Accept-Charset HTTP header.

Even the html_body does not return these tags. This means possible important parts of a document can go missing. See this return, edited to only show the first paragraph:

>> Pismo['http://hsivonen.iki.fi/accept-charset/'].html_body
 => "Accept-Charset Is No More<p>Now that Firefox 10 has been released,   the <code>Accept-Charset</code> HTTP header.</p>\n\n"

Please support the DEL and INS elements by:

  1. Drop the DEL and its content from lede and body but keep the content of INS in both.
  2. Keep the DEL and INS elements and their content in html_body.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant