Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't show base64 data to user #92

Open
kmike opened this issue Aug 15, 2016 · 12 comments
Open

don't show base64 data to user #92

kmike opened this issue Aug 15, 2016 · 12 comments

Comments

@kmike
Copy link

kmike commented Aug 15, 2016

Hey,

I started to use base64-encoded HAR content recently - it is not possible to guarantee that content can be passed in JSON otherwise, even for content with html or json mime types. HTML can use encoding other than utf-8, and even data which is sent with application/json content-type can be binary if server wants.

But this switch to 'base64 by default' makes it less easy for harviewer: e.g. for HTML both 'Response' and 'HTML' tabs display base64-encoded data. 'Highlighted' gets a decoded version, but for large HTML pages it is very slow. There is a similar issue for JSON files: 'Response' tab displays confusing base64-encoded data. 'Response' tab for images also shows base64 version of the binary data.

I think it is better to either remove tabs with base64-encoded data, or to try decoding it more aggresively. I'm not sure what's the use case for showing base64 to user; user may think it is a bug (which I think happened already for Splash). Also, there is no visual distinction between base64-encoded data and non-base64-encoded data, so e.g. a true base64 response will look the same as a HTML response which HAR generating software encoded to base64 in order to store without data loss.

@gitgrimbo
Copy link
Collaborator

Hi, do you have an example HAR?

@kmike
Copy link
Author

kmike commented Aug 15, 2016

Yep! habr.ru.har.zip

@gitgrimbo
Copy link
Collaborator

Thanks. How was this HAR generated? I don't recognise the following browser ...

    "browser": {
        "comment": "PyQt 5.5.1, Qt 5.5.1",
        "name": "QWebKit",
        "version": "538.1"
    },

or User Agent header:

Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/538.1 (KHTML, like Gecko) server.py Safari/538.1

@kmike
Copy link
Author

kmike commented Aug 16, 2016

@gitgrimbo it was generated using Splash. Splash uses HAR as a data export format; it also embeds harviewer in a script debugging page.

@kmike
Copy link
Author

kmike commented Aug 16, 2016

ui

@gitgrimbo
Copy link
Collaborator

gitgrimbo commented Aug 16, 2016

Thanks. I see what you mean. Pasting image here for reference.

First row shows a base64-encoded HTML response in the Response tab. Second row shows a similar HTML response, but decoded in the Highlighted tab.

harviewer-92-response-and-highlighted-tabs


To discuss a couple of your points:

  • The Response tab currently shows raw content. I'm not sure this default behaviour should be changed in case the user wants to see this raw content, but perhaps a Decoded Response tab could be added, or a Decode button placed in the Response tab.
  • Yes, the Syntax Highlighting can be slow using the current implementation (Syntax Highlighter 3.0.83). When I upgraded to 3.0.83 I considered using a different implementation, but I played safe. If I get time I'll take a look at some alternatives. On the plus side, HAR Viewer didn't originally have a separate Highlighted tab, and so the user had to pay the cost of slow highlighting all the time. Now at least you have the option not to click on the Highlighted tab to avoid the slowdown.

@kmike
Copy link
Author

kmike commented Aug 16, 2016

Thanks for looking at it!

Regarding Response tab: currently it doesn't show raw content of the webpage or raw response content, it shows data stored in HAR JSON as-is. This is not the same as raw response content because 'encoding' HAR argument is not handled (see http://www.softwareishard.com/blog/har-12-spec/#content). This is useful for debugging HAR files, but not for debugging received responses. This base64 encoding is a technical detail of how the data is stored in HAR, not something specific to a website. That's why I think Response tab should show response content; currently it doesn't show it.

@gitgrimbo
Copy link
Collaborator

Yeah I think you're right. So is it true that whenever the encoding field of content is present in a HAR, it should always be decoded (as the encoding was only added by the HAR-creator, and had nothing to do with the original response)?

I'm trying to think of any exception to that rule.

@kmike
Copy link
Author

kmike commented Aug 16, 2016

Yeah, I think it is good to always decode text if encoding is present. The exception could be unknown encoding (only base64 is mentioned is standard). Another tricky case is binary (or any non-utf8) data; it is not clear how to show it in a decoded form.

@gitgrimbo
Copy link
Collaborator

Hi @kmike, I've uploaded this branch for you to try, http://gitgrimbo.github.io/harviewer/issue-92/.

It simply tries to decode every HAR entry for the Response tab.

It does the right thing for the first two HTML entries in your example HAR. But the third seems to have charset issues; the title displays as follows:

harviewer-92-char-encoding-title

And now images and other binary files are also shown in their decoded raw state. I'm not sure if this is a good or bad thing to be honest.

If you could take a look I'd appreciate it, and maybe think of any reasons why every entry should not be decoded in this way as I'm not sure I've thought about all the possibilities here.

@gitgrimbo
Copy link
Collaborator

Using the tips from here, https://developer.mozilla.org/en/docs/Web/API/WindowBase64/Base64_encoding_and_decoding, I think the issue is a UTF8/UTF16 thing.

Following the tips the text now renders correctly.

harviewer-92-char-encoding-title-2

@hydrargyrum
Copy link

Images should probably be encoded in base64 too. Is there a way to view the decoded base64 with the appropriate type? i.e. if it's a base64-ed image, let the user view the image, if it's base64-ed HTML, let the user view the resulting (decoded) page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants