Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bookmark Base64 encoding #182

Open
jjon opened this issue Feb 21, 2017 · 2 comments
Open

Bookmark Base64 encoding #182

jjon opened this issue Feb 21, 2017 · 2 comments

Comments

@jjon
Copy link

jjon commented Feb 21, 2017

Here's a corner case bug, but I'm not sure where, exactly, it lies.

When integrating an Exhibit into a WordPress site, I discovered that the URL generated by the "bookmark" function merely opened the page with the default set of data items, ignoring state. Tom Woodward, on the simile-widgets list very helpfully pointed out that the Base64 payload of the generated URL was corrupted.

Exhibit.History.getState() retrieves an object with a title property. When the dataset has been filtered, the title property is a string comprised of a page title followed by a string subtitle generated by Exhibit.History.pushState()

title += " {" + subtitle + "}"; (line 235 in history.js)

In WordPress, that title string is a concatenation of the page template's "slug" and the site name. By some off-stage php chicanery, these are concatenated with a separator which is an en dash (\u2013). It is this character (as well as the em dash (\u2014)) that causes Bookmark.generateBookmarkHash(state) to produce a corrupted base64 string. When a browser tries to interpret the URL so generated, it simply ignores the corrupted payload, and loads the default page and dataset.

Working at the browser console, I observe the following:

state = Exhibit.History.getState()
Object {normalized: true, title: "Collection Exhibit – Ocean Acidification Curriculum Collection {Text search foo}", url: "http://www.oacurriculumcollection.org/collection-exhibit/", hash: ".//collection-exhibit/?&_suid=148763896047903145240874606472", data: Object…}

Note the en dash in the title. Then if we generate the Base64 string for the URL and decode it we get gibberish:

Base64.decode(Exhibit.Bookmark.generateBookmarkHash(state))
"{"normalized":true,"title":"Collection Exhibit$È�ØÙX[��XÚY�Y�XØ]�[Û��Ý\��XÝ[�[H�ÛÛ��XÝ�[Û����\����������ËÝÝÝË�ØXÝ\��XÝ[�[XÛÛ��XÝ�[Û��Ü�ËØÛÛ��XÝ�[Û�Y^��X�]�È����\Ú�����ËØÛÛ��XÝ�[Û�Y^��X�]�ÏÉ�ÜÝZY�LM�
Í�Í�LÍMMM��NL�Ì�NLÌL�M��È����]�H��È�ÛÛ\�Û�[��È��ßK��Ý�]�H��N_K��Y����M�
Í�Í�LÍMMM��NL�Ì�NLÌL�M��È���Û�X[�\����������ËÝÝÝË�ØXÝ\��XÝ[�[XÛÛ��XÝ�[Û��Ü�ËØÛÛ��XÝ�[Û�Y^��X�]�È����\Ú�Y�\����������ËÝÝÝË�ØXÝ\��XÝ[�[XÛÛ��XÝ�[Û��Ü�ËØÛÛ��XÝ�[Û�Y^��X�]�ËØÛÛ��XÝ�[Û�Y^��X�]�ÏÉ�ÜÝZY�LM�
Í�Í�LÍMMM��NL�Ì�NLÌL�M��È��"

If we then alter the title property of the state object thus:

state.title = state.title.replace("\u2013", "--")
"Collection Exhibit -- Ocean Acidification Curriculum Collection {Text search foo}"

Then do encode/decode as before:

Base64.decode(Exhibit.Bookmark.generateBookmarkHash(state))
"{"normalized":true,"title":"Collection Exhibit -- Ocean Acidification Curriculum Collection {Text search foo}","url":"http://www.oacurriculumcollection.org/collection-exhibit/","hash":".//collection-exhibit/?&_suid=148763731640905208773945768443","data":{"components":{"facet-text--default-0":{"type":"facet","state":{"text":"foo"}}},"state":61,"lengthy":true},"id":"148763731640905208773945768443","cleanUrl":"http://www.oacurriculumcollection.org/collection-exhibit/","hashedUrl":"http://www.oacurriculumcollection.org/collection-exhibit//collection-exhibit/?&_suid=148763731640905208773945768443"}"

We get the uncorrupted JSON string we need for the bookmark URL. My work-around for this is crude, but effective. I simply execute document.title = document.title.replace(/\u2013/, "--"); in an onLoad function, and all is well. But, I found it strange that ONLY \u2013 and \u2014 will corrupt the JSON string in response to Exhibit.Bookmark.generateBookmarkHash(state). So far as I can tell, literally ANY other character will work, whether ascii or not. Is the Base64 function at fault?

Anyway, not exactly crucial, inasmuch as there's an easy fix, but puzzling nonetheless.

@jjon
Copy link
Author

jjon commented Feb 24, 2017

So, yes. After a little further experimentation it becomes clear that the Base64 methods are giving incorrect results for em dash and en dash. Using those methods (from http://api.simile-widgets.org/exhibit/STABLE/lib/base64.js) at the chrome console I get the following results:

Base64.encode('—') // em dash
"A=="
Base64.encode('–') // en dash
"w=="
Base64.encode('-') // hyphen
"LQ=="

Whereas, using the python base64 module, I get this:

>>> base64.b64encode('—') # em dash
'4oCU'
>>> base64.b64encode('–') # en dash
'4oCT'
>>> base64.b64encode('-') # hyphen
'LQ=='

Unfortunately, I don't know nearly enough about the bitwise manipulation of strings to offer a solution.

j

@jjon
Copy link
Author

jjon commented Apr 10, 2017

Hmm. I guess nobody wanted to embarrass me by pointing out that base64 is for encoding 8-bit characters! So, there's nothing at all wrong with the base64 methods. My problem is thus not a bug in Exhibit; however, it does seem that Exhibit.History.init should be armored against this sort of thing. It seems like if Exhibit.Bookmark.generateBookmarkHash is going to return base64, then the title property of the state object should be sanitized somewhere along the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant