Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commands JSON content mangled #323

Open
gagern opened this issue Feb 19, 2023 · 3 comments
Open

Commands JSON content mangled #323

gagern opened this issue Feb 19, 2023 · 3 comments

Comments

@gagern
Copy link

gagern commented Feb 19, 2023

I've been doing large-scale drawing commands with several thousands of move-to and line-to commands. I'm regularly observing problems where the notebook UI doesn't reflect these draws, and the developer tools complain about invalid JSON.

Steps to reproduce and debug:

  1. Launch new notebook.
  2. Open developer tools and start capturing network traffic.
  3. Re-load browser page just to make sure we get the web socket connection from the start.
  4. Enable pause on exception in developer tools
  5. from ipycanvas import Canvas, hold_canvas
    import ipycanvas.utils
    import ipycanvas.canvas
    import random
  6. canvas = Canvas(width=100, height=100)
  7. rnd = random.Random(12345)
    with hold_canvas():
        for _ in range(2**16):
            canvas.begin_path()
            canvas.move_to(rnd.uniform(0, 100), rnd.uniform(0, 100))
            canvas.line_to(rnd.uniform(0, 100), rnd.uniform(0, 100))
            canvas.stroke()
        commands = ipycanvas.canvas._CANVAS_MANAGER._commands_cache
        metadata, buffer = ipycanvas.utils.commands_to_buffer(commands)
  8. Check developer tools error console for indications of a JSON.parse error.
  9. with open("/tmp/buf.json", "w") as f:
        f.write(bytes(buffer).decode("utf-8").replace("[16, [], 0], [14", "[16, [], 0],\n[14"))
  10. Find the huge (several megabytes) web socket frame in developer tools, save that (which for me meant copying the content as Base64 and then decoding the copied content), strip everything before the JSON content, introduce line breaks there as well for easier processing.
  11. Run a diff between the buf.json above and the intercepted web socket traffic.

Expected behavior:
No errors reported, JSON syntax fine, draw commands get executed.

Actual error:
Exception raised in the code below:

https://github.com/martinRenou/ipycanvas/blob/d9ee0a16cf0011978b5a5eedb165dbca564eb8c4/src/widget.ts#L155-L157

SyntaxError: Unexpected non-whitespace character after JSON at position 1096865

On different snippets of code I've seen other exceptions as well, e.g. indicating that the parser expected ] or , instead of a digit.

Further investigation:

The content I constructed from within Python and the content I captured on the web socket both have equal length. But the content doesn't match. At first I had assumed a reordering of content, but on closer inspection it looks like some content gets sent twice, while other doesn't get sent at all.

For my one test run the first line of difference was this:

--- buf.json
+++ ws.json
 ⋮
 [14, [], 0], [19, [35.176595290003085, 16.01226606715308], 0], [20, [25.897973500979454, 98.51700805701621], 0], [16, [], 0],
+[14, [], 0], [19, [21.970374712094465, 30.82521070114561], 0], [20, [31.038972969752663,0, [78.90780014104607, 96.29247878018113], 0], [16, [], 0],
+[14, [], 0], [19, [97.26632347104481, 53.73183580212182], 0], [20, [72.74444557067314, 8.130720637561616], 0], [16, [], 0],
+
+[14, [], 0], [19, [35.176595290003085, 16.01226606715308], 0], [20, [25.897973500979454, 98.51700805701621], 0], [16, [], 0],
 [14, [], 0], [19, [21.970374712094465, 30.82521070114561], 0], [20, [31.038972969752663, 21.562431158684813], 0], [16, [], 0],
 [14, [], 0], [19, [85.60624494666325, 49.70005040877674], 0], [20, [4.4617710682454454, 72.45010175547452], 0], [16, [], 0],
 ⋮

You can see how the web socket data may be valid JSON at this point but is already violating the nesting structure of the original content. The grep below picks 3 specific numbers occurring around the point cited above. It shows how the original data contains each number exactly once, but the web socket capture has multiple repeats of these, in different constellations in places that break the structure.

$ grep -C2 -n -E '31.038972969752663|78.90780014104607|4.4617710682454454,' buf.json ws.json
buf.json-875-[14, [], 0], [19, [57.89785234089263, 67.03980026780665], 0], [20, [24.997912597471895, 75.83357144770898], 0], [16, [], 0],
buf.json-876-[14, [], 0], [19, [44.54321372714283, 25.339244195875278], 0], [20, [16.043099302157458, 75.28619022131002], 0], [16, [], 0],
buf.json:877:[14, [], 0], [19, [92.14354329352402, 87.42295702996742], 0], [20, [78.90780014104607, 96.29247878018113], 0], [16, [], 0],
buf.json-878-[14, [], 0], [19, [97.26632347104481, 53.73183580212182], 0], [20, [72.74444557067314, 8.130720637561616], 0], [16, [], 0],
buf.json-879-[14, [], 0], [19, [3.098187139223685, 10.997478677049733], 0], [20, [62.543320535053205, 94.9303938364831], 0], [16, [], 0],
--
buf.json-2268-[14, [], 0], [19, [72.19717694940098, 75.22373944538988], 0], [20, [10.131354956028094, 96.9036194235527], 0], [16, [], 0],
buf.json-2269-[14, [], 0], [19, [35.176595290003085, 16.01226606715308], 0], [20, [25.897973500979454, 98.51700805701621], 0], [16, [], 0],
buf.json:2270:[14, [], 0], [19, [21.970374712094465, 30.82521070114561], 0], [20, [31.038972969752663, 21.562431158684813], 0], [16, [], 0],
buf.json:2271:[14, [], 0], [19, [85.60624494666325, 49.70005040877674], 0], [20, [4.4617710682454454, 72.45010175547452], 0], [16, [], 0],
buf.json-2272-[14, [], 0], [19, [20.85166532755719, 17.106070345044323], 0], [20, [98.05104645164904, 8.493319730817461], 0], [16, [], 0],
buf.json-2273-[14, [], 0], [19, [54.26381574438427, 60.01339406798003], 0], [20, [53.91568921485662, 93.92600643635562], 0], [16, [], 0],
--
ws.json-875-[14, [], 0], [19, [57.89785234089263, 67.03980026780665], 0], [20, [24.997912597471895, 75.83357144770898], 0], [16, [], 0],
ws.json-876-[14, [], 0], [19, [44.54321372714283, 25.339244195875278], 0], [20, [16.043099302157458, 75.28619022131002], 0], [16, [], 0],
ws.json:877:[14, [], 0], [19, [92.14354329352402, 87.42295702996742], 0], [20, [78.90780014104607, 96.29247878018113], 0], [16, [], 0],
ws.json-878-[14, [], 0], [19, [97.26632347104481, 53.73183580212182], 0], [20, [72.74444557067314, 8.130720637561616], 0], [16, [], 0],
ws.json-879-[14, [], 0], [19, [3.098187139223685, 10.997478677049733], 0], [20, [62.543320535053205, 94.9303938364831], 0], [16, [], 0],
--
ws.json-2268-[14, [], 0], [19, [72.19717694940098, 75.22373944538988], 0], [20, [10.131354956028094, 96.9036194235527], 0], [16, [], 0],
ws.json-2269-[14, [], 0], [19, [35.176595290003085, 16.01226606715308], 0], [20, [25.897973500979454, 98.51700805701621], 0], [16, [], 0],
ws.json:2270:[14, [], 0], [19, [21.970374712094465, 30.82521070114561], 0], [20, [31.038972969752663,0, [78.90780014104607, 96.29247878018113], 0], [16, [], 0],
ws.json-2271-[14, [], 0], [19, [97.26632347104481, 53.73183580212182], 0], [20, [72.74444557067314, 8.130720637561616], 0], [16, [], 0],
ws.json-2272-[14, [], 0], [19, [3.098187139223685, 10.997478677049733], 0], [20, [62.543320535053205, 94.9303938364831], 0], [16, [], 0],
--
ws.json-3661-[14, [], 0], [19, [72.19717694940098, 75.22373944538988], 0], [20, [10.131354956028094, 96.9036194235527], 0], [16, [], 0],
ws.json-3662-[14, [], 0], [19, [35.176595290003085, 16.01226606715308], 0], [20, [25.897973500979454, 98.51700805701621], 0], [16, [], 0],
ws.json:3663:[14, [], 0], [19, [21.970374712094465, 30.82521070114561], 0], [20, [31.038972969752663, 21.562431158684813], 0], [16, [], 0],
ws.json:3664:[14, [], 0], [19, [85.60624494666325, 49.70005040877674], 0], [20, [4.4617710682454454, 72.45010175547452], 0], [16, [], 0],
ws.json-3665-[14, [], 0], [19, [20.85166532755719, 17.106070345044323], 0], [20, [98.05104645164904, 8.493319730817461], 0], [16, [], 0],
ws.json-3666-[14, [], 0], [19, [54.26381574438427, 60.01339406798003], 0], [20, [53.91568921485662, 93.92600643635562], 0], [16, [], 0],
--
ws.json-6796-[14, [], 0], [19, [5.170562257013877, 31.23427393274617], 0], [20, [11.694492047225147, 24.077429785430603], 0], [16, [], 0],
ws.json-6797-[14, [], 0], [19, [55.56866315210074, 36.02872560358343], 0], [20, [80.88981318134164, 66.01051915584057], 0], [16, [], 0], [ 21.562431158684813], 0], [16, [], 0],
ws.json:6798:[14, [], 0], [19, [85.60624494666325, 49.70005040877674], 0], [20, [4.4617710682454454, 72.45010175547452], 0], [16, [], 0],
ws.json-6799-[14, [], 0], [19, [20.85166532755719, 17.106070345044323], 0], [20, [98.05104645164904, 8.493319730817461], 0], [16, [], 0],
ws.json-6800-[14, [], 0], [19, [54.26381574438427, 60.01339406798003], 0], [20, [53.91568921485662, 93.92600643635562], 0], [16, [], 0],

My environment:

I'm observing this from a virtualenv with the following packages installed (according to their *.dist-info directories):

anyio-3.6.2 argon2_cffi-21.3.0 argon2_cffi_bindings-21.2.0 arrow-1.2.3 asttokens-2.2.1 attrs-22.2.0 backcall-0.2.0 beautifulsoup4-4.11.2 bleach-6.0.0 cffi-1.15.1 comm-0.1.2 debugpy-1.6.6 decorator-5.1.1 defusedxml-0.7.1 executing-1.2.0 fastjsonschema-2.16.2 fqdn-1.5.1 idna-3.4 ipycanvas-0.13.1 ipykernel-6.21.2 ipython-8.10.0 ipython_genutils-0.2.0 ipywidgets-8.0.4 isoduration-20.11.0 jedi-0.18.2 Jinja2-3.1.2 jsonpointer-2.3 jsonschema-4.17.3 jupyter_client-8.0.2 jupyter_core-5.2.0 jupyter_events-0.6.3 jupyterlab_pygments-0.2.2 jupyterlab_widgets-3.0.5 jupyter_server-2.2.1 jupyter_server_terminals-0.4.4 MarkupSafe-2.1.2 matplotlib_inline-0.1.6 mistune-2.0.5 nbclassic-0.5.1 nbclient-0.7.2 nbconvert-7.2.9 nbformat-5.7.3 nest_asyncio-1.5.6 notebook-6.5.2 notebook_shim-0.2.2 numpy-1.24.2 packaging-23.0 pandocfilters-1.5.0 parso-0.8.3 pexpect-4.8.0 pickleshare-0.7.5 Pillow-9.4.0 pip-23.0 platformdirs-3.0.0 prometheus_client-0.16.0 prompt_toolkit-3.0.36 psutil-5.9.4 ptyprocess-0.7.0 pure_eval-0.2.2 pycparser-2.21 Pygments-2.14.0 pyrsistent-0.19.3 python_dateutil-2.8.2 python_json_logger-2.0.6 PyYAML-6.0 pyzmq-25.0.0 rfc3339_validator-0.1.4 rfc3986_validator-0.1.1 Send2Trash-1.8.0 setuptools-66.1.1 six-1.16.0 sniffio-1.3.0 soupsieve-2.4 stack_data-0.6.2 terminado-0.17.1 tinycss2-1.2.1 tornado-6.2 traitlets-5.9.0 uri_template-1.2.0 wcwidth-0.2.6 webcolors-1.12 webencodings-0.5.1 websocket_client-1.5.1 wheel-0.38.4 widgetsnbextension-4.0.5

I don't have orjson installed, and ipycanvas.utils.ORJSON_AVAILABLE reports False in my notebook. I have highlighted some of the components that sound like they are involved in the machinery for sending this content, but I might be wrong.

I would assume a bug in some underlying communication library, but at the moment I don't have a sufficient understanding of the involved components to judge where that error might lie. So I'm reporting the issue first at the point where I observe it, and hope that someone can help me construct appropriate reproducing examples on lower levels so that the suitable spin-off issues can be created.

@gagern
Copy link
Author

gagern commented Feb 19, 2023

Using IPython.core.debugger.set_trace I've traced the synchronous call stack through ipywidgets and comm to ipykernel.iostream.BackgroundSocket.send_multipart and found that the data is still in order at that point. Even monkey-patching the method to pass bytes objects instead of the (potentially mutable) memoryview didn't change things, so it doesn't seem to be due to asynchronous mutation of the memory area, and it does seem to be nested fairly deeply in the communication stack. I don't have a good idea for how to trace the asynchronous communication, though, since it's harder to get a pdb attached to a background thread.

@martinRenou
Copy link
Collaborator

Thanks for opening an issue and for the thorough debugging.

My gut feeling would be that tornado has a web socket message size limit that you are reaching, and tornado may be cutting your message. But this hypothesis does not match with the diff you are seeing I suppose.

@gagern
Copy link
Author

gagern commented Feb 20, 2023

Plain size limitation doesn't match with the buffer size being the same on both sides of the channel. Must be at least somewhat more subtle to allow length to be passed independently.

Have you been able to reproduce? I'm having a hard time reproducing just now. I could reproduce with near certainty yesterday. Today when I wanted to explore other browsers (original report was using Chrome) and using Wireshark to observe the data over the wire, and suddenly I can't reproduce with any browser. Will try again, wondering what other aspects could come into this, including system load, network connectivity, or perhaps actually bad hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants