Decoder suppresses stack overflow #9

danfishgold · 2018-12-05T15:46:19Z

A decoder I wrote returned Nothing for large byte sequences, but it was due to a stack overflow in a Bytes.Decode.map function I used.

Here's a simple example:

import Bytes.Decode as D
import Bytes.Encode as E

-- This causes a stack overflow for any input
stackOverflow x =
    case [x] of
        hd :: _ ->
            hd :: stackOverflow x
        [] ->
            []

-- This is a single byte (00000001)
byte = E.unsignedInt8 1

-- This decoder should raise an exception
overflowDecoder = D.map stackOverflow D.unsignedInt8

> stackOverflow 1
RangeError: Maximum call stack size exceeded

> D.decode overflowDecoder byte
Nothing

The text was updated successfully, but these errors were encountered:

folkertdev · 2018-12-06T17:45:58Z

The issue seems to be a bit more general. Likely there is a catch that just catches all exceptions and lets the decoder return Nothing. For instance, a Debug.todo will also not throw an exception, but return Nothing

-- This is a single byte (00000001)
byte = E.unsignedInt8 1


todoDecoder = 
    D.map (\_ -> Debug.todo "crash!") D.unsignedInt8

-- result == Nothing, instead of throwing the exception
result = 
    D.decode todoDecoder (E.encode ( byte))

ellie

evancz · 2019-02-11T20:36:24Z

The root thing is that any time you run a decoder, it may be trying to access bytes that do not exist.

Example

You want to read an Int32 with four bytes, but the Bytes value only has one byte.

The implementation is eventually going to call bytes.getInt32(0) in JavaScript, which generates an exception.

Problem

So if we want to avoid this exception, we need to introduce a bounds check in the JS code before calling any function like bytes.getInt32. But that function already does a bounds check! So we'd be paying twice, slowing things down overall.

So I'm not really sure what the right move is here. The performance degredation needed to resolve this issue seems like a bad trade though.

Conclusion

I think it would be worthwhile for someone to do some performance experimentation here.

I did not focus on that too much in the initial implementation, and I suspect it's possible to do better.

In the course of those experiments, the experimenter may find a way to make things faster that also resolves this issue. Whatever the case, please share any results on this topic on https://discourse.elm-lang.org/

folkertdev · 2019-02-12T21:19:06Z

I made a micro-benchmark to check two bounds-checking strategies: if-statement to check the bound up-front, and a try/catch that catches the out of bounds RangeError. See the discourse post for more info.

To summarize

checking the offset up-front is expensive (since in practice in most cases the read will be allowed)
using a try/catch block around the buffer read has (close to) no cost when the read succeeds.

So the try/catch approach seems the way to go. In practice that would mean that

var _Bytes_read_i8  = F2(function(      bytes, offset) { return __Utils_Tuple2(offset + 1, bytes.getInt8(offset)); });

Becomes

var _Bytes_read_i8  = F2(function(      bytes, offset) { 
    try {
        return __Utils_Tuple2(offset + 1, bytes.getInt8(offset)); 
    } catch (e) { 
        return __Utils_Tuple2(-1, 0)); 
    }
});

Here I assume that Maybe is not used because of the allocation. Instead an offset of -1 is used to indicate an error and the kernel wrapper must check for this case. We also assume that the only exception that can be thrown in return __Utils_Tuple2(offset + 1, bytes.getInt8(offset)); is a bounds error. I think that is safe, but if not we can check whether the access was out of bounds in the catch block.

Currently the Decode.fail function throws an exception to make the decoding fail. I'm not familiar enough with kernel code to suggest a fix, but would assume there are ways to make it work.

Does this make sense? are there other paths to explore?

evancz · 2019-02-13T02:53:56Z

Nice work @folkertdev! Rather than adding a try to each specific read function, I am curious if the bounds error can be detected in the outermost try somehow. Is the out-of-bounds exception a particular value that'd be consistent across browsers?

folkertdev · 2019-02-13T11:58:06Z

Based on the spec, I don't think there is any guarantee that there is a consistent value that separates out-of-bounds access RangeErrors from other RangeErrors.

All getters are based on GetViewValue which stipulates

If getIndex + elementSize > viewSize, throw a RangeError exception.

So it is just a normal RangeError. In practice I found that none of the exception attributes are unique and consistent. For instance chrome will error with the message

Offset is outside the bounds of the DataView

and FF gives

offset is outside the bounds of the DataView

Note that chrome capitalizes the initial O and firefox doesn't. Parsing the stack trace also doesn't work (and seems like a bad idea anyway).

So matching on the exception message looks quite brittle. Maybe all major JS engines give a similar message (can't test edge/IE at the moment) but there is no guarantee that they do (and the error in theory could change at any moment).

Code to test this in other browsers:

var buffer = new ArrayBuffer(8);

var view = new DataView(buffer);
view.setInt32(0, 42); 
view.setInt32(4, 84); 

view.getInt32(5);

Change the image file decoding to support larger images. When using the previous version on a BMP image generated from `2019-06-26.eve-online-screenshot.jpeg`, it failed with this error: > Stopped with result: Failed to decode image: Failed to decode pixel array This problem correlated with the size of the image; when the image has too many pixels, the bytes decoding part (in the previous version) ran into a stack overflow. `Bytes.Decode` in turn has the problem that it hides the exception by decoding to `Nothing` instead. This problem in `Bytes.Decode` has also been reported at elm/bytes#9

lue-bird mentioned this issue Dec 26, 2022

decoding implementation produces runtime exception → Nothing gren-lang/core#47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoder suppresses stack overflow #9

Decoder suppresses stack overflow #9

danfishgold commented Dec 5, 2018

folkertdev commented Dec 6, 2018

evancz commented Feb 11, 2019 •

edited

Loading

folkertdev commented Feb 12, 2019

evancz commented Feb 13, 2019

folkertdev commented Feb 13, 2019

Decoder suppresses stack overflow #9

Decoder suppresses stack overflow #9

Comments

danfishgold commented Dec 5, 2018

folkertdev commented Dec 6, 2018

evancz commented Feb 11, 2019 • edited Loading

Example

Problem

Conclusion

folkertdev commented Feb 12, 2019

evancz commented Feb 13, 2019

folkertdev commented Feb 13, 2019

evancz commented Feb 11, 2019 •

edited

Loading