Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoder suppresses stack overflow #9

Open
danfishgold opened this issue Dec 5, 2018 · 5 comments
Open

Decoder suppresses stack overflow #9

danfishgold opened this issue Dec 5, 2018 · 5 comments

Comments

@danfishgold
Copy link

A decoder I wrote returned Nothing for large byte sequences, but it was due to a stack overflow in a Bytes.Decode.map function I used.

Here's a simple example:

import Bytes.Decode as D
import Bytes.Encode as E

-- This causes a stack overflow for any input
stackOverflow x =
    case [x] of
        hd :: _ ->
            hd :: stackOverflow x
        [] ->
            []

-- This is a single byte (00000001)
byte = E.unsignedInt8 1

-- This decoder should raise an exception
overflowDecoder = D.map stackOverflow D.unsignedInt8
> stackOverflow 1
RangeError: Maximum call stack size exceeded

> D.decode overflowDecoder byte
Nothing
@folkertdev
Copy link

The issue seems to be a bit more general. Likely there is a catch that just catches all exceptions and lets the decoder return Nothing. For instance, a Debug.todo will also not throw an exception, but return Nothing

-- This is a single byte (00000001)
byte = E.unsignedInt8 1


todoDecoder = 
    D.map (\_ -> Debug.todo "crash!") D.unsignedInt8

-- result == Nothing, instead of throwing the exception
result = 
    D.decode todoDecoder (E.encode ( byte))

ellie

@evancz
Copy link
Member

evancz commented Feb 11, 2019

The root thing is that any time you run a decoder, it may be trying to access bytes that do not exist.

Example

You want to read an Int32 with four bytes, but the Bytes value only has one byte.

The implementation is eventually going to call bytes.getInt32(0) in JavaScript, which generates an exception.

Problem

So if we want to avoid this exception, we need to introduce a bounds check in the JS code before calling any function like bytes.getInt32. But that function already does a bounds check! So we'd be paying twice, slowing things down overall.

So I'm not really sure what the right move is here. The performance degredation needed to resolve this issue seems like a bad trade though.

Conclusion

I think it would be worthwhile for someone to do some performance experimentation here.

I did not focus on that too much in the initial implementation, and I suspect it's possible to do better.

In the course of those experiments, the experimenter may find a way to make things faster that also resolves this issue. Whatever the case, please share any results on this topic on https://discourse.elm-lang.org/

@folkertdev
Copy link

I made a micro-benchmark to check two bounds-checking strategies: if-statement to check the bound up-front, and a try/catch that catches the out of bounds RangeError. See the discourse post for more info.

To summarize

  • checking the offset up-front is expensive (since in practice in most cases the read will be allowed)
  • using a try/catch block around the buffer read has (close to) no cost when the read succeeds.

So the try/catch approach seems the way to go. In practice that would mean that

var _Bytes_read_i8  = F2(function(      bytes, offset) { return __Utils_Tuple2(offset + 1, bytes.getInt8(offset)); });

Becomes

var _Bytes_read_i8  = F2(function(      bytes, offset) { 
    try {
        return __Utils_Tuple2(offset + 1, bytes.getInt8(offset)); 
    } catch (e) { 
        return __Utils_Tuple2(-1, 0)); 
    }
});

Here I assume that Maybe is not used because of the allocation. Instead an offset of -1 is used to indicate an error and the kernel wrapper must check for this case. We also assume that the only exception that can be thrown in return __Utils_Tuple2(offset + 1, bytes.getInt8(offset)); is a bounds error. I think that is safe, but if not we can check whether the access was out of bounds in the catch block.

Currently the Decode.fail function throws an exception to make the decoding fail. I'm not familiar enough with kernel code to suggest a fix, but would assume there are ways to make it work.

Does this make sense? are there other paths to explore?

@evancz
Copy link
Member

evancz commented Feb 13, 2019

Nice work @folkertdev! Rather than adding a try to each specific read function, I am curious if the bounds error can be detected in the outermost try somehow. Is the out-of-bounds exception a particular value that'd be consistent across browsers?

@folkertdev
Copy link

Based on the spec, I don't think there is any guarantee that there is a consistent value that separates out-of-bounds access RangeErrors from other RangeErrors.

All getters are based on GetViewValue which stipulates

  1. If getIndex + elementSize > viewSize, throw a RangeError exception.

So it is just a normal RangeError. In practice I found that none of the exception attributes are unique and consistent. For instance chrome will error with the message

Offset is outside the bounds of the DataView

and FF gives

offset is outside the bounds of the DataView

Note that chrome capitalizes the initial O and firefox doesn't. Parsing the stack trace also doesn't work (and seems like a bad idea anyway).

So matching on the exception message looks quite brittle. Maybe all major JS engines give a similar message (can't test edge/IE at the moment) but there is no guarantee that they do (and the error in theory could change at any moment).

Code to test this in other browsers:

var buffer = new ArrayBuffer(8);

var view = new DataView(buffer);
view.setInt32(0, 42); 
view.setInt32(4, 84); 

view.getInt32(5);

Viir added a commit to Viir/bots that referenced this issue Aug 8, 2019
Change the image file decoding to support larger images.
When using the previous version on a BMP image generated from `2019-06-26.eve-online-screenshot.jpeg`, it failed with this error:
> Stopped with result: Failed to decode image: Failed to decode pixel array

This problem correlated with the size of the image; when the image has too many pixels, the bytes decoding part (in the previous version) ran into a stack overflow. `Bytes.Decode` in turn has the problem that it hides the exception by decoding to `Nothing` instead. This problem in `Bytes.Decode` has also been reported at elm/bytes#9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants