Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoders with custom offset #22

Open
SirJayEarn opened this issue Nov 9, 2020 · 0 comments
Open

Decoders with custom offset #22

SirJayEarn opened this issue Nov 9, 2020 · 0 comments

Comments

@SirJayEarn
Copy link

SirJayEarn commented Nov 9, 2020

Hi!

first of all: thanks a lot for bringing Elm into the world! This language kept me motivated digging more and more into functional programming, whereas with other languages I felt overwhelmed and discouraged pretty quickly.

Short

Recently I've been writing a parser for midi files in elm, using elm/bytes. Overall it worked pretty nicely. But there was one thing I was missing: writing a custom decoder where an offset could be provided.

Issue

Parts of a midi file contain a list-like structure. Items are (potentially) compressed in a way that one byte needs to be read in order to know how to process this same and the following bytes. Depending on the most significant bit, this byte might contain some meta-information. If it doesn't this means the current list-item has the same meta-information as the latest item, so it is just dropped here. Implying that the byte that was just read is not to be read as just a single byte, but in various ways, depending on the previous meta-information. This leads to a situation where some kind of lookahead would be useful. Something that goes like: Ok the byte is of this structure, so keep it and read it as two nibbles, which define how to read the following bytes. Or: Oh the byte is of this other structure, so just forget about it and use the most recent meta-information and continue like normal, BUT start where the byte we just read started, because it is not the meta-info byte, but already part of the data. So we are basically one byte off.

Solution (possibly)

I think in this case it would be nice to just set back the offset, so we effectively forget about the byte we just decoded. Because there is no way to reset the offset when using andThen I carried this already read byte around and needed to provided it to the following decoders, where I had to prepend it conditionally. This made the code harder to read, understand and reuse.

In the source code of elm/bytes andThen and mapN use an offset internally. I guess exposing the data constructor Decoder (Bytes -> Int -> (Int, a)), instead of just the type constructor Decoder a would be all that is needed to be able to build custom map / andThen decoders for doing this kind of lookahead decoding.

Illustration

Maybe my explanation was a little confusing, so this code hopefully makes it easier to understand

Bytes.Decode.unsignedInt8
    |> Bytes.Decode.andThen
        (\currentPotentialStatusByte ->
            let
                isCompressed =
                    currentPotentialStatusByte < 128

                currentStatusByte =
                    if isCompressed then
                        previousStatusByte

                    else
                        currentPotentialStatusByte

                ( mEventName, channel ) =
                    statusByteToNibbles currentStatusByte

                readFirstByte =
                    if isCompressed then
                        Bytes.Decode.succeed currentPotentialStatusByte

                    else
                        Bytes.Decode.unsignedInt8
...
            readFirstByte |> Bytes.Decode.andThen preReadVariableLengthValueDecoder |> Bytes.Decode.andThen Bytes.Decode.string |> Bytes.Decode.map ((++) "System Exclusive Begin Event" >> NotYetSupportedEvent)

and then I need to carry around readFirstByte and map all the following decoders. But I think this would be nicer:

Bytes.Decode.Decoder
    (\bites offset ->
        let
            (Bytes.Decode.Decoder uint8Decode) =
                Bytes.Decode.unsignedInt8

            (currentPotentialStatusByte, newOffset) =
                uint8Decode bites offset

            isCompressed =
                currentPotentialStatusByte < 128

            currentStatusByte =
                if isCompressed then
                    previousStatusByte

                else
                    currentPotentialStatusByte

            ( mEventName, channel ) =
                statusByteToNibbles currentStatusByte

            nextOffset =
                if isCompressed then
                    offset

                else
                    newOffset
...
        withOffset nextOffset readVariableLengthValueDecoder |> Bytes.Decode.andThen Bytes.Decode.string |> Bytes.Decode.map ((++) "System Exclusive Begin Event" >> NotYetSupportedEvent)

Where withOffset would be something like

withOffset : Int -> Bytes.Decode.Decoder a
withOffset offset (Bytes.Decode.Decoder decode) =
	Decoder <| \bites _ -> decode bites offset

So here I could just use readVariableLengthValueDecoder, which is also used in other places. And I would not need to create a preReadVariableLengthValueDecoder - which does the same thing but either reads the first byte, or doesn't depending on the given argument

edit: during the past days I watched a bunch of elm talks (mostly held by Richard Feldman). I now understand much better why the type Decoder is opaque. Still I think having a way to adjust the offset would be very nice. Maybe through adding a utility function like withOffset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant