Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added log processing #75

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ilyakooo0
Copy link

closes #74

@denibertovic
Copy link
Owner

Hey @ilyakooo0 Thanks for the PR. 🍰 I've only taken a cursory look as I'm short on time but I'll try and review it properly tomorrow. It seems the CI is erroring for unrelated reasons.


processLog :: Monad m => ConduitT BSS.ByteString BSS.ByteString m ()
processLog = do
-- metadata (is the next string stdout or stderr)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the first 4 bytes in the STREAM just denote that the data marked as collected from stdout/stderr?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained in the issue, I couldn't find any documentation regarding the format. All I could find was this repo: https://github.com/mafintosh/docker-raw-stream

In that repo, the second 4 bytes represent the length of the subsequent string, and the first bit denotes whether the string is from stdout or stderr.

I assume that the first 4 bytes are reserved for metadata.

I have had the opportunity to test this branch and it works as expected.

processLog = do
-- metadata (is the next string stdout or stderr)
_ <- CB.take 4
len' <- CB.take 4
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only take 4 bytes at a time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, why don't we read the first 8 bytes in one go?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure reading it in one go and partitioning into two parts would be terribly more efficient.

@ilyakooo0
Copy link
Author

ilyakooo0 commented Feb 27, 2020

Actually, some of the builds errored because I used the new ConduitT, which is not available in the older versions.

UPD: I now fixed it and builds are failing only for unrelated reasons

@mbg
Copy link
Contributor

mbg commented Apr 2, 2020

Hi everyone, I ran into the same problem yesterday, but found that the format is documented in https://docs.docker.com/engine/api/v1.40/#operation/ContainerAttach

Here's some (simplified) code to show how I deal with it at the moment (which seems to work fine):

import Data.ByteString (ByteString)
import qualified Data.ByteString as BS

testSink :: ConduitT ByteString Void IO ()
testSink = do
    -- wait for data to become available
    mbs <- await

    -- the data might be `Nothing` if the conduit has been closed,
    -- so lets decide what to do...
    case mbs of
        Nothing -> pure ()
        -- otherwise, if there is data, send it to the output channel with
        -- appropriate metadata added to it
        Just bs  -> do
            -- the docker daemon attaches 8 byte headers to each line of
            -- output, see:
            -- https://docs.docker.com/engine/api/v1.40/#operation/ContainerAttach
            -- the first byte identifies the stream (1 - stdout, 2 - stderr)
            let streamType = BS.unpack (BS.take 1 bs)
            let message = BS.drop 8 bs

            -- do something with `message` according to `streamType`

            -- resume waiting for input
            testSink


-- put this somewhere else in your program to use it
getContainerLogsStream logOpts containerID $ CB.lines .| testSink

It is worth noting that it may not be sensible to deal with this directly in the implementation of getContainerLogsStream since the 8 byte headers are only present if tty: false for the container. If a user sets tty: true in the CreateOpts then the 8 byte header will not be present.

@denibertovic
Copy link
Owner

Hey @mbg thanks for the clarification and the example. It was always the intention of the library to leave the stream processing outside of the library ie. let the user define the sink that gets passed to the getContainerLogsSream function. So I'm not sure if it's wise to include this entire machinery in the library itself.

That said, there is too much abstration leaking outside of the library and the user has to know docker api internals to construct a proper sink....(ie. the message structure and what the first bytes mean and so on). So I think we need to address this somehow.

@denibertovic
Copy link
Owner

I'd love to get this one merged but I haven't had time to look into it in more detail.
I still think we shouldn't handle the stream necessarily in the library but there's also too much abstraction leaking if we let the users implement it themselves.

Would an example stream that just prints stuff to stdout and a pointer to it in the docs be the best course of action? @ilyakooo0 @mbg what do you think?

@ilyakooo0
Copy link
Author

IMO the main problem is that the raw stream produces invalid characters (as some of the data is not text at all). So, the stream is mostly useless unless you do the exact processing that is done in this PR even if you want to get the binary output of the executable.

I can't imagine a reason why someone would use this library and want to get the raw unprocessed stream.

@mbg
Copy link
Contributor

mbg commented Apr 5, 2022

I agree with @ilyakooo0. I think that there should be an implementation of this in the library, at the very least as an opt-in (i.e. even if it was just something along the lines of the testSink from my comment that got exported). The two questions for me are:

  • If the processing gets incorporated into getContainerLogsStream, how do we disable it if "tty": true (as I noted in my comment). This is not an issue if the processing is provided by a separate function.
  • Users would probably want more than just the ByteString representing the message, but also the source of the message (stdout/stderr). So we might need to introduce a new data type to represent a line of output for that.

Neither of these are particularly big issues, they are just design decisions that need to be made.

@mbg
Copy link
Contributor

mbg commented Sep 22, 2022

@denibertovic have you had any more thoughts on this / have you made a decision?

@denibertovic
Copy link
Owner

@mbg I agree with your previous comment. This should be implemented as a separate function and not directly in getContainerLogsStream so that it can handle the tty: true case. We export that function and provide docs/example on how to use it with getContainerLogStream.
I haven't thought about your second point but introducing a new type for that seems fine to me.

I don't have time to work on this but I'd be happy to merge if the above changes were made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-streaming log api
3 participants