Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdio in mo.Thread goes to Linux console after sleep() #3532

Open
apowers313 opened this issue Jan 22, 2025 · 4 comments
Open

stdio in mo.Thread goes to Linux console after sleep() #3532

apowers313 opened this issue Jan 22, 2025 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@apowers313
Copy link

Describe the bug

When running the following code, the first two print statements are shown in the marimo cell, and all the print statements that follow the first sleep end up bring printed in the stdout of the Linux shell where the marimo command was run.

If the sleep statement is commented out, then all print statements end up in the marimo cell.

Issue #2642 seems to imply that all I/O should go to the notebook if using mo.Thread

Environment

{
  "marimo": "0.10.15",
  "OS": "Linux",
  "OS Version": "5.15.0-125-generic",
  "Processor": "x86_64",
  "Python Version": "3.13.1",
  "Binaries": {
    "Browser": "--",
    "Node": "v18.20.4"
  },
  "Dependencies": {
    "click": "8.1.8",
    "docutils": "0.21.2",
    "itsdangerous": "2.2.0",
    "jedi": "0.19.2",
    "markdown": "3.7",
    "narwhals": "1.23.0",
    "packaging": "24.2",
    "psutil": "6.1.1",
    "pygments": "2.19.1",
    "pymdown-extensions": "10.14",
    "pyyaml": "6.0.2",
    "ruff": "0.9.2",
    "starlette": "0.45.2",
    "tomlkit": "0.13.2",
    "typing-extensions": "4.12.2",
    "uvicorn": "0.34.0",
    "websockets": "14.2"
  },
  "Optional Dependencies": {
    "pandas": "2.2.3"
  }
}

Code to reproduce

import marimo as mo
import time

def long_running():
    print("Thread starting:")
    i = 0
    while i < 20:
        print(f"Loop {i}...")
        i += 1
        time.sleep(1)
    print("Thread exiting.")

mo.Thread(target=long_running).start()
@apowers313 apowers313 added the bug Something isn't working label Jan 22, 2025
@akshayka akshayka self-assigned this Jan 23, 2025
@apowers313
Copy link
Author

From discord:

  • Right now, mo.Thread is initialized with the parent thread's context, but I believe it needs its own context. What's happening now is that after the cell that creates the mo.Thread finishes executing, the parent thread's Stdout is correctly disconnected from the cell, but this disconnects the mo.Thread's Stdout too.
  • The new thread's context will also need to have its Stdout and Stderr objects redirected, similar to this:
    redirect_streams(
    cell_id,
    stream=self.stream,
    stdout=self.stdout,
    stderr=self.stderr,
    stdin=self.stdin,
    ),
    . You can use the cell_id of the parent thread's context. But you likely don't want to use redirect_streams as a context manager, since they should be permanently redirected.

@apowers313
Copy link
Author

This seems to do the trick. The only changes are to runtime_context_installed and the file imports.

If this doesn't look too crazy to you, I'll put together a test and a pull request:

class Thread(threading.Thread):
    """A Thread subclass that is aware of marimo internals.

    `mo.Thread` has the same API as threading.Thread,
    but `mo.Thread`s are able to communicate with the marimo
    frontend, whereas `threading.Thread` can't.
    """

    def __init__(self, *args: Any, **kwargs: Any) -> None:
        super().__init__(*args, **kwargs)

        self._marimo_ctx: RuntimeContext | None = None

        if runtime_context_installed():
            # self._marimo_ctx = get_context()
            parent_ctx = get_context()
            parent_stream = parent_ctx.stream
            new_stream = ThreadSafeStream(
                pipe=parent_stream.pipe,
                input_queue=parent_stream.input_queue,
                cell_id=parent_stream.cell_id,
            )
            new_stdout = ThreadSafeStdout(new_stream)
            new_stderr = ThreadSafeStderr(new_stream)
            sys.stdout = new_stdout
            sys.stderr = new_stderr

            self._marimo_ctx = KernelRuntimeContext(**parent_ctx.__dict__)
            self._marimo_ctx.stdout = new_stdout
            self._marimo_ctx.stderr = new_stderr
            new_stdout = new_stdout._watcher.start()
            new_stderr = new_stderr._watcher.start()

    def run(self) -> None:
        if self._marimo_ctx is not None:
            initialize_context(self._marimo_ctx)
        super().run()

@akshayka
Copy link
Contributor

akshayka commented Jan 26, 2025

I think the approach is good -- few small things can bring up in a PR review. Thanks so much!

EDIT: Took a closer look, I think it may be more complicated, see follow-up comment.

@akshayka
Copy link
Contributor

It's been a while since I've looked at this code, so I took a closer look and I think it may be more complicated.

This approach might not work when there are multiple mo.Thread's writing to stdout, or if mo.Thread is writing to stdout at the same time as another cell — which is presumably the default use case.

Something like the following could work (but without diving deeper I am not 100% sure): share the same ThreadSafeStdout/Stderr objects across all threads, but maintain a map from mo.Thread thread IDs to their cell IDs; on write, the Stdout object would look up the current thread's ID to determine to which cell to forward its message. The logic for redirecting file descriptors may also need to be updated, such that stdout and stderr are always duplicated but not closed, and only conditionally redirected (if a cell is executing, or if written to by a mo.Thread object).

Even still there may be some limitations, such as mo.Thread objects won't be able to monkey-patch the Python sys.stdout object (though they will still be able to call print and write to file descriptors directly).

Let me know if you're still comfortable with attempting this change. No worries if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants