Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Interactivity Overhaul (User Interface & Model Instrumentation & Netw…
…ork Comms) (#1054) # Interactivity Overhaul > What you want, when you want. > > -- <cite>some guidance developer (circa 2024)</cite>  ## Overview This PR is the first of many focusing on interactivity. It introduces an updated user interface for notebooks, new instrumentation for models, and a respective network layer to handle bidirectional communication between the IPython kernel and JavaScript client. To further support this, models have reworked rendering, added tracing logic to better support replays where required. This PR also functions as a foundational step towards near future work including rendering across various environments (i.e. terminal support as TUI and append-only outputs), upgraded benchmarking and model inspection. ### TL;DR We added a lot of code to support better model metrics and visualization. We are getting ready for multimedia streaming, and want to have users deep inspect all the models, without overheating the computer. ### Acknowledgements Big shoutouts to: - Loc (co-developed this PR): model instrumentation & metrics. - Jingya: consult & sketches on enhanced UI design. - Harsha: overall feedback & collab on prototypes. ### Running this PR - `cd packages/python/stitch && pip install -e .` - Go run a notebook. ## User Interface Design principle: **All visibility. No magic.** Overall we're trying to show as much as we can on model outputs. When debugging outputs, there can be real ugliness that is often hidden away including tokenization concerns and critical points that may dictate the rest of the output. This need for inspection increases as users begin to define their own structured decoding grammars, unexpected overconstraints can occur in development. The old user interface that displays HTML as a side-effect in notebooks when models compute, have been replaced with a custom Jupyter Widget (see Network Communications for more detail), of which hosts an interactive sandboxed iframe. We still support a legacy mode, if users desire the previous UI. **Before** <img width="651" alt="image" src="https://github.com/user-attachments/assets/89b91c60-e428-43bb-ab41-d7ab34c65483"> **After** <img width="353" alt="image" src="https://github.com/user-attachments/assets/49964f2f-aef2-4faf-b631-5b1898677dd3"> We're getting more information to the output at the expense of less text density. There is simply more going on, and in order to keep some legibility we've increased text size and spacing, compensating for two visual elements (highlighting and underlines) that are used to convey token info for scanning. A general metrics bar is also displayed for discoverability on token reduction and other efficiency metrics relevant when prompt engineering for reduced costs. When users want further detail on tokens, we support a tool tip that contains top 5 alternate token candidates alongside exact values for visual elements. Highlighting has been applied to candidates, accentuating tokens that include spaces. We use a mono-space typeface such that data format outputs can be inspected quicker (i.e. verticality can matter for balancing braces and indentation). As users learn a system: a UI with easier discoverability can come at the cost of productivity. We've made all visual components optional to keep our power users in the flow, and in the future we intend to allow users to define defaults to fully support this. For legacy mode (modeled after previous UI). Users can execute `guidance.legacy_mode(True)` at the start of their notebook.  *Old school cool*. ### The Code - Added - `guidance.visual` module. Handles renderer creation (stitch or HTML display) and all required messaging. This also handles Jupyter cell change detection for deciding when widgets need to be instantiated or reset. - `guidance.trace` module. Tracks model inputs & outputs of an engine. Important for replaying for clients. - `graphpaper-inline` NPM package has been added. This handles all client-side rendering and messaging. Written with Svelte/TypeScript/Tailwind/D3. - Changed - Rendering logic has been stripped from `Model` class and has been delegated to `Renderer` member where possible. - Relevant state logic has been augmented for inputs & outputs, and stored within engine for tracing across models. - Role processing across guidance has been thinned. `Model` class now generates role openers and closer text directly from its respective chat template. ## Instrumentation Instrumentation is key for model inspection, debugging and cost-sensitive prompt engineering. This includes backing the new UI. Metrics are now collected for both general compute resources (CPU/GPU/RAM) and model tokens (including token counts/reduction, latency, type, backtracking). ### The Code * Added (metric collection feature) * Add Monitor class in _model.py to collect common metrics (CPU, RAM, GPU utilization, etc.) * Monitor runs in a separated process to prevent competing resources with model/engine process * Model now keeps stats of current input/output/backtrack tokens * At the end of notebook cell's execution, we'll collect probability of each token in the final model state, and collect associated stats per token such as * Latency * If token was generated, force-forwarded or from user input * Changed: * Replaced get_next_token with get_next_token_with_top_k to keep track issued token along with its associated top_k tokens (both constrained and unconstrained). Data will be stored in EngineOutput class * Model now has VisBytesChunk object to keep track of which part of the chunk is from user input, generated by engine or force-forwarded by parser. VisBytesChunk also stores the list of EngineOutput objects generated by the engine during chunk generation. This facilitates the process of checking tokens from the final state are generated, force-forwarded or from user input. * Add get_per_token_topk_probs function in Engine class to calculate probability of each token in the token list. This function is used at the end of the cell execution to calculate the probabilities of model state in unconstrained mode. * Add get_per_token_stats function in Model class to report stats for each token in model state in unconstrained mode. Stats include issued token, probability, latency, top-k, masked-top-k if available. Data from get_per_token_stats will be reported to the UI for new visualization. ## Network Communications We have two emerging requirements that will impact future guidance development. One, the emergence of streaming multimedia around language models (audio/video). Two, user interactivity within the UI, requesting more data or computation that may not be feasible to r`pre-(?:fetch|calculate)` to a static client. For user interactivity from UI to Python, it's also important that we cover as many notebook environments as possible. Each cloud notebook provider has their own quirks of which complicates client development. Some providers love resizing cell outputs indefinitely, others refuse to display HTML unless it's secured away in an isolated iframe. All in all, we need a solution that is isolated, somewhat available across providers and can allow streams of messages between server (Jupyter Python kernel) and client (cell output with a touch of JS). ### Stitch > It's 3:15AM, bi-directional comms was a mistake. > > -- <cite>some guidance developer, minutes prior to passing out (circa 2024)</cite> `stitch` is an auxiliary package we've created, that handles bi-directional communication between a web client and a Jupyter python kernel. It does this by creating a thin custom Jupyter widget that handles messages between the kernel and a sandboxed iframe hosting the web client. It looks something like this: `python code` -> `kernel-side jupyter widget` -> `kernel comms (ZMQ)` -> `client-side jupyter widget` -> `window event message` -> `sandboxed iframe` -> `web client (graphpaper-inline)` This package drives messages between `guidance.visual` module and `graphpaper-inline` client. All messages are streamed to allow near-real-time rendering within a notebook. Bi-directional comms is used to repair the display if initial messages have been missed (client will request a full replay when it notices the first message it receives has a non-zero identifier). ### The Code - Added - `stitch` Python package. Can be found at `packages/python/stitch`. ## Future work We wanted to shoot for the stars, and ended up in the ocean. The following will occur after this PR. Near future tasks: - User defaults for UI - Terminal support (non-interactive & shell) - Restyle - Richer visualizations - Memory re-architecture (broader than this PR) - Interactive support for multimedia - Guidance quality-of-life (visual diff testing) --------- Signed-off-by: Loc Huynh <[email protected]> Signed-off-by: JC1DA <[email protected]> Co-authored-by: Loc Huynh <[email protected]> Co-authored-by: Loc Huynh <[email protected]> Co-authored-by: Hudson Cooper <[email protected]>
- Loading branch information