Skip to content

Latest commit

 

History

History
96 lines (55 loc) · 9.16 KB

UIP-0100.md

File metadata and controls

96 lines (55 loc) · 9.16 KB
uip title description author status type category created
0100
Sticky Scry
Extend Remote Scry to Support Subscriptions
~rovnys-ricfer
Draft
Standards Track
Kernel
2023-06-13

Abstract

This proposal enables cross-ship subscriptions using the remote scry protocol (Fine) by making scry requests for paths that are known to be part of publications "sticky" on the publisher, so they persist long enough that when the path does get grown, the response can be sent down to requesters immediately.

Sending responses immediately only works if Vere somehow finds out from Arvo that a request can now be fulfilled. This proposal advocates for Arvo to emit an effect whenever a new piece of concrete data is added to the scry namespace: a Clay commit with new files, a Gall %grow move from an agent to bind a new path, or Eyre binding a new static response to a mutable URL. In a "shrubbery" future, every new shrub binding will also trigger such an effect.

Motivation

The basic Fine implementation does not support low-latency subscriptions. If the subscriber ship has received the nth datum in a subscription, it can use Fine to ask for the n+1th datum by requesting data at the path that's identical to the first except for an incremented revision number. The subscriber will retry that request on a timer indefinitely, so after the publisher has grown the new path binding, the subscriber ship will receive the new datum, but only after half the retry interval on average.

Given the current Fine backoff interval of two minutes, a request for the next datum will only resolve after a minute. This is too much latency for many Urbit applications, including most social applications such as chat. Ideally, Urbit wouldn't need any session-based protocol to ensure low latency on responses to scry requests whose paths haven't been bound yet when the publisher receives the request; instead, the basic remote scry protocol would handle this.

Furthermore, each network request for the next datum will naively cause a new call to Arvo's +peek arm, with accompanying IPC overhead between runtime processes and Nock execution overhead. If there are, say, ten thousand subscribers, then even with a two-minute retry interval, the publisher ship will need to run the +peek arm over 80 times per second, which is a bit silly. That is best-case, too: just after the nth datum is successfully retrieved, subscribers are likely to send requests with a much faster retry, easily causing thousands of extraneous Arvo activations per second without additional machinery.

A sticky scry system should make it feasible for a runtime to include relatively simple machinery that would allow it to hold onto multiple requests for the same future datum without inducing duplicate Arvo +peek requests in common code paths -- i.e. it would be acceptable to have some request duplication, but the duplicate request frequency for a given datum under typical conditions should stay under an upper bound whose order of magnitude is around once per second, or ideally more like once per minute. Any faster than that would scale poorly for busy publisher ships with high numbers of subscribers and published paths.

Specification

Whenever a new subscribable scry path is grown, Arvo emits a [%grew path] effect to the runtime. The runtime spams the effect to all its I/O drivers in case any of them have listeners for this path.

I propose using the duct [//spam]~ as the duct on which all %grew effects will be emitted. This has the following useful properties:

  1. The duct does not require any Arvo state, such as the unix-duct variable that some vanes store to remember the duct on which they heard the last %born event when the runtime last restarted. For "landlocked" vanes such as Gall (i.e. they don't perform I/O with the runtime), this obviates the need for adding a new %born event for that vane or propagating a %born from som eother vane, both of which would be lame due to not being fundamentally needed.
  2. The is easy to compare quickly for equality, so the runtime can quickly and simply determine that the effect should be delivered to all I/O drivers, rather than processed solely by a single driver then discarded. (Alternatively the effect's value could indicate this, but I think it's cleaner if it's in the duct).

Subscribable Paths

The set of subscribable paths is currently:

  • /g/a paths bound explicitly by a Gall agent
  • /c/z paths asking for the root folders of Clay desks, at numeric revisions
  • /e/x/cache paths asking Eyre for static responses to HTTP requests

Other scry paths do not easily support detecting changes, so they cannot be subscribed to. This set could change with each Kelvin update, but it is a formal part of the interface at a given Kelvin.

TODO: how do we make sure multiparty encrypted remote scry can use sticky scry? Do we need to include the message sequence number in cleartext?

Runtime Effect Handling

The runtime will distribute the %grew effect to all its I/O drivers. Each driver might have outstanding requests for this path or one or more related paths, so it might perform scry requests into Arvo over IPC to retrieve the values grown at the requested paths.

Once such a request is resolved and the driver has the scry result, the driver will immediately send the response to the requester, closing the request if possible. The Fine driver will remove any outstanding requests for the newly grown path from its state, but it will keep the scry results in its cache the way it normally does.

Initialization

Without knowing the latest case for a path, Vere can't drop repeated requests for a future revision that would block, which could be costly, since each failing request would incur an IPC roundtrip and an Arvo activation. One answer could be that on process restart, Clay would emit the latest revision for all desks, Eyre would emit the latest revision for all static responses, and Gall would emit the latest revision for all paths bound by all agents. Vere would then store all these paths and revisions in memory until invalidated by later %grew notifications from Arvo.

However, this would impose a rather high burden on Vere to know the full scry tree, which scales poorly, potentially causing slow restarts and IPC issues, and could be bug-prone. It would be better to introduce transient Vere state for a path on demand.

Vere's scry cache could include "blocked" values for scry paths that have blocked when requested. Vere will only create a "blocked" cache entry for a path if that path is one for which Vere knows Arvo will emit %grew notifications. If so, then when first asked for that path, there is no cache entry for that path, so Vere performs an Arvo +peek request for that path. If the request blocks, Vere marks it as pending in the cache, and then duplicate network requests for that path do not trigger any more +peek requests.

A further optimization would have Vere track the latest resolved case for any cached path, so that when the blocked request for datum n+1 resolves, Vere can remember that and automatically mark datum n+2 as pending, so that Arvo will not need to +peek for that datum at all until it has actually been bound.

It should be safe and effective for a "pending" entry to be evicted from the cache, in which case Vere will need to +peek into Arvo again to reinitialize this entry.

Rationale

An earlier proposal for scry-based subscriptions defined a new network protocol for managing ship-to-ship sessions. Within a session, the publisher would be responsible for remembering a request for a future datum and honoring it as soon as possible.

This proposal is lighter-weight and does not introduce a session abstraction, which is often considered distasteful in Urbit circles because it introduces a generally unnecessary statefulness into the networking stack -- Ames and Fine are packet-oriented connectionless protocols, and it would be a shame to lose that.

Some more background on the problem can be found here: https://gist.github.com/belisarius222/4e91c13ff42e0c9371a4d194ad2947e5

Backward Compatibility

TODO Either explicit protocol negotiation to check if a publisher supports sticky scry, or fall back to polling, possibly with a shorter re-send interval.

This system should burn a Kelvin, and it might be a good idea to increment the Fine protocol version too.

Test Cases

TODO

Reference Implementation

TODO

Security Considerations

For unencrypted scry, there are no security implications for this proposal. Once encrypted scry is part of the system, extending this proposal to ensure requests for future encrypted paths are still resolved by growths of the corresponding decrypted path is a remaining piece of work to be done.

One option could be to move the revision number of an encrypted path out of the encrypted section of the path into cleartext. This would be a metadata leak, but it might be worth it if it allows for good subscription semantics. It could be especially important for relays to participate in subscriptions later, since relays can't decrypt the paths.

Copyright Waiver

Copyright and related rights waived via CC0.