Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Custom RTTs, Type-Associated Data, and WasmGC JS Interop #1552

Open
tlively opened this issue Jan 15, 2025 · 34 comments
Open

Proposal: Custom RTTs, Type-Associated Data, and WasmGC JS Interop #1552

tlively opened this issue Jan 15, 2025 · 34 comments

Comments

@tlively
Copy link
Member

tlively commented Jan 15, 2025

WebAssembly structs and other heap values are currently associated at runtime with “runtime type” objects, or RTTs, which contain the information necessary to implement structural casts and type checks. In the MVP design of WebAssembly GC, these RTTs are left implicit and do not appear in the syntax of the language. Nevertheless, they are already part of WebAssembly’s semantics, and require an implicit extra reference field in every heap object.

Similarly, many source languages have some form of data associated with source types. Examples include vtables and itables for method dispatch as well as static fields. These features must currently be implemented with an extra reference field in every object representing a source object, just like the RTT reference field managed by the engine. Using just one field to access both the structural type information the engine needs to implement casts and the userspace type-associated information would reduce the in-memory size of every WebAssembly object that needs both. For Google Sheets calcworker, we estimate this to be a roughly 10% reduction in total memory use.

To enable that memory savings, this proposal extends struct definitions to allow arbitrary data to be attached to RTTs, exposes RTT references as first-class values, and introduces new instructions for allocating custom RTTs, allocating structs with custom RTTs, and retrieving data from objects’ RTTs. To avoid code size regressions in the type section, this proposal also includes a new mechanism for deduplicating common sequences of fields in type definitions.

Putting type-associated data on the RTT also lays the foundation for allowing embedders, especially JS, richer access to WebAssembly structs. In JS engines, WebAssembly RTTs are analogous to JS type descriptors that store information about the prototype and field layout of JS objects. With explicit RTTs holding arbitrary data, types can have externref RTT fields annotated as holding a JS prototype that can be used to call methods, including getters and setters, on objects of that type when they flow out to JS. Other embedders don't have JS prototypes specifically, but they can use the same annotation for whatever other host data they wish to use to mediate access to WebAssembly structs.

The rest of this writeup gives a detailed design for custom RTTs to serve as a basis for discussion.

RTT Definitions

Struct definitions are augmented with `(rtt fieldlist)’. The fields must all be immutable. Struct definitions without RTT definitions are redefined to be shorthands for struct definitions with empty RTTs, i.e. RTTs with no fields.

Note: Alternatively, RTTs could be defined as separate types, but they would have to be mutually and exclusively recursive with their associated struct types anyway, so it's simpler to combine the definitions.

Note: Alternatively, structs without RTT definitions and structs with empty RTT definitions could be defined to have different structures if there would be a benefit to such a distinction.

Note: Alternatively, array and function type definitions could also be extended to include custom RTTs. This is not included in the current design because there are no known use cases, but they can always be added later.

Note: Alternatively, RTT fields could also be allowed to be mutable if there is a use case for mutability.

;; RTT fields are separated from normal fields
(type $foo (struct (rtt (field $type_id i32)) (field $a i32) (field $b i64))) 

;; $no-rtt and $empty-rtt are the same type
(type $no-rtt (struct (field i32)))
(type $empty-rtt (struct (rtt) (field i32)))

;; RTT fields must be immutable
(type $invalid (struct (rtt (field (mut i32))))) ;; invalid!

In addition to the existing rules about how the normal fields of struct subtypes may differ from those of their supertypes, we need new rules governing how the RTT fields of a struct subtype may differ from those of its supertype. These rules must ensure that access to RTT fields, as described later, is sound. The rules turn out to be the same as for normal struct fields; both width and depth subtyping are allowed, and they can be applied independently to the RTT fields and normal fields because they have different index spaces.

;; Depth subtyping on RTT fields
(type $super (sub (struct (rtt (field anyref)))))
(type $sub (sub $super (struct (rtt (field eqref)))))

;; Width subtyping on RTT fields
(type $super (sub (struct (rtt (field anyref)))))
(type $sub (sub $super (struct (rtt (field anyref anyref)))))

;; Width and depth subtyping on both RTT and struct fields
(type $super (sub (struct (rtt (field anyref)) (field anyref))))
(type $sub (sub $super (struct (rtt (field eqref eqref)) (field eqref eqref))))

RTT Types

RTTs form a new heap type hierarchy with a top type rtt and a bottom type nortt. rttref is a shorthand for (ref null rtt) and nullrttref is a shorthand for (ref null nortt). In between these types are the defined RTT types for particular struct heap types: (rtt typeidx). Defined RTT types are never in subtype relationships with one another, even if their associated struct types are subtypes of one another.

Note: Alternatively, we could exclude the rtt and nortt types and have each RTT heap type be its own singleton type hierarchy. rtt and nortt are included for now for consistency with other type hierarchies.

(type $super (sub (struct (rtt))))
(type $sub (sub $super (struct (rtt))))

;; Invalid because (rtt $sub) is not a subtype of (rtt $super), even though $sub is a subtype of $super.
(func $invalid (param (ref null (rtt $sub))) (result (ref null (rtt $super)))
  (local.get 0)
)

Here's a program that would execute unsoundly if RTT subtyping mirrored the associated struct subtyping:

(type $super (sub (struct (rtt))))

(type $sub (sub $super (struct (rtt) (field i32))))

(func $unsound (result i32)
  rtt.new $sub
  ;; Allocate a $super with an RTT for $sub. Allowed if (rtt $sub) <: (rtt $super).
  struct.new_with_rtt $super
  ;; Cast the $super to a $sub. This will succeed because the object has an RTT for $sub.
  ref.cast $sub
  ;; Read out-of-bounds because we have a reference to the object
  ;; with a more specific type than it was allocated with.
  struct.get $sub 0
)

RTT Instructions

Custom RTTs may be allocated by the rtt.new and rtt.new_default instructions. These are similar to struct.new and struct.new_default, but they take values for a type’s RTT’s fields rather than its own fields.

rtt.new typeidx
C |- rtt.new x : [t*] -> [(ref (rtt x))]
 - C.types[i] = (rtt fields) subtype
 - t* = extract(fields)

rtt.new_default
C |- rtt.new_default x : [] -> [(ref (rtt x))]
 - C.types[i] = (rtt fields) subtype
 - defaultable(fields)

Structs can be allocated with explicit RTTs using struct.new_with_rtt and struct.new_default_with_rtt. These instructions trap if the provided RTT reference is null.

struct.new_with_rtt typeidx
C |- struct.new_with_rtt x : [t* (ref null (rtt x))] -> [(ref x)]
 - …

struct.new_default_with_rtt typeidx
C |- struct.new_default_with_rtt x : [(ref null (rtt x))] -> [(ref x)]
 - …

The semantics of the existing struct.new and struct.new_default instructions can be redefined in terms of struct.new_with_rtt and struct.new_default_with_rtt along with an rtt.canon administrative instruction that looks up the canonical RTT for a type. rtt.canon only makes sense for empty RTTs, so struct.new and struct.new_default can only be used to allocate types that have empty RTTs.

Note: Alternatively, rtt.canon could be a user-visible instruction rather than an administrative instruction if there is a use for it.

Note: Alternatively, rtt.canon (and therefore struct.new and struct.new_default) could be allowed for non-empty RTTs as long as all their fields are defaultable.

A new rtt.get instruction provides access to the data stored on a struct’s RTT, along with rtt.get_s and rtt.get_u for packed RTT fields. struct.get is not repurposed for retrieving RTT data to allow the RTT fields to have their own index space separate from that of the struct’s own fields. This is a requirement to support width subtyping on the struct and RTT fields independently.

rtt.get_sx? typeidx fieldidx
C |- rtt.get x y : [(ref null x)] -> [t]
 - …

It would not be sound to allow retrieving a reference to an arbitrary struct’s RTT without some way to express or check for an exact type, so instead we offer an instruction rtt.eq that can compare the identity of a struct’s RTT with that of a given RTT. This can be used as a fast path in the userspace implementation of casts. For maximum generality, it can take any struct type and any RTT type.

rtt.eq
C |- rtt.eq : [(ref null struct) (ref null rtt)] -> [i32]
 - …

Note: Alternatively, rtt.eq could take a struct type index x and take [(ref null x) (ref null (rtt x))] operands, but this would be more restrictive than necessary. In particular, it wouldn't be able to check for matches with RTTs for subtypes of struct x.

Note: It would be sound to provide a struct.get_rtt instruction as long as it returned rttref rather than a specific RTT reference. This would let us reuse ref.eq rather than introducing rtt.eq, and other use cases might come up as well.

Casts and Custom RTTs

WebAssembly uses structural types because the only intended purpose of its type system is to guard against unsafe accesses that are out of bounds or use the wrong value type. As such, WebAssembly casts and type checks are intended to recover information about the structure of a heap object and nothing more. For lack of an efficient alternative, several toolchains currently use WebAssembly casts to implement source-level casts, but this only works in a closed-world scenario where the toolchain can ensure all source types lower to unique WebAssembly types.

The structure of the RTT associated with a struct type is part of the struct type’s structure, so RTT declarations are included in the structural type canonicalization procedure. On the other hand, the particular RTT value associated with an object does not affect the object’s structure, so the outcome of a WebAssembly cast never depends on RTT identity; two objects of the same WebAssembly type always pass the same set of WebAssembly casts no matter what their RTT values are.

Engines currently depend on type canonicalization producing unique RTTs for each type to implement casts efficiently. Each RTT stores a vector of the canonical RTTs for its supertypes and the relative subtype depths between types can be used to index into that vector to perform casts in constant time. With custom RTTs, the RTT values for each type are no longer unique, but canonical values can still be produced and canonical supertype vectors can still be stored on all RTTs to preserve constant time casts. Cast fast paths that perform identity checks on RTTs will not work for types with customizable RTTs, though.

Whether RTT reference types themselves can be cast is another question that will require discussion. On the one hand the consistency of being able to ref.cast to recover lost type information for any reference type is useful for optimizers, and on the other hand we are already considering making exceptions for e.g. continuation references to simplify implementations.

JS Prototypes

The JS embedding specification currently says that Wasm structs that flow into JS have immutable, null prototypes. To allow richer interop with Wasm structs from JS, we can annotate RTT allocations to tell the engine to duplicate a particular immutable fields into the prototype slot in the RTT. This will allow JS to call methods on objects that use the allocated RTT. The annotation must be on the RTT allocation rather than the type definition because there would be no composable way to resolve conflicting annotations on different definitions of the same type.

;; counter.wasm
(type $counter (struct (rtt (field $proto externref)) (field $val i32)))

;; Import the prototype.
(import "prototypes" "counter" (global $counter.proto externref))

;; Allocate the custom RTT for $counter, annotating the prototype field.
(global $counter.rtt (ref (rtt $counter))
  (@prototype $counter.proto)
  (rtt.new $counter (global.get $counter.proto))
)

;; Allocate and export a $counter.
(global $foo (export "counter") (ref $counter) (struct.new_with_rtt $counter (i32.const 0) (global.get $counter.rtt)))

;; Export functions that access counters.
(func $get (export "counter.get") (param (ref $counter)) (result i32)
  (struct.get $counter $val (local.get 0))
)
(func $inc (export "counter.inc") (param (ref $counter))
  (struct.set $counter $val (local.get 0)
    (i32.add
      (struct.get $counter $val (local.get 0))
      (i32.const 1)
    )
  )
)
// counter.js

// Create a prototype for counters.
var counterProto = {};

// Instantiate the module, passing the prototype as an import.
var {module, instance} = await WebAssembly.instantiateStreaming(fetch('counter.wasm'), {
  prototypes: { counter: counterProto }
});

// Populate the prototype with methods that call exported functions.
counterProto.get = function() { return instance.exports['counter.get'](this); }
counterProto.inc = function() { instance.exports['counter.inc'](this); }

// Use the counter.
var counter = instance.exports.counter;
console.log(counter.get()); // 0
counter.inc();
console.log(counter.get()); // 1

Note: Alternatively, the JS embedding could implicitly treat externref initial RTT fields as prototypes, avoiding the need for a new annotation and custom section at the expense of giving users less control over the prototypes.

Note: RTT allocations could also be annotated to configure reflections of struct fields as JS own properties if there is demand for such a feature.

Declarative Population of Prototypes

One problem with the example given above is the large amount of initialization code that has to be run after instantiation to populate the prototypes. It is possible that there is a faster way to manually populate the prototypes than assigning one property at a time, but we still anticipate that this will be a measurable problem for startup performance. To allow the engine to optimize this part of startup, we propose adding a "magic import" mechanism similar to what we have for strings paired with a custom section that will allow the engine to synthesize and populate the imported prototypes. Users would still be able to perform additional manual configuration of the prototypes after instantiation.

At its most basic, the custom section needs to annotate externref imports with a sequence of method names and associated exports. Note that it is important that the custom section refer to export names rather than e.g. function indices so that it can be specified is the JS embedding document and implemented without breaking the core Wasm extraction that only exports are visible to the embedder.

...
;; Declaratively populate the prototype from the previous example.
(@create-prototype (method "inc" "counter.inc") (method "get" "counter.get"))
(import "prototypes" "counter" (global $counter.proto externref))
...
// counter.js

// Instantiate the module. No imports are provided because the
// prototype will be created and populated during instantiation.
var {module, instance} = await WebAssembly.instantiateStreaming(fetch('counter.wasm'), {});

// Use the counter.
var counter = instance.exports.counter;
console.log(counter.get()); // 0
counter.inc();
console.log(counter.get()); // 1

Note: Alternatively, we could also require passing an importedPrototypes="prototypes" compile option like we do with imported string constants. Because importing and populating prototypes already requires a custom section, we can also get away without the extra compilation option.

The design for the custom section could also be extended to support other features, although the precise details can be determined later:

  • Installing property getters and setters in addition to normal methods.
  • Installing static methods that do not receive this as a parameter.
  • Setting up prototype chains by declaring that some other imported prototype should be installed as the prototype for the current imported prototype.

Any features not supported by the declarative custom section can still be implemented in user space by exporting the prototypes and mutating them imperatively after instantiation. For example, if the prototypes need to contain more complex methods than can be mechanically generated from the custom section, those methods can simply be added after instantiation.

Deduplicating Field Lists

Note: This code size optimization can be separated from the rest of the custom RTT design and is not necessary for initial prototyping. If we think it's sufficient that compressed module sizes do not increase, we may not need this at all.

Just like normal struct fields, RTT fields need to be repeated when declaring subtypes. In cases where vtable types could currently be shared between multiple types in a source type hierarchy, moving the vtable fields into RTTs will cause those fields to be repeated more in the type section. To prevent code size regressions, we need a new mechanism to deduplicate lists of fields in type definitions. There is a large design space here, but this is one possible design:

A new kind of type section entry called fieldlist is introduced. Struct fields can become references to fieldlists, which are inlined into the struct definitions. In the text format, field names given in the field list are inlined into the type definition as well.

;; $struct1 and $struct2 are the same type and have the same field names
(type $struct1 (struct (field $a i32) (field $b i64) (field $c f32) (field $d f64)))
(fieldlist $list (field $b i64) (field $c f32))
(type $struct2 (field $a i32) (fieldlist $list) (field $d f64))

To prevent infinitely large types, fieldlists can use only previously defined fieldlists in their definitions. They can refer to any type defined in the current or previous recursion groups. Types can only use previously defined fieldlists.

(rec
  (fieldlist $list1 (field (ref $foo) (ref $bar)))
  (fieldlist $list2 (fieldlist $list1) (field (ref $baz)))
  (type $foo (struct (fieldlist $list1)))
  (type $bar (struct (fieldlist $list2)))
  (type $baz (struct (fieldlist $list2) (field i32)))
)
@sjrd
Copy link

sjrd commented Jan 15, 2025

This looks amazing! You can count on the Scala.js toolchain for early experiments and adoption. We've been needing the custom prototypes to complete the correctness of our semantics wrt. the JS backend. Avoiding an additional field for vtables is a nice bonus.

@rossberg
Copy link
Member

rossberg commented Jan 16, 2025

Great to see this! From a first read it seems to be along the lines of the ideas we discussed in the past. All makes sense, as far as I can tell. Paging back in my thoughts from back then, though, I would like to suggest a structural simplification that would make the design both more minimal and more expressive at the same time.

As currently described in the OP, RTTs duplicate a lot of the machinery for structs (syntax, instructions, type constructors, subtyping rules). And I highly suspect that the endgame would be that they end up duplicating almost all of it, e.g., mutability, casts, etc. And in terms of implementation, there really isn't much difference anyway.

The natural conclusion then is to simply make RTTs into structs. Instead of declaring a list of RTT fields on a struct, we allow to declare one type index that denotes the type of the descriptor:

(type $t (struct (descriptor $rtt) (field ...)))

Inversely, to make this sound, the RTT itself must also declare which type it is identifying:

(type $rtt (struct (describes $t) (field ...)))

Only types with a describes clause can be used as RTTs.

As you note, there necessarily is a mutual recursion here. Yet this design has several advantages:

  • RTTs can immediately reuse all the machinery for structs.
  • References to RTTs become just ordinary references, no parallel shadow type hierarchy.
  • It naturally extends to other composite types.
  • A type can have both a descriptor and a describes clause, allowing multiple levels of meta data. (Some runtimes make use of such a structure.)
  • In principle, it could allow other objects than structs as RTTs, e.g. arrays. I don't currently have an immediate use case for that, but it'd be reassuring if that could "just work".

There would be only a single new instruction necessary in this setup, which extracts the RTT from a struct:

struct.get_desc : (ref null $t) → (ref $rtt)
- where $tstruct (descriptor $rtt) t*

In particular, all allocation instructions remain as they are: struct.new and friends simply require an additional operand when the type has a descriptor clause. RTTs themselves are simply created with struct.new as well, and likewise reuse all other instructions.

Subtyping would extend to descriptor/describes clauses as follows:

  • A subtype of a type with a descriptor clause must also have a descriptor (that is a "describes-oblivious" subtype)
  • A subtype of a type with a describes clause must also have a describes clause (that is a subtype equivalent (d'oh!))
  • A subtype may add a describes clause.
  • Possibly: a subtype may add a descriptor clause. In that case, the supertype has a canonical RTT, and the custom RTT is considered to describe a subtype of that. I'm not entirely sure what the (implementation) implications of this would be, so that may require some care.

Here, a "describes-oblivious" subtype of a composite type is one that is a structural subtype up to their describes clause.

[Edit: The "describes-oblivious" notion admittedly is ugly. A cleaner alternative might be to keep a version of the rtt type from the OP, and in exchange allow simple covariant subtyping on both the descriptor and the description clauses. The rtt $t type would be an invariant subtype of ref $t that prevents unsound subtyping on its use as an RTT operand to the new instructions.

In fact, since the only purpose of this extra type constructor is preventing subtyping, a cleaner and more forward-looking solution would be to introduce that as a generic feature. That is, have an optional final attribute on ref types, that makes them invariant: (ref final 𝑡) would be a subtype of (ref 𝑡), but not have any further subtypes (other than bottom). The rtt operand to the new instructions then is required to be final:

struct.new : (ref null final $𝑟𝑡𝑡)? 𝑡* → (ref final $𝑠)
- where $𝑠 ≈ struct (descriptor $𝑟𝑡𝑡)? (field 𝑡*)

All allocation instructions would return final references. This essentially is the same as the exact types I have toyed around with before in some other context. Semantically, this is only marginally more complicated, if not simpler, than having a parallel hierarchy of rtt types.]

Wrt deduplicating field lists: why would we need a new form of declaration for field lists? Can't we simply have a form of include clause referencing another struct type?

(rec
  (type $foo (struct (field (ref $foo) (ref $bar))))
  (type $bar (struct (include $foo) (field (ref $baz))))
  (type $baz (struct (include $bar) (field i32)))
)

The same constraints as you mention apply, i.e., an include clause can only refer to lexically preceding struct types.

@tlively
Copy link
Member Author

tlively commented Jan 16, 2025

Thanks, @rossberg. Reusing structs largely makes sense to me, but I'll have to think more about the consequences for the subtyping rules and soundness. I think you're right that adding exact reference types could be a nice solution. (Is there a reason to call them final instead of exact? FWIW I prefer the latter.)

Adding an include clauses instead of a new fieldlist declaration also sounds like a nice simplification.

@titzer
Copy link

titzer commented Jan 16, 2025

Hi,

Initial thoughts, I really like this, and agree with @rossberg 's suggestions of reducing duplication with structs; IMO first-class RTT instances are just like regular struct instances, and field accesses are basically the same machine operation. That also improves reuse if we have parametric polymorphism over GC types in the future.

For this part,

Engines currently depend on type canonicalization producing unique RTTs for each type to implement casts efficiently. Each RTT stores a vector of the canonical RTTs for its supertypes and the relative subtype depths between types can be used to index into that vector to perform casts in constant time. With custom RTTs, the RTT values for each type are no longer unique, but canonical values can still be produced and canonical supertype vectors can still be stored on all RTTs to preserve constant time casts. Cast fast paths that perform identity checks on RTTs will not work for types with customizable RTTs, though.

For this we could add a new kinds of identity-RTT cast that takes an RTT instance and compares for equality. Since equality of RTT instance implies a static type match, it would obviate the need for a second, redundant ref.cast.

@rossberg I didn't quite understand the second point here:

Subtyping would extend to descriptor/describes clauses as follows:

A subtype of a type with a describes clause must also have a describes clause (that is a subtype equivalent (d'oh!))

I think we could allow this at the cost of a dynamic check at allocations. In particular, if we define struct A and struct A.desc and then define struct B <: A and struct B.desc <: A.desc, we just need to preserve the invariant that all allocated instances of A are allocated with an A.desc, but not any subtype of A.desc--so there would be a dynamic check on the RTT operand that it is exactly A.desc. The introduction of the exact types, IIRC is trying to make that a static check, but could have far-ranging implications for the type system. Even so, I think if we went with a dynamic check, we could add exact types later.

Edit:

Further, I think we above works if actually require a subtype of a type with a description to declare a new description. This ensures that the hierarchy of description types (RTTs) is parallel to the hierarchy of struct types.

@tlively
Copy link
Member Author

tlively commented Jan 16, 2025

@jakobkummerow, from the engine perspective, what do you think about the "RTTs are just structs" idea? They would still be statically differentiated from normal structs by the existence of the describes clause, so they could have a different layout in memory. My intuition is that we wouldn't be able to support an RTT struct (i.e. struct with describes clause) being a subtype of a non-RTT struct because of these layout differences. Also, would we be able to support structs with both describes and descriptor clauses, i.e. RTTs-of-RTTs in arbitrarily long chains?

@jakobkummerow
Copy link

what do you think about the "RTTs are just structs" idea?

I can see that in terms of spec simplicity it's compelling. In implementations, they very likely won't be the same. The key point is that each GC'ed object already has a "shape descriptor" as its first field, and that's going to remain the case no matter what. For extending that basic object layout, we'll have some design choices to make. Off the top of my head I can think of three options; I don't know yet which one we might end up picking, or if there might be even more possible designs:

  1. We could introduce another internal field at the beginning of each struct, to hold its custom RTT. That would require some special-casing when these are used for JS interop (to hold prototypes); perhaps there are blockers there but it'll probably work. It would also increase memory consumption per object, so you'd no longer "save the vtable slot" (compared to Wasm's status quo), that slot would just be moved to the engine-managed part of the object; so in other words this would be pretty much equivalent to what you can do today in userspace (aside from JS interop magic). So I don't think I'd prefer this design, but it would come closest to the idea of "RTTs are just structs".
  2. We could try to append the custom RTT struct fields to the shape descriptor. That has some challenges because it makes shape descriptors variable-sized. Could probably be done at the cost of some significant complexity, and would likely be the most efficient approach in terms of both memory and performance. So I think it's unlikely this would be our first choice, but perhaps it'd be the long-term solution.
  3. By default, I'd assume that the Wasm-specified RTT structs will be hanging off of the engine-internal shape descriptor; so they'll be two indirections away from the "described" struct. There'd have to be a 1:1 coupling between shape descriptors and their customizable sub-objects, so there are at least two variants of this option:
    • (a) The "RTT reference" we pass around (e.g. to struct.new) could be the shape descriptor (despite its module-specified fields being an indirection away). In that case, the sub-object could be just a regular struct, but the thing we pass around as RTT would be very different, and we would internally want to keep regular struct references and RTT struct references just as separated from each other as anyref and funcref, which probably makes this a total non-starter.
    • (b) The "RTT reference" could be the sub-object, in which case it would have to have an internal field pointing back at the shape descriptor whose sub-object it is, so again the in-memory layout wouldn't be the same as for a regular struct.

RTTs can immediately reuse all the machinery for structs.

In terms of spec (and binary encoding), maybe so; in terms of implementations, there would likely be very little to reuse, see above.

we wouldn't be able to support an RTT struct (i.e. struct with describes clause) being a subtype of a non-RTT struct because of these layout differences.

Agreed. AFAICS that would effectively force us into option (1) above, which would be a rather unfortunate restriction.

would we be able to support structs with both describes and descriptor clauses, i.e. RTTs-of-RTTs in arbitrarily long chains?

At this time, I don't see a reason why we couldn't. The only way to be sure is to try it, though.

(nit: I find "describes" a misleading name, I'd say it's more of an "enriches" or "decorates" or "attaches-to". That's bikeshedding though. Ultimately only the binary encoding matters.)

@lukewagner
Copy link
Member

Great to see this idea coming back! Using structs as other structs' rtts is an intriguing new wrinkle.

For the "JS Prototypes" part of the design, could an alternative to custom sections be to allow configuring the rtt via the JS API of the rtt value (perhaps via post-creation-initialization so that wasm could still be the one to create the rtt using the type index space)? This would avoid giving runtime semantics to custom sections which (ignoring Error.stack) we've thus far intentionally avoided while also providing more programmatic control over initialization.

@tlively
Copy link
Member Author

tlively commented Jan 16, 2025

@jakobkummerow, thanks, implementation strategy 3b sounds fine to start, and hopefully we could move to implementation strategy 2 in the long run. In the original idea where RTTs are entirely separate from structs, I would have expected implementation strategy 3a to start and still hopefully implementation strategy 2 eventually.

@lukewagner, yes, it should be feasible to configure prototypes via an API instead of a custom section. That being said, our concern around programmatic configuration in JS is that it would have a significant cost at startup time when there are very many prototypes to configure, just like we saw with initial experiments with imported string constants. The proposed custom section design would let the engine implement fast paths to avoid this cost, and we would also design it to be fully specified in the JS embedding spec without changing or violating core Wasm's abstractions in any way.

@rossberg
Copy link
Member

@tlively:

Is there a reason to call them final instead of exact?

Mainly because it's lifting the same concept to a different level, and because the term perhaps is more suggestive to many people — my impression was that folks found exact types a bit weird, presumably because they couldn't relate them to anything they knew.

(I've never been a big fan of the keyword final in the first place, but there is something to be said about using established terminology when it makes sufficient sense.)

@titzer:

I think we could allow this at the cost of a dynamic check at allocations. [...]
The introduction of the exact types [...] could have far-ranging implications for the type system.

In principle, you are right, we could add them later. Personally, I would dislike the possibility of a runtime check on such a basic operation if we can avoid it. Conceptually, final/exact ref types are rather straightforward and only a refinement to subtyping that shouldn't affect much of anything else.

@jakobkummerow:

In implementations, they very likely won't be the same.

In most of the implementation approaches you describe, accessing the fields of a custom RTT only seems to differ from a regular struct by offset, as if they have an additional field — which they do, and that would be immediately visible from their declaration as a struct. Is there a deeper difference than that? (Note that the actual getting-to-the-RTT-struct is encapsulated in the struct.get_desc instruction, and hence decoupled from field access.)

The only additional complexity appears to occur with approach (2), where the offset could differ per type. Would that offset still be statically known?

FWIW, for a clean-slate Wasm engine I could envision yet another strategy, where the shape hangs off the custom RTT struct when both are present. So the shape is an extra indirection away, not the custom fields. That could potentially make sense for a pure Wasm engine, if we assume that casts are less frequent than access to meta fields in such cases.

I find "describes" a misleading name, I'd say it's more of an "enriches" or "decorates" or "attaches-to".

Well, I'm not married to the keywords, but it is the exact dual role to the descriptor clause, so the keyword pair should of course reflect that connection.

@jakobkummerow
Copy link

accessing the fields of a custom RTT only seems to differ from a regular struct by offset

Yeah. Which means field offset computation isn't the same for RTT structs and regular structs. And subtyping will have to distinguish them. And, depending on spec design and implementation choices, extern.convert_any behavior might be different. And I'm not sure yet how many other code paths will need adaptations for them. It's all doable, but I'm quite sure it'll be a lot more work to implement than saying "it's all just structs so it'll Just Work™".

And FWIW, regarding "field accesses only differ by their offset", that still holds for approach (2), the offset delta will just be bigger. The main complexity of approach (2) I can see so far is the fact that shape descriptors will become trickier to handle due to being variable-size. That's an example of a more general principle in software design: having a single object serving two purposes (being a shape descriptor and being a Wasm struct) is generally more complex than having two objects focusing on one purpose each.

Overall, from an implementation perspective, I expect there to be no significant complexity difference between "let's just use structs" and "let's define customizable RTTs as a new kind of object". In cases where code reuse is possible (with appropriate parameterization), it'll be possible either way. And either way there'll be a need to distinguish RTT structs and regular structs.

If anything, I'm slightly worried that "let's just use structs" might lead to cases that are difficult to implement and useless in practice, and only exist because they make sense for non-RTT structs. I don't have a specific example in mind; but the earlier observation that "let's just use structs" rules out implementation approach (3a) is illustrative of the general concern: while that case is no big deal because there's a viable alternative (3b), it demonstrates that "let's just use structs" doesn't exactly make engines' lives simpler; it can just as easily create complications and constraints.

Anyway, all I'm saying is: use structs as RTTs in the spec if you like, just don't argue that this choice is motivated by engine simplicity.

for a clean-slate Wasm engine I could envision yet another strategy, where the shape hangs off the custom RTT struct when both are present

I don't know; if I implemented a new engine from scratch I'd probably still want all GC'ed objects to have their shape descriptor in the same place... but that's a very hypothetical scenario for me, so I won't spend more time thinking about it.

the keyword pair should of course reflect that connection.

Of course. I commented only on one half of the pair for the sake of brevity.

@lukewagner
Copy link
Member

@tlively That's a reasonable concern. Some time later, it'd be useful to see a realistic example of how things get big to see what we're trying to optimize and if perhaps it ends up looking different than JS string built-ins. Given that JS parsers have been hyper-optimized for bulk parsing (arrays of) object literals (that could serve as prototypes), hypothetically JS might have a perf advantage if rtts could be initialized in-bulk, but it'll depend on the details.

@jakobkummerow
Copy link

@lukewagner We are assuming that configurable JS-side prototypes for Wasm structs will mostly contain methods backed by exported Wasm functions, so the hypothetical setup code would be many repetitions of:

struct1_prototype.do_something = function(arg) { 
  return wasm_instance.exports.struct1_do_something(this, arg);
};

or perhaps

Object.defineProperty(struct1_prototype, "foo", {
  get: function() { return wasm_instance.exports.struct1_get_foo(this); },
  set: function(value) { wasm_instance.exports.struct1_set_foo(this, value); },
});

so I don't think plain JS object literal parsing speed will be very helpful for that. Rather, since the things being installed on the prototypes are coming from the Wasm instance, bulk initialization (or, perhaps even better, lazy initialization) is enabled by having some declarative description of the desired prototypes somewhere in the Wasm module. A new (non-custom) section type in core Wasm would do the trick; using a "custom" section on the core Wasm level is a form of a more layered design that would allow specifying JS-related things in the JS API spec. Maybe there's a third option?

The scale we need to target is a good question. I don't think any hard data on that exists just yet, given the early state of this proposal and partner teams' limited time so far to seriously evaluate it. They can count how many types and/or functions they have in their app, but that doesn't necessarily indicate how many of those will need rich JS exposure (just a small "API" set? Or almost all, for generality?). I do expect that larger deployments may well annotate thousands, perhaps tens of thousands of types, with easily a dozen methods each. So having to parse and compile or interpret, say, 100K repetitions of the patterns sketched above would add up to quite some startup overhead.

@eqrion
Copy link

eqrion commented Jan 20, 2025

Thanks for writing this up, this looks reasonable to me. Couple quick thoughts:

  1. Extending structs instead of introducing a new concept seems like a worthwhile direction to consider. I'll need to think if it's implementable in SpiderMonkey though, we're in a similar position to V8 where we have a lot of implementation constraints on our shape objects that make them very different from structs. We already use an extra header word on all our wasm-gc objects to workaround constraints in our shape objects. Although we'd like to fix that.
  2. We need to be careful if we expose an rtt.canon instruction that works with rtt's with fields to avoid introducing new user-accessible global storage or expose GC timing. The canonical set of types is process global in SpiderMonkey, and so if you can write to fields on these types you have a new communication vector with other threads. And you possibly could see GC timing if you set a field to a value, perform a GC that collects the rtt, then acquire the canonical rtt again and read the value.
  3. Just as with JS-string-builtins I think the starting point should be to rely on JS initialization code for prototypes until we have data that indicates it's a problem. That being said, I could imagine it could be an issue and something declarative could be useful. If that's the case, then I'd prefer we avoid giving a custom section semantic meaning. I've sort of wondered before if there's enough use-cases where we should just define a 'web' layer (similar to the component-model) on top of core wasm that lets us define web specific sections. Still not sure how I feel about that.
  4. For rtts that are on shared structs, I'm assuming that all fields must be shared too. That looks easy to enforce in either design. If we do have mutable fields on shared rtt's, we may then end up duplicating atomic loads/store instructions which is another point in favor of re-using structs.

@rossberg
Copy link
Member

rossberg commented Jan 21, 2025

@eqrion, to clarify, even as an extension of structs, custom RTTs would remain statically separated by type (per the presence of a describes clause). So even if their implementation was entirely different, that should not pose a new set of problems for an implementation. It is first and foremost a conceptual simplification (though I'd typically expect a lot of implementation overlap as well).

Good point about atomic access!

Can you elaborate on point 2? I don't quite understand what case you are referring to. AFAICS, canonical RTTs cannot have user-defined fields, or depend on RTTs with user-defined fields, by construction.

@lukewagner
Copy link
Member

@jakobkummerow Thanks! That helps paint a clearer picture. Given the formulaic nature of the initialization logic, it still seems like there could be some sort of bulk-prototype-initialization JS API tailored to this purpose. A declarative alternative that is sortof symmetric to JS String Builtins would be to have JS API instantiation do extra prototype initialization of exports with a certain name prefix. I suppose there's a wide design space here, so my main suggestion in the short term is just that we not take the custom section approach for granted. But also agreed with @eqrion that before any of that, it's probably best to start with JS glue code that we can measure.

@titzer
Copy link

titzer commented Jan 21, 2025

@eqrion @jakobkummerow Another implementation consideration is a space/time tradeoff for RTT instances in how to store the Cohen display (canonical supertype array). Previously, with only canonical RTTs, storing the supertype array inline in the RTT object could save an indirection, making a cast just two loads and a compare (load RTT, load supertype-at-depth D, compare). With customizable RTTs, it's possible that programs instantiate them many times, so to save space, it makes sense to move it out of line, making casts three loads and a compare. I'm not sure if Web engines have been storing that array inline, but it would necessitate some refactoring.

It basically boils down to: struct instances are essentially a record with a tag and their fields, and RTT instances are a record with their own tag, a description for structs that refer to them, and then their fields. The "description for structs that refer to them" is heavy if its an inline supertype array. They are inherently heavier-weight than regular structs.

@rossberg

In principle, you are right, we could add them later. Personally, I would dislike the possibility of a runtime check on such a basic operation if we can avoid it. Conceptually, final/exact ref types are rather straightforward and only a refinement to subtyping that shouldn't affect much of anything else.

Sure, but that probably means we need an instruction to cast between a non-final type and a final type, because likely programs will want to use subtyping with RTTs in most places but would be required to supply final types for allocations. So it's moving a runtime check and also introducing type system complexity. It's maybe too early to tell, but my preference would be to keep things simpler, especially since the cost of an allocation likely dwarfs a dynamic check for the exactness of an RTT's descriptor.

@sjrd
Copy link

sjrd commented Jan 21, 2025

With customizable RTTs, it's possible that programs instantiate them many times, so to save space, it makes sense to move it out of line, making casts three loads and a compare. I'm not sure if Web engines have been storing that array inline, but it would necessitate some refactoring.

It's possible of course, but is it likely? I would expect most (if not all) toolchains to generate exactly one instance of each custom RTT. If we take Java as an example, one does not need several instances of the RTT for java.util.ArrayList. There is always going to be exactly one instance. Do you know of any language where that wouldn't apply?

AFAICT, toolchains would prefer to keep the performant Cohen encoding, rather than hypothetically duplicating a few fields.

@rossberg
Copy link
Member

@titzer:

Sure, but that probably means we need an instruction to cast a non-final type and a final type, because likely programs will want to use subtyping with RTTs in most places but would be required to supply final types for allocations.

The non-final version of a final type is its direct supertype. Hence, if you need both variants, passing the final one is always sufficient, as it subsumes the other.

A final type, by its very nature, has no other final sub- nor super-types, so there are no interesting casts between two final types. It would already be paradoxical to have a cast involving a final source, because its whole point is to express that this already is most precise type possible for the referenced value.

The only cast involving final that makes any sense would be from a non-final down to a final target type. That would already be covered (conceptually, and if we chose to allow it) by the existing cast operators, as they cast between two reftypes. And the implementation would be straightforward: instead of going through the supertype vector, all it needs to perform is a direct pointer comparison against the value's type representation, so it's much simpler than a regular cast — it's the same check that the dynamic version would need to implement in every allocation instructions. That said, I cannot even think of a scenario where I'd need or want such a cast.

@titzer
Copy link

titzer commented Jan 21, 2025

The non-final version of a final type is its direct supertype. Hence, if you need both variants, passing the final one is always sufficient, as it subsumes the other.

Sure, subsumption works here, no need for an explicit cast.

A final type, by its very nature, has no other final sub- nor super-types, so there are no interesting casts between two final types. It would already be paradoxical to have a cast involving a final source, because its whole point is to express that this already is most precise type possible for the referenced value.

Sure, but in any case we do allow other impossible casts now, so maybe that's not important. Depending on how to we choose to encode ref final, we could make it only possible to encode casts from non-final to final types, see below.

The only cast involving final that makes any sense would be from a non-final down to a final target type. That would already be covered (conceptually, and if we chose to allow it) by the existing cast operators, as they cast between two reftypes. And the implementation would be straightforward: instead of going through the supertype vector, all it needs to perform is a direct pointer comparison against the value's type representation, so it's much simpler than a regular cast — it's the same check that the dynamic version would need to implement in every allocation instructions. That said, I cannot even think of a scenario where I'd need or want such a cast.

Agreed that it's the same check as would be needed for the allocation, but it's not just a pointer comparison because RTTs will now have instances, unless you mean loading a pointer from the RTT instance to compare. But even then It's different from other casts that compare a tag at known depth in the supertype vector; it'd either that same check-at-depth plus a length of supertype vector check, or another, dedicated descriptor field.

In the binary encoding of casts, which is already different than how reftypes are encoded as value types, we'd probably need to steal one bit from the flags, which would represent whether the target type was final.

That said, I cannot even think of a scenario where I'd need or want such a cast.

Well the point of letting RTTs have a subtyping relation is exactly so they can be passed around like other objects, so it's reasonable to expect that programs would use the non-final type liberally, because the final type of course doesn't allow subtypes. Languages that have a meta-object protocol will make use of that feature--e.g. likely Scala. So they'll need a cast to go from the happy-go-lucky world of passing around meta-objects to a final type to allocate.

(As an aside, I'm designing Virgil's meta-object protocol in the background for exactly the same reason, but one layer down--to implement Wasm in Wizard using the Virgil object model and combining the Wasm metadata with the Virgil metadata). And so far it makes sense to me that the meta-object subtyping hierarchy mirrors the object hierarchy, so they'd naturally be non-final in generated Wasm-GC code.

@sjrd
Copy link

sjrd commented Jan 22, 2025

Well the point of letting RTTs have a subtyping relation is exactly so they can be passed around like other objects, so it's reasonable to expect that programs would use the non-final type liberally, because the final type of course doesn't allow subtypes. Languages that have a meta-object protocol will make use of that feature--e.g. likely Scala. So they'll need a cast to go from the happy-go-lucky world of passing around meta-objects to a final type to allocate.

Scala does have a meta-object protocol, but the RTTs themselves would never be first-class. java.lang.Class is only a reification of the RTTs. They are not the RTTs themselves. When you instantiate an instance of a class, you must have static information about what class you're instantiating, so you also have a fixed RTT.

The only place where I imagine it could come up would be array types. Arrays of dimension $n$ and base reference type $B$ have a type hierarchy parallel to that of $B$. In practice we implement that with a single class (struct type) for all reference arrays, but a different run-time vtable instance. Even then, though, the vtable instances of all the reference array types in world share the same vtable type. So we would still have an exact/final RTT type in that situation.

(This might change if we got read-only field types, because then we would probably have different struct types for different reference array types, as I outlined in WebAssembly/gc#558 (comment))

@sjrd
Copy link

sjrd commented Jan 22, 2025

Unrelated, but I would caution against the introduction of final types anyway. The problem with final types is that once they're in, you can never invent new subtypes of existing types. A simple example would be intersection types. In Wasm v6.0, it could be the case that $T$ is the most precise type we can give to a particular value $v$. Hence, we can type $v$ as final T. But then, Wasm v7.0 wants to introduce intersection types of the form A & B. Ah but! That allows $v$ to be better type as T & U for some U. The guarantee of final T that $T$ was the most precise type possible for $v$ is broken.

@rossberg
Copy link
Member

rossberg commented Jan 22, 2025

@titzer:

unless you mean loading a pointer from the RTT instance to compare.

Yes, that's what I meant. (Yeah, when we discussed this idea some years back, we concluded that it is confusing to call these custom things RTTs, and something like "descriptor" would be more appropriate. They merely carry a proper RTT.)

I don't follow why anything more than a pointer comparison on the proper RTT would be needed, but given that it has to happen one place or the other, that's probably getting OT.

Well the point of letting RTTs have a subtyping relation is exactly so they can be passed around like other objects

Sure, subtype polymorphism is useful for consumers of objects and their descriptors alike. But any code site that produces and initialises an object naturally has to fix both its exact Wasm type and its descriptor's statically. So I'm on the same page as @sjrd and not sure where that kind of downcast would come in.

For languages with primitives for truly generically creating an object via a meta-object protocol, I'd expect that they will — for many reasons — require a generic object representation, where the Wasm-level type is invariant across source types. But then, all their descriptors have the same Wasm-level type as well, and no Wasm-level casting between them is needed either. But I may well be missing some less obvious scenarios. That said, I don't mind allowing such casts.

@sjrd:

The problem with final types is that once they're in, you can never invent new subtypes of existing types.

Ah, interesting point, but I don't believe it is true as stated. New forms of subtyping are not incompatible with final types per se. What matters concretely are the introduction forms for final types, and whether they could change to produce values of subtypes in the future. In our case, the only introduction forms are allocation expression, and they explicitly spell out the very type of the value they produce. By nature, those won't be able to produce more precise types in the future. Or if they did, that would be very different instructions. So while this can indeed be a problem in a more implicitly typed language with unannotated intro forms, I don't think it can become one in our setting. Or am I missing something? (Edit: This is assuming that we stick to the current principle of always requiring type annotations where otherwise lubs or glbs would be required. I have no reason to believe that will change, as it would certainly produce other problems.)

@tlively
Copy link
Member Author

tlively commented Jan 22, 2025

So IIUC, it’s safe to introduce final/exact types iff we never further refine the types produced by allocation instructions.

On the one hand, we’re discussing refining the allocation types now by introducing final/exact types, but on the other hand, it’s unclear why we would need to refine them again in the future given the type annotations @rossberg mentioned.

Personally, I would love to have final/exact types to help propagate more precise type information for optimization purposes in Binaryen, but that use case is entirely orthogonal to custom RTTs.

@jakobkummerow
Copy link

I recently learned that the stack-switching proposal is encountering a vaguely similar sub-problem (re-using existing type definitions for a new purpose), and currently appears to be leaning towards a different solution. I think it would probably be better if both proposals eventually settled for similarly-looking solutions; I don't care much which solution that is.

As very quick summary: stack-switching introduces "continuations", which are a bit like funcrefs (they have a signature, they can be "called"/resumed with the right parameters, and will eventually return results of a certain type), but need to be kept separate from funcrefs as they aren't interchangeable: they're produced and consumed by a set of instructions that doesn't overlap with the set of instructions that deal with funcrefs.

This reuse of function signature definitions for a strictly separate purpose reminds me of this discussion's idea to reuse struct type definitions for a new purpose, while keeping this new use case statically separate from non-RTT structs. I'm aware of three different approaches to this:

(1) "Set a flag". That's what's been suggested above: (type struct (describes $other) ...) is conceptually (type struct is_rtt=true ...), whereas all existing struct types are implicitly reinterpreted as (type struct is_rtt=false ...).
If the stack-switching proposal wanted to adopt this approach, it would introduce something conceptually akin to (type func is_continuation=true ...), while reinterpreting all existing function type definitions as (type func is_continuation=false ...).

(2) "Wrapper". That's what the stack-switching proposal currently sketches out: a new kind of type definition (type continuation $functypeindex), i.e. a new type that simply wraps another type definition. If this discussion here wanted to adopt that approach, that'd mean a new kind of (wrapper) type definition (type rtt $structtypeindex), meaning "I'm an RTT type, and my fields are described by $structtypeindex".

(3) "Separate kind of reference". That's what early versions of the GC proposal assumed, and what still somewhat echoes in the thread-starting post above: instead of having a single kind of (ref $index) where the thing referenced by $index can fall into one of various distinct buckets, the distinction could instead be made by the kind of reference, so there could e.g. be (ref $type0) and (rtt $type0) and (continuation $type0) all for the same $type0 (in the latter case, only valid if $type0 is a function type). (Note that this doesn't address the desire to statically fix the RTT shape for the type it describes, which would presumably still have to be encoded in the type section somehow; it only addresses the part that references to RTT structs and references to non-RTT structs aren't interchangeable.)

I believe these three approaches are by and large interchangeable in the sense that they're each successfully solving the same problem of reusing existing type definitions for a new, statically-distinguishable purpose. Of course they come with different pros and cons in their details; in particular they keep different parts of the system simpler at the cost of making other parts more complex. I don't think I have much of a preference at this time (in part because, within certain limits, implementations can pick their own favorite internal model and map the decoded wire bytes onto that); but I do think it would be weird if the custom-rtts proposal that's in the making here went for approach (1) because there's no precedent for wrapping types, whereas the stack-switching proposal went for approach (2) because there's no precedent for setting flags on types, and then once finalized they'll both be the odd one out that doesn't do things the way other proposals do them.

@titzer
Copy link

titzer commented Jan 22, 2025

For languages with primitives for truly generically creating an object via a meta-object protocol, I'd expect that they will — for many reasons — require a generic object representation, where the Wasm-level type is invariant across source types. But then, all their descriptors have the same Wasm-level type as well, and no Wasm-level casting between them is needed either. But I may well be missing some less obvious scenarios. That said, I don't mind allowing such casts.

Reflecting on @sjrd 's comments, I think it's probably true that the actual metaobject will need to be separate from descriptors anyway, because in the current design a descriptor can't be its own descriptor, whereas metaobjects will eventually end up in a cul-de-sac, like the class object of a java.lang.Class or the final Metaclass object in Smalltalk.

But I think you still end up with the final/subtyping problem, because those metaobjects will have a field which is a reference to the descriptor. If we want to have one descriptor field in the metaobject, and have it read-only and co-variant, they can only be so if the type is non-final.

@titzer
Copy link

titzer commented Jan 22, 2025

Personally, I would love to have final/exact types to help propagate more precise type information for optimization purposes in Binaryen, but that use case is entirely orthogonal to custom RTTs.

Thanks, that's a really good use case!

@eqrion
Copy link

eqrion commented Jan 22, 2025

@rossberg

@eqrion, to clarify, even as an extension of structs, custom RTTs would remain statically separated by type (per the presence of a describes clause). So even if their implementation was entirely different, that should not pose a new set of problems for an implementation. It is first and foremost a conceptual simplification (though I'd typically expect a lot of implementation overlap as well).

Ah, I suppose. I was mostly thinking about the discussed extension where a custom RTT could have a nested custom RTT. Although that's something we could syntactically limit.

Can you elaborate on point 2? I don't quite understand what case you are referring to. AFAICS, canonical RTTs cannot have user-defined fields, or depend on RTTs with user-defined fields, by construction.

This was responding to some of the proposed extensions from the original post.

    Note: Alternatively, RTT fields could also be allowed to be mutable if there is a use case for mutability.
    Note: Alternatively, rtt.canon could be a user-visible instruction rather than an administrative instruction if there is a use for it.
    Note: Alternatively, rtt.canon (and therefore struct.new and struct.new_default) could be allowed for non-empty RTTs as long as all their fields are defaultable.

So not something that's part of the core proposal here, just trying to point out a potential issue with a maximal design.

@titzer

@eqrion @jakobkummerow Another implementation consideration is a space/time tradeoff for RTT instances in how to store the Cohen display (canonical supertype array). Previously, with only canonical RTTs, storing the supertype array inline in the RTT object could save an indirection, making a cast just two loads and a compare (load RTT, load supertype-at-depth D, compare). With customizable RTTs, it's possible that programs instantiate them many times, so to save space, it makes sense to move it out of line, making casts three loads and a compare. I'm not sure if Web engines have been storing that array inline, but it would necessitate some refactoring.

Yeah it's a good question. We also have an optimization where we would do an early comparison of the rtt on the casted value exactly with the canonical rtt for the common case where it's an exact match. In that case we can skip loading the supertype vector. My understanding is that would also no longer be valid for rtt's that can be instantiated multiple times. Although I haven't measured how much that optimization is buying us in a while, so maybe it's not a big deal.

@tlively
Copy link
Member Author

tlively commented Jan 22, 2025

Wrt deduplicating field lists: why would we need a new form of declaration for field lists? Can't we simply have a form of include clause referencing another struct type?

I thought of a reason we may want something besides structs to provide the included list of fields. Consider these types (where (ref $b)^n is a shorthand for repeating the field (ref $b) n times):

(rec
  (struct $a (field (ref $b)^n))
  (struct $b (field (ref $b)^n (ref $a)))
)

To optimize the representation of the repeated field sequence (ref $b)^n using structs, we would have to do this:

(rec
  (struct $included (field (ref $b)^n))
  (struct $a (include $included))
  (struct $b (include $included) (field (ref $a)))
)

But the introduction of $included into the rec group changes the identity of types $a and $b, so this optimization is not safe to do in general. There is no way to avoid putting $included in the same rec group because it is mutually recursive with $a and $b.

In contrast, if we used a new fieldlist declaration, we could specify that inclusions are resolved and fieldlists removed before the rec group structure is considered for the purposes of determining type identity, so the optimization would become safe and this rec group would define the same types as the original rec group:

(rec
  (fieldlist $included (field (ref $b)^n))
  (struct $a (include $included))
  (struct $b (include $included) (field (ref $a)))
)

@rossberg
Copy link
Member

@jakobkummerow: The difference is that continuations have a completely disjoint set of instructions from functions. In contrast, RTTs/descriptors require pretty much the exact same set of instructions that ordinary structs do, so a unification is natural.

(There are other technical reasons for the wrapping design of continuation types, which I explain here; these likewise do not apply to descriptors.)

@tlively: That's an interesting point. In what sense is it not "safe" to do this transformation? I see that it would have to be a whole-program transformation, but I was under the impression that Binaryen is doing that anyway?

Do you know how often this particular case occurs? Introducing a whole new declaration sort and namespace to the language is not a small feat. Having them nestable into rec would make it even more complicated. I would be reluctant to do that just for space-optimising a comparably rare edge case.

FWIW, if we already had generic type definitions, then the factorisation could be expressed — the following ought to work:

(type $included (param $b) (struct (field (ref $b)^n)))
(rec
  (type $a (struct (include $included $b))
  (type $b (struct (include $included $b) (field (ref $a)))
)

@tlively
Copy link
Member Author

tlively commented Jan 23, 2025

No, I don’t know how common this situation is, so it will require some investigation. I suspect it’s not common, though, so planning to do the simpler thing for now makes sense to me. Even with closed-world optimization, Binaryen still has to be careful not to change the identities of “public” types that appear in the module interface, otherwise the optimized module may not be able to link with its environment. We’re also moving toward supporting open-world optimization better due to demand from Dart and Kotlin.

@Jamesernator
Copy link

Even for cases where parsing lots of JS prototypes is a concern, presumably languages could just import a minimal reflection library for JS and do the calls inside WASM to create methods.

e.g. Something like:

;; define $counterPrototype like in the OP
(...)

;; function defineMethod(wasmFunc) { return function(...args) { return wasmFunc(this, ...args) } }
(import "js-reflect" "toMethod" (func $toMethod (param funcref) (result externref)))
;; function defineProperty(target, name, value) { return Reflect.defineProperty(target, name, { value, writable: true, configurable: true }) }
(import "js-reflect" "definePropertyValue" (func $defineProperty (param $target externref) (param $name externref) (param $value externref))

;; Connect prototype object to rtt

;; Some wasm func
(func $counter.get (param (ref $counterType)) (result i32)) 

;; Install methods as needed on any prototypes needed
(func $start
    (call $definePropertyValue
        (global.get $counterPrototype)
        (... js stringref for "get" ...)
        (call $toMethod (ref.func $counter.get))
    )
)

I have seen elsewhere that in v8 currently calls are slower wasm->js than js->wasm, so I do wonder if there could (or should) be a builtin module for reflection containing the neccessary functions to interact with the JS object model from WASM (i.e. the Reflect.* methods, a way to create a blank object, and something similar to toMethod from above so that WASM can receive this).

@tlively
Copy link
Member Author

tlively commented Feb 1, 2025

Thinking more about exact/final reference types, I realized they would be most useful for optimizations if the finality/exactness was orthogonal to nullability:

   (ref null ht)
   /           \
(ref ht) (ref null final/exact ht)
   \           /
(ref exact/final ht)

In keeping with all of our other instructions accepting nullable operands, the rtt/descriptor operand to struct.new $t could have type (ref null final/exact $descriptor/rtt($t)).

Additionally, to keep each heap type hierarchy a lattice, we could special case none (and nofunc, etc.) such that (ref null exact/final none) <: (ref null exact/final ht) and (ref final/exact none) <: (ref final/exact ht) for all ht in the corresponding hierarchy.

@rossberg
Copy link
Member

rossberg commented Feb 2, 2025

Yes, absolutely, that's an important clarification. The final attribute applies to the ht dimension (classifying the pointee), not the null dimension (classifying the pointer). And I think I mentioned above that it means no subtype other than bottom.

@tlively
Copy link
Member Author

tlively commented Feb 3, 2025

I wasn't sure before whether you meant "bottom" as in none/nofunc/noextern/noexn/nocont or "bottom" as in validation-only-real-bottom. Glad we're on the same page 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants