Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support returning Iterator<Item=T> via an iterable iterator #4142

Open
RunDevelopment opened this issue Oct 4, 2024 · 3 comments
Open

Support returning Iterator<Item=T> via an iterable iterator #4142

RunDevelopment opened this issue Oct 4, 2024 · 3 comments

Comments

@RunDevelopment
Copy link
Contributor

RunDevelopment commented Oct 4, 2024

Motivation

Right now, returning Rust iterators is a pain. While collecting everything into a vec (so the JS side can get an Array<T>) can work for some scenarios, this approach is simply not possible when iterators contain infinite elements and undesirable when computing individual elements is costly. #1478 has more context. I made this a separate issue, because I want to talk about a specific solution.

Determined users might attempt to make their own iterator wrapper, but this comes with some pitfalls. While very similar, the iterator protocols for Rust and JS have subtle differences. One important one is what happens after iteration is done. Rust says that calling next may return pretty much anything (=unspecified but not UB; it may even panic), while JS requires that next returns { done: true }.

In my specific case, I had an iterator with between 1 and 248 elements, where each element can take a few milliseconds to compute, with no way to know how many elements there will be before having computed the first element. The idea was that the JS side can decide how many elements to take based on a time budget and user input while showing a little animation in the browser. I made my own wrapper and fell right into that pitfall.

Proposed Solution

Add a struct IterableIterator<T> type to js_sys and implement From<Iterator<Item=T>> for it. The idea is to allow users to write code like this:

#[wasm_bindgen]
fn squares() -> js_sys::IterableIterator<u64> {
    (0..).map(|i| i * i).into()
}

or even better:

#[wasm_bindgen]
fn squares() -> impl Iterator<Item=u64> {
    (0..).map(|i| i * i)
}

The generated JS code would then look something like this:

/**
 * @returns {IterableIterator<bigint>}
 */
export function squares() {
    // this would be a pointer to a an object, but I'll treat it as an object here
    // todo: wrap, finalizer, and so on
    let iterable_iterator_u64 = wasm.squares();

    return {
        [Symbol.iterator]() { 
            // this makes it an iterable iterator
            // think of it as making it `impl<T> IntoIterator<Item=T> for Iterator<Item=T>`
            return this;
        },
        next() {
            const next = iterable_iterator_u64?.next();
            if (next === undefined) {
                // free the Rust object as soon as iteration is done
                // this makes sure that we never call next on it again
                iterable_iterator_u64?.free();
                iterable_iterator_u64 = undefined;
                return { value: undefined, done: true };
            }
            return { value: next };
        }
    }
}

The basic idea is to return an object that implements the Rust iterator protocol (next(&mut self) -> Option<Item>) can convert it to the JS iterator protocol (next(): IteratorResult).

Alternatively, much of the JS next method could also be done in Rust depending on what would be easier/more efficient.

I would also like to suggest wrapping the returned JS object with Iterator.from if available. Since this API is brand new (currently stage 3), this would only be done when the JS environment supports it.

Alternatives

  1. Don't support it. This is the current situation and it forces users to either explore alternative solutions or implement iterator support themselves. Alternative solutions might not always be viable and implementing it yourself is not trivial and a potential source of bugs.
  2. Using a non-generic js_sys::IterableIterator. This has the huge downside of losing type information for the generated TS types. I would like iterators to be strongly typed on both sides.
  3. Returning a pure iterator (no Symbol.iterator) or pure iterable (no next) on the JS side. Both are viable since most built-in JS APIs support both, but the same is not true for libraries. Most libraries take iterables, most often explicitly typed as Iterable<T>, because standard container types like Array and Set are iterables but not iterators. So returning a pure iterator is not as useful in the wider JS ecosystem. A pure iterable would be viable, but an IterableIterator<T> communicates that it can only be iterated once. This is the current expectation, because IterableIterator<T> is most notably returned by generator functions.

Additional Context

A similar solution could also be used to support the upcoming Rust 2024 gen blocks/fns. This will create a nice symmetry between Rust generators and JS generators.


I would like to discuss this solution and then make a PR for it. However, I'll likely need some guidance on it

@daxpedda
Copy link
Collaborator

daxpedda commented Oct 7, 2024

I think as a more general solution, it would be best if we start by support defining Symbols on wasm-bindgen exported structs. This would not only help users, but also allow us to easily implement features like this. It is currently possible to do this via Reflect::set(&object, &Symbol::iterator(), fun), though this isn't ideal in every situation.

AFAIU IterableIterator is only a concept in TS, not in JS. So I don't believe your proposal has a place in js-sys. But wasm-bindgen should still facilitate this for external implementations by, again, a more general solution to allow TS interfaces to be implemented.

So I'm mostly in favor of your 3. alternative. I believe it would be reasonable to add a conversion from std::iter::Iterator to js_sys::Iterator and maybe add a new type js_sys::Iterable and add a conversion from std::iter::IntoIterator. However, this would bring up your concern in the 2. alternative. js_sys::Iterator should have been strongly typed in the first place and should become so in the next breaking change.


Let me know what you think!
Otherwise I'm happy to review a PR implementing the solution I outlined above (your 3rd alternative).

@RunDevelopment
Copy link
Contributor Author

I think as a more general solution, it would be best if we start by support defining Symbols on wasm-bindgen exported structs

That's a good point. Is there already an issue for it?

AFAIU IterableIterator is only a concept in TS, not in JS.

Ah true. JS and TS is mostly one thing in my head, so sometimes I mix things up.

So I'm mostly in favor of your 3. alternative. I believe it would be reasonable to add a conversion from std::iter::Iterator to js_sys::Iterator and maybe add a new type js_sys::Iterable and add a conversion from std::iter::IntoIterator. However, this would bring up your concern in the 2. alternative. js_sys::Iterator should have been strongly typed in the first place and should become so in the next breaking change.

I don't think this is a bad idea, but this might become a pitfall. Rust programmers might naturally assume that they should use js_sys::Iterator to return iterators even though pure iterators are less useful in JS. So this either needs some really good documentation to nudge them into the direction js_sys::Iterable or a different solution. (Yes, my proposed js_sys::IterableIterator suffers the same problem.)

I wonder whether we could make js_sys::Iterator an iterable iterator? This would remove the need for js_sys::Iterable or js_sys::IterableIterator. The soon-to-be standardized JS Iterator class is exactly that too. This would change the nature of js_sys::Iterator though, so I'm not sure.

On that note: binding the JS Iterator class will be fun. TS solved the name clash by calling the class BuiltinIterator and keeping the name Iterator for the protocol (explained in their release notes).

@daxpedda
Copy link
Collaborator

daxpedda commented Oct 8, 2024

I think as a more general solution, it would be best if we start by support defining Symbols on wasm-bindgen exported structs

That's a good point. Is there already an issue for it?

I'm not aware of one.

On that note: binding the JS Iterator class will be fun. TS solved the name clash by calling the class BuiltinIterator and keeping the name Iterator for the protocol (explained in their release notes).

I guess this would cover all three variants then.

But yes, this will be a problem to figure out without a breaking change.
I would propose following the TS naming until the next breaking change, unless somebody comes up with a better naming scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants