Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a worker pool that supports job streaming #1

Closed
4 of 6 tasks
plastikfan opened this issue Aug 6, 2023 · 0 comments
Closed
4 of 6 tasks

Create a worker pool that supports job streaming #1

plastikfan opened this issue Aug 6, 2023 · 0 comments
Assignees
Labels
feature New feature or request

Comments

@plastikfan
Copy link
Contributor

plastikfan commented Aug 6, 2023

So many of the examples, documentation and other go packages do not support the model where the issuance of jobs to a worker pool can be made in bursts and at anytime up to and including some known end point. A lot of the examples discovered assumes the client knowns the full job stream up front; ie we see examples where a client will create a slice containing all the jobs which are then subsequently dispatched to the pool and the channel immediately closed. These are really noddy examples that don't reflect the complexity of the real world. So we need to roll our own.

The worker pool must have the following features/properties:

  • must contain a means of reporting errors, via an error channel
  • must have an output channel off which the client can receive results
  • must be observable compatible; this is because this needs to be able to be wrapped inside an observable to support the reactive model
  • must be able to support reactive operators (only a small set of operators, ie only the ones required by extendio)
  • must be able to support reactive options
  • appropriate channels must be based upon generics rather that reflection/interface{}. But we also need a specific type let's call this envelope, which defines fixed properties for internal requirements and a client defined generic parameter, let's call this the payload.

Channels in play:
🔆 jobs (input)
🔆 results (output)
🔆 errors (output)
🔆 cancel (signal)
🔆 done (signals no more new work)

▶️ ProducerGR(observable):
- writes to job channel

▶️ PoolGR(workers):
- reads from job channel

▶️ ConsumerGR(observer):
- reads from results channel
- reads from errors channel

Both the Producer and the Consumer should be started up immediately as
separate GRs, distinct from the main GR.

* ProducerGR(observable) --> owns the job channel and should be free to close it
when no more work is available.

* PoolGR(workers) --> the pool owns the output channels

So, the next question is, how does the pool know when to close the output channels?
In theory, this should be when the jobs queue is empty and the current pool of
workers is empty. This realisation now makes us discover what the worker is. The
worker is effectively a handle to the go routine which is stored in a scoped collection.
This collection should probably be a map, who key is a uniquely generated ID
(see "github.com/google/uuid"). When the map is empty, we know there are no more
workers active to send to the outputs, therefore we can close them.

TODO:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant