-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic concurrency support for transaction prover service #908
Comments
Looks great! A couple of additional comments:
And the last point is that ideally we'd not implement it ourselves but would be able to use an already existing component/framework. |
Hello! I've been researching a bit on this issue, in particular using Cloudflare's Pingora crates. I think that it fits as a solution for all our problems here.
This can be tackled with Pingora's LoadBalancer and setting the config to 1 thread per service. I can elaborate a PoC of this with a simple "hello world" server shortly. Related to this is also the creation and destruction of workers. At least in my first approach I was thinking of manually running instances of the server and adding the endpoints to the upstream configuration of the load balancer; this will also work for destructing workers (remove the server from the list of upstreams, reload, turn of the prover server). This can be benefited from the Graceful Upgrade that Pingora supports.
For this we can use the rate limiting that the crate has out of the box, just setting the max amount of request per user per second as described here.
We can also use the Pingora's timeout out of the box for this, which is easily configurable.
It also has a builtin Prometheus server that can be used for that purpose. If you think that it is ok, I can proceed the following way:
All of these can be done in different issues and if you agree I can start immediately. |
This sounds great! Let's start with the PoC to see if we hit any roadblocks. In terms of the setup, I was thinking to load balancer could run on a relatively small machine (e.g., 4 cores or something like that), and the provers would run on big machines (e.g., 64 cores). Is that how you are thinking about it too? |
Sorry for the late reply. Considering that the prover takes advantage of concurrency, that sounds great and should work more than ok. If I can get access to one of those machines I might ran a couple tests. And for the load balancer, 4 cores should be ok too, it is not supposed to be a heavy workload. |
PoC updateI implemented a minimal version of the load balancer with pingora and a "Hello world!" server, everything work smoothly out-of-the-box. I've been thinking that we could make a primitive first version of the queue deploying workers and adding those to the load balancer, and then letting the round robin work as a kind of queue. In an ideal case, where every prove takes the same time to run, this will behave like a queue of size N where N is the number of workers. Then, I thought on adding queues built in place on each worker, augmenting the proving capacity to N*M where N is the number of workers and M the size of each queue. I'm not sure yet on how to implement this queues. This approach lets us split the issue in two tasks that can be perform in paralel:
I consider that we may start with the proxy server, which allows us to deploy various workers in case of we need it, and then (or simultaneously) start to work on the queue. Also, I'm still thinking on an automated way to deploy/kill workers and gracefully restarting the proxy. Any idea on this or the queues implementating would be great. I am starting with the pingora server but let me know if we prefer to start with the queue and I will switch to that. |
Question: could the load balancer not maintain the queue itself? Basically, can we avoid having queues in the proving service and rely on the load balancer to manage the jobs? Or it doesn't work this way? Basically, the ideal approach in my mind would be:
If this works, we won't need to modify the current prover service at all - but let me know if that's not a viable approach. |
Following your last question I've been doing a bit more of research on Pingora's load balancing strategies. It looks like we can achieve that functionality by defining our own strategy. To add the queue doesn't seem that hard, I'm not sure on how to tell when to remove the element from the queue at the moment, im looking after this now. In general this sounds like a good and doable approach, and shouldn't raise the requirements for the server running the proxy. |
Nice! So, right now as the tasks arrive they will be assigned to queues of individual workers in the round-robin fashion, right? And so, if there is one worker, it'll basically just have a single queue of tasks which it'll work on one-by-one, right? If so, this may be good enough - though, of course, we get a single queue working, it'll be even better. |
Right now there is no queue, just round robin for the workers. I'm currently adding queues. The design that I'm working on is one queue per backend, and the requests evenly distributed using round robin. |
What does "backend" mean here? Is it the same as "worker"? |
Yes, the same as worker |
Makes sense. Yes, this should be fine as initial implementation. And we can improve it later on to make distribution more balanced and fault-tolerant. |
The current version of the transaction proving service processes exclusively one transaction at a time. To improve performance and scalability, we want to introduce concurrency while ensuring the system remains resilient against potential DDOS attacks and resource exhaustion.
Basic desired features:
The text was updated successfully, but these errors were encountered: