Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Add HA support for Redis (Sentinel, Clustering...) #1017

Open
i2dcarrasco opened this issue Dec 18, 2024 · 0 comments
Open

[Request] Add HA support for Redis (Sentinel, Clustering...) #1017

i2dcarrasco opened this issue Dec 18, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@i2dcarrasco
Copy link

i2dcarrasco commented Dec 18, 2024

Hello,

This request is related with the post that I have open in the Wordpress forums:

https://wordpress.org/support/topic/request-add-redis-cluster-support/

We are using this plugin in our client sites because the Redis feature is mandatory when you need to use a distributed system to provide High Availability. To store the data in a disk is not the best because every front must create their own cache data so the time required to have the cache ready is much higher, and in this kind of environments the fronts rotates by the autoscaling.

Normally we use Google Cloud Platform which provides Redis HA with automatic management. If the primary fails the secondary turns into primary and the endpoint doesn't changes. The problem comes when the managed version of the Redis is not enough for a client like now, that the CPU usage of the plugin in Redis is high for sure by any of the other plugins installed in Wordpress. The problem is that we cannot remove any of them, so the solution is to get more power.

Here is when the problems comes, because the way to get more power is to pay a lot more to GCP for a bigger instance full of memory that you will not use, and the improvement will be just a bit which is not enough. A cheaper solution will be to use your own custom Redis instance with more cores than memory, but the plugin doesn't provide HA in any way.

Actually the plugin provides the possibility to add several Redis instances, but seems to work in a Round Robin mode without error management, so when a redis instance fails, the page randomly fails. For example, we have two redis instances configured in our production environment. One of them failed for a while because an OOM problem and while was failing, about the 50% of the requests were failing. This is not good for a production environment because is not a real HA. Also in our tests we have not noticed any improvement in the capacity, because the CPU usage was about the same with one and two instances, so the capacity limit will remains the same.

Redis provides two ways to improve the capacity and to have HA:

  • Redis with two or more instances in Primary/Secondary mode, with more cores in the instance. On our tests we have noticed an improvement of about 10-25% per core, so is not the best
  • Redis Cluster with several instances. The improvement here is much bigger.

in our tests, the standalone redis instance was able to manage about 45k of RPS:
Captura desde 2024-10-18 11-30-41

By adding a thread the capacity is improved to 56k:
Captura desde 2024-10-18 12-27-51

And by using three instances of Redis in cluster mode, the improvement is much bigger, reaching the 214k of RPS:
Captura desde 2024-10-18 10-05-44

so the best option for performance improvement is the clustering. This mode even allows to add secondary instances which can be used to read the data and improve the performance.

I suppose that the plugin uses PRedis to connect to the Redis instances. This module already allows the automatic usage of the Sentinel for the Primary/Secondary replication method, sending Read Only requests to the slaves. This will be a way to improve the capacity:

https://github.com/predis/predis?tab=readme-ov-file#replication

Also provides a way to use clustering in Redis with the same automatic management of the instances and their sharding, so would be great to have it too:

https://github.com/predis/predis?tab=readme-ov-file#cluster

Would be great to check why the plugin uses a lot of connections. We suspect that maybe the persistent connection is not working as expected or similar, or maybe just the plugin opens a connection for every feature (database, page, fragments...), for every child, so the connections is activated-features * Number of Childs * Number of instances.

Best regards and thanks!

@cssjoe cssjoe added the enhancement New feature or request label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants