-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unstable clusterer module (repeated messages Node [X] is UP) #3542
Comments
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
same "issue" after upgrading from 3.4.8 to 3.4.10, no routing issues just a bothersome log message on the secondary box during low traffic conditions |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
Up |
Hello OpenSIPS Team, I’ve encountered the same issue after updating from version Are there any updates on this issue? Thank you! |
I have same issue with other details - my configuration of clusterer module has no connection to DB and has some fine tuning:
OpenSIPS that I'm using builded from git sources ("updated" revision of 3.4.10) on Debian 11:
I found that if there is no configured timer_workers (default is 1) I have many ping fails during 30 minutes test:
If I add some additional timer workers (
After this tests I checked the tcpdump, and in both cases I see same conditions for ping problems.
Looks like there are two problems:
|
Thank you for the reports -- perhaps commit 7d74f3d introduced a regression during periods of low traffic, where node state can now flip-flop. Currently treating this issue with raised priority, so the fix makes it into the next stable minor release with some added testing from your side as well. |
@liviuchircu is there any updates on this issue? Is there a chance that the problem will be solved by the release of 3.4.11? |
Hey @Shkiperon! I tried to reproduce the issue here using your settings, but failed to do so (both in high traffic conditions: 500 CPS, or while idling). Any tips on reproducing this are appreciated -- perhaps I'm missing a module, setting or network setup which induces all these issues. |
Hello @liviuchircu !
I observe the issue approximately 6–8 hours after the load has been applied. |
Hi @liviuchircu ! I have extra tuning in my config file and some modules configured in cluster mode:
Here is part of my config with modules that works with clusterer or with TCP connections |
@liviuchircu maybe something is differents in my build options (#3542 (comment))? |
I found combination of modparams that suppress the "node down <-> node up" flapping - in my configuration (#3542 (comment)) I changed ping timeout:
After that flapping between two virtual machines stopped. There is a low ping between them. Here are the statistics for 100 icmp packets:
On OpenSIPS 3.3 I wasn't have such problem |
OpenSIPS version you are running
Describe the bug
I observe an unstable cluster module.
Have two nodes (usrloc replication):
Sometime it works, then i see this repeated messages on "node-2-reserve":
I understand that this already fixed in 4c38ceb but i have last build "b16e49c98" and still observe the problem.
To Reproduce
It is floating problem, at this moment do not know how to reproduce it.
But command
clusterer_reload
is fixing this problem.Expected behavior
Stable clusterer, no logs like:
Relevant System Logs
OS/environment information
Additional context
Could you please check it?
The text was updated successfully, but these errors were encountered: