Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPsrcaddr : retource stop problem #2019

Open
Rico29 opened this issue Jan 27, 2025 · 1 comment
Open

IPsrcaddr : retource stop problem #2019

Rico29 opened this issue Jan 27, 2025 · 1 comment

Comments

@Rico29
Copy link

Rico29 commented Jan 27, 2025

Hello,
I have an issue with IPsrcaddr resource on latest debian 12 with up-to-date packages

I've pulled findif.sh and IPsrcaddr from ClusterLabs/resource-agents github repo

Problem occurs when moving a resource (or resource group) to another node.

Reproduction :

node 1 :

root@freepbx-lab-ha1:~# crm status
[...]
Node List:
  * Online: [ freepbx-lab-ha1 freepbx-lab-ha2 ]

Full List of Resources:
  * email_alert (ocf:heartbeat:MailTo):  Started freepbx-lab-ha2
  * Resource Group: grp_services:
    * shared_ip (ocf:heartbeat:IPaddr2):         Started freepbx-lab-ha1
    * src_ip    (ocf:heartbeat:IPsrcaddr):       Started freepbx-lab-ha1
    * srv_freepbx       (systemd:freepbx):       Started freepbx-lab-ha1

root@freepbx-lab-ha1:~# ip a
[...]
3: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether bc:24:11:6b:df:9e brd ff:ff:ff:ff:ff:ff
    inet 192.168.222.211/24 brd 192.168.222.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet 192.168.222.210/32 brd 192.168.222.255 scope global bond0
       valid_lft forever preferred_lft forever

root@freepbx-lab-ha1:~# ip r
default via 192.168.222.1 dev bond0 proto keepalived src 192.168.222.210 onlink 
192.168.222.0/24 dev bond0 proto keepalived scope link src 192.168.222.210 
192.168.222.212 dev bond0 scope link src 192.168.222.211 

node 2 :

root@freepbx-lab-ha2:~# crm status
[...]
Node List:
  * Online: [ freepbx-lab-ha1 freepbx-lab-ha2 ]

Full List of Resources:
  * email_alert (ocf:heartbeat:MailTo):  Started freepbx-lab-ha2
  * Resource Group: grp_services:
    * shared_ip (ocf:heartbeat:IPaddr2):         Started freepbx-lab-ha1
    * src_ip    (ocf:heartbeat:IPsrcaddr):       Started freepbx-lab-ha1
    * srv_freepbx       (systemd:freepbx):       Started freepbx-lab-ha1

root@freepbx-lab-ha2:~# ip a
[...]
3: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether bc:24:11:bf:7d:11 brd ff:ff:ff:ff:ff:ff
    inet 192.168.222.212/24 brd 192.168.222.255 scope global bond0
       valid_lft forever preferred_lft forever

root@freepbx-lab-ha2:~# ip r
default via 192.168.222.1 dev bond0 proto keepalived src 192.168.222.212 onlink 
192.168.222.0/24 dev bond0 proto kernel scope link src 192.168.222.212 
192.168.222.211 dev bond0 scope link src 192.168.222.212 

At startup, everything is working correctly. node1 owns the IPaddr2 address, and default route uses this address as src address

When moving resource group via command crm resource move grp_services freepbx-lab-ha2 , I get this status and error

Node List:
  * Online: [ freepbx-lab-ha1 freepbx-lab-ha2 ]

Full List of Resources:
  * email_alert (ocf:heartbeat:MailTo):  Started freepbx-lab-ha2
  * Resource Group: grp_services:
    * shared_ip (ocf:heartbeat:IPaddr2):         Started freepbx-lab-ha1
    * src_ip    (ocf:heartbeat:IPsrcaddr):       FAILED freepbx-lab-ha1 (blocked)
    * srv_freepbx       (systemd:freepbx):       Stopped

Failed Resource Actions:
  * src_ip stop on freepbx-lab-ha1 returned 'error' (command 'ip route replace  192.168.222.0/24 dev bond0 proto kernel scope link src 192.168.222.211) at Mon Jan 27 12:59
:40 2025 after 48ms

running the "ip route..." command manually returns no error on given node :

# ip route replace  192.168.222.0/24 dev bond0 proto kernel scope link src 192.168.222.211 && echo $?
0

How can I fix this ?
Regards

@oalbrigt
Copy link
Contributor

If you run pcs resource update src_ip trace_ra=1 or the crm equivalent you will get trace-files for every run of each action in /var/lib/heartbeat.

Then you try to move it again and should be able to identify exactly command fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants