Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple HW addresses per netdev & NIC bonding #184

Merged
merged 2 commits into from
May 14, 2019

Conversation

beinvisible
Copy link
Contributor

  • Multiple hw addresses can be set by providing multiple --hwaddr arguments
    or giving a comma separated list of hw addresses
  • In the same way bonding devices can be set with --bonddevs
  • Bonding mode can be set with --bondmode

Example:
wwsh node set node1 --netdev bond0
--hwaddr aa:bb:cc:dd:ee:01,aa:bb:cc:dd:ee:02
--bonddevs eth0,eth1
--bondmode 802.3ad

Changed interfaces:

  • Warewulf::Node::update_netdev_member()

    Changed order of 'validator' and 'new_values' parameters
    new_values can now be a list

    The function is not used anywhere else.
    This is now consistent with the order that is also used in Object::prop().

* Multiple hw addresses can be set by providing multiple --hwaddr arguments
  or giving a comma separated list of hw addresses
* In the same way bonding devices can be set with --bonddevs
* Bonding mode can be set with --bondmode

Example:
  wwsh node set node1 --netdev bond0 \
    --hwaddr aa:bb:cc:dd:ee:01,aa:bb:cc:dd:ee:02 \
    --bonddevs eth0,eth1 \
    --bondmode 802.3ad

Changed interfaces:

* Warewulf::Node::update_netdev_member()

  Changed order of 'validator' and 'new_values' parameters
  new_values can now be a list

  The function is not used anywhere else.
  This is now consistent with the order that is also used in Object::prop().
@@ -70,6 +70,158 @@ for module in `/sbin/detect`; do
done
wwsuccess

bondup() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In provision/initramfs/init there's also a section that starts:

if [ -n "$WWPOSTNETDOWN" ]; then

Will need to be sure the bond device is handled here, and also the slave devices are taken down when --postnetdown is used.

This is how I've generally provisioned a bond device. Provision the node off eth0 (or whatever), take the network device down when complete, have the image init setup the bond; Use Warewulf files to provision out proper ifcfg-bond0, ifcfg-eth0, etc... files.

Copy link
Contributor Author

@beinvisible beinvisible Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this section the device is read from

NETDEV=cat /tmp/wwdev

which is also used by my code, so --postnetdown should already take it down again. There might be some more nuanced issues with the enslaved devices though, which might also need to be un-configured (by running "ifenslave -d" over all --bonddevs). I'll test that case.

But even in your scenario this patch might already be useful, as assigning multiple hw adresses will help to identify the node when the order of network interfaces changes or the primary interface is actually broken/unglugged. In this case you would not configure --bonddevs and init will behave as before (i.e. not call bondup() at all), just with the added benefit that it doesn't matter if eth0 or eth1 is the primary interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added code to correctly delete bonding device in case of failure or --postnetdown
busybox's "ip link del" also detaches the slaves automatically

@jmstover
Copy link
Contributor

@bensallen
I don't see anything in this that would prevent a merge. What are your thoughts on it here?

@bensallen
Copy link
Member

Looks good to me. We'll want to see some sites test this a bit before it makes it into a release.

@bensallen
Copy link
Member

@beinvisible Thanks for the great contribution. Bonding support has been missing for a long time.

@bensallen bensallen self-requested a review April 23, 2019 18:53
@bensallen bensallen merged commit af22a39 into warewulf:development May 14, 2019
@bensallen bensallen added this to the 3.9 milestone May 14, 2019
@aflyhorse
Copy link

I failed to setup bonding in my cluster with ohpc-shipped warewulf. I got two separate ports with the same IP address.

[root@sirius-c02 ~]# ip a
2: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether e4:3d:1a:e6:e1:e0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.10/24 brd 192.168.10.255 scope global noprefixroute bond0
3: eno2np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether e4:3d:1a:e6:e1:e1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.10/24 brd 192.168.10.255 scope global dynamic noprefixroute eno2np1

Configs are:

[root@sirius ~]# wwsh node print sirius-c02
#### sirius-c02 ###############################################################
     sirius-c02: NAME             = sirius-c02
     sirius-c02: ARCH             = x86_64
     sirius-c02: ENABLED          = TRUE
     sirius-c02: bond0.HWADDR     = e4:3d:1a:e6:e1:e0,e4:3d:1a:e6:e1:e1
     sirius-c02: bond0.IPADDR     = 192.168.10.10
     sirius-c02: bond0.GATEWAY    = 192.168.10.24
     sirius-c02: bond0.BONDDEVS   = eno2np0,eno2np1
     sirius-c02: bond0.BONDMODE   = 802.3ad

[root@sirius ~]# wwsh provision print sirius-c02
#### sirius-c02 ###############################################################
     sirius-c02: MASTER           = UNDEF
     sirius-c02: BOOTSTRAP        = 4.18.0-348.12.2.el8_5.x86_64
     sirius-c02: VNFS             = rocky8
     sirius-c02: VALIDATE         = FALSE
     sirius-c02: FILES            = dynamic_hosts,group,network,passwd,shadow
     sirius-c02: PRESHELL         = FALSE
     sirius-c02: POSTSHELL        = FALSE
     sirius-c02: POSTNETDOWN      = TRUE
     sirius-c02: POSTREBOOT       = FALSE
     sirius-c02: CONSOLE          = ttyS1,115200
     sirius-c02: PXELOADER        = UNDEF
     sirius-c02: IPXEURL          = UNDEF
     sirius-c02: SELINUX          = DISABLED
     sirius-c02: KARGS            = "net.ifnames=1,biosdevname=1"
     sirius-c02: FS               = "select /dev/sda,mklabel gpt,mkpart ESP fat32 1MiB 513MiB,mkpart primary linux-swap 513MiB 20%,mkpart primary ext4 20% 100%,name 1 ESP,name 2 swap,name 3 root,set 1 boot on,mkfs 1 vfat -n ESP,mkfs 2 swap,mkfs 3 ext4 -L root,fstab 3 / ext4 defaults 0 0,fstab 1 /boot/efi vfat defaults 0 0,fstab 2 swap swap defaults 0 0"
     sirius-c02: BOOTLOADER       = sda
     sirius-c02: BOOTLOCAL        = FALSE

There is only one entry in network-script:

[root@sirius-c02 ~]# cat /etc/sysconfig/network-scripts/*
# This was created by the Warewulf bootstrap
DEVICE=bond0
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.10.10
NETMASK=255.255.255.0
HWADDR=e4:3d:1a:e6:e1:e0

and this 'bond0' is just a name:

[root@sirius-c02 ~]# cat /sys/class/net/bond0/bonding/slaves
cat: /sys/class/net/bond0/bonding/slaves: No such file or directory

After investigating, ohpc shipped with a snapshot of commit 98fcdc3 , which I believe included this patch.

@bensallen
Copy link
Member

bensallen commented Jan 21, 2022

Hi @aflyhorse, could you please move this to a new issue? Also its difficult for the upstream project to debug OHPC releases. As at least personally, I don't track what version they ship, nor what extra patches they apply.

@aflyhorse
Copy link

aflyhorse commented Jan 22, 2022

Would you mind telling me how to log msg_gray and msg_white messages? I'd like to try some debugging myself.

@jmstover
Copy link
Contributor

The msg_gray and msg_white functions are just output shell functions. Just changes how text is displayed. You use them like:

. /etc/functions
msg_gray "Some text to be Gray.\n With newlines!\n"

You can include these functions by running:

. /etc/functions

This file is warewulf3/provision/initramfs/functions

@bensallen
Copy link
Member

bensallen commented Jan 22, 2022

If you set wwsh provision set --postshell=1 your node will drop into a shell on the console before calling switch_root. See /var/log/warewulf/ after that. Also check /tmp/ifcfg* as that’s where the network config files are initially created. There’s also --preshell=1, which is a shell before any network config happens. If that doesn’t provide enough detail add wwdebug=2 to kargs (eg kernel cmdline). This will set -o xtrace in /init and display verbose output to the console. I suggest a serial based console like IPMI SOL when using this, as output will pretty quickly scroll off screen otherwise. You can find the code for network bond setup under https://github.com/warewulf/warewulf3/blob/development/provision/initramfs/init.

@aflyhorse
Copy link

I found a possible cause.

I configured "predictable netif" but bondup() starts before postnetdown, thus the interfaces were not renamed, and bonding got failed. I restored "ethX" behavior and the bonding works as intended.

Output with predictable netif:

Now Booting Warewulf...

Bringing up local loopback network:                                          OK
Configuring bonding interfaces: bond0 802.3ad (eno2np0 eno2np1)
Loading bonding module
ip: can't find device 'eno2np0'
ip: can't find device 'eno2np1'
Bringing up interface bond0ifenslave: skipping eno2np0: can't get settings: No such device
ifenslave: skipping eno2np1: can't get settings: No such device
                                                                          ERROR
Checking for network device: bond0 (bond0)                              SKIPPED
cat: can't open '/sys/class/net/bonding_masters/address': Not a directory
sh: e4:3d:1a:e6:e1:e0: unknown operand
Checking for network device: eth0 (bond0)                                    OK
Configuring eth0 (bond0) statically: (192.168.10.10/255.255.255.0)           OK
Configuring gateway: (192.168.10.24)                                         OK
Creating network initialization files: (bond0)                               OK

Welcome to Linux!

@bensallen
Copy link
Member

bensallen commented Jan 24, 2022

@aflyhorse ah yes this makes sense. Warewulf does support a fallback to matching interface names based on MAC address normally, but not in the case of bonds currently. I've created #304 to track this.

@aflyhorse
Copy link

@bensallen I'm glad to be of help. Also, thank you for your detail explanation and your selfless contribution to warewulf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants