Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kdump initial config not translated to the USE_KDUMP kernel flag #21625

Open
arista-nwolfe opened this issue Feb 4, 2025 · 1 comment
Open
Assignees

Comments

@arista-nwolfe
Copy link
Contributor

arista-nwolfe commented Feb 4, 2025

the config kdump [enable|disable] command will set the KDUMP|config in CONFIG_DB.

> sonic-db-cli CONFIG_DB hgetall "KDUMP|config"
{'enabled': 'true', 'memory': '0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M', 'num_dumps': '3'}

Changes to this config are handled by KdumpCfg in hostcfgd:
https://github.com/sonic-net/sonic-host-services/blob/202405/scripts/hostcfgd#L1943

Changing the enabled flag in KDUMP|config will result in the sonic-kdump-config [--enable|--disable] script to be called:
https://github.com/sonic-net/sonic-host-services/blob/202405/scripts/hostcfgd#L1152-L1155

That script will in turn set the USE_KDUMP=[0|1] in /etc/default/kdump-tools depending on whether or not it's enabled:
https://github.com/sonic-net/sonic-utilities/blob/202405/scripts/sonic-kdump-config#L320

Changing this config requires a system reboot.

This behavior works as expected in a running system.
However, on boot-up the entry in KDUMP|config won't be propagated to the USE_KDUMP kernel flag.

Problematic scenario:

  1. config_db.json has 'enabled': 'false'
  2. /etc/default/kdump-tools has USE_KDUMP=1
  3. Reboot

Steps to Reproduce:

  1. config_db.json has 'enabled': 'false'
root@nfc406-7:~# config kdump disable
KDUMP configuration changes may require a reboot to take effect.
Save SONiC configuration using 'config save' before issuing the reboot command.

root@nfc406-7:~# sonic-db-cli CONFIG_DB hgetall "KDUMP|config"
{'enabled': 'false', 'memory': '0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M', 'num_dumps': '3'}

root@nfc406-7:~# config save -y
Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json

root@nfc406-7:~# grep "USE_KDUMP" /etc/default/kdump-tools
# USE_KDUMP - controls kdump will be configured
USE_KDUMP=0

root@nfc406-7:~# show kdump config
Kdump administrative mode: Disabled
Kdump operational mode: Not Ready
Kdump memory reservation: 0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M
Maximum number of Kdump files: 3
  1. Set USE_KDUMP=1 (set but don't save)
root@nfc406-7:~# config kdump enable
KDUMP configuration changes may require a reboot to take effect.
Save SONiC configuration using 'config save' before issuing the reboot command.

root@nfc406-7:~# grep "USE_KDUMP" /etc/default/kdump-tools
# USE_KDUMP - controls kdump will be configured
USE_KDUMP=1

root@nfc406-7:~# show kdump config
Kdump administrative mode: Enabled
Kdump operational mode: Ready after reboot
Kdump memory reservation: 0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M
Maximum number of Kdump files: 3
  1. Reboot
root@nfc406-7:~# reboot

Result: KDUMP|config says it's disabled but the USE_KDUMP=1 kernel flag says it's enabled

root@nfc406-7:~# sonic-db-cli CONFIG_DB hgetall "KDUMP|config"
{'enabled': 'false', 'memory': '0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M', 'num_dumps': '3'}

root@nfc406-7:~# grep "USE_KDUMP" /etc/default/kdump-tools
# USE_KDUMP - controls kdump will be configured
USE_KDUMP=1

root@nfc406-7:~# show kdump config
Kdump administrative mode: Disabled
Kdump operational mode: Ready
Kdump memory reservation: 0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M
Maximum number of Kdump files: 3

Fallout: The startup config (config_db.json) requested the KDUMP feature be disabled but it's still enabled and rebooting the switch won't help, you'll need to re-run the config kdump disable on a running switch to re-sync the config_db and kernel flags

Note: The same thing can occur with the opposite config (CONFIG_DB: enabled=True, USE_KDUMP=0)

Analysis:
Looking at the load functions of the classes in hostcfgd it looks like it's typically used to call the handler function for the initial CONFIG_DB processing.
I'm not sure why in the case of KdumpCfg we only setup the KDUMP|config in CONFIG_DB if it's not already present:

    def load(self, kdump_table):
        """
        Set the KDUMP table in CFG DB to kdump_defaults if not set by the user
        """
        syslog.syslog(syslog.LOG_INFO, "KdumpCfg init ...")
        kdump_conf = kdump_table.get("config", {})
        for row in self.kdump_defaults:
            value = self.kdump_defaults.get(row)
            if not kdump_conf.get(row):
                self.config_db.mod_entry("KDUMP", "config", {row : value})

https://github.com/sonic-net/sonic-host-services/blob/202405/scripts/hostcfgd#L1130-L1139

@arista-nwolfe
Copy link
Contributor Author

Actually it looks like in the original PR which added KdumpCfg the load function did call the kdump_update function:
sonic-net/sonic-host-services@1c458da#diff-f46602bac3380a0d28ddfcbe698d95d87c7bc57e504fac08dd6906bdb31ec388R243-R254

But a large commit came later and removed it:
sonic-net/sonic-host-services@58a6e49#diff-f46602bac3380a0d28ddfcbe698d95d87c7bc57e504fac08dd6906bdb31ec388L735-R785

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants