You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When matching the GPU through the mig config to obtain the nvdev.MigProfileInfo, unconfigured Gpus(gpu 0) are not filtered out.
As shown in Figure 1,In my config,it will match gpu 0 gpu instance profile 15 but not gpu 1 gpu instance profile 14, becase they have the same name: 1g.6gb.
so when pass pkg/mig/config/config.go:153(giProfileInfo, ret := device.GetGpuInstanceProfileInfo(mp.GIProfileID)) to get GpuInstanceProfileInfo, the wrong mp.GIProfileID is used, led to this failure, "Error getting GPU instance profile info for '1g.6gb': ERROR_NOT_SUPPORTED"
My question:
Is this a bug? or my usage scenario is wrong?
If it is a bug, can you give me a general repair plan?
If I use the wrong way, can you tell me how to use it?
Thank you very much!
Looking forward to your reply.
i have two A30 gpu
if i config with below, it will success
i config with below, it will success too
but if i config with below, it will failed
logs:
Applying the MIG mode change from the selected config to the node (and double checking it took effect)
If the -r option was passed, the node will be automatically rebooted if this is not successful
time="2024-12-31T06:44:06Z" level=debug msg="Parsing config file..."
time="2024-12-31T06:44:06Z" level=debug msg="Selecting specific MIG config..."
time="2024-12-31T06:44:06Z" level=debug msg="Running apply-start hook"
time="2024-12-31T06:44:06Z" level=debug msg="Checking current MIG mode..."
time="2024-12-31T06:44:08Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:08Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:08Z" level=debug msg=" Asserting MIG mode: Enabled"
time="2024-12-31T06:44:08Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:08Z" level=debug msg=" Current MIG mode: Disabled"
time="2024-12-31T06:44:10Z" level=debug msg="Running pre-apply-mode hook"
time="2024-12-31T06:44:10Z" level=debug msg="Applying MIG mode change..."
time="2024-12-31T06:44:13Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:13Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:13Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:13Z" level=debug msg=" Current MIG mode: Disabled"
time="2024-12-31T06:44:13Z" level=debug msg=" Updating MIG mode: Enabled"
time="2024-12-31T06:44:17Z" level=debug msg=" Mode change pending: false"
time="2024-12-31T06:44:19Z" level=debug msg="Running apply-exit hook"
MIG configuration applied successfully
time="2024-12-31T06:44:19Z" level=debug msg="Parsing config file..."
time="2024-12-31T06:44:19Z" level=debug msg="Selecting specific MIG config..."
time="2024-12-31T06:44:19Z" level=debug msg="Asserting MIG mode configuration..."
time="2024-12-31T06:44:22Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:22Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:22Z" level=debug msg=" Asserting MIG mode: Enabled"
time="2024-12-31T06:44:22Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:22Z" level=debug msg=" Current MIG mode: Enabled"
Selected MIG mode settings from configuration currently applied
Applying the selected MIG config to the node
time="2024-12-31T06:44:23Z" level=debug msg="Parsing config file..."
time="2024-12-31T06:44:23Z" level=debug msg="Selecting specific MIG config..."
time="2024-12-31T06:44:23Z" level=debug msg="Running apply-start hook"
time="2024-12-31T06:44:23Z" level=debug msg="Checking current MIG mode..."
time="2024-12-31T06:44:26Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:26Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:26Z" level=debug msg=" Asserting MIG mode: Enabled"
time="2024-12-31T06:44:26Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:26Z" level=debug msg=" Current MIG mode: Enabled"
time="2024-12-31T06:44:28Z" level=debug msg="Checking current MIG device configuration..."
time="2024-12-31T06:44:30Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:30Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:30Z" level=debug msg=" Asserting MIG config: map[1g.6gb:4]"
time="2024-12-31T06:44:32Z" level=debug msg="Running pre-apply-config hook"
time="2024-12-31T06:44:32Z" level=debug msg="Applying MIG device configuration..."
time="2024-12-31T06:44:35Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:35Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:35Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:35Z" level=debug msg=" Updating MIG config: map[1g.6gb:4]"
time="2024-12-31T06:44:35Z" level=error msg="Error getting GPU instance profile info for '1g.6gb': ERROR_NOT_SUPPORTED"
time="2024-12-31T06:44:37Z" level=debug msg="Running apply-exit hook"
time="2024-12-31T06:44:37Z" level=fatal msg="Error applying MIG configuration with hooks: error setting MIGConfig: error attempting multiple config orderings: all orderings failed"
Restarting any GPU clients previously shutdown on the host by restarting their systemd services
Starting kubelet.service
The text was updated successfully, but these errors were encountered: