Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Fix the issue with ASIC detection on the SN4280 platform (#20397) #20621

Open
wants to merge 1 commit into
base: 202405
Choose a base branch
from

Conversation

oleksandrivantsiv
Copy link
Collaborator

Cherry-pick of #20397

…onic-net#20397)

- Why I did it
Fix the issue with ASIC detection on the SN4280 platform.

The root cause of the issue is in the PCI subsystem race condition. When the Dark Mode is enabled on the system start we do the following actions in parallel:

The dpuctl service starts and powers down the DPUs which causes the DPU PCI devices removal.
At the same time the syncd service starts. It launches mlnx-fw-upgrade.sh script which queries the available ASIC devices from the PCI subsystem using the lspci command.
There is a small period after the removal of the DPU PCI device when the PCI subsystem in Linux remains inconsistent and lspci command might return an error upon execution. This might cause an error in mlnx-fw-upgrade.sh which interrupts the syncd container start.

- How I did it
Add a retry mechanism for the lspci command. Cache lspci output to reduce the number of command executions.

- How to verify it
Run regression.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants