Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Enable event log for qualification & profiling tools testing … #12103

Merged
merged 2 commits into from
Feb 11, 2025

Conversation

gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented Feb 11, 2025

This reverts #4217 commit b18492e.

#4217 was turning on event logging which was already on by default. #4217 overrode xdist-aware setting of event log locations, and causes the hang

Fixes #12096

Downstream steps in pipelines need to collect event logs under all eventlog_gw[0-9] paths

$ find ./integration_tests/target/run_dir-20250211100159-C4VO -name \*zstd
./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw3/eventlog_v2_local-1739268134946/events_1_local-1739268134946.zstd
./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw0/eventlog_v2_local-1739268134579/events_1_local-1739268134579.zstd
./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw2/eventlog_v2_local-1739268134806/events_1_local-1739268134806.zstd
./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw1/eventlog_v2_local-1739268134420/events_1_local-1739268134420.zstd

@gerashegalov gerashegalov added the bug Something isn't working label Feb 11, 2025
@gerashegalov gerashegalov requested a review from pxLi February 11, 2025 10:12
@gerashegalov gerashegalov self-assigned this Feb 11, 2025
@gerashegalov gerashegalov requested a review from a team as a code owner February 11, 2025 10:12
Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov gerashegalov added the test Only impacts tests label Feb 11, 2025
@gerashegalov
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator

Please add a description as to why this is being reverted. I get that it caused some hang, but why did it cause it.

Then is there a followup issue to turn it back on because i assume it was turned on for a reason.

@gerashegalov
Copy link
Collaborator Author

@tgravescs

The issue linked as being fixed #12096 provides the description of how the hang is triggered by the PR being reverted

#4217 was turning on event logging which was already on by default. #4217 overrode xdist-aware setting of event log locations, and causes the hang

I'll repeat this in the PR description.

Copy link
Collaborator

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch, @gerashegalov. LGTM, FWIW.

@tgravescs
Copy link
Collaborator

thanks, I missed the description of solution/root cause in #12096 since it was under the steps to reproduce. I generally try to separate that out into a next comment but we don't have a convention for that. I just missed it.

@zhanga5 to see why original pr went in if eventlogging already enabled? Maybe configuration missing

@gerashegalov gerashegalov merged commit 2cc2992 into NVIDIA:branch-25.02 Feb 11, 2025
51 checks passed
@gerashegalov gerashegalov deleted the fix-eventlog branch February 11, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] CI_PART1 for DBR 14.3 hangs in the nightly pre-release pipeline
4 participants