Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet test hangs with .NET9 RC1/RC2 in CI build on Ubuntu #43432

Closed
gimlichael opened this issue Sep 13, 2024 · 45 comments
Closed

dotnet test hangs with .NET9 RC1/RC2 in CI build on Ubuntu #43432

gimlichael opened this issue Sep 13, 2024 · 45 comments
Labels
Area-DotNet Test untriaged Request triage from a team member

Comments

@gimlichael
Copy link

gimlichael commented Sep 13, 2024

Describe the bug

In some cases, dotnet test hangs doing or after a build with .NET9 RC1.

Prior to .NET9 RC1, this worked without a problem - for previous versions as well.

Full "log" can be found here (as I was unsure which team to put the issue to): dotnet/core#9496 (comment)

To Reproduce

Make a CI build in GitHub Actions using Ubuntu and this repo: https://github.com/gimlichael/Cuemon
The main branch will fail on these two projects:

  • Cuemon.AspNetCore.Mvc.FunctionalTests
  • Cuemon.AspNetCore.Mvc.Tests

If you use this branch instead, https://github.com/gimlichael/Cuemon/tree/v9.0.0/net9rc1-gha-troubleshoot, all .NET9 RC1 updates has been reverted to .NET9 preview 7, and the two test projects mentioned above, runs fine.

Exceptions (if any)

No exceptions, but the test will hang - and for me, timeout after 15 minutes (cause I don't like the built-in timeout of 6 hours or so).

Further technical details

After thorough investigation over the last two days, my conclusion is that something has changed for dotnet test with .NET9 RC1.

Oddly enough, it works fine on Windows 2022 CI build as well.

Locally, using Windows 11, Docker and WSL (Ubuntu) it works as expected as well.

Workaround for .NET9 RC1 and RC2

Segregate dotnet build and dotnet test .

NOK:

dotnet test --configuration Debug --verbosity normal --logger trx --results-directory $RUNNER_TEMP/TestResults --collect:"XPlat Code Coverage;Format=opencover" -p:CoverletOutputFormat=opencover -p:UseSourceLink=true -p:SkipSignAssembly=true

OK:

dotnet build --configuration Debug --verbosity normal -p:SkipSignAssembly=true

dotnet test --no-build --configuration Debug --verbosity normal --logger trx --results-directory $RUNNER_TEMP/TestResults --collect:"XPlat Code Coverage;Format=opencover" --blame-hang-timeout 10m --blame-hang-dump-type mini -p:CoverletOutputFormat=opencover -p:UseSourceLink=true
@nohwnd
Copy link
Member

nohwnd commented Sep 16, 2024

@gimlichael would it be possible to create a PR that shows the problem and produces a binary log in your repository please?

@gimlichael
Copy link
Author

@nohwnd I will try to put aside some time to fulfill your request; hopefully before the end of this week.

@gimlichael
Copy link
Author

gimlichael commented Sep 23, 2024

@nohwnd i have created this branch for you: https://github.com/gimlichael/Cuemon/tree/v9.0.0/dotnet-sdk-issue-43432

I did notice (looking in the source associated with dotnet test and the timeframe the bug occurred) that you guys are using a lot of .GetAwaiter().GetResult() .. I think this might be an issue in a CI environment with low-spec CPU.

Anyway, feel free to clone it and run it .. still Ubuntu only that fails. Windows run fine.

I did enable the -bl build switch, but I don't see any binary files added to GitHub artifacts.
If I go into the cog-wheel, I do see an option to download the log archive, but after 20 MB or so, an error occurs from GitHub.

https://github.com/gimlichael/Cuemon/actions/runs/11000415914/job/30542914518

@gimlichael
Copy link
Author

@nohwnd did you have a chance to look into the issue?

@nohwnd
Copy link
Member

nohwnd commented Sep 30, 2024

I just tried to do it, but checking out the branch and running dotnet build fails with many errors like this:

S:\c\Cuemon\test\Cuemon.AspNetCore.Mvc.FunctionalTests\Filters\Diagnostics\FaultDescriptorFilterTest.cs(1342,21): error CS0012: The type 'Disposable' is defined in an assembly that is not referenced. You must add a reference to assembly 'Cuemon.Core, Version=9.0.0.0, Culture=neutral, PublicKeyToken=9f6823cab47d945f'.

Build failed with 5525 error(s) and 4 warning(s) in 13,8s

I don't see contributing.md or other description of build steps, and your CI seems to do the same. What do I need to do to build please?

@gimlichael
Copy link
Author

gimlichael commented Sep 30, 2024

@nohwnd thank you for getting back.

I just verified your discovery and I realized, that the branch I used had to many experimental changes for the forthcoming v9.0.0 release, why I replaced the branch in question with one a little earlier in the development process.

I have verified that:

  • git clone https://github.com/gimlichael/Cuemon.git
  • git checkout v9.0.0/dotnet-sdk-issue-43432 (removed the -b switch due to muscle-memory in my fingers)
  • dotnet build

Now works as expected.

I also verified, that the bug is still present.

Sorry for the inconvenience; and thank you for hinting I should have a look at my community health pages while I am at it 🙂

@nohwnd
Copy link
Member

nohwnd commented Oct 1, 2024

I've updated to the latest, and I don't know what I am doing wrong but I still see the same errors. I also tried to delete the repo and clone again, but that did not help either.

Are all the changes you wanted to push pushed?

S:\c\Cuemon [v9.0.0/dotnet-sdk-issue-43432]> git log --oneline
bd3caa34 (HEAD -> v9.0.0/dotnet-sdk-issue-43432, origin/main, origin/HEAD, main) V9.0.0/housekeeping for extensions globalization (#90)
d5f284b9 (tag: v9.0.0-preview.9) V9.0.0/move replace-line-endings to xunit (#89)
e1941c3b (tag: v9.0.0-preview.8) V9.0.0/finalize newtonsoft json migration (#88)
aa3bfa62 V9.0.0/finalize swashbuckle aspnetcore migration (#87)
5e308b4f V9.0.0/finalize asp versioning migration (#86)
f2f6db4a Bump Backport.System.Threading.Lock to 2.0.3 (#85)
b0cac8b9 (tag: v9.0.0-preview.7) V9.0.0/finalize yamldotnet migration (#84)
73f14cd6 (tag: v9.0.0-preview.6) V9.0.0/finalize xunit migration (#83)
87ca1047 V9.0.0/net9rc1 housekeeping (#82)
cf9609de (tag: v9.0.0-preview.5) V9.0.0/xunit testoutputhelperaccessor (#78)
19af986a V9.0.0/lock refactoring (#77)
51c275e4 (tag: v9.0.0-preview.4) V9.0.0/hotfix, housekeeping and dependency bump (#76)
450b22ef (tag: v9.0.0-preview.3) V9.0.0/System.Threading.Lock (#73)
a539fa16 V9.0.0/housekeeping and CI fix (#75)
11c206ac (tag: v9.0.0-preview.2) V9.0.0/problem details (#74)
e251523d (tag: v9.0.0-preview.1) V9.0.0/major (#72)
ad2cd5c1 :pencil2: updated changelog

@nohwnd
Copy link
Member

nohwnd commented Oct 1, 2024

I see you target both net9 and net8, I am on net9-rc2, but I also tried downgrading to net9-preview7 and it has the same problem. The errors also don't seem to be upgraded from warnings.

@gimlichael
Copy link
Author

gimlichael commented Oct 1, 2024

I have now double- and tripled checked the branch.

Expectations to meet:

  • netstandard2.0
  • netstandard2.1
  • net6
  • net8
  • net9rc1 (you should not use preview7, as this works fine - the bug was introduced with rc1)

If these TFMs are in place, build will succeed.

I was able to locate a typo in my post from before - DO REMOVE -b (as this will checkout a new branch from the main) ..

Here are the adjusted steps:

  • git clone https://github.com/gimlichael/Cuemon.git
  • git checkout v9.0.0/dotnet-sdk-issue-43432
  • dotnet build

Sorry for the typo .. 😪

@gimlichael
Copy link
Author

@nohwnd do let me know, despites the set-backs, if you were able to proceed with bughunting 🙂

@nohwnd
Copy link
Member

nohwnd commented Oct 4, 2024

I was able to make it run, but I don't see any hangs yet.

@gimlichael
Copy link
Author

Keep in mind, that I cannot provoke the error in the confines of my local development.
It is only in GitHub Actions using an Ubuntu runner.

Consider running this flow; it is tweaked to only include the two failing projects.

@McMlok
Copy link

McMlok commented Oct 9, 2024

Hello,

I have the same problem but on different CI. We are using JenkinsX on on-premise K8s (1.29) cluster. We have our own tailored build images based on Ubuntu 22.04. When I update .NET SDK to 9.0 RC1 pipeline stuck on test step. As in case of @gimlichael I run dotnet test with --no-build parameter.

I tried SDK versions
9.0.100-preview.7.24407.12 - OK
9.0.100-rc.1.24452.12 - NOK
9.0.100-rc.2.24474.11 - NOK

First, I suspected breaking change with terminal logger but adding MSBUILDTERMINALLOGGER=off to image does not help.

P.S. I cannot send logs with our internal project but if I will be able to create minimal repro I will publish it.

@gimlichael
Copy link
Author

@nohwnd - any update on this issue? My workaround works, but I still think you guys should fix the issue introduced sometime between preview7 and rc1. For you convenience, maybe this can help you on track: https://github.com/dotnet/sdk/commits/main/src/Cli/dotnet/commands/dotnet-test?since=2024-08-13&until=2024-09-10

@McMlok
Copy link

McMlok commented Oct 9, 2024

I did a few experiments and it looks like that the problem is related with logging.

First I tried blame-hang if the problem is not in tests but without luck

'''
dotnet test --configuration Release --verbosity n --no-build --no-restore --blame-hang --blame-hang-dump-type full --blame-hang-timeout 1m
'''

then I tried

dotnet test --configuration Release --verbosity q --no-build --no-restore

And build on CI agent successfully passed.

@gimlichael
Copy link
Author

gimlichael commented Oct 9, 2024

I did a few experiments and it looks like that the problem is related with logging.

First I tried blame-hang if the problem is not in tests but without luck

''' dotnet test --configuration Release --verbosity n --no-build --no-restore --blame-hang --blame-hang-dump-type full --blame-hang-timeout 1m '''

then I tried

dotnet test --configuration Release --verbosity q --no-build --no-restore

And build on CI agent successfully passed.

Interesting - especially because prior to this bug, I always used a verbosity level of quiet.
Only after the bug surfaced, I changed verbosity normal, detailed and diagnostic.

GitHub Actions DOES NOT like diagnostics loglevel .. haha .. you cannot even download logs afterwards due to the size.

That said - and as you have discovered - it might still be related to logging.
Let's hope the SDK team find a solution 🤞🏼

@gimlichael
Copy link
Author

@nohwnd - any updates? It fails with both RC1 and RC2.
Last working build was Preview 7.

@nohwnd
Copy link
Member

nohwnd commented Oct 23, 2024

I am sorry but there are no updates. This is on my work list, but unfortunately not on the top.

@gimlichael
Copy link
Author

I am sorry but there are no updates. This is on my work list, but unfortunately not on the top.

Fair - thank you for the honest response.
Fingers crossed it will be solved with a patch after .NET 9 launch next month 🤞🏼

@fabianoliver
Copy link

fabianoliver commented Nov 1, 2024

I am sorry but there are no updates. This is on my work list, but unfortunately not on the top.

Thank you for the update!

We've run into the same issue with RC2, also on Jenkins on an on-prem cluster.

Until fixed this will categorically prevent us from moving over to NET9 whatsoever once released - I'd imagine a good bunch of enterprise users might be in the same boat there, though hard to judge if it just affects some small fraction of CI build setups, or is a wider problem. Either way, thanks of looking into this!

@gimlichael gimlichael changed the title dotnet test hangs with .NET9 RC1 in CI build on Ubuntu dotnet test hangs with .NET9 RC1/RC2 in CI build on Ubuntu Nov 4, 2024
@GOATS2K
Copy link

GOATS2K commented Nov 4, 2024

.NET 9 RC2 - dotnet test running on a self-hosted Act runner, running in a Docker container using an Ubuntu image, seems to hang on one host but runs OK on another.

It hangs when I run my tests using this command (works fine in .NET 8):

dotnet test --no-build --verbosity normal

And crashes when I run it with this command

dotnet test --configuration Release --verbosity normal --no-build --no-restore

with the following log

Build started 11/04/2024 19:13:23.
     1>Project "/workspace/Starlight/Starlight/Starlight.sln" on node 1 (VSTest target(s)).
     1>ValidateSolutionConfiguration:
         Building solution configuration "Release|Any CPU".
     1>Project "/workspace/Starlight/Starlight/Starlight.sln" (1) is building "/workspace/Starlight/Starlight/tests/Starlight.Services.Tests/Starlight.Services.Tests.csproj" (6) on node 7 (VSTest target(s)).
     6>_VSTestConsole:
         MSB4181: The "VSTestTask" task returned false but did not log an error.
     6>Done Building Project "/workspace/Starlight/Starlight/tests/Starlight.Services.Tests/Starlight.Services.Tests.csproj" (VSTest target(s)) -- FAILED.
     1>Project "/workspace/Starlight/Starlight/Starlight.sln" (1) is building "/workspace/Starlight/Starlight/tests/Starlight.Updater.Tests/Starlight.Updater.Tests.csproj" (8) on node 2 (VSTest target(s)).
     8>_VSTestConsole:
         MSB4181: The "VSTestTask" task returned false but did not log an error.
     8>Done Building Project "/workspace/Starlight/Starlight/tests/Starlight.Updater.Tests/Starlight.Updater.Tests.csproj" (VSTest target(s)) -- FAILED.
     1>Project "/workspace/Starlight/Starlight/Starlight.sln" (1) is building "/workspace/Starlight/Starlight/tests/Starlight.Models.Tests/Starlight.Models.Tests.csproj" (9) on node 3 (VSTest target(s)).
     9>_VSTestConsole:
         MSB4181: The "VSTestTask" task returned false but did not log an error.
     9>Done Building Project "/workspace/Starlight/Starlight/tests/Starlight.Models.Tests/Starlight.Models.Tests.csproj" (VSTest target(s)) -- FAILED.
     1>Done Building Project "/workspace/Starlight/Starlight/Starlight.sln" (VSTest target(s)) -- FAILED.
Build FAILED.
    0 Warning(s)
    0 Error(s)
Time Elapsed 00:00:01.24

Changing the log level to quiet does not help.

Finally, if I just try to run dotnet test, the test runner only discovers a single test out of many.

I cannot reproduce issue locally on my macOS machine.

@nohwnd
Copy link
Member

nohwnd commented Nov 5, 2024

Finally getting back to this. If someone has a local better repro, binary log or diagnostic logs that they can share, please let me know.

@nohwnd
Copy link
Member

nohwnd commented Nov 5, 2024

I was able to download the original logs from Cuemon (from the github action, they show that vstest task is running but that is about it).

2024-09-23T18:48:38.1132167Z   1:19>Done building target "ShowCallOfVSTestTaskWithParameter" in project "Cuemon.AspNetCore.Mvc.FunctionalTests.csproj".
2024-09-23T18:48:38.1132353Z        Done executing task "CallTarget".
2024-09-23T18:48:38.1133033Z        Using "VSTestTask" task from assembly "/usr/share/dotnet/sdk/9.0.100-rc.1.24452.12/Microsoft.TestPlatform.Build.dll".
2024-09-23T18:48:38.1133182Z        Task "VSTestTask"
2024-09-23T18:48:38.1133375Z          Task Parameter:VSTestBlameHang=True
2024-09-23T18:48:38.1133584Z          Task Parameter:VSTestVerbosity=detailed
2024-09-23T18:48:38.1134227Z          Task Parameter:VSTestTraceDataCollectorDirectoryPath=/home/runner/.nuget/packages/microsoft.codecoverage/17.11.1/build/netstandard2.0/
2024-09-23T18:48:38.1134573Z          Task Parameter:VSTestResultsDirectory=/home/runner/work/_temp/TestResults
2024-09-23T18:48:38.1135189Z          Task Parameter:VSTestTestAdapterPath=/home/runner/.nuget/packages/coverlet.collector/6.0.2/build/netstandard2.0/
2024-09-23T18:48:38.1135410Z          Task Parameter:VSTestBlameHangTimeout=10m
2024-09-23T18:48:38.1135922Z          Task Parameter:VSTestConsolePath=/usr/share/dotnet/sdk/9.0.100-rc.1.24452.12/vstest.console.dll
2024-09-23T18:48:38.1136191Z          Task Parameter:VSTestFramework=.NETCoreApp,Version=v9.0
2024-09-23T18:48:38.1136448Z          Task Parameter:VSTestArtifactsProcessingMode=collect
2024-09-23T18:48:38.1136627Z          Task Parameter:VSTestLogger=trx
2024-09-23T18:48:38.1136935Z          Task Parameter:VSTestCollect=XPlat Code Coverage;Format=opencover
2024-09-23T18:48:38.1137390Z          Task Parameter:VSTestSessionCorrelationId=2566_e2458d0a-316a-4f7c-bc36-809795f7b744
2024-09-23T18:48:38.1138235Z          Task Parameter:TestFileFullPath=/home/runner/work/Cuemon/Cuemon/test/Cuemon.AspNetCore.Mvc.FunctionalTests/bin/Debug/net9.0/Cuemon.AspNetCore.Mvc.FunctionalTests.dll
2024-09-23T18:48:38.1138458Z          Task Parameter:VSTestBlameHangDumpType=mini
2024-09-23T19:00:29.2866221Z ##[error]The operation was canceled.
2024-09-23T19:00:29.2965128Z Post job cleanup.
2024-09-23T19:00:29.3022727Z Post job cleanup.
2024-09-23T19:00:29.3078135Z Post job cleanup.
2024-09-23T19:00:29.3133174Z Post job cleanup.
2024-09-23T19:00:29.4096216Z [command]/usr/bin/git version
2024-09-23T19:00:29.4130361Z git version 2.46.0
2024-09-23T19:00:29.4173260Z Temporarily overriding HOME='/home/runner/work/_temp/a32d568f-9d17-48cb-af35-577639717d18' before making global git config changes
2024-09-23T19:00:29.4175549Z Adding repository directory to the temporary git global config as a safe directory
2024-09-23T19:00:29.4179724Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/Cuemon/Cuemon
2024-09-23T19:00:29.4214951Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2024-09-23T19:00:29.4245963Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2024-09-23T19:00:29.4907938Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2024-09-23T19:00:29.4909542Z http.https://github.com/.extraheader
2024-09-23T19:00:29.4913024Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2024-09-23T19:00:29.4917309Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2024-09-23T19:00:29.5160414Z Cleaning up orphan processes
2024-09-23T19:00:29.5462333Z Terminate orphan process: pid (2566) (dotnet)
2024-09-23T19:00:29.5511293Z Terminate orphan process: pid (2596) (dotnet)
2024-09-23T19:00:29.5581880Z Terminate orphan process: pid (2597) (dotnet)
2024-09-23T19:00:29.5638717Z Terminate orphan process: pid (2598) (dotnet)
2024-09-23T19:00:29.5692699Z Terminate orphan process: pid (2885) (dotnet)
2024-09-23T19:00:29.5781556Z Terminate orphan process: pid (2886) (dotnet)

So pretty much, tests running, some work is being done, and then the parent kills it.

Will try to run on GH, to see if I can get the diag logs;

@Youssef1313
Copy link
Member

This is probably the same as microsoft/vstest#5091 ? cc @nohwnd

@nohwnd
Copy link
Member

nohwnd commented Nov 5, 2024

Image
i am seeing the abort when I have build logging to detailed, but not when I set it to normal.

@gimlichael
Copy link
Author

gimlichael commented Nov 5, 2024

i am seeing the abort when I have build logging to detailed, but not when I set it to normal.

This is interesting - and on-par with what @McMlok is also experiencing.
However, prior to this bug, my default verbosity level was set to quiet, so there might be some different use cases that ties into logging.

@GOATS2K
Copy link

GOATS2K commented Nov 5, 2024

@nohwnd I can probably produce some more detailed logs today - if you tell me the exact dotnet test command you'd want me to run.

I was able to download the original logs from Cuemon (from the github action, they show that vstest task is running but that is about it).

I also saw this with diagnostic logging yesterday, it says a VSTest task is running but it's hung.

@nohwnd
Copy link
Member

nohwnd commented Nov 5, 2024

just add --diag:logs/log.txt to your command and it will produce logs into logs folder in the current dir. Or you can specify any other relative or absolute path. Please grab all the logs in the folder, not just the one called log.txt

@nohwnd
Copy link
Member

nohwnd commented Nov 5, 2024

This is probably the same as microsoft/vstest#5091 ? cc @nohwnd

it might or might not be. Hard to say without consistent repro :/

@gimlichael
Copy link
Author

@nohwnd I did yet another CI run .. same result with log level set to normal.

Test completes on Windows; but does not complete on Ubuntu.

Compared to my workaround, two additional thing to mention:

  • test on Windows completes in about 2.5 minutes; this takes around 6 minutes with normal dotnet test
  • test on Ubuntu completes in about 1 minute; this hangs completely with normal dotnet test

I hope this help you further. Since you are at Microsoft, maybe you can get the full view here: https://github.com/gimlichael/Cuemon/actions/runs/11697884725

logs_30547850241.zip

I did use the -bl swtich, but it does not look like its part of the zip file.

@nohwnd
Copy link
Member

nohwnd commented Nov 6, 2024

thanks for the logs, I don't see any diag logs in there, without them I am pretty blind, I tried yesterday to get them but they did not upload I must have did the paths wrong. I tried to do that on my fork (I am syncing your repro branch to my main). https://github.com/nohwnd/Cuemon/actions/runs/11691191041

@gimlichael
Copy link
Author

gimlichael commented Nov 6, 2024

thanks for the logs, I don't see any diag logs in there, without them I am pretty blind, I tried yesterday to get them but they did not upload I must have did the paths wrong. I tried to do that on my fork (I am syncing your repro branch to my main). https://github.com/nohwnd/Cuemon/actions/runs/11691191041

I can try to enable the --diag:/home/runner/work/_temp/TestResults/logs/log.txt, but I am afraid that I am unable to download from GitHub (for instance, detailed logs cannot be donwloaded due to size - another bug on the GitHub platform) ..

Update: the --diag switch did not look like it provided any additional content.

@nohwnd
Copy link
Member

nohwnd commented Nov 6, 2024

you should be able to upload them as artifacts . I also had better luck clicking the cog wheel and selecting "see raw logs" which opens the whole giant log in browser, which takes some time to load it, but then searches it just fine.

your upload needs to be set to contine after previous step fails btw. (mine was not of course, so I don't have the logs yet)

@gimlichael
Copy link
Author

you should be able to upload them as artifacts . I also had better luck clicking the cog wheel and selecting "see raw logs" which opens the whole giant log in browser, which takes some time to load it, but then searches it just fine.

your upload needs to be set to contine after previous step fails btw. (mine was not of course, so I don't have the logs yet)

I have tweaked it further, though I doubt it will make any changes, as I already put failfast; false (hence, everything is a success). I will download the full log as suggested ans ZIP it.

job-logs.zip

@gimlichael
Copy link
Author

After a little dicking around, it looks like I need to use always() as per https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions#always .. I leave it here for others to be inspired by.

Here are the logs you requested:
logs.zip

@nohwnd
Copy link
Member

nohwnd commented Nov 6, 2024

Thanks for the tip, I was struggling with that as well.

@gimlichael
Copy link
Author

@nohwnd any findings that might lead to an update? Will a fix be ready for .NET 9 launch or will it be planned for later bugfix? Cheers - and happy .NET 9 launch 🚀

@nietras
Copy link

nietras commented Nov 14, 2024

I am having this exact issue but for dotnet format (!) with .NET 9.0.100 RTM for CI build on Windows on Azure DevOps (see mastodon thread https://mastodon.social/@nietras/113476144331935898). It seems to hang just after restore finished at the point where .editorconfig paths should be listed. Same exact command at same exact working directory works fine when run manually.

This is stopping us from adopting .NET 9. 👾

E:\agent-default-pool\_work\_tool\dotnet\dotnet.exe format --verify-no-changes -v diagnostic
  The dotnet runtime version is '9.0.0'.
  The dotnet CLI version is '9.0.100'.
  Using MSBuild.exe located in 'E:\agent-default-pool\_work\_tool/dotnet\sdk\9.0.100\'.
  Formatting code files in workspace 'E:\agent-default-pool\_work\15\s\BaseApp.sln'.
  Loading workspace.
    Determining projects to restore...

Image

@nohwnd
Copy link
Member

nohwnd commented Nov 18, 2024

My logs from this run: https://github.com/nohwnd/Cuemon/actions/runs/11706551074/job/32604198919

Looks like vstest does really exit correctly and all diag logs from vstest are sayin that we exited (below excerp of the MSBuild log, and VsTest.console log.

2024-11-06T15:27:55.4921545Z   Passed Cuemon.AspNetCore.Mvc.Filters.Diagnostics.FaultDescriptorFilterTest.OnException_ShouldCaptureException_RenderAsDefault_UsingNewtonsoftJson(sensitivity: FailureWithStackTraceAndData) [311 ms]
2024-11-06T15:27:55.5262843Z Results File: /home/runner/work/_temp/TestResults/_fv-az1198-407_2024-11-06_15_27_53.trx
2024-11-06T15:27:55.5265169Z 
2024-11-06T15:27:55.5301505Z Test Run Successful.
2024-11-06T15:27:55.5302178Z Total tests: 42
2024-11-06T15:27:55.5302621Z      Passed: 42
2024-11-06T15:27:55.5362843Z  Total time: 26.8984 Seconds
2024-11-06T15:27:55.5488688Z        Done executing task "VSTestTask".
2024-11-06T15:27:55.5489829Z   1:25>Done building target "_VSTestConsole" in project "Cuemon.AspNetCore.Mvc.FunctionalTests.csproj".
2024-11-06T15:27:55.5490966Z        Done executing task "CallTarget".
2024-11-06T15:27:55.5492151Z        Task "CallTarget" skipped, due to false condition; ($(_MSBUILDTLENABLED) == '1' AND $(VsTestUseMSBuildOutput)) was evaluated as (0 == '1' AND true).
2024-11-06T15:27:55.5493884Z   1:25>Done building target "VSTest" in project "Cuemon.AspNetCore.Mvc.FunctionalTests.csproj".
2024-11-06T15:27:55.5495760Z        Target "GenerateCoverageResultAfterTest" skipped, due to false condition; ('$(CollectCoverage)' == 'true') was evaluated as ('false' == 'true').
2024-11-06T15:27:55.5497933Z   1:25>Done Building Project "/home/runner/work/Cuemon/Cuemon/test/Cuemon.AspNetCore.Mvc.FunctionalTests/Cuemon.AspNetCore.Mvc.FunctionalTests.csproj" (VSTest target(s)).
2024-11-06T15:40:53.1572103Z ##[error]The operation was canceled.
2024-11-06T15:40:53.1628936Z ##[group]Run actions/upload-artifact@v4
2024-11-06T15:40:53.1629282Z with:
2024-11-06T15:40:53.1629510Z   name: ubuntu-22.04-Debug-logs
2024-11-06T15:40:53.1629821Z   path: /home/runner/work/_temp/logs/
2024-11-06T15:40:53.1630145Z   if-no-files-found: warn
2024-11-06T15:40:53.1630420Z   compression-level: 6
2024-11-06T15:40:53.1630678Z   overwrite: false
2024-11-06T15:40:53.1630928Z   include-hidden-files: false
2024-11-06T15:40:53.1631195Z env:
2024-11-06T15:40:53.1631407Z   DOTNET_ROOT: /usr/share/dotnet
2024-11-06T15:40:53.1631694Z ##[endgroup]
2024-11-06T15:40:53.4806521Z With the provided path, there will be 7 files uploaded
2024-11-06T15:40:53.4807219Z Artifact name is valid!
2024-11-06T15:40:53.4807576Z Root directory input is valid!
2024-11-06T15:40:53.6311281Z Beginning upload of artifact content to blob storage
2024-11-06T15:40:54.1823327Z Uploaded bytes 231015
2024-11-06T15:40:54.2536151Z Finished uploading artifact content to blob storage!
2024-11-06T15:40:54.2537073Z SHA256 hash of uploaded artifact zip is 3b700e62ab9af74c1a44ba335da310a3de0ead7c0250e7b52e7ed36bffd93255
2024-11-06T15:40:54.2537805Z Finalizing artifact upload
2024-11-06T15:40:54.4309537Z Artifact ubuntu-22.04-Debug-logs.zip successfully finalized. Artifact ID 2153014557
2024-11-06T15:40:54.4310915Z Artifact ubuntu-22.04-Debug-logs has been successfully uploaded! Final size is 231015 bytes. Artifact ID is 2153014557
2024-11-06T15:40:54.4312121Z Artifact download URL: https://github.com/nohwnd/Cuemon/actions/runs/11706551074/artifacts/2153014557
2024-11-06T15:40:54.4424776Z Post job cleanup.
2024-11-06T15:40:54.4514382Z Post job cleanup.
2024-11-06T15:40:54.4572762Z Post job cleanup.
2024-11-06T15:40:54.5486358Z [command]/usr/bin/git version
2024-11-06T15:40:54.5521271Z git version 2.47.0
2024-11-06T15:40:54.5564062Z Temporarily overriding HOME='/home/runner/work/_temp/81f2968f-510d-43ef-96a6-5607076ac348' before making global git config changes
2024-11-06T15:40:54.5566093Z Adding repository directory to the temporary git global config as a safe directory
2024-11-06T15:40:54.5577579Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/Cuemon/Cuemon
2024-11-06T15:40:54.5611550Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2024-11-06T15:40:54.5642263Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2024-11-06T15:40:54.5888391Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2024-11-06T15:40:54.5910120Z http.https://github.com/.extraheader
2024-11-06T15:40:54.5922449Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2024-11-06T15:40:54.5955002Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2024-11-06T15:40:54.6402854Z Cleaning up orphan processes
2024-11-06T15:40:54.6758026Z Terminate orphan process: pid (2465) (dotnet)
2024-11-06T15:40:54.6786955Z Terminate orphan process: pid (2499) (dotnet)
2024-11-06T15:40:54.6824802Z Terminate orphan process: pid (2500) (dotnet)
2024-11-06T15:40:54.6862192Z Terminate orphan process: pid (2501) (dotnet)
2024-11-06T15:40:54.6899776Z Terminate orphan process: pid (2866) (dotnet)
2024-11-06T15:40:54.6936786Z Terminate orphan process: pid (2869) (dotnet)
TpTrace Verbose: 0 : 2878, 9, 2024/11/06, 15:27:55.538, 239860798567, vstest.console.dll, MulticastDelegateUtilities.SafeInvoke: LengthPrefixCommunicationChannel: MessageReceived: Invoking callback 1/1 for Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestSender+<>c__DisplayClass34_0., took 689 ms.
TpTrace Information: 0 : 2878, 1, 2024/11/06, 15:27:55.538, 239860841048, vstest.console.dll, TestRequestManager.RunTests: run tests completed.
TpTrace Information: 0 : 2878, 9, 2024/11/06, 15:27:55.539, 239860887793, vstest.console.dll, SocketServer.PrivateStop: Stopping server endPoint: 127.0.0.1:42493 error: 
TpTrace Information: 0 : 2878, 1, 2024/11/06, 15:27:55.539, 239860928030, vstest.console.dll, RunTestsArgumentProcessor:Execute: Test run is completed.
TpTrace Verbose: 0 : 2878, 1, 2024/11/06, 15:27:55.539, 239861144944, vstest.console.dll, Executor.Execute: Exiting with exit code of 0
TpTrace Verbose: 0 : 2878, 9, 2024/11/06, 15:27:55.539, 239861142307, vstest.console.dll, LengthPrefixCommunicationChannel.Dispose: Dispose reader and writer.
TpTrace Information: 0 : 2878, 9, 2024/11/06, 15:27:55.539, 239861249137, vstest.console.dll, SocketServer.Stop: Raise disconnected event endPoint: 127.0.0.1:42493 error: 
TpTrace Verbose: 0 : 2878, 9, 2024/11/06, 15:27:55.539, 239861307896, vstest.console.dll, TestRequestSender: OnTestRunAbort: Operation is already complete. Skip error message.
TpTrace Verbose: 0 : 2878, 9, 2024/11/06, 15:27:55.539, 239861333924, vstest.console.dll, MulticastDelegateUtilities.SafeInvoke: SocketServer: ClientDisconnected: Invoking callback 1/1 for Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestSender., took 0 ms.
TpTrace Verbose: 0 : 2878, 9, 2024/11/06, 15:27:55.539, 239861352359, vstest.console.dll, TcpClientExtensions.MessageLoopAsync: exiting MessageLoopAsync remoteEndPoint: 127.0.0.1:42738 localEndPoint: 127.0.0.1:42493

@nohwnd
Copy link
Member

nohwnd commented Nov 19, 2024

@gimlichael I cannot repro the hang anymore on my github actions, I am running exactly the same code as 14 days ago, are your tests still hanging?

I thought it might be related to how we capture or forward the output, but we are capturing the standard output for a long time, and only recently started forwarding it. But we are using the mechanics that have been there forever.

microsoft/vstest#5091 (comment)

@gimlichael
Copy link
Author

@nohwnd - this is interesting. Given nothing has changed, should we assume that GitHub has patched their runners or fixed the issue together with you guys? Or maybe provided better hardware?

Because you are right - what has failed consistently since preview-7 now seems to work.

This at least suggest SOMETHING has changed on the GitHub side.

@nohwnd
Copy link
Member

nohwnd commented Nov 19, 2024

Well there was an update to net9 latest release on the agents, so looks like someone did a fix, that fixed us as well. At least I hope. All I could find is that the processes remain lingering and that prevents the task from finishing, but from the logs vstest.console finishes correctly, or at least it is not blocking on waiting for the children. Presumably the vstest.console process exits (you cannot see the 2878 process id in the list of killed child processes). And we also did not do any change to how we handle the child processes, we capture the output all the same as we did before, but now we send forward more info. But the mechanics were all there and used even before (going back to net7 at least).

@gimlichael
Copy link
Author

Thanks for the update - and for prioritizing this now hopefully resolved issue.

Is it possible for you to share some links to the updated agents? What was done? Maybe to be transparent to the other guys that experienced this issue.

I might be damaged by my field of work - but I am really curious to why it failed on GitHub runners, so any links that could help draw the full picture would be appreciated.

@nohwnd
Copy link
Member

nohwnd commented Nov 19, 2024

I don't have any, I can just see that the last failing run was running on net9 rc2, and the newest passing runs are on the GA version 9.0.100. Finding which exact commit fixed this would be difficult, and I don't see any issue on sdk or runtime that would look relevant. As much as I would like to know why exactly this works now I cannot spend half more day figuring out why. :/

@nohwnd nohwnd closed this as completed Nov 19, 2024
@McMlok
Copy link

McMlok commented Nov 19, 2024

Same on my side. After I updated on-prem JenkinsX images to GA version (9.0.100) tests are passed without a problem. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-DotNet Test untriaged Request triage from a team member
Projects
None yet
Development

No branches or pull requests

7 participants