-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dotnet test hangs with .NET9 RC1/RC2 in CI build on Ubuntu #43432
Comments
@gimlichael would it be possible to create a PR that shows the problem and produces a binary log in your repository please? |
@nohwnd I will try to put aside some time to fulfill your request; hopefully before the end of this week. |
@nohwnd i have created this branch for you: https://github.com/gimlichael/Cuemon/tree/v9.0.0/dotnet-sdk-issue-43432 I did notice (looking in the source associated with Anyway, feel free to clone it and run it .. still Ubuntu only that fails. Windows run fine. I did enable the -bl build switch, but I don't see any binary files added to GitHub artifacts. https://github.com/gimlichael/Cuemon/actions/runs/11000415914/job/30542914518 |
@nohwnd did you have a chance to look into the issue? |
I just tried to do it, but checking out the branch and running S:\c\Cuemon\test\Cuemon.AspNetCore.Mvc.FunctionalTests\Filters\Diagnostics\FaultDescriptorFilterTest.cs(1342,21): error CS0012: The type 'Disposable' is defined in an assembly that is not referenced. You must add a reference to assembly 'Cuemon.Core, Version=9.0.0.0, Culture=neutral, PublicKeyToken=9f6823cab47d945f'. Build failed with 5525 error(s) and 4 warning(s) in 13,8s I don't see contributing.md or other description of build steps, and your CI seems to do the same. What do I need to do to build please? |
@nohwnd thank you for getting back. I just verified your discovery and I realized, that the branch I used had to many experimental changes for the forthcoming v9.0.0 release, why I replaced the branch in question with one a little earlier in the development process. I have verified that:
Now works as expected. I also verified, that the bug is still present. Sorry for the inconvenience; and thank you for hinting I should have a look at my community health pages while I am at it 🙂 |
I've updated to the latest, and I don't know what I am doing wrong but I still see the same errors. I also tried to delete the repo and clone again, but that did not help either. Are all the changes you wanted to push pushed?
|
I see you target both net9 and net8, I am on net9-rc2, but I also tried downgrading to net9-preview7 and it has the same problem. The errors also don't seem to be upgraded from warnings. |
I have now double- and tripled checked the branch. Expectations to meet:
If these TFMs are in place, build will succeed. I was able to locate a typo in my post from before - DO REMOVE -b (as this will checkout a new branch from the main) .. Here are the adjusted steps:
Sorry for the typo .. 😪 |
@nohwnd do let me know, despites the set-backs, if you were able to proceed with bughunting 🙂 |
I was able to make it run, but I don't see any hangs yet. |
Keep in mind, that I cannot provoke the error in the confines of my local development. Consider running this flow; it is tweaked to only include the two failing projects. |
Hello, I have the same problem but on different CI. We are using JenkinsX on on-premise K8s (1.29) cluster. We have our own tailored build images based on Ubuntu 22.04. When I update .NET SDK to 9.0 RC1 pipeline stuck on test step. As in case of @gimlichael I run dotnet test with --no-build parameter. I tried SDK versions First, I suspected breaking change with terminal logger but adding MSBUILDTERMINALLOGGER=off to image does not help. P.S. I cannot send logs with our internal project but if I will be able to create minimal repro I will publish it. |
@nohwnd - any update on this issue? My workaround works, but I still think you guys should fix the issue introduced sometime between preview7 and rc1. For you convenience, maybe this can help you on track: https://github.com/dotnet/sdk/commits/main/src/Cli/dotnet/commands/dotnet-test?since=2024-08-13&until=2024-09-10 |
I did a few experiments and it looks like that the problem is related with logging. First I tried blame-hang if the problem is not in tests but without luck ''' then I tried
And build on CI agent successfully passed. |
Interesting - especially because prior to this bug, I always used a verbosity level of quiet. GitHub Actions DOES NOT like diagnostics loglevel .. haha .. you cannot even download logs afterwards due to the size. That said - and as you have discovered - it might still be related to logging. |
@nohwnd - any updates? It fails with both RC1 and RC2. |
I am sorry but there are no updates. This is on my work list, but unfortunately not on the top. |
Fair - thank you for the honest response. |
Thank you for the update! We've run into the same issue with RC2, also on Jenkins on an on-prem cluster. Until fixed this will categorically prevent us from moving over to NET9 whatsoever once released - I'd imagine a good bunch of enterprise users might be in the same boat there, though hard to judge if it just affects some small fraction of CI build setups, or is a wider problem. Either way, thanks of looking into this! |
.NET 9 RC2 - dotnet test running on a self-hosted Act runner, running in a Docker container using an Ubuntu image, seems to hang on one host but runs OK on another. It hangs when I run my tests using this command (works fine in .NET 8):
And crashes when I run it with this command
with the following log
Changing the log level to quiet does not help. Finally, if I just try to run I cannot reproduce issue locally on my macOS machine. |
Finally getting back to this. If someone has a local better repro, binary log or diagnostic logs that they can share, please let me know. |
I was able to download the original logs from Cuemon (from the github action, they show that vstest task is running but that is about it).
So pretty much, tests running, some work is being done, and then the parent kills it. Will try to run on GH, to see if I can get the diag logs; |
This is probably the same as microsoft/vstest#5091 ? cc @nohwnd |
This is interesting - and on-par with what @McMlok is also experiencing. |
@nohwnd I can probably produce some more detailed logs today - if you tell me the exact
I also saw this with diagnostic logging yesterday, it says a VSTest task is running but it's hung. |
just add |
it might or might not be. Hard to say without consistent repro :/ |
@nohwnd I did yet another CI run .. same result with log level set to Test completes on Windows; but does not complete on Ubuntu. Compared to my workaround, two additional thing to mention:
I hope this help you further. Since you are at Microsoft, maybe you can get the full view here: https://github.com/gimlichael/Cuemon/actions/runs/11697884725 I did use the -bl swtich, but it does not look like its part of the zip file. |
thanks for the logs, I don't see any diag logs in there, without them I am pretty blind, I tried yesterday to get them but they did not upload I must have did the paths wrong. I tried to do that on my fork (I am syncing your repro branch to my main). https://github.com/nohwnd/Cuemon/actions/runs/11691191041 |
I can try to enable the Update: the --diag switch did not look like it provided any additional content. |
you should be able to upload them as artifacts . I also had better luck clicking the cog wheel and selecting "see raw logs" which opens the whole giant log in browser, which takes some time to load it, but then searches it just fine. your upload needs to be set to contine after previous step fails btw. (mine was not of course, so I don't have the logs yet) |
I have tweaked it further, though I doubt it will make any changes, as I already put failfast; false (hence, everything is a success). I will download the full log as suggested ans ZIP it. |
After a little dicking around, it looks like I need to use Here are the logs you requested: |
Thanks for the tip, I was struggling with that as well. |
@nohwnd any findings that might lead to an update? Will a fix be ready for .NET 9 launch or will it be planned for later bugfix? Cheers - and happy .NET 9 launch 🚀 |
I am having this exact issue but for This is stopping us from adopting .NET 9. 👾
|
My logs from this run: https://github.com/nohwnd/Cuemon/actions/runs/11706551074/job/32604198919 Looks like vstest does really exit correctly and all diag logs from vstest are sayin that we exited (below excerp of the MSBuild log, and VsTest.console log.
|
@gimlichael I cannot repro the hang anymore on my github actions, I am running exactly the same code as 14 days ago, are your tests still hanging? I thought it might be related to how we capture or forward the output, but we are capturing the standard output for a long time, and only recently started forwarding it. But we are using the mechanics that have been there forever. |
@nohwnd - this is interesting. Given nothing has changed, should we assume that GitHub has patched their runners or fixed the issue together with you guys? Or maybe provided better hardware? Because you are right - what has failed consistently since preview-7 now seems to work. This at least suggest SOMETHING has changed on the GitHub side. |
Well there was an update to net9 latest release on the agents, so looks like someone did a fix, that fixed us as well. At least I hope. All I could find is that the processes remain lingering and that prevents the task from finishing, but from the logs vstest.console finishes correctly, or at least it is not blocking on waiting for the children. Presumably the vstest.console process exits (you cannot see the 2878 process id in the list of killed child processes). And we also did not do any change to how we handle the child processes, we capture the output all the same as we did before, but now we send forward more info. But the mechanics were all there and used even before (going back to net7 at least). |
Thanks for the update - and for prioritizing this now hopefully resolved issue. Is it possible for you to share some links to the updated agents? What was done? Maybe to be transparent to the other guys that experienced this issue. I might be damaged by my field of work - but I am really curious to why it failed on GitHub runners, so any links that could help draw the full picture would be appreciated. |
I don't have any, I can just see that the last failing run was running on net9 rc2, and the newest passing runs are on the GA version 9.0.100. Finding which exact commit fixed this would be difficult, and I don't see any issue on sdk or runtime that would look relevant. As much as I would like to know why exactly this works now I cannot spend half more day figuring out why. :/ |
Same on my side. After I updated on-prem JenkinsX images to GA version (9.0.100) tests are passed without a problem. Thanks |
Describe the bug
In some cases,
dotnet test
hangs doing or after a build with .NET9 RC1.Prior to .NET9 RC1, this worked without a problem - for previous versions as well.
Full "log" can be found here (as I was unsure which team to put the issue to): dotnet/core#9496 (comment)
To Reproduce
Make a CI build in GitHub Actions using Ubuntu and this repo: https://github.com/gimlichael/Cuemon
The main branch will fail on these two projects:
If you use this branch instead, https://github.com/gimlichael/Cuemon/tree/v9.0.0/net9rc1-gha-troubleshoot, all .NET9 RC1 updates has been reverted to .NET9 preview 7, and the two test projects mentioned above, runs fine.
Exceptions (if any)
No exceptions, but the test will hang - and for me, timeout after 15 minutes (cause I don't like the built-in timeout of 6 hours or so).
Further technical details
After thorough investigation over the last two days, my conclusion is that something has changed for
dotnet test
with .NET9 RC1.Oddly enough, it works fine on Windows 2022 CI build as well.
Locally, using Windows 11, Docker and WSL (Ubuntu) it works as expected as well.
Workaround for .NET9 RC1 and RC2
Segregate
dotnet build
anddotnet test
.NOK:
dotnet test --configuration Debug --verbosity normal --logger trx --results-directory $RUNNER_TEMP/TestResults --collect:"XPlat Code Coverage;Format=opencover" -p:CoverletOutputFormat=opencover -p:UseSourceLink=true -p:SkipSignAssembly=true
OK:
The text was updated successfully, but these errors were encountered: