Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpp/Ice/timeout failure (Windows) #3048

Closed
bernardnormier opened this issue Nov 4, 2024 · 7 comments · Fixed by #3110 or #3158
Closed

cpp/Ice/timeout failure (Windows) #3048

bernardnormier opened this issue Nov 4, 2024 · 7 comments · Fixed by #3110 or #3158
Assignees
Labels
Milestone

Comments

@bernardnormier
Copy link
Member

From https://github.com/zeroc-ice/ice/actions/runs/11666208467/job/32480630322

*** [60/93] Running cpp/Ice/timeout tests ***
[ running client/server test - 11/04/24 15:08:44 ]
- Config: ssl,Win32
(D:\a\ice\ice\cpp\test\Ice\timeout\msbuild\server\Win32\Release\server.exe --Ice.Default.Host=127.0.0.1 --Test.BasePort=14100 --Ice.Warn.Connections=1 --Ice.Default.Protocol=ssl --Ice.IPv6=0 --IceSSL.*** --IceSSL.DefaultDir=D:\a\ice\ice\certs --IceSSL.CAs=cacert.pem --IceSSL.CertFile=server.p12 --Ice.PrintStackTraces=1 --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Server.SizeMax=3 --Ice.ThreadPool.Server.SizeWarn=0 --Ice.PrintAdapterReady=1 --Ice.Trace.Network=2 --Ice.Trace.Retry=1 --Ice.Trace.Protocol=1 --Ice.LogFile=D:\a\ice\ice\cpp\test\Ice\timeout\server-110424-1508.log env={'PATH': 'D:\\a\\ice\\ice\\cpp\\test\\Common\\msbuild\\Win32\\Release;D:\\a\\ice\\ice\\cpp\\bin\\Win32\\Release'})
(D:\a\ice\ice\cpp\test\Ice\timeout\msbuild\client\Win32\Release\client.exe --Ice.Default.Host=127.0.0.1 --Test.BasePort=14100 --Ice.Warn.Connections=1 --Ice.Default.Protocol=ssl --Ice.IPv6=0 --IceSSL.*** --IceSSL.DefaultDir=D:\a\ice\ice\certs --IceSSL.CAs=cacert.pem --IceSSL.CertFile=client.p12 --Ice.PrintStackTraces=1 --Ice.Trace.Network=2 --Ice.Trace.Retry=1 --Ice.Trace.Protocol=1 --Ice.LogFile=D:\a\ice\ice\cpp\test\Ice\timeout\client-110424-1508.log env={'PATH': 'D:\\a\\ice\\ice\\cpp\\test\\Common\\msbuild\\Win32\\Release;D:\\a\\ice\\ice\\cpp\\bin\\Win32\\Release'})
testing connect timeout... ok
testing invocation timeout... ok
testing close timeout... ok
testing invocation timeouts with collocated calls... ok
[ running client/server test - 11/04/24 15:08:51 ]
- Config: ws,ipv6,serialize,mx,Win32
(D:\a\ice\ice\cpp\test\Ice\timeout\msbuild\server\Win32\Release\server.exe --Ice.Default.Host=::1 --Test.BasePort=14100 --Ice.Warn.Connections=1 --Ice.Default.Protocol=ws --Ice.ThreadPool.Server.Serialize=1 --Ice.IPv6=1 --Ice.PreferIPv6Address=1 --Ice.Admin.Endpoints="tcp -h \"::1\"" --Ice.Admin.InstanceName=Server --IceMX.Metrics.Debug.GroupBy=id --IceMX.Metrics.Parent.GroupBy=parent --IceMX.Metrics.All.GroupBy=none --Ice.PrintStackTraces=1 --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Server.SizeMax=3 --Ice.ThreadPool.Server.SizeWarn=0 --Ice.PrintAdapterReady=1 --Ice.Trace.Network=2 --Ice.Trace.Retry=1 --Ice.Trace.Protocol=1 --Ice.LogFile=D:\a\ice\ice\cpp\test\Ice\timeout\server-110424-1508.log env={'PATH': 'D:\\a\\ice\\ice\\cpp\\test\\Common\\msbuild\\Win32\\Release;D:\\a\\ice\\ice\\cpp\\bin\\Win32\\Release'})
(D:\a\ice\ice\cpp\test\Ice\timeout\msbuild\client\Win32\Release\client.exe --Ice.Default.Host=::1 --Test.BasePort=14100 --Ice.Warn.Connections=1 --Ice.Default.Protocol=ws --Ice.ThreadPool.Server.Serialize=1 --Ice.IPv6=1 --Ice.PreferIPv6Address=1 --Ice.Admin.Endpoints="tcp -h \"::1\"" --Ice.Admin.InstanceName=Client --IceMX.Metrics.Debug.GroupBy=id --IceMX.Metrics.Parent.GroupBy=parent --IceMX.Metrics.All.GroupBy=none --Ice.PrintStackTraces=1 --Ice.Trace.Network=2 --Ice.Trace.Retry=1 --Ice.Trace.Protocol=1 --Ice.LogFile=D:\a\ice\ice\cpp\test\Ice\timeout\client-110424-1508.log env={'PATH': 'D:\\a\\ice\\ice\\cpp\\test\\Common\\msbuild\\Win32\\Release;D:\\a\\ice\\ice\\cpp\\bin\\Win32\\Release'})
testing connect timeout... error: D:\a\ice\ice\cpp\src\Ice\ConnectionI.cpp:2387 ::Ice::ConnectTimeoutException connection establishment timed out
stack trace:
  0 Ice::LocalException::LocalException at D:\a\ice\ice\cpp\include\Ice\LocalException.h:25
  1 Ice::TimeoutException::TimeoutException
  2 Ice::ConnectTimeoutException::ConnectTimeoutException at D:\a\ice\ice\cpp\include\Ice\LocalExceptions.h:313
  3 `anonymous namespace'::ConnectTimerTask::runTimerTask at D:\a\ice\ice\cpp\src\Ice\ConnectionI.cpp:50
  4 IceInternal::ThreadObserverTimer::runTimerTask at D:\a\ice\ice\cpp\src\Ice\Instance.cpp:254
  5 IceInternal::Timer::run at D:\a\ice\ice\cpp\src\Ice\Timer.cpp:125
  6 std::thread::_Invoke<std::tuple<void (__thiscall `anonymous namespace'::ServiceStatusManager::*)(void),A0xf1684611::ServiceStatusManager *>,0,1> at C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.41.34120\include\thread:61
  7 00000000762b6403
  8 0000000076b667f9
  9 0000000077e67f6d
 10 0000000077e67f3b
saved D:\a\ice\ice\cpp\test\Ice\timeout\client-110424-1508.log
!! 11/04/24 15:08:55353 error: communicator not destroyed during global destruction.
saved D:\a\ice\ice\cpp\test\Ice\timeout\server-110424-1508.log
@bernardnormier bernardnormier added this to the 3.8.0 milestone Nov 4, 2024
@bernardnormier bernardnormier self-assigned this Nov 4, 2024
@bernardnormier
Copy link
Member Author

Hard to say what's happening:

client

-- 11/04/24 15:08:53078 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Protocol: sending request 
   message type = 0 (request)
   compression status = 0 (not compressed; do not compress response, if any)
   message size = 53
   request id = 2
   identity = controller
   facet = 
   operation = resumeAdapter
   mode = 0 (normal)
   context = 
   encoding = 1.1
-- 11/04/24 15:08:53078 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: received ws data frame with payload length of 25 bytes
   local address = ::1:55538
   remote address = ::1:14101
-- 11/04/24 15:08:53078 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Protocol: received reply 
   message type = 2 (reply)
   compression status = 0 (not compressed; do not compress response, if any)
   message size = 25
   request id = 2
   reply status = 0 (ok)
   encoding = 1.1
-- 11/04/24 15:08:53079 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: trying to establish ws connection to ::1:14100

... full second before next message...

-- 11/04/24 15:08:54082 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: failed to establish ws connection
   local address = <not available>
   remote address = ::1:14100
   connection establishment timed out
-- 11/04/24 15:08:54082 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: connection to endpoint failed and no more endpoints to try
   connection establishment timed out
-- 11/04/24 15:08:54082 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Retry: cannot retry operation call because retry limit has been exceeded
   connection establishment timed out


server

-- 11/04/24 15:08:53078 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Protocol: received request 
   message type = 0 (request)
   compression status = 0 (not compressed; do not compress response, if any)
   message size = 53
   request id = 2
   identity = controller
   facet = 
   operation = resumeAdapter
   mode = 0 (normal)
   context = 
   encoding = 1.1
-- 11/04/24 15:08:53078 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Network: accepting ws connections at ::1:14100
-- 11/04/24 15:08:53078 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Protocol: sending reply 
   message type = 2 (reply)
   compression status = 0 (not compressed; do not compress response, if any)
   message size = 25
   request id = 2
   reply status = 0 (ok)
   encoding = 1.1
-- 11/04/24 15:08:53078 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Network: trying to accept ws connection
   local address = ::1:14100
   remote address = ::1:55539

... full second before next trace message...

-- 11/04/24 15:08:54082 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Network: received ws data frame with payload length of 53 bytes
   local address = ::1:14101
   remote address = ::1:55538

@pepone
Copy link
Member

pepone commented Nov 4, 2024

I think the connect timeout happens in

timeout->ice_connectionId("connection-3")->op();

And the message 1 second later correspond to the resumeAdapter call from

catch (const Ice::Exception&)
{
// Ensure the adapter is not in the holding state when an unexpected exception occurs to prevent the test
// from hanging on exit in case a connection which disables timeouts is still opened.
controller->resumeAdapter();
throw;

@bernardnormier
Copy link
Member Author

Yes, that's also my understanding.

@bernardnormier
Copy link
Member Author

With more tracing from https://github.com/zeroc-ice/ice/actions/runs/11673396573?pr=3060, we have:

client

-- 11/04/24 22:42:19395 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Protocol: received reply 
   message type = 2 (reply)
   compression status = 0 (not compressed; do not compress response, if any)
   message size = 25
   request id = 2
   reply status = 0 (ok)
   encoding = 1.1
-- 11/04/24 22:42:19395 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: trying to establish ws connection to ::1:14200
-- 11/04/24 22:42:19395 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: sent ws connection HTTP upgrade request
   local address = ::1:60244
   remote address = ::1:14200

== hang ==

-- 11/04/24 22:42:20576 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: failed to establish ws connection
   local address = <not available>
   remote address = ::1:14200
   connection establishment timed out

server

-- 11/04/24 22:42:19394 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Network: accepting ws connections at ::1:14200
-- 11/04/24 22:42:19394 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Protocol: sending reply 
   message type = 2 (reply)
   compression status = 0 (not compressed; do not compress response, if any)
   message size = 25
   request id = 2
   reply status = 0 (ok)
   encoding = 1.1
-- 11/04/24 22:42:19394 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Network: trying to accept ws connection
   local address = ::1:14200
   remote address = ::1:60207

== hang ==

-- 11/04/24 22:42:20577 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/server/Win32/Release/server.exe: Network: received ws data frame with payload length of 53 bytes
   local address = ::1:14201
   remote address = ::1:60206

Interestingly, the server hangs accepting another connection (from port 60207).

This connection attempt from the client timed out earlier:

# client trace
-- 11/04/24 22:42:17377 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: trying to establish ws connection to ::1:14200
-- 11/04/24 22:42:17378 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: sent ws connection HTTP upgrade request
   local address = ::1:60207
   remote address = ::1:14200
-- 11/04/24 22:42:18384 D:/a/ice/ice/cpp/test/Ice/timeout/msbuild/client/Win32/Release/client.exe: Network: failed to establish ws connection
   local address = <not available>
   remote address = ::1:14200
   connection establishment timed out

@bernardnormier
Copy link
Member Author

Debugging on a slow Windows VM shows the root cause is the slow collection of stack traces on Windows (ws + Win32). It should be fixed by #3110.

@bernardnormier
Copy link
Member Author

The root cause was indeed stack trace printing on Windows, specifically:

@bernardnormier
Copy link
Member Author

#3110 didn't fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment