Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CommissioningComplete command fails with CHIP Error 0x00000032: Timeout in python controller #37075

Open
Harshith-GRL opened this issue Jan 15, 2025 · 3 comments
Labels

Comments

@Harshith-GRL
Copy link
Contributor

Harshith-GRL commented Jan 15, 2025

Description

When executing the script TC_Darwin_pair.py as a part of stress testing, in the CASE session when sending CommissionComplete command there is a exception being thrown which is chip.exceptions.ChipStackError: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:124: CHIP Error 0x00000032: Timeout. This script will mimic on a high level how the pairing occurs in the Darwin ecosystem. Here there are two fabrics that will be created, in Some cases commission complete fails for the first fabric but in some scenarios it fails for the second fabric.

GitHub hash of the SDK that were being used are : f8d457a.

Platforms

I Have performed the execution on Two different DUT (Raspberrypi and Nordic board), using ble-wifi and ble-thread provisioning
Test Bed -1
Controller Platform - python controller running on Raspberrypi 4B -8GB,
Nordic Board as DUT - all-cluster-app running on Nordic nrf52840-DK board
Provisioning method - Ble-thread

Test-bed 2
Controller Platform - python controller running on linux Laptop,
Raspberrypi as DUT - all-cluster-app running on Raspberrypi 4B -8GB
Provisioning method - Ble-Wifi

Test scenario

  1. Pair/Provision the DUT (All-cluster-app running on raspi) as per drawiin pairing flow using ble-wifi/ ble-thread.
  2. Read device information after the device is provisioned
  3. sleep for 2 secs
  4. Unpair the DUT (i.e. removing the fabric)
  5. Factory reset the DUT
  6. repeat step1 - 6 for 500 iterations
  7. Script reference: https://github.com/CHIP-Specifications/matter-qa/blob/main/src/matter_qa/scripts/reliability_scripts/TC_Darwin_Pair.py

Execution Details

Test-Bed 1
NOTE : In this Test bed for Iteration 11 the first controller is not able send commissionComplete command and in Iteration 8 the second controller is not able to send commissionComplete Command.
Git hub hash id: f8d457a
Iterations scheduled to run: 500
Iterations Failed: 3
Iterations Passed: 22
Python controller is running on raspberrpi with Nordic nrf52840 as dut with commissioning method is ble-thread.
Execution Logs (logs consists of two Pass iterations and two Fail Iterations):
nordic_device_as_dut.zip

Test Bed 2
Git hub hash id: f8d457a
Iterations scheduled to run: 500
Iterations Passed: 225
Iterations Failed: 38
Python controller is running on linux Laptop with chip-all-cluster-app on rapsberrypi machine as dut with commissioning method is ble-wifi.
Execution Logs (logs consists of two Pass iterations and one Fail Iterations):
laptop_python_contorller_logs.zip

@bzbarsky-apple
Copy link
Contributor

So looking at the nordic_device_as_dut.zip bits, under fail_log and iteration_8_fail, the python_contorller_log_2025-01-07_12-12-01.log, what I see is that we AddNOC, then do the operational resolve and send the Sigma1, but then never get an ack or response:

2025-01-07 12:12:46,228 - root - INFO - matter_testing_support.py:print_step:1327 - ***** Test Step 8-26 : Send AddNoC(0x06) from OperationalCredendials (0x3e) 
...
2025-01-07 12:12:46,235 - chip.native.EM - INFO - __init__.py:_RedirectToPythonLogging:41 - <<< [E:43878i S:48125 M:10597041] (S) Msg TX from 000000000001B669 to 1:000000009B2CD111 [73CB] [UDP:[fd6b:a34d:88a5:1:bec1:e9f5:429d:2345]:5540] --- Type 0001:08 (IM:InvokeCommandRequest) (B:569)
...
2025-01-07 12:12:49,134 - chip.native.EM - INFO - __init__.py:_RedirectToPythonLogging:41 - <<< [E:43879i S:0 M:137805163] (U) Msg TX from 57D12704719A79E5 to 0:0000000000000000 [0000] [UDP:[fd6b:a34d:88a5:1:5114:d666:807f:dd6b]:5540] --- Type 0000:30 (SecureChannel:CASE_Sigma1) (B:234)
...
2025-01-07 12:13:12,891 - chip.native.EM - ERROR - __init__.py:_RedirectToPythonLogging:39 - <<5 [E:43879i S:0 M:137805163] (U) Msg Retransmission to 0:0000000000000000 failure (max retries:4)

On the server side, Dut_log_8_2025-01-07T12_12_01_346617.log, I see us get the AddNOC but then nothing received on exchange 43879:

I: 51752 [ZCL]OpCreds: Received an AddNOC command
...
I: 52169 [EM]<<< [E:43878r S:18456 M:87125946 (Ack:10597041)] (S) Msg TX from 000000009B2CD111 to 1:000000000001B669 [73CB] [UDP:[fd00:db8::ee0f:8608:a097:849e]:33639] --- Type 0001:09 (IM:InvokeCommandResponse) (B:70)
...
I: 52943 [DL]SRP Client was started, detected server: fd00:0db9:0000:0000:b607:10d5:4b5a:13d7
D: 52952 [DL]OpenThread State Changed (Flags: 0x00000200)
...
E: 55059 [DL]SRP update error: timed out waiting on server response
...
D: 57409 [DL]SRP update succeeded

Someone needs to loop at the pcap file there to see what's going on with the Thread packets, but that's a pcap from the controller, not the Thread border router, right @Harshith-GRL ?

@Harshith-GRL
Copy link
Contributor Author

You're right, the file I attached above is from controller side only, let me try to capture the tcpdump on border router, is there any specific filter that i need to lookout for when capturing the tcpdump or do you need me to capture all packets?

@bzbarsky-apple
Copy link
Contributor

@Harshith-GRL I don't know... might be best to capture all packets and do the filtering later when looking at the capture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants