Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug: workaround for sporadic failures of some tests due to unexpected data present in pexpect match #555

Merged
merged 1 commit into from
May 16, 2024

Conversation

aap-sc
Copy link
Collaborator

@aap-sc aap-sc commented May 14, 2024

Problem was observed on UnavailableMultiTest - this test was sporadically failing. When the failure was observed the log of the failing test looked as follows:

  File "/whatever/RISCVTests/debug/testlib.py", line 504, in <genexpr>
    if all(targets[hart.id]["State"] == "running" for hart in harts):
           ~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'State'

Adding this modification to testlib.py

--- a/debug/testlib.py
+++ b/debug/testlib.py
@@ -498,6 +498,10 @@ class Openocd:
         for line in lines[2:]:
             if line.strip():
                 data.append(dict(zip(headers, line.split()[1:])))
+        str_data = str(data)
+        sys.stdout.flush()
+        sys.stdout.write(f"parsed targets:\n{result}\n===\n{str_data}\n---\n")
+        sys.stdout.flush()
         return data

Allowed me to root cause the issue. Namely we have the following situation:

parsed targets:
Exception ignored in: <function _TemporaryFileCloser.__del__ at 0x7f2dee64d1c0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/tempfile.py", line 450, in __del__
    self.close()
  File "/usr/local/lib/python3.11/tempfile.py", line 446, in close
    unlink(self.name)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/[email protected]'
...
    TargetName         Type       Endian TapName            State
--  ------------------ ---------- ------ ------------------ ------------
 0  riscv.cpu0         riscv      little riscv.cpu          running
 1* riscv.cpu1         riscv      little riscv.cpu          running
===
[{'Exception': '"/usr/local/lib/python3.11/tempfile.py",', 'ignored': 'line', 'in:': '450,', ...

The only reasonable (to me) explanation for the observed behavior is below.

Here is how we connect to TCL-RPC server:

  self.openocd_cli = pexpect.spawn(f"nc localhost {self.tclrpc_port}")
  tty.setraw(self.openocd_cli.child_fd)

Later we request target list by issuing "targets" command:

  self.command("targets")

Internally, pexpect.spawn implemented as follows:

  • we fork the process
  • set up pty and then call execve
  • all these steps are written in python

"Exception ignored" messages are result of exceptions thrown from finalizers of NamedTemporaryFile objects. When exception is thrown from the finalizer - python unconditionally prints a "warning" to stderr. It seems that these messages are polluting our output from "nc" since python GC can be invoked before the execve syscall.

The workaround is just to make sure that execve was executed before we rely on the format of command output. To have such a guarantee we just issue a dummy "echo" command and check that we have a proper reply in the output stream.

While this explanation looks convincing, the behavior above still looks strange, given that we have https://bugs.python.org/issue14548 which was resolved long ago.

However, the proposed workaround fixes the issue.

@aap-sc aap-sc requested a review from en-sc May 14, 2024 13:39
@aap-sc
Copy link
Collaborator Author

aap-sc commented May 14, 2024

@TommyMurphyTM1234 FYI

…ed data present in pexpect match

Problem was observed on UnavailableMultiTest - this test was sporadically failing.
When the failure was observed the log of the failing test looked as follows:

```
  File "/whatever/RISCVTests/debug/testlib.py", line 504, in <genexpr>
    if all(targets[hart.id]["State"] == "running" for hart in harts):
           ~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'State'
```

Adding this modification to testlib.py

```
--- a/debug/testlib.py
+++ b/debug/testlib.py
@@ -498,6 +498,10 @@ class Openocd:
         for line in lines[2:]:
             if line.strip():
                 data.append(dict(zip(headers, line.split()[1:])))
+        str_data = str(data)
+        sys.stdout.flush()
+        sys.stdout.write(f"parsed targets:\n{result}\n===\n{str_data}\n---\n")
+        sys.stdout.flush()
         return data
```

Allowed me to root cause the issue. Namely we have the following
situation:

```
parsed targets:
Exception ignored in: <function _TemporaryFileCloser.__del__ at 0x7f2dee64d1c0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/tempfile.py", line 450, in __del__
    self.close()
  File "/usr/local/lib/python3.11/tempfile.py", line 446, in close
    unlink(self.name)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/[email protected]'
...
    TargetName         Type       Endian TapName            State
--  ------------------ ---------- ------ ------------------ ------------
 0  riscv.cpu0         riscv      little riscv.cpu          running
 1* riscv.cpu1         riscv      little riscv.cpu          running
===
[{'Exception': '"/usr/local/lib/python3.11/tempfile.py",', 'ignored': 'line', 'in:': '450,', ...
```

The only reasonable (to me) explanation for the observed behavior is below.

Here is how we connect to TCL-RPC server:

```
  self.openocd_cli = pexpect.spawn(f"nc localhost {self.tclrpc_port}")
  tty.setraw(self.openocd_cli.child_fd)
```

Later we request target list by issuing "targets" command:

```
  self.command("targets")
```

Internally, pexpect.spawn implemented as follows:

- we fork the process
- set up pty and then call execve
- all these steps are written in python

"Exception ignored" messages are result of exceptions thrown from
finalizers of NamedTemporaryFile objects. When exception is thrown from
the finalizer - python unconditionally prints a "warning" to stderr. It
seems that these messages are polluting our output from "nc" since python
GC can be invoked before the execve syscall.

The workaround is just to make sure that execve was executed before we
rely on the format of command output. To have such a guarantee we just
issue a dummy "echo" command and check that we have a proper reply in the
output stream.

While this explanation looks convincing, the behavior above still looks
strange, given that we have https://bugs.python.org/issue14548 which
was resolved long ago.

However, the proposed workaround fixes the issue.
@aap-sc aap-sc force-pushed the aap-sc/sporadic_faulure_fix branch from cdd91da to db7e451 Compare May 14, 2024 14:09
@en-sc
Copy link
Collaborator

en-sc commented May 14, 2024

@aap-sc, I think it would be better to use Python's socket API with socket_pexpect instead of netcat.
AFAIU, this will eliminate the fork and therefore fix the issue.

@aap-sc
Copy link
Collaborator Author

aap-sc commented May 14, 2024

@aap-sc, I think it would be better to use Python's socket API with socket_pexpect instead of netcat. AFAIU, this will eliminate the fork and therefore fix the issue.

Yes, it will. Though I'm not sure that it is easy to to manage socket in python layer. I'm somewhat scared of low-level socket programming. will investigate, though

@aap-sc
Copy link
Collaborator Author

aap-sc commented May 14, 2024

@en-sc I've investigated the situation. The thing is that socket_pexpect is available only in pexpect 4.9. This is not a stable version. I'm not sure we should use a non-stable version of packages, given that the existing solution works. I pinky-promise to keep my eyes open and transition to pexpect 4.9 once it is considered stable.

@en-sc en-sc merged commit 408e461 into riscv-software-src:master May 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants