-
Notifications
You must be signed in to change notification settings - Fork 24
Resilience Scenarios
xzfc edited this page Dec 27, 2020
·
3 revisions
There is a list of unwanted scenarios that may happen during a test that may interrupt the test flow and/or invalidate its results.
- Connection between the client and the server lost due to network failure.
- The client crashed.
- The workload crashed.
- The server crashed.
- Analyzer or PTDaemon issue occurred.
- USB/network connection between a server and an analyzer lost.
- Connection between a server and PTDaemon failed.
- Analyzer switched off, unplugged, or crashed.
At a bare minimum, we should detect each of these scenarios, interrupt the test, and let the user know that the test is interrupted unexpectedly. A message "Test completed successfully" should not appear in a case of failure; it should be trustworthy.
Taking a step further, we may perform an attempt to recover from these situations if it is possible. Here is a table of applicable recovery ways for each failure scenario.
handshake | ranging | prepare logs |
send logs |
testing | prepare logs |
send logs |
|
---|---|---|---|---|---|---|---|
Network failure | - | ||||||
Client crashed | - | 1 | 1 | 1 | 1 | 1 | |
Workload crashed | - | - | - | - | - | ||
Server crashed | - | 2 | 2 | 2 | 2 | 2 | |
Analyzer issue |
-
Restart workload and start power measurement again.
Current phase logs are invalidated since they are not complete.
It may be done either fully automatically (e.g.
--max-tries 5
) or manual (e.g.--continue
). -
Client should reconnect to the server and resend messages that are lost (if any).
- 1 — The client should be able to store its state on a disc rather than in-memory. This would require a manual restart of the client.
- 2 — The server should be able to store its state on a disc rather than in-memory. This would require a restart of the server.
- Restart PTDaemon and reconnect to the analyzer.