Add suspend stats related check (Bugfix) #1700

hanhsuan · 2025-01-24T03:01:01Z

Description

WARNING: This modifies com.canonical.certification::sru-server

According to the suspend_advanced_auto is the dependency for all after suspend jobs, it would be better to do more tests for this concept to fix #1579.
Therefore, this PR is the first step to collect information by:

valid_suspend_status is used to check success should be non zero
any_fail_during_supend is used to check fail should be zero

If everything woks well, valid_suspend_status will be merge into suspend_advanced_auto in the future.

Resolved issues

#1579

Documentation

Tests

succ 202001-27667
fail 202412-36131

1. test case is_suspend_success will be merged into suspend_advanced_auto 2. is_device_supend_success is used to check any fail

codecov · 2025-01-24T03:03:10Z

Codecov Report

Attention: Patch coverage is 92.15686% with 4 lines in your changes missing coverage. Please review.

Project coverage is 49.16%. Comparing base (e5b7c6f) to head (6021828).
Report is 26 commits behind head on main.

Files with missing lines	Patch %	Lines
providers/base/bin/suspend_stats.py	92.15%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1700      +/-   ##
==========================================
+ Coverage   49.01%   49.16%   +0.15%     
==========================================
  Files         372      374       +2     
  Lines       40348    40481     +133     
  Branches     6817     6845      +28     
==========================================
+ Hits        19777    19903     +126     
- Misses      19849    19853       +4     
- Partials      722      725       +3

Flag	Coverage Δ
provider-base	`25.38% <92.15%> (+0.58%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Hook25

Well done, I really like what I see and I wouldn't be against merging (the half that checks we did actually suspend) in suspend advanced auto.

I have 2 big worries:

Is this supported on any kernel we have to support? Even xenial? Can you check? If not, what do we do about it?
Does this work from a strict snap? Can you check?

providers/base/bin/suspend_stats.py

Hook25 · 2025-01-29T17:09:37Z

providers/base/bin/suspend_stats.py

+        return (
+            self.contents["success"] != "0"
+            and self.contents["failed_prepare"] == "0"
+            and self.contents["failed_suspend"] == "0"
+            and self.contents["failed_resume"] == "0"
+        )


Big questoion is, why are we comparing these values as strings? Is it even possible that these are not valid integers? if that happens I suppose its a big deal, for this reason I've also suggested to remove the utf8/ignore errors above, if sysfs is broken we want to know

Another big question is what do we do if one or more of these files is missing? is it possible? if it is, when does it happen? should we have a default for those situations?

providers/base/bin/suspend_stats.py

providers/base/units/suspend/suspend.pxu

providers/base/units/suspend/test-plan.pxu

Hook25 · 2025-01-31T16:35:53Z

Adding this I dug up from the kernel docs: https://github.com/torvalds/linux/blob/69e858e0b8b2ea07759e995aa383e8780d9d140c/Documentation/ABI/testing/sysfs-power#L317C9-L317C40

If this means what I think it means, then we need a fallback mechanism

hanhsuan · 2025-02-12T03:36:19Z

Adding this I dug up from the kernel docs: https://github.com/torvalds/linux/blob/69e858e0b8b2ea07759e995aa383e8780d9d140c/Documentation/ABI/testing/sysfs-power#L317C9-L317C40

If this means what I think it means, then we need a fallback mechanism

I've checked it on the 201604-21849

ubuntu@201604-21849:/sys/power$ grep -r "" *
disk:[platform] shutdown reboot suspend 
image_size:6662103040
pm_async:1
pm_freeze_timeout:20000
pm_print_times:0
pm_test:[none] core processors platform devices freezer
pm_trace:0
pm_trace_dev_match:usb
grep: pm_wakeup_irq: No data available
reserved_size:1048576
resume:0:0
state:freeze mem disk
wake_lock:
wake_unlock:
wakeup_count:0

The things might be used to check are

ubuntu@201604-21849:~$ grep "PM\:" 1604_suspend.log 
Feb 11 22:13:20 201604-21849 kernel: PM: Syncing filesystems ... done.
Feb 11 22:13:20 201604-21849 kernel: PM: Preparing system for sleep (mem)
Feb 11 22:13:51 201604-21849 kernel: PM: Suspending system (mem)
Feb 11 22:13:51 201604-21849 kernel: PM: suspend of devices complete after 1155.793 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: late suspend of devices complete after 20.104 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: noirq suspend of devices complete after 17.105 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: Saving platform NVS memory
Feb 11 22:13:51 201604-21849 kernel: PM: Restoring platform NVS memory
Feb 11 22:13:51 201604-21849 kernel: PM: noirq resume of devices complete after 15.017 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: early resume of devices complete after 17.481 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: resume of devices complete after 449.473 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: Finishing wakeup.

and

ubuntu@201604-21849:~$ grep "systemd-sleep" 1604_suspend.log 
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: Selected interface 'mlan0'
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: OK
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: Suspending system...
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: System resumed.
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: Selected interface 'mlan0'
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: OK

Do you think it is enough to check the output of journal -b 0 -g systemd-sleep should not include error for validating the suspend status in 16.04 and 18.04 ? The log file is uploaded to CHECKBOX-1642

hanhsuan · 2025-02-17T00:08:38Z

According to this, I think the failed_* indicates one or more device failed. Therefore the kernel changes the success from 0 to 1.
The device failed log:

Feb 17 07:49:24 ubuntu-202412-36132 kernel: spd5118 2-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Feb 17 07:49:24 ubuntu-202412-36132 kernel: spd5118 2-0050: PM: failed to resume async: error -6

Therefore, I would like to change the method is_after_suspend to

    def is_after_suspend(self) -> bool:
        """
        The system is under after suspend status or not

        :returns: return Ture while system is under after suspend status
        """
        return self.contents["success"] != "0"

and is_any_failed to

    def is_any_failed(self) -> bool:
        """
        Is any failed during suspend

        :returns: return Ture while one failed during suspend
        """
        for c, v in self.contents.items():
            if c.startswith("fail") and v != "0":
                return True
        return False

@Hook25 How do you think ?

1. is_after_suspend --> success != 0 2. is_any_failed --> fail* != 0

hanhsuan · 2025-02-21T00:38:01Z

New check condition test result:

PASS
-- 16.04
FAIL
-- suspend failed
-- device failed

hanhsuan added 2 commits January 24, 2025 10:45

This is the first step to fix #1579.

9b1cd14

1. test case is_suspend_success will be merged into suspend_advanced_auto 2. is_device_supend_success is used to check any fail

fix black error

c0c2536

hanhsuan added 2 commits January 24, 2025 13:30

Change to check fail only

10872f3

fix black error

3684954

hanhsuan marked this pull request as ready for review January 24, 2025 05:38

hanhsuan changed the title ~~Add suspend stats releated check (Bugfix)~~ Add suspend stats related check (Bugfix) Jan 24, 2025

pieqq assigned Hook25 Jan 24, 2025

Hook25 requested changes Jan 29, 2025

View reviewed changes

hanhsuan added 3 commits February 16, 2025 16:33

Following the comments to fix the scripts and pxu files

95742b5

Fix black error

6a31743

Fix script error on python 3.5

b347f1d

hanhsuan added 2 commits February 17, 2025 08:09

change user to root to make it could read debugfs

16c1844

Change the check conditions:

6021828

1. is_after_suspend --> success != 0 2. is_any_failed --> fail* != 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add suspend stats related check (Bugfix) #1700

Add suspend stats related check (Bugfix) #1700

hanhsuan commented Jan 24, 2025 •

edited

Loading

codecov bot commented Jan 24, 2025 •

edited

Loading

Hook25 left a comment •

edited

Loading

Hook25 Jan 29, 2025

Hook25 commented Jan 31, 2025

hanhsuan commented Feb 12, 2025 •

edited by jira bot

Loading

hanhsuan commented Feb 17, 2025 •

edited

Loading

hanhsuan commented Feb 21, 2025

Add suspend stats related check (Bugfix) #1700

Are you sure you want to change the base?

Add suspend stats related check (Bugfix) #1700

Conversation

hanhsuan commented Jan 24, 2025 • edited Loading

Description

WARNING: This modifies com.canonical.certification::sru-server

Resolved issues

Documentation

Tests

codecov bot commented Jan 24, 2025 • edited Loading

Codecov Report

Hook25 left a comment • edited Loading

Choose a reason for hiding this comment

Hook25 Jan 29, 2025

Choose a reason for hiding this comment

Hook25 commented Jan 31, 2025

hanhsuan commented Feb 12, 2025 • edited by jira bot Loading

hanhsuan commented Feb 17, 2025 • edited Loading

hanhsuan commented Feb 21, 2025

hanhsuan commented Jan 24, 2025 •

edited

Loading

codecov bot commented Jan 24, 2025 •

edited

Loading

Hook25 left a comment •

edited

Loading

hanhsuan commented Feb 12, 2025 •

edited by jira bot

Loading

hanhsuan commented Feb 17, 2025 •

edited

Loading