Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drm/stm: ltdc: fix pinctrl recovery after sleep #17

Open
wants to merge 1 commit into
base: v5.10-stm32mp
Choose a base branch
from

Conversation

dougg3
Copy link

@dougg3 dougg3 commented Apr 28, 2022

I encountered a problem where an RGB TFT LCD panel attached to an STM32MP1 would stop working after the framebuffer was put to sleep and woken back up. The reason was that the sleep pinctrl was applied when the framebuffer went to sleep, but the default pinctrl wasn't being restored after the LTDC woke up. This is because an older commit moved the LTDC default pinctrl configuration to ltdc_encoder_mode_set, which doesn't get called again after it wakes up.

This fix was tested in the v5.4-stm32mp branch on a custom STM32MP1 board. I tested by blanking and then unblanking the framebuffer. Before this fix, the LCD display got garbled after unblanking. With this fix, it recovers correctly.

echo 1 > /sys/class/graphics/fb0/blank
echo 0 > /sys/class/graphics/fb0/blank

When the encoder is disabled, the pinctrl goes to sleep state. If the
encoder is re-enabled after that, the pinctrl needs to go back to
default state. This wasn't happening after the pinctrl setup was moved
to ltdc_encoder_mode_set.

Without this fix, if the framebuffer is put to sleep and then awakened
(e.g. using /sys/class/graphics/fb0/blank), it doesn't recover properly.

Signed-off-by: Doug Brown <[email protected]>
Fixes: f412af1 ("drm/stm: ltdc: move pinctrl to encoder mode set")
mcarlin-ds pushed a commit to DatumSystems/linux that referenced this pull request Oct 2, 2022
[ Upstream commit 4224cfd ]

When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.

    [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
    [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
    ...
    [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

    crash> bt
    ...
    PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
    ...
     #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
        [exception RIP: dma_pool_alloc+0x1ab]
        RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
        RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
        RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
        RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
        R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
        R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
    #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
    #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
    STMicroelectronics#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
    STMicroelectronics#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
    STMicroelectronics#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
    STMicroelectronics#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
    STMicroelectronics#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
    STMicroelectronics#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
    STMicroelectronics#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
    STMicroelectronics#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
    STMicroelectronics#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
    STMicroelectronics#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
    STMicroelectronics#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
    STMicroelectronics#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
    STMicroelectronics#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
    STMicroelectronics#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

    crash> net_device.state ffff89443b0c0000
      state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

To prevent this scenario, we also make sure that the netdevice is present.

Signed-off-by: suresh kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mcarlin-ds pushed a commit to DatumSystems/linux that referenced this pull request Oct 2, 2022
[ Upstream commit f22881d ]

In calipso_map_cat_ntoh(), in the for loop, if the return value of
netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when
netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses
of bitmap[byte_offset] occurs.

The bug was found during fuzzing. The following is the fuzzing report
 BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0
 Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252

 CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ STMicroelectronics#17
 Hardware name: linux,dummy-virt (DT)
 Call trace:
  dump_backtrace+0x21c/0x230
  show_stack+0x1c/0x60
  dump_stack_lvl+0x64/0x7c
  print_address_description.constprop.0+0x70/0x2d0
  __kasan_report+0x158/0x16c
  kasan_report+0x74/0x120
  __asan_load1+0x80/0xa0
  netlbl_bitmap_walk+0x3c/0xd0
  calipso_opt_getattr+0x1a8/0x230
  calipso_sock_getattr+0x218/0x340
  calipso_sock_getattr+0x44/0x60
  netlbl_sock_getattr+0x44/0x80
  selinux_netlbl_socket_setsockopt+0x138/0x170
  selinux_socket_setsockopt+0x4c/0x60
  security_socket_setsockopt+0x4c/0x90
  __sys_setsockopt+0xbc/0x2b0
  __arm64_sys_setsockopt+0x6c/0x84
  invoke_syscall+0x64/0x190
  el0_svc_common.constprop.0+0x88/0x200
  do_el0_svc+0x88/0xa0
  el0_svc+0x128/0x1b0
  el0t_64_sync_handler+0x9c/0x120
  el0t_64_sync+0x16c/0x170

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Yufen <[email protected]>
Acked-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mcarlin-ds pushed a commit to DatumSystems/linux that referenced this pull request Apr 18, 2023
[ Upstream commit 4e264be ]

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 STMicroelectronics#13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 STMicroelectronics#14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 STMicroelectronics#15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 STMicroelectronics#16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 STMicroelectronics#17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 STMicroelectronics#18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 STMicroelectronics#19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 STMicroelectronics#20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 STMicroelectronics#21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 STMicroelectronics#22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@dougg3
Copy link
Author

dougg3 commented Apr 25, 2023

Is there a realistic world in which this PR gets merged at this point? Or should I try submitting it to the upstream kernel instead?

@BernardPuel
Copy link

Hello Doug, sorry for the delay and thanks for your contribution. I will push internally for rapid analysis and recovery of normal process.

@BernardPuel
Copy link

The fix is already part of last delivery (V4.1). It will be also backported (as your request) on V3.1.3 release (this summer).

@dougg3
Copy link
Author

dougg3 commented Apr 28, 2023

Hi Bernard,

Thanks for looking into this. Looking at the latest code in the v5.15-stm32mp branch of this repository it looks like the problem is still there (ltdc_encoder_enable doesn't restore the default pinctrl) -- is there another place I should be looking at for the fix?

@BernardPuel
Copy link

after review, the first patch was not in V4.1 but is planned to be also delivered for V4.1.1. So a review will be done to check all the use cases are taken into account with the 2 patches for V4.1.1.

@no111u3
Copy link

no111u3 commented May 12, 2023

@BernardPuel so there is the same behaviour for DSI too but on different kernel version (5.15.67) with run on board STM32MP157F-DK2, but patch fixed only non-DSI part of code.

@BernardPuel
Copy link

Here is the patch planned for V4.1.1:

diff --git a/drivers/gpu/drm/stm/ltdc.c b/drivers/gpu/drm/stm/ltdc.c
index 3dcd63f..f7b9258 100644
--- a/drivers/gpu/drm/stm/ltdc.c
+++ b/drivers/gpu/drm/stm/ltdc.c
@@ -1060,6 +1060,20 @@
 	}
 }
 
+static int ltdc_crtc_atomic_check(struct drm_crtc *crtc,
+				  struct drm_atomic_state *state)
+{
+	struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc);
+
+	DRM_DEBUG_ATOMIC("\n");
+
+	/* force a full mode set if active state changed */
+	if (crtc_state->active_changed)
+		crtc_state->mode_changed = true;
+
+	return 0;
+}
+
 static bool ltdc_crtc_get_scanout_position(struct drm_crtc *crtc,
 					   bool in_vblank_irq,
 					   int *vpos, int *hpos,
@@ -1120,6 +1134,7 @@
 	.atomic_flush = ltdc_crtc_atomic_flush,
 	.atomic_enable = ltdc_crtc_atomic_enable,
 	.atomic_disable = ltdc_crtc_atomic_disable,
+	.atomic_check = ltdc_crtc_atomic_check,
 	.get_scanout_position = ltdc_crtc_get_scanout_position,
 };

@no111u3
Copy link

no111u3 commented May 12, 2023

@BernardPuel thank you for patch, it works on manual test, will you have plans to release to this repo?
Thank you.

@BernardPuel
Copy link

Thanks for your feedback. Yes in this repo but not in 5.10 kernel branch. Only 5.15.

mcarlin-ds pushed a commit to DatumSystems/linux that referenced this pull request Sep 13, 2024
[ Upstream commit e3e82fc ]

When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a
cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when
removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be
dereferenced as wrong struct in irdma_free_pending_cqp_request().

  PID: 3669   TASK: ffff88aef892c000  CPU: 28  COMMAND: "kworker/28:0"
   #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34
   #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2
   #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f
   #3 [fffffe0000549eb8] do_nmi at ffffffff81079582
   #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4
      [exception RIP: native_queued_spin_lock_slowpath+1291]
      RIP: ffffffff8127e72b  RSP: ffff88aa841ef778  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: ffff88b01f849700  RCX: ffffffff8127e47e
      RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffffffff83857ec0
      RBP: ffff88afe3e4efc8   R8: ffffed15fc7c9dfa   R9: ffffed15fc7c9dfa
      R10: 0000000000000001  R11: ffffed15fc7c9df9  R12: 0000000000740000
      R13: ffff88b01f849708  R14: 0000000000000003  R15: ffffed1603f092e1
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  -- <NMI exception stack> --
   #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b
   #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4
   #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363
   #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma]
   #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma]
   #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma]
   #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma]
   #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb
   STMicroelectronics#13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6
   STMicroelectronics#14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278
   STMicroelectronics#15 [ffff88aa841efb88] device_del at ffffffff82179d23
   STMicroelectronics#16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice]
   STMicroelectronics#17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice]
   STMicroelectronics#18 [ffff88aa841efde8] process_one_work at ffffffff811c589a
   STMicroelectronics#19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff
   STMicroelectronics#20 [ffff88aa841eff10] kthread at ffffffff811d87a0
   STMicroelectronics#21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f

Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: "Ismail, Mustafa" <[email protected]>
Signed-off-by: Shifeng Li <[email protected]>
Reviewed-by: Shiraz Saleem <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants