Feature: Non-hanging semihosting #1507

ALTracer · 2023-06-07T00:01:48Z

Detailed description

BMP firmware already implements a debugging mode: when building via make PROBE_HOST=native ENABLE_DEBUG=1,
the corresponding newlib/libgloss rdimon (rdimon_nano) is linked, routing many syscalls to an ARM RDI Monitor (Angel SWI).
In presence of an upstream debugging adapter, the developer should issue a monitor semihosting enable in the relevant gdbserver implementation (OpenOCD, JLinkGDBServer, or even another Blackmagic) to start debugging the BMP firmware itself.

However, on ARMv7-M cores, the initialise_monitor_handles function issues a 0xAB breakpoint in usb_serial_set_config(). In absence of upstream debugger this escalates to a HardFault immediately. libopencm3 has a default blocking_handler as a weak symbol for hard_fault_handler. This leads to the probe firmware hanging until reset.

The rest of possible operations actually use a 0xBEAB (T16 breakpoint), too. Only Cortex-A/R MPUs and older ARM9 cores can use a swi software interrupt, hence AngelSWI. Whether SVCall (svc) is available and codegen-able in supported libc implementations for ARMv6-M/v7-M is a valid discussion topic. Note that libopencm3 requires gcc-6.

A new feature consists of a proper hard_fault_handler which allows detecting semihosting breakpoints and skipping over them in case the firmware was built with debugging enabled, but no external debugger was attached.
This solves an existing problem of a hanging BMP on ENABLE_DEBUG=1 builds.
Additionally, debug_print/platform_printf traces of BMP are available on USB CDC-ACM endpoint, like on BMDA-hosted, without attaching an upstream debugger. This allows observing the blackmagic remote protocol part implemented in probe firmware without halting, which may help in BMDA+BMP remote protocol features development. If flash space allows, we could even enable all the debug levels for target/gdb/proto/wire in the probe.
This PR solves the problem by adding a couple small-ish functions in cold path and replacing one vector. Approx. .text increase is ~200 bytes, and none of this code was required in release builds.

Your checklist for this pull request

I've read the Code of Conduct
I've read the guidelines for contributing to this repository
It builds for hardware native (make PROBE_HOST=native)
It builds as BMDA (make PROBE_HOST=hosted)
I've tested it to the best of my ability
My commit messages provide a useful short description of what the commits do

I tested builds with a GNU Tools for STM32 10.3-2021 from STM32CubeIDE 1.10 package, as well as Ubuntu/jammy gcc-10.3-2021 (insufficient newlib optimization). For native I had to drop like 5 targets from the Makefile to fit info ROM space.

I wanted to not have dual debuggers on my desk just to tinker with adding target support while working off a blackpill-f411ce.
This PR should be a draft before proper testing for debug-enabled platforms, the closest to native hardware I have is a bluepill/stm32f103cbt6. I'd like to move the semihosting and ARM interrupt handling stuff into a separate file, as it does not really belong to usb_serial.c apart from _write calling debug_serial_debug_write(). A far simpler solution would be to decouple serial traces from semihosting.

Actual semihosting, when enabled, should not break. A probe platform MCU from reset used to either quickly reach a hardfault, or break into the already connected upstream debugger.

Code inspiration from McuOnEclipse, uOS++ IIIe, NuttX and maybe SEGGER wiki.

* Provide a naked hard_fault_handler in GCC inline assembly * Provide a C handler to detect failed semihosting breakpoints * Spoof the one RDI_SYS_OPEN call in case the Angel is missing

tlyu · 2024-01-19T19:28:12Z

It seems like a cleaner approach is to ensure that no IRQs are enabled at priority 0. A hard fault implies a serious programming error, and making a nontrivial handler for it seems unwise.

I don't currently experience hard faults or lockups with debug output enabled on my native BMP (hardware 2.3), even without my libnosys debug output patch. Which probe platforms were you seeing the hardfault on? On native, only IRQ_PRI_TRACE is 0.

ALTracer · 2024-01-19T20:33:57Z

I don't currently experience hard faults or lockups with debug output enabled on my native BMP (hardware 2.3), even without my libnosys debug output patch. Which probe platforms were you seeing the hardfault on? On native, only IRQ_PRI_TRACE is 0.

I saw it on blackpillv2 or blackpill-f411ce, as it's called now. At the time of writing this PR more than half a year ago, I was not familiar with this codebase (BMF-specific, not BMDA), and did not understand immediately on the first reading of how it changes modes of operation under debug output enabled build. The problem for blackpill-f4 family of $4 boards is fixed in my #1715, more details are there. TL,DR: flip SCS->DHCSR |= MON_EN bit on ARMv7-M platforms; and then Thumb breakpoints, including the one used for semihosting call implementation, get caught by DebugMon exception handler instead of escalating to HardFault exception. The platform was missing that setup, and the breakage got propagated/uncovered after meson merge (previously I stashed/unstashed some patches to pull rdimon etc. every time into workdir)

Note that a) I could not buy a real native BMP hardware adapter over all this time, due to pricing/payment/shipping complications; b) trying to stuff 20 KiB of debug messages into the already full 120 KiB payload flash requires dropping multiple target drivers on top of whatever is excluded by default in meson configuration, which is what I recently do building swlink/stlink variants. These behave normally. A 128/512 KiB F411CE does not require any hoop-jumping, and just fits everything enabled out-of-the-box. It is inferior in other ways, but that's outside this topic.

A hard fault implies a serious programming error, and making a nontrivial handler for it seems unwise.

There are a few clever applications for HardFault handler, which are definitely not a serious programming error. Some notable ones include trapping access to unmapped external memory as if it's mapped (see https://github.com/yocto-8/yocto-8/blob/main/doc/extmem.md but they use MemManage) and emulating instructions missing from ISA (UsageFault) (see series of blogposts, shorter one describing mechanism https://dmitry.gr/?r=05.Projects&proj=27.%20m0FaultDispatch , longer one describing application https://dmitry.gr/?r=05.Projects&proj=27.%20rePalm )

It seems like a cleaner approach is to ensure that no IRQs are enabled at priority 0.

As far as BMF is concerned, there are no current problems with interrupt/exception priority selection or inversion. Timer-based Manchester capture driver for SWO (in streaming mode to USB bulk pipe endpoint) has to be the highest urgency for reasons better discussed separately.

This PR is not fully obsolete, either: note that in-tree F072-IF platform is a Cortex-M0, and cannot vector to DebugMon which is absent from ARMv6-M. (future possible rp2040 ports out-of-tree cannot either). Trapping breakpoint insns, checking for C_DEBUG_EN and handing them over to a "fake" debug_monitor_handler_cm0 allows for using the same mechanism without deleting any existing tested/proven working code. I could rework the PR into that way (split out breakpoint handling) and keep the semihosting machinery optionally working (compile-time option). Or we could drop Newlib libgloss/rdimon, ask for just libnosys and implement _swiopen / _swiwrite more cheaply, because BMF AFAIK only ever wants to open() stdout FD and write() logging output into it, with or without _REENT stdio buffering. Other syscalls could fail by default (or upstream inception BMP can service them now that that functionality is unbroken, restored and refactored to be safer and more correct).

I have other plans for hard_fault_handler(), including making it blink all the LEDs in Linux kernel panic Caps+Scroll Lock fashion (instead of just locking up without reboots), which plugs below the other HF cases. Or one could imagine SPI "OTG" Flash is "mapped" into memory space and trap accesses to that region into calling SPI read functions -- this provides if not code space extension, then at least message strings storage, when linked that way (at the cost of a ton of interrupts). Using a proper FS is superior but costs flash space.

common/usb_serial: Implemented non-hanging semihosting

c71f3a5

* Provide a naked hard_fault_handler in GCC inline assembly * Provide a C handler to detect failed semihosting breakpoints * Spoof the one RDI_SYS_OPEN call in case the Angel is missing

ALTracer mentioned this pull request Jun 13, 2023

Fix: Blackpill-F4 and BMP DFU #1508

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Non-hanging semihosting #1507

Feature: Non-hanging semihosting #1507

ALTracer commented Jun 7, 2023

tlyu commented Jan 19, 2024

ALTracer commented Jan 19, 2024

Feature: Non-hanging semihosting #1507

Are you sure you want to change the base?

Feature: Non-hanging semihosting #1507

Conversation

ALTracer commented Jun 7, 2023

Detailed description

Your checklist for this pull request

tlyu commented Jan 19, 2024

ALTracer commented Jan 19, 2024