Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCUBoot swap using scratch bricks after power cut on STM32H5xx using external flash NOR for update storage #2217

Open
MBorto opened this issue Feb 26, 2025 · 10 comments

Comments

@MBorto
Copy link

MBorto commented Feb 26, 2025

I am developing a firmware update system for a project that includes an STM32H5xx microcontroller running Zephyr and an external NOR flash memory connected via the SPI interface.
The internal flash of the microcontroller is 2MB, divided into 128 sectors, each 8KB in size. The external NOR flash has a capacity of 4MB, divided into 8192 pages of 512 bytes each.
Given the differences between the two memory structures, I have used the swap mechanism using the scratch partition.
An area of non-init SRAM is used for sharing measured boot data between bootloader and mainapp.

Here is the current DTS configuration:

&flash0 {
	partitions {
		compatible = "fixed-partitions";
		#address-cells = <1>;
		#size-cells = <1>;

		//SWAP USING SCRATCH EXTERNAL NOR
		boot_partition: partition@0 {
			label = "mcuboot";
			reg = <0x00000000 DT_SIZE_K(128)>;
		};
		slot0_partition: partition@20000 {
			label = "image-0";
			reg = <0x00020000 DT_SIZE_K(1920)>;
		};
	};
};

&spi1 {
	pinctrl-0 = <&spi1_sck_pa5 &spi1_miso_pa6 &spi1_mosi_pd7>;
	pinctrl-names = "default";
	cs-gpios = <&gpioa 4 GPIO_ACTIVE_LOW>;
	status = "okay";

	nor0: at45db321e@0 {
		compatible = "atmel,at45";
		reg = <0>;
		jedec-id = [1F 27 01];
		size = <0x2000000>; // in bits
		sector-size = <DT_SIZE_K(64)>; // in bytes
		block-size = <DT_SIZE_K(4)>; // in bytes
		page-size = <512>; // in bytes
		spi-max-frequency = <DT_FREQ_M(16)>;
		status = "okay";

		partitions {
			compatible = "fixed-partitions";
			#address-cells = <1>;
			#size-cells = <1>;
	
			slot1_partition: partition@10000 {
				label = "image-1";
				reg = <0x00010000 DT_SIZE_K(1920)>;
			};
	
			scratch_partition: partition@1f0000 {
				label = "scratch";
				reg = <0x001f0000 DT_SIZE_K(64)>;
			};
		};
	};
};

Downloading the firmware image to the external flash works fine.
However, when a power cut occurs during the image swap, MCUBoot encounters a BusFault on the next reboot. This seems to be due to an invalid value (0xFFFFFFFF) in the swap_size field, causing the BusFault in the swap_run(...) function in the swap_scratch.c file.

Here are the MCUBoot configuration variables I’m using with Zephyr sysbuild:

# Support for action hooks and led indication on board
CONFIG_MCUBOOT_ACTION_HOOKS=y

# Enable support for serial recovery
CONFIG_MCUBOOT_SERIAL=y
CONFIG_BOOT_SERIAL_UART=y
CONFIG_BOOT_SERIAL_BOOT_MODE=y
CONFIG_BOOT_SERIAL_NO_APPLICATION=y

# We do not have a GPIO to trigger the recovery
CONFIG_BOOT_SERIAL_ENTRANCE_GPIO=n
CONFIG_BOOT_SERIAL_IMG_GRP_IMAGE_STATE=y

# Info logs slow down the update
#CONFIG_MCUBOOT_LOG_LEVEL_WRN=y
CONFIG_MCUBOOT_LOG_LEVEL_DBG=y

# Enable the user to update slot partition 1
CONFIG_MCUBOOT_SERIAL_DIRECT_IMAGE_UPLOAD=y

# Update algorithm
CONFIG_BOOT_SWAP_USING_SCRATCH=y

# Turn on the led1 when in DFU
CONFIG_MCUBOOT_INDICATION_LED=y

# Enable communication between MCUBoot bootloader and MA
CONFIG_RETAINED_MEM=y
CONFIG_RETENTION=y
CONFIG_BOOT_SHARE_DATA=y
CONFIG_BOOT_SHARE_DATA_BOOTINFO=y
CONFIG_BOOT_SHARE_BACKEND_RETENTION=y

# Enable Measured Boot for sharing running image hash (SHA256)
CONFIG_MEASURED_BOOT=y
CONFIG_MEASURED_BOOT_MAX_CBOR_SIZE=256

# MAX image size is: (512B flash sector * 8192 sectors) = 4096 KB -> External flash NOR
CONFIG_BOOT_MAX_IMG_SECTORS=8192

Has anyone encountered this behavior before?
Can someone provide support with this issue?

Thank you in advance for your assistance.

@MBorto MBorto changed the title MCUBoot swap using scratch brick after power cut on STM32H5xx using external flash NOR for update storage MCUBoot swap using scratch bricks after power cut on STM32H5xx using external flash NOR for update storage Feb 26, 2025
@nordicjm
Copy link
Collaborator

Assuming your part contains ECC on the flash memory which results in a fault if you read from memory which was partially written and is corrupt

@MBorto
Copy link
Author

MBorto commented Feb 26, 2025

Hi, and thank you for your prompt response. Based on the datasheet, the external flash memory I’m using (AT45DB321E) doesn't appear to have an ECC mechanism. Could you further clarify the concept you described?

@nordicjm
Copy link
Collaborator

Hi, and thank you for your prompt response. Based on the datasheet, the external flash memory I’m using (AT45DB321E) doesn't appear to have an ECC mechanism. Could you further clarify the concept you described?

your internal flash does

@MBorto
Copy link
Author

MBorto commented Feb 26, 2025

The internal uC flash has ECC protection as you correctly said.
When swap_size with value of 0xFFFFFFFF i see that the magic number at the end of the image is present and correct, and for what i've see debugging the bootloader seems that from this magic value tries find the swap_size.
So i suppose that if the magic value is present the flash is not corrupted.

@nordicjm
Copy link
Collaborator

The internal uC flash has ECC protection as you correctly said. When swap_size with value of 0xFFFFFFFF i see that the magic number at the end of the image is present and correct, and for what i've see debugging the bootloader seems that from this magic value tries find the swap_size. So i suppose that if the magic value is present the flash is not corrupted.

try reading all flash pages in a loop and seeing what happens, I would guess you will hit the same fault on one of them, that page would be the partially written one during move/swap

@taltenbach
Copy link
Contributor

taltenbach commented Feb 28, 2025

@MBorto On an H7 MCU, this might have been an ECC-related issue as @nordicjm said, unfortunately on the H5, according to RM0481 §7.9.10, an NMI is generated on ECC error, not a bus fault as for the H7:

When two ECC errors are detected during a read, the flash interface sets the double error detection flag ECCD in FLASH_ECCDETR register. [...] When the ECCD the flag is raised, an NMI is generated.

When you say:

This seems to be due to an invalid value (0xFFFFFFFF) in the swap_size field, causing the BusFault in the swap_run(...) function in the swap_scratch.c file.

Do you mean you were able to confirm that after these lines copy_size == 0xFFFFFFFF?

rc = boot_read_swap_size(fap, &bs->swap_size);
assert(rc == 0);
copy_size = bs->swap_size;

Also, FYI I recently fixed two bugs with the swap-scratch that apply to your configuration:

I doubt they are related to the issue you described, but I guess you will need those two fixes anyway. If the issue is easily reproducible, you might want give a try with at least bbd0ee1 just to be sure. The latter MR won't have any effect unless you have a very large image in one of the slots.

@MBorto
Copy link
Author

MBorto commented Feb 28, 2025

Thank @taltenbach for the feedback!!

When you said that NMI is generated after ECC issue is correct.

I can confirm that is after the line 1740. To copy_size is assigned the value 0xFFFFFFFF.

The BusFault is generated after, when the swap_run function is called, precisely when calling the method boot_im_sector_size(...) of the first if condition:

Image

How I can integrate in easy way your fixes into Zephyr enviroement?
Actually I'm fetching MCUBoot using west.yml file:

# Copyright (c) 2021 Nordic Semiconductor ASA
# SPDX-License-Identifier: Apache-2.0

manifest:
  self:
    west-commands: scripts/west-commands.yml

  remotes:
    - name: zephyrproject-rtos
      url-base: https://github.com/zephyrproject-rtos

  projects:
    - name: zephyr
      remote: zephyrproject-rtos
      revision: v3.6.0
      import:
        # By using name-allowlist we can clone only the modules that are
        # strictly needed by the application.
        name-allowlist:
          - cmsis      # required by the ARM port
          - hal_nordic # required by the custom_plank board (Nordic based)
          - hal_stm32  # required by the nucleo_f302r8 board (STM32 based)
          - mcuboot    # required for board bootloader implementation (MCUBoot based)
          - mbedtls    # required as MCUBoot dependencies
          - zcbor      # required as MCUBoot dependencies

I've tried the following way, but it gives me compile error:

# Copyright (c) 2021 Nordic Semiconductor ASA
# SPDX-License-Identifier: Apache-2.0

manifest:
  self:
    west-commands: scripts/west-commands.yml

  remotes:
    - name: zephyrproject-rtos
      url-base: https://github.com/zephyrproject-rtos

  projects:
    - name: zephyr
      remote: zephyrproject-rtos
      revision: v3.6.0
      import:
        # By using name-allowlist we can clone only the modules that are
        # strictly needed by the application.
        name-allowlist:
          - cmsis      # required by the ARM port
          - hal_nordic # required by the custom_plank board (Nordic based)
          - hal_stm32  # required by the nucleo_f302r8 board (STM32 based)
          #- mcuboot    # required for board bootloader implementation (MCUBoot based)
          - mbedtls    # required as MCUBoot dependencies
          - zcbor      # required as MCUBoot dependencies

    - name: mcuboot
      revision: bbd0ee1ecce3d7e7be6baf606027710052dd4e13
      url: https://github.com/taltenbach/mcuboot
      path: bootloader/mcuboot

@taltenbach
Copy link
Contributor

@MBorto Thanks for those details, I tried to reproduce your issue in the simulator and in fact with your configuration and without applying at least the first fix I mentioned, I was not even able to do a simple upgrade unless using very small images (< 90 KiB). But I might have done something wrong.

Are you able to do an upgrade if you're not resetting the device in the middle of the upgrade? And in case you reset the device, does the issue occurs consistently or is it reproducible only if you reset at a very precise time?

How I can integrate in easy way your fixes into Zephyr environment?

I will create tomorrow a branch with my fixes based on the MCUboot version used by Zephyr 3.6.0 :)

@taltenbach
Copy link
Contributor

@MBorto
Copy link
Author

MBorto commented Mar 3, 2025

Thank you @taltenbach for your availability !!
As soon as I have a moment, I'll try to see if it works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants