Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NXP LPSPI: Convert to native zephyr driver, remove MCUX shim #85010

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

decsny
Copy link
Member

@decsny decsny commented Feb 1, 2025

Convert last pieces of the LPSPI driver completely to a native Zephyr driver since mostly it already is.

Main commit message:

Remove the shim to the SDK driver, and only use direct register writes in the zephyr driver instead. This will save flash space.

Remove mcux branding from all code.

While doing this part of the rewrite, I discovered also some issues that I went ahead and fixed:
- The driver was hardcoded to always only use PCS 0 no matter what the user provided spi_config set slave to.
- Active high CS configuration was not respected.

Fixes an issue where SDK driver performed an althgorithm on 2,048 possible clock configurations before every transfer.
This was replaced with a binary search in the native version.
And make it so that this configuration does not need to happen every transfer, if the same spi_cfg is used, to improve performance.
Various optimizations also due to the fact that the use case is more clear than the SDK was written for, so we can be more straightforward and simpler.

Also, I have calculated the difference in the code size of the LPSPI driver in 3.7.0 (including HAL + shim driver) vs the version in this PR after all the recent changes, and have the following results:

Code Size Difference

Zephyr 3.7.0 LPSPI:
- 2.793 KB from HAL driver itself
- 1.488 KB from Zephyr shim driver
- 188 bytes from common Zephyr SPI code
= Total 4.465 KB (4,572 bytes)

Latest Changes + This PR (Hopefully Zephyr 4.1.0 LPSPI):
- 1.352 KB from CPU-based LPSPI specific driver (spi_nxp_lpspi.c)
- 1.230 KB from common LPSPI driver code (spi_nxp_lpspi_common.c)
- 190 bytes from common Zephyr SPI code
= Total 2.768 KB (2,834 bytes)

Tested on MCXN, pending testing on RT and MCXW

Depends on #84931
Depends on zephyrproject-rtos/hal_nxp#504
Depends on hardware testing for MCXW, and RT

@zephyrbot
Copy link
Collaborator

zephyrbot commented Feb 1, 2025

The following west manifest projects have changed revision in this Pull Request:

Name Old Revision New Revision Diff
hal_nxp zephyrproject-rtos/hal_nxp@49ff7e3 (master) zephyrproject-rtos/hal_nxp#504 zephyrproject-rtos/hal_nxp#504/files

DNM label due to: 1 project with PR revision

Note: This message is automatically posted and updated by the Manifest GitHub Action.

@zephyrbot zephyrbot added manifest manifest-hal_nxp DNM This PR should not be merged (Do Not Merge) labels Feb 1, 2025
@decsny decsny force-pushed the feature/lpspi_enhancements_4 branch from b25696f to eb1c476 Compare February 1, 2025 05:50
@decsny decsny force-pushed the feature/lpspi_enhancements_4 branch from eb1c476 to 52f48c1 Compare February 1, 2025 06:00
@butok
Copy link
Contributor

butok commented Feb 3, 2025

Is it possible to add improvements to the original MCUX driver and avoid adding and maintaining new Zephyr specific drivers?

@decsny
Copy link
Member Author

decsny commented Feb 3, 2025

Is it possible to add improvements to the original MCUX driver and avoid adding and maintaining new Zephyr specific drivers?

It was internally requested for this driver to enhance the API for this SDK driver 5 years ago to be compatible with zephyr and rejected by SDK team due to having to rewrite the entire fsl_lpspi to achieve it basically. So why write a new fsl_lpspi specifically designed for the purpose of working with zephyr and then shim to it, when you can just write it properly in zephyr?

In my experience, the shim drivers cause the most maintainance overhead by like 10 fold, specifically the ones with heavy abstraction. Having zephyr native is pretty much always simpler to maintain, that's the main advantage. Not to mention removing coupling and dependency is always a positive thing in software engineering practice. This coupling can cause configuration management problems and "spaghetti" code due to needing to design an implementation around an existing API and having to pull from upstream projects.

Another difference between zephyr driver and shim driver, is when a user finds a problem, for the shim driver, we get bug reports. Because the HAL is much harder to deal with for them. For the zephyr native, they just make a PR.

I also think that as shown by the numbers in my original post, zephyr native always reduces code size by a mega amount, in this case we have more than halved throughout the last 2 months by this conversion. That's normal whenever this happens. The new zephyr driver is smaller than the Hal was alone. Partly maybe due to the fact that it can be tailored to the zephyr specific driver implementation rather than as general as possible like the HAL needs to achieve.

So it is generally desirable not to have shim drivers. The choice to use shim drivers was an early zephyr project choice when there was only a small amount of contributors and they needed to move fast to make the project viable with quick hardware support. Now at this point it's just not desirable to have these dependencies for those reasons I said above.

@decsny decsny added this to the v4.1.0 milestone Feb 3, 2025
@Raymond0225
Copy link
Collaborator

I spotted that a lot of modifications are from "CONFIG_SPI_MCUX_LPSPI" to "CONFIG_SPI_NXP_LPSPI".
why we need to rename it?

@ehughes
Copy link

ehughes commented Feb 3, 2025

@decsny FYI. I am tracking an obscure bug with the shim based driver for LPSPI on the MCXA156. Even though the LPSPI driver passes the test, I was able to get it to fail in interesting ways. I am tagging you as I saw you submitted the updated to the FRDM-MCXA156. I can try this driver.

Issue : Using the LPSPI driver on MCXA156 with the generic ""jedec,spi-nor" and a external flash memory only works with optimizations disabled.

Setup:

I have a FRDM-MCXA156 with a W25q64 SPI Flash memory wired to LPSPI1.

I am using the "SPI_FLASH" sample as a basis for the my test.

Note: As a controller, I also tested the same Flash chip wired to a FRDM-MCX947 (which also uses LPSPI) and a LPC55S69Xpresso (which has flexcomm). The spi_flash runs OK on both of them. I narrowed it down to some interaction w/ the MCXA156 / LPSPI

I created an overlay :

image

The driver works through boot and most of the test. It can read the the JEDEC flash ID, issue erase commands, etc. However, I noticed that writes would fail. After some debugging I observed this on my logic analyzer:
image

The specific transaction for the flash write had the chip select active >200mS, most of that idle. The transaction after that was a command to shut down the flash as the SPI transaction for the write returned a timeout from the driver.

image

image

I turned on logging and saw this message:

image

The flash write is successfull if all optimizations are disabled.
image

Looking through the LP SPI Flash driver, I found this:

image

If you follow the spi_transcieve_dt function:

image

I noticed when optimizations are on, This if statement never evaluates to true

image

With optimizations off, it will evaluate to true for the write transcations.

Even though this appears to be an issue with the SPI NOR driver for MCXA156. I could wire the same chip to other boards (MCXN947 and LPC55S69) and it worked with optimizations turned on.

This does appear to be timing related. If I turn on all the SPI Dbg Messages, the sample will work.

I noticed that MCXN driver isntance had a fifo size of 8 while MCXA had a size of 4. If I reduce this size, the MCXA156 version works

image

Values of 1,2,3 yield a successful test result with optimizations enabled. A value of 4 will result in the timeout error.

I am going to modify my west manifest to point to your new branch/driver and see if makes a difference.

@decsny
Copy link
Member Author

decsny commented Feb 3, 2025

I spotted that a lot of modifications are from "CONFIG_SPI_MCUX_LPSPI" to "CONFIG_SPI_NXP_LPSPI". why we need to rename it?

Anything in zephyr named mcux means it is a shim to the mcux ecosystem, and so this is no longer a shim we remove mcux .

@decsny
Copy link
Member Author

decsny commented Feb 3, 2025

@ehughes I also found issues with using LPSPI with the SPI nor driver, because I was trying to enable the QSPI flash on from mcxw71 board. So far the new driver doesn't have any difference from the shim version as far as trying to communicate to the flash, so I doubt if you'll find it better. But thanks for the information because I'm going to get back look into this too eventually. Can you create a GitHub issue with the information you have about it?

@ehughes
Copy link

ehughes commented Feb 3, 2025

@decsny Updated my manifest to point to nxp-upstream, commit 52f48c1

The spi_loopback test has a bus fault (as does the spi_flash sample) for the FRDM-MCXA156 board. I used Segger Ozone to trace:

image

image

@ehughes
Copy link

ehughes commented Feb 3, 2025

@ehughes I also found issues with using LPSPI with the SPI nor driver, because I was trying to enable the QSPI flash on from mcxw71 board, so far the new driver doesn't have any difference from the shim version as far as trying bti communicate to the flash, so I doubt if you'll find it better but thanks for the information because I'm going to get back look into this too eventually. Can you create a GitHub issue with the information you have about it?

Yes, I'll create a new issue with what I posted.

uint8_t best_scaler = 0, scaler = 0;
uint32_t diff, min_diff = 0xFFFFFFFF;

while (scaler < 256 && min_diff != 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaler is uint8_t, it is definitely smaller than 256.

uint32_t real_freq = 0;

/* maximum freq won't get better than what we got with previous prescaler */
if (clock_freq / (TWO_EXP(prescaler) * 2) < best_freq) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to use a right shift operator? clock_freq >> prescaler


while (div > 0) {
div = low + (high - low) / 2;
real_freq = (clock_freq / (TWO_EXP(prescaler) * (div + 2)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(clock_freq >> prescaler) /(div + 2) is better?

Remove the shim to the SDK driver, and only use direct register writes
in the zephyr driver instead. This will save flash space.

Remove mcux branding from all code.

While doing this part of the rewrite, I discovered also some issues that
I went ahead and fixed:
- The driver was hardcoded to always only use PCS 0 no matter
  what the user provided spi_config set slave to.
- Active high CS configuration was not respected.

Also fixes an issue where SDK driver performed an althgorithm on 2,048
possible clock configurations before every transfer. This was replaced
with a binary search in the native version. And make it so that this
configuration does not need to happen every transfer, if the same
spi_cfg is used, to improve performance. Various optimizations also
due to the fact that the use case is more clear than the SDK was
written for, so we can be more straightforward and simpler.

Signed-off-by: Declan Snyder <[email protected]>
@decsny decsny force-pushed the feature/lpspi_enhancements_4 branch from 52f48c1 to 75e9544 Compare February 4, 2025 13:48
@ehughes
Copy link

ehughes commented Feb 4, 2025

@ehughes I also found issues with using LPSPI with the SPI nor driver, because I was trying to enable the QSPI flash on from mcxw71 board. So far the new driver doesn't have any difference from the shim version as far as trying to communicate to the flash, so I doubt if you'll find it better. But thanks for the information because I'm going to get back look into this too eventually. Can you create a GitHub issue with the information you have about it?

@decsny FYI:

#85081

I did test w/ the W71 and was able to get the onboard SPI chip to work with te spi_flash sample. The onboard flash is It is similar to what I was testing with. (macronix vs winbond).

It did not have the timeout issue that was dependent on the compiler optimization setting or the tx/rx-fifo size. It appears to be the similar to the MCX N instances (which also worked).

However, I mention in that issue that default transfer-delay is problematic. It should probably default to a value that yeilds 50% SCLK duty cycle as it is a reasonable default for most SPI peripherals.

@fabiobaltieri fabiobaltieri added DNM (manifest) This PR should not be merged (controlled by action-manifest) and removed DNM This PR should not be merged (Do Not Merge) labels Feb 4, 2025
@decsny
Copy link
Member Author

decsny commented Feb 4, 2025

However, I mention in that issue that default transfer-delay is problematic. It should probably default to a value that yeilds 50% SCLK duty cycle as it is a reasonable default for most SPI peripherals.

I definitely agree with this, I was just talking to somebody about these properties on a separate issue (#69658 (comment)) and was thinking the same thing as you actually. Either to make a default that is not the minimum or to make it a required property. I am leaning towards the latter for the theoretical principle (these delays need to be known and decided by board designer). But in a practical sense maybe leaning toward the former, because for the upstream EVK enablement, maybe we don't know how the SPI will be used on many of the boards and not sure what a sensible default is anyways, so would be easier to just update the binding default in one place.

@decsny
Copy link
Member Author

decsny commented Feb 4, 2025

Rebased the PR since dependent one merged, will address more comments soon, still doing testing

@ehughes
Copy link

ehughes commented Feb 4, 2025

However, I mention in that issue that default transfer-delay is problematic. It should probably default to a value that yeilds 50% SCLK duty cycle as it is a reasonable default for most SPI peripherals.

I definitely agree with this, I was just talking to somebody about these properties on a separate issue (#69658 (comment)) and was thinking the same thing as you actually. Either to make a default that is not the minimum or to make it a required property. I am leaning towards the latter for the theoretical principle (these delays need to be known and decided by board designer). But in a practical sense maybe leaning toward the former, because for the upstream EVK enablement, maybe we don't know how the SPI will be used on many of the boards and not sure what a sensible default is anyways, so would be easier to just update the binding default in one place.

This is a feature particular to the NXP LPSPI IP. From a generic SPI /Driver / Interface point of view, I think the expectation from a Zephyr user will that a stream of contiguous bytes should have 50% as a API default. In my experience I have never need the the timing as showing in the picture (a gap between bytes that is less than the 1/2 the period of the SPI frequency). If there are special adjustments, always has been to make the gap between bytes wider for devices that need more time in between bytes.

I think you will generally always pass a loopback test regardless, but the signals are essentially synchronous in the test case. However, I think the real world implications here bad (i.e. it was broken for a very simple SPI flash device on the FRDM-W71)

From a board level timing / slack / setup/hold POV, analysis is always done assuming 50%. Then you apply allowed tolerances form specific devices. Many devices don't really care as long as setup/hold isn't violated, but the current approach is the least compatible with anything that could be in the wild.

Feel that is OK to be optional as someone who wants this feature would be looking at NXP specific docs (vs generic SPI API). It just needs to have a default that yields timings that are generic.

The bare metal MCUX driver example start use a 50% duty as a default.

image

@decsny
Copy link
Member Author

decsny commented Feb 4, 2025

However, I mention in that issue that default transfer-delay is problematic. It should probably default to a value that yeilds 50% SCLK duty cycle as it is a reasonable default for most SPI peripherals.

I definitely agree with this, I was just talking to somebody about these properties on a separate issue (#69658 (comment)) and was thinking the same thing as you actually. Either to make a default that is not the minimum or to make it a required property. I am leaning towards the latter for the theoretical principle (these delays need to be known and decided by board designer). But in a practical sense maybe leaning toward the former, because for the upstream EVK enablement, maybe we don't know how the SPI will be used on many of the boards and not sure what a sensible default is anyways, so would be easier to just update the binding default in one place.

This is a feature particular to the NXP LPSPI IP. From a generic SPI /Driver / Interface point of view, I think the expectation from a Zephyr user will that a stream of contiguous bytes should have 50% as a API default. In my experience I have never need the the timing as showing in the picture (a gap between bytes that is less than the 1/2 the period of the SPI frequency). If there are special adjustments, always has been to make the gap between bytes wider for devices that need more time in between bytes.

I think you will generally always pass a loopback test regardless, but the signals are essentially synchronous in the test case. However, I think the real world implications here bad (i.e. it was broken for a very simple SPI flash device on the FRDM-W71)

From a board level timing / slack / setup/hold POV, analysis is always done assuming 50%. Then you apply allowed tolerances form specific devices. Many devices don't really care as long as setup/hold isn't violated, but the current approach is the least compatible with anything that could be in the wild.

Feel that is OK to be optional as someone who wants this feature would be looking at NXP specific docs (vs generic SPI API). It just needs to have a default that yields timings that are generic.

The bare metal MCUX driver example start use a 50% duty as a default.

image

So it sounds like what you are suggesting is actually to programmatically decide the default instead of from DT. This I didn't consider and seems like a good idea to me. So what I could do as part of this PR is to remove the default value from DT, keep the property unrequired, and then if it is not present then to programmatically determine what is 50%, otherwise use DT value. Does this sound like a reasonable approach to you?

@ehughes
Copy link

ehughes commented Feb 4, 2025

However, I mention in that issue that default transfer-delay is problematic. It should probably default to a value that yeilds 50% SCLK duty cycle as it is a reasonable default for most SPI peripherals.

I definitely agree with this, I was just talking to somebody about these properties on a separate issue (#69658 (comment)) and was thinking the same thing as you actually. Either to make a default that is not the minimum or to make it a required property. I am leaning towards the latter for the theoretical principle (these delays need to be known and decided by board designer). But in a practical sense maybe leaning toward the former, because for the upstream EVK enablement, maybe we don't know how the SPI will be used on many of the boards and not sure what a sensible default is anyways, so would be easier to just update the binding default in one place.

This is a feature particular to the NXP LPSPI IP. From a generic SPI /Driver / Interface point of view, I think the expectation from a Zephyr user will that a stream of contiguous bytes should have 50% as a API default. In my experience I have never need the the timing as showing in the picture (a gap between bytes that is less than the 1/2 the period of the SPI frequency). If there are special adjustments, always has been to make the gap between bytes wider for devices that need more time in between bytes.
I think you will generally always pass a loopback test regardless, but the signals are essentially synchronous in the test case. However, I think the real world implications here bad (i.e. it was broken for a very simple SPI flash device on the FRDM-W71)
From a board level timing / slack / setup/hold POV, analysis is always done assuming 50%. Then you apply allowed tolerances form specific devices. Many devices don't really care as long as setup/hold isn't violated, but the current approach is the least compatible with anything that could be in the wild.
Feel that is OK to be optional as someone who wants this feature would be looking at NXP specific docs (vs generic SPI API). It just needs to have a default that yields timings that are generic.
The bare metal MCUX driver example start use a 50% duty as a default.
image

So it sounds like what you are suggesting is actually to programmatically decide the default instead of from DT. This I didn't consider and seems like a good idea to me. So what I could do as part of this PR is to remove the default value from DT, keep the property unrequired, and then if it is not present then to programmatically determine what is 50%, otherwise use DT value. Does this sound like a reasonable approach to you?

Yep, I think copying what the bare metal sample code does is a good idea as it has been test a lot and yields good values. If the there is a value in the DT / Overlay, then use that, else use the computation.

@decsny decsny modified the milestones: v4.1.0, v4.2.0 Feb 11, 2025
@decsny
Copy link
Member Author

decsny commented Feb 18, 2025

This PR was kind of poorly made, because there ended up being way too much in one commit, I'm gonna put DNM and spend the time to refactor the commit history

@decsny decsny added the DNM This PR should not be merged (Do Not Merge) label Feb 18, 2025
@Raymond0225
Copy link
Collaborator

I think this file should also be updated:
\zephyr\tests\drivers\spi\spi_loopback\boards\mimxrt700_evk_mimxrt798s_cm33_cpu0.conf

@decsny decsny marked this pull request as draft February 18, 2025 17:08
@ehughes
Copy link

ehughes commented Feb 19, 2025

@decsny Let me know when it is good to retest this.

@decsny
Copy link
Member Author

decsny commented Feb 19, 2025

@decsny Let me know when it is good to retest this.

Okay, it's delayed for a while until after release , trying to stabilize things as they are for LPSPI first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Clock Control area: SPI SPI bus block: HW Test Testing on hardware required before merging DNM (manifest) This PR should not be merged (controlled by action-manifest) DNM This PR should not be merged (Do Not Merge) manifest manifest-hal_nxp platform: NXP Drivers NXP Semiconductors, drivers platform: NXP NXP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants