Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune the compiled code to its minimum expression #639

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

JuanSapriza
Copy link
Contributor

@JuanSapriza JuanSapriza commented Feb 5, 2025

Objective

The objective of this PR is to minimize the code as much as possible.
Some new improvements will not affect compilation at all, such as adding --gc-sections, which simply removes unused sections and can free several kB.

Other changes might be more drastical and I will set them as optional and heavily disadviced configurations.

Background

Just leaving here as documentation. I am compiling the following code

int main() return 0;

The initial size was 24 kB. By removing the unused sections it goes down to 12 kB.

The remaining 12 kB are mostly due to:

  1. Interrupt handlers
  2. Standard libraries
    Screenshot from 2025-02-05 21-40-48
    Screenshot from 2025-02-05 21-40-38

Tasks

🟩 Remove unused sections
🔲 Not compile unused interrupt handlers
🔲 Optionally remove standard libraries
🔲 Remove unnecessary calls to standard libraries
🔲 Optionally make data come immediately after text
🟩 Allow to have only 1 memory bank
❓ Move the power manager from 0x3000

Important ⚠️

This PR builds on top of #636. It is blocked by the merging of that one first (or i will have to cherry pick changes)

@davidmallasen davidmallasen added the software Software and application label Feb 6, 2025
@JuanSapriza
Copy link
Contributor Author

@davidmallasen @davideschiavone

With the latest modifications i see that the whole code + data of a small application could very well fit inside 1 memory bank. Yet there is the limitation of needing to have 2 at least.

Why is there such limitation?

@davideschiavone
Copy link
Member

davideschiavone commented Feb 6, 2025

@davidmallasen @davideschiavone

With the latest modifications i see that the whole code + data of a small application could very well fit inside 1 memory bank. Yet there is the limitation of needing to have 2 at least.

Why is there such limitation?

In principle there is not, it's legacy, if you want we can remove it - still, having 2 banks would be more efficient than one bank (Harvard architecture) so maybe two 16KB memories is better than one single 32kB - however, two 16kB of memories together are bigger than one bank of 32kB - so it's a trade-off - but only one bank would be very inefficient (2 cycles to fetch an instruction every time it is done at the same time of any other load or dma transaction, and 3/4 cycles for a lw/sw from the cve2 CPU)

@JuanSapriza
Copy link
Contributor Author

I agree, but for HEEPidermis, for example, there is a chance that accesses to memory from peripherals are sparse.
I will try to remove this restriction so we can simulate both options (two smaller or one bigger) once we have the full application.

@davideschiavone
Copy link
Member

I agree, but for HEEPidermis, for example, there is a chance that accesses to memory from peripherals are sparse. I will try to remove this restriction so we can simulate both options (two smaller or one bigger) once we have the full application.

having two small ones should be already possible (AFAIK) - maybe having only 1 is more tricky as the mcu-gen has probably been tailored around having at least 2, which is not just for the HW but for the generation of the linker scripts - nothing too hard to fix, but not just a parameter

@davideschiavone
Copy link
Member

btw - question @JuanSapriza , why do we have the print functions if we are not using it in your example? shouldn't that be pruned?

@JuanSapriza
Copy link
Contributor Author

Tomorrow i will test having two smaller ones. I think i tried it a while ago but dont remember the result. WIll also see how to have only one.

Regarding the printfs, it's sadly not as simple. The stdlibs are included in some of the drivers, so we might need to do some heavy prunning.

Today i managed to remove several things, but realized that the cost of making the compilation flow so much more complex and sensitive vs having 4kB of printf definitions was probably not worth it now.

@JuanSapriza
Copy link
Contributor Author

JuanSapriza commented Feb 7, 2025

With 2x16 kB banks everything fine. A Hello World:

Total space: 32.0 kB = Continuous: [16, 16] kB + Interleaved: [0] kB
Region 	 Start 	End	Sz(kB)	Usd(kB)	Req(kB)	Utilz(%) 
Code:  	  0.0	 16.0	 16.0	9.6	 12.3	76.8
Data:  	 16.0	 32.0	 16.0	5.5	  5.5	34.7

Cont 0 CCCCCCCCC------- 	56.2%
Cont 1 ddddd----------- 	31.2%

@JuanSapriza
Copy link
Contributor Author

@davideschiavone is suggesting to try this out. https://github.com/Velko/FsLibc/tree/master
I think it will help a lot bringing both code and data space down

@JuanSapriza
Copy link
Contributor Author

having two small ones should be already possible (AFAIK) - maybe having only 1 is more tricky as the mcu-gen has probably been tailored around having at least 2, which is not just for the HW but for the generation of the linker scripts - nothing too hard to fix, but not just a parameter

Just tried with 1 mem bank. Only needed to change a check in system.py where it was verifying that mem banks >=2 .... beyond that everything else worked out of the box jaahahah

@JuanSapriza
Copy link
Contributor Author

I went down to the syscalls, handlers and asserts and removed the use of <stdio.h>. This effectively frees 4 kB. Now i am at 8 kB of code + 5.3 kB of data that needs prunning.

Things that i need to check:

🔲 Changing this exception printings might break everything - we need tests over exceptions!
🔲 Not including unused files for compilation. The interrupt handlers are being overriden by their non-weak implementations, even if your application doesnt use that peripheral. This comes from the problem of including all files. e.g. if you comment the contents of dma.h you will get errors in the compilation from dma_sdk.h (??!?!?)
🔲 We have several circular dependencies on the drivers

@JuanSapriza
Copy link
Contributor Author

JuanSapriza commented Feb 7, 2025

Removed 1.7kB of program memory by adding a macro that resolved the UART NCo in compile time instead of before every printf.

This does not stop you from changing the baudrate in runtime, but you need to recompute the NCO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
software Software and application
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants