Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking in libliteeth causes binary to fail to execute #2157

Open
kscz opened this issue Jan 13, 2025 · 5 comments
Open

Linking in libliteeth causes binary to fail to execute #2157

kscz opened this issue Jan 13, 2025 · 5 comments

Comments

@kscz
Copy link

kscz commented Jan 13, 2025

I have the following code base: https://github.com/kscz/wyrm/tree/debug_bridge

Generating the SoC and then loading the software works, so long as I do not call any method in libliteeth (see the instructions in the readme). The moment that I link in any function in libliteeth, the binary no longer appears to run at all.

I have the same issue regardless of what code I use - if I generate the SoC by invoking (note this expects this PR):

../litex-boards/litex_boards/targets/colorlight_5a_75x.py --revision "8.2" --sys-clk-freq 50e6 --with-ethernet --cpu-type vexriscv --with-uart --build

And then loading the code from demo works as expected. If I add a call to eth_init() in helloc.c, then the whole program has no output at all.

I am unclear what's happening - the linker script feels like it must be related to the issue, but using the linker script from the bios modified to place code in main_ram seems to have the same issues.

Edit: some extra context! This issue reproduces in the simulator and with both vexiiriscv as well as vexriscv. It reproduces both if the executable is in ROM or loaded by the BIOS into RAM. I have been unable to find a configuration where I can attach a debugger and see where the code is getting stuck. I followed the directions on the wiki and tried to get a simulator session of a vexiiriscv CPU with jtag, but I could not get openocd to connect.

@FlyGoat
Copy link
Contributor

FlyGoat commented Jan 14, 2025

JTAG on vexriscv_smp is indeed broken, @Dolu1990 do you have any clue?

litex_sim --cpu-type=vexriscv_smp --with-rvc --with-privileged-debug --hardware-breakpoints 4 --jtag-tap --with-jtagremote

Resulted

Info : Initializing remote_bitbang driver
Info : Connecting to localhost:44853
Info : remote_bitbang driver initialized
Info : This adapter doesn't support configurable speed
Info : JTAG tap: riscv.cpu tap/device found: 0x10003fff (mfg: 0x7ff (<invalid>), part: 0x0003, ver: 0x1)
Error: IR capture error at bit 5, saw 0x41 not 0x...3
Warn : Bypassing JTAG setup events due to errors
Error: dtmcontrol is 0. Check JTAG connectivity/board power.
Warn : target riscv.cpu examination failed
Info : starting gdb server for riscv.cpu on 3333
Info : Listening on port 3333 for gdb connections

With mainline OpenOCD.

I think JTAG for OpenC906 and naxriscv are working, you can try those cores.

@kscz
Copy link
Author

kscz commented Jan 15, 2025

I was able to get OpenOCD to attach to a litex_sim session:

litex_sim --cpu-type=vexriscv_smp --with-ethernet --with-jtagremote --with-privileged-debug --hardware-breakpoints 4 --jtag-tap --rom-init ./software/wyrm.bin

OpenOCD invocation:

openocd -f jtag_remote.cfg -f riscv_jtag_tunneled.tcl

jtag_remote.cfg contents

adapter speed 10000
adapter driver remote_bitbang
remote_bitbang host localhost
remote_bitbang port 44853

riscv_jtag_tunneled.tcl contents:

set _CHIPNAME vexriscv_smp
set _TARGETNAME $_CHIPNAME.cpu
set cpu_count 1
if [info exists env(RISCV_COUNT)]  {
    set cpu_count $::env(RISCV_COUNT)
}

if { [info exists TAP_NAME] } {
    set _TAP_NAME $TAP_NAME
} else {
    set _TAP_NAME $_TARGETNAME
}

adapter speed 500

# external jtag probe
if {$_TAP_NAME eq $_TARGETNAME} {
    jtag newtap $_CHIPNAME cpu -irlen 6 -expected-id 0x10003FFF
}

for {set i 0} {$i < $cpu_count} {incr i} {
  target create $_TARGETNAME.$i riscv -coreid $i -chain-position $_TAP_NAME
  riscv use_bscan_tunnel 6 1
  #riscv set_bscan_tunnel_ir 0x23
}

for {set i 0} {$i < $cpu_count} {incr i} {
    targets $_TARGETNAME.$i
    init
    halt
}

echo "Ready for Remote Connections"

The experience was not good though - single stepping took ~45 seconds per step, and I couldn't set breakpoints. Since I couldn't halt the CPU very effectively, I couldn't make very concrete determinations about what was wrong.

Execution seems to be caught in trap_entry before ever calling main, which would implicate some method in crt0.S. I'm not seeing anything obvious in the map file for something causing an access error or something in .data and .bss section initialization....

I am very open to ideas about what could be happening here.

@kscz
Copy link
Author

kscz commented Jan 15, 2025

After much more debugging I have tracked down the issue to an unaligned load. It appears that the libliteeth archive causes the load address of the .data section to become non-word-aligned - in my case I had _fdata_rom == 0x27c2

This caused the word-alignment requirement of crt0.S to not be met - see here

I fixed the issue for my application by simply swapping from lw and sw to lb and sb (and changing the pointer advance to 1) so that alignment was no longer an issue, but I imagine that would be an unacceptable increase in boot time in other applications.

I am unable to find a way to force the linker to align the LOADADDR, which would be the more universal way to solving this issue - I am open for suggestions for how to patch this!

@FlyGoat
Copy link
Contributor

FlyGoat commented Jan 15, 2025

Thanks for tracking it down, can you try this?

diff --git a/litex/soc/software/bios/linker.ld b/litex/soc/software/bios/linker.ld
index 66f8b9aff882..c4679ac4ab10 100644
--- a/litex/soc/software/bios/linker.ld
+++ b/litex/soc/software/bios/linker.ld
@@ -18,6 +18,8 @@ SECTIONS
 		KEEP(*(.text.isr))
 
 		*(.text .stub .text.* .gnu.linkonce.t.*)
+		FILL(0);
+		. = ALIGN(8);
 		_etext = .;
 	} > rom
 
@@ -39,16 +41,22 @@ SECTIONS
 
 	.commands :
 	{
+		. = ALIGN(8);
 		PROVIDE_HIDDEN (__bios_cmd_start = .);
 		KEEP(*(.bios_cmd))
 		PROVIDE_HIDDEN (__bios_cmd_end = .);
+		FILL(0);
+		. = ALIGN(8);
 	} > rom
 
 	.init :
 	{
+		. = ALIGN(8);
 		PROVIDE_HIDDEN (__bios_init_start = .);
 		KEEP(*(.bios_init))
 		PROVIDE_HIDDEN (__bios_init_end = .);
+		FILL(0);
+		. = ALIGN(8);
 	} > rom
 
 	.data :

@kscz
Copy link
Author

kscz commented Jan 16, 2025

The BIOS links in some other symbols which cause it to not experience this misalignment issue - could you provide your fixes against this linker script? https://github.com/kscz/wyrm/blob/debug_misalignment/software/linker.ld

I actually had the same thought as you for what might fix the issue, and added alignment in the .rodata section with the hopes that it would fix the issue, but alas it did not.

I'm actually willing to believe this may be some gnuld issue, but let me know if you have other thoughts for how to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants