Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: runtime error: index out of range [3] with length 0 [recovered] #3881

Open
Validatus opened this issue Oct 23, 2024 · 2 comments
Open
Labels
bug Something isn't working external Issues created by non node team members

Comments

@Validatus
Copy link

Celestia Node version

Semantic version: v0.16.0 Commit: 6744f64

OS

Rocky Linux 9.4

Install tools

We are building binaries from source to ensure complete control over the compilation process. All needed tools are present on the system.

Others

There are no known resource limitations.

Steps to reproduce it

celestia bridge start --metrics.tls=true --metrics --metrics.endpoint otel.celestia.observer --p2p.network celestia --node.store /storage/.celestia-bridge

Expected result

The expected result is to have the Celestia bridge node fully operational and syncing to the latest network head. This ensures that the node is connected to the network, validating and relaying blocks in real time, and staying up to date with the current state of the blockchain.
This allows the bridge node to play its role in cross-chain communication or light client verification within the Celestia network.

Actual result

The panic message "index out of range [3] with length 0" suggests that the program tried to access an index in an array that doesn't exist, causing the program to crash.

panic: runtime error: index out of range [3] with length 0 [recovered]
	panic: 
	== Recovering from initIndex crash ==
	File Info: [ID: 9310013, Size: 502439, Zeros: 502439]
	isEnrypted: false checksumLen: 0 checksum:  indexLen: 0 index: [] 
	== Recovered ==

Relevant log output

/home/user/go/bin/celestia bridge start --metrics.tls=true --metrics --metrics.endpoint otel.celestia.observer --p2p.network celestia --node.store /storage-vault/.celestia-bridge
2024-10-23T18:46:40.159Z	INFO	node	nodebuilder/module.go:27	Accessing keyring...
2024-10-23T18:46:40.196Z	INFO	badger4	[email protected]/levels.go:171	All 163 tables opened in 4ms

2024-10-23T18:46:40.201Z	INFO	badger4	[email protected]/discard.go:66	Discard stats nextEmptySlot: 2979

2024-10-23T18:46:40.201Z	INFO	badger4	[email protected]/db.go:368	Set nextTxnTs to 108712042
2024-10-23T18:46:40.562Z	INFO	module/p2p	p2p/tls.go:54	using a default ws transport
2024/10/23 18:46:40 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
2024-10-23T18:46:43.881Z	INFO	badger4	[email protected]/levels.go:118	2631 tables out of 20890 opened in 3.002s

2024-10-23T18:46:46.882Z	INFO	badger4	[email protected]/levels.go:118	5222 tables out of 20890 opened in 6.004s

2024-10-23T18:46:49.882Z	INFO	badger4	[email protected]/levels.go:118	7676 tables out of 20890 opened in 9.004s

2024-10-23T18:46:50.570Z	INFO	basichost	basic/natmgr.go:112	DiscoverNAT error:no NAT found
panic: runtime error: index out of range [3] with length 0 [recovered]
	panic: 
	== Recovering from initIndex crash ==
	File Info: [ID: 9310013, Size: 502439, Zeros: 502439]
	isEnrypted: false checksumLen: 0 checksum:  indexLen: 0 index: [] 
	== Recovered ==
	

goroutine 12328 [running]:
github.com/dgraph-io/badger/v4/table.(*Table).initBiggestAndSmallest.func1.1()
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/table.go:354 +0x9c
github.com/dgraph-io/badger/v4/table.(*Table).initBiggestAndSmallest.func1()
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/table.go:400 +0xa5
panic({0x2ab2460?, 0xc001a06510?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
github.com/google/flatbuffers/go.GetInt32(...)
	/home/user/go/pkg/mod/github.com/google/[email protected]/go/encode.go:85
github.com/google/flatbuffers/go.GetUOffsetT(...)
	/home/user/go/pkg/mod/github.com/google/[email protected]/go/encode.go:121
github.com/dgraph-io/badger/v4/fb.GetRootAsTableIndex(...)
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/fb/TableIndex.go:14
github.com/dgraph-io/badger/v4/table.(*Table).readTableIndex(0xc002d92480)
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/table.go:708 +0x154
github.com/dgraph-io/badger/v4/table.(*Table).initIndex(0xc002d92480)
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/table.go:465 +0x19b
github.com/dgraph-io/badger/v4/table.(*Table).initBiggestAndSmallest(0xc002d92480)
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/table.go:404 +0x6a
github.com/dgraph-io/badger/v4/table.OpenTable(0xc0025a9260, {0x0, 0x1, 0x200000, 0x0, 0x0, 0x3f847ae147ae147b, 0x1000, 0x0, 0x0, ...})
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/table.go:310 +0x23d
github.com/dgraph-io/badger/v4.newLevelsController.func1({0xc005276940, 0x3a}, {0x50?, 0xc011394780?, 0x6aa240?})
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/levels.go:150 +0x1d9
created by github.com/dgraph-io/badger/v4.newLevelsController in goroutine 1
	/home/user/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/levels.go:129 +0x565


Oct 23 18:56:34 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:34.825Z        INFO        badger4        [email protected]/levels.go:171        All 164 tables opened in 4ms
Oct 23 18:56:34 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:34.826Z        INFO        badger4        [email protected]/discard.go:66        Discard stats nextEmptySlot: 3015
Oct 23 18:56:34 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:34.826Z        INFO        badger4        [email protected]/db.go:368        Set nextTxnTs to 108727503
Oct 23 18:56:34 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:34.995Z        INFO        module/p2p        p2p/tls.go:54        using a default ws transport
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]: 2024/10/23 18:56:35 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:35.289Z        ERROR        badger4        [email protected]/db.go:263        Received err: Opening table: "/storage-vault/.celestia-bridge/inverted_index/9309990.sst" error: failed to initialize table error: failed to read index. error: failed to verify checksum for table: /storage-vault/.celestia-bridge/inverted_index/9309990.sst error: actual: 1265260451, expected: 48412823 error: checksum mismatch
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]: github.com/dgraph-io/badger/v4/y.init
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]:         <autogenerated>:1
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]: runtime.doInit1
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]:         /usr/local/go/src/runtime/proc.go:7290
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]: runtime.doInit
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]:         /usr/local/go/src/runtime/proc.go:7257
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]: runtime.main
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]:         /usr/local/go/src/runtime/proc.go:254
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]: runtime.goexit
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]:         /usr/local/go/src/runtime/asm_amd64.s:1700. Cleaning up...
Oct 23 18:56:35 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:35.290Z        INFO        badger4        [email protected]/db.go:546        Lifetime L0 stalled for: 0s
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:39.786Z        INFO        badger4        [email protected]/db.go:625
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Level 0 [ ]: NumTables: 00. Size: 0 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 16 MiB
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Level 4 [ ]: NumTables: 00. Size: 0 B of 11 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 4.0 MiB
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Level 5 [B]: NumTables: 18. Size: 74 MiB of 108 MiB. Score: 0.00->0.00 StaleData: 686 KiB Target FileSize: 8.0 MiB
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Level 6 [ ]: NumTables: 144. Size: 1.0 GiB of 1.0 GiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 16 MiB
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Level Done
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:39.789Z        ERROR        peerstore/ds        pstoreds/keybook.go:81        error while updating pubkey in datastore for peer 12D3KooWRH8GqxDAbFgcor9bcs6nAvT7SVk54Yd7fB3M8BsMsDxg: datastore closed
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: Error: could not build arguments for function "github.com/celestiaorg/celestia-node/nodebuilder/share".shrexServerComponents.func1 (/home/cosmos/celestia-node/nodebuilder/share/module.go:156): failed to build *shrexeds.Server: could not build arguments for function "reflect".makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:28): failed to build *eds.Store: received non-nil error from function "reflect".makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:28): failed to create index: can't open Badger Datastore: Opening table: "/storage-vault/.celestia-bridge/inverted_index/9309990.sst" error: failed to initialize table error: failed to read index. error: failed to verify checksum for table: /storage-vault/.celestia-bridge/inverted_index/9309990.sst error: actual: 1265260451, expected: 48412823 error: checksum mismatch
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: github.com/dgraph-io/badger/v4/y.init
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]:         <autogenerated>:1
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: runtime.doInit1
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]:         /usr/local/go/src/runtime/proc.go:7290
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: runtime.doInit
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]:         /usr/local/go/src/runtime/proc.go:7257
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: runtime.main
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]:         /usr/local/go/src/runtime/proc.go:254
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: runtime.goexit
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]:         /usr/local/go/src/runtime/asm_amd64.s:1700
Oct 23 18:56:39 m29593.contaboserver.net bash[779971]: 2024-10-23T18:56:39.789Z        ERROR        peerstore/ds        pstoreds/keybook.go:81        error while updating pubkey in datastore for peer 12D3KooWEV8gPcFACEwV94g6ftREXykfvg8Ez97z9CDVvp4Bu8px: datastore closed

Is the node "stuck"? Has it stopped syncing?

The log indicates a runtime panic that occurred while using BadgerDB, specifically during the initialization of BadgerDB tables, causing the binary to crash.

Notes

We can easily provide additional logs if necessary. Should you require more detailed information or further diagnostics to help resolve the issue, just let us know, and we’ll be happy to share any logs that can assist with troubleshooting.

@Validatus Validatus added the bug Something isn't working label Oct 23, 2024
@github-actions github-actions bot added the external Issues created by non node team members label Oct 23, 2024
@Wondertan
Copy link
Member

Did the panic happen during the restart or during regular operation?

This panic happened somewhere deep in the DB we use. My current hypothesis is that the node was ungracefully terminated causing DB corruption

@Validatus
Copy link
Author

Did the panic happen during the restart or during regular operation?
During "regular operation".

This panic happened somewhere deep in the DB we use. My current hypothesis is that the node was ungracefully terminated causing DB corruption

We share the same working hypothesis. Although the datacenter has denied any sudden power loss or server shutdowns, we’ve observed that during ongoing construction work at the site, these issues tend to occur intermittently. This suggests that the node might have been ungracefully terminated, potentially leading to the database corruption.

We’re currently working to reproduce the error to confirm this theory and will update you once we have more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working external Issues created by non node team members
Projects
None yet
Development

No branches or pull requests

2 participants