Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate ipld: could not find errors #10260

Open
8 of 18 tasks
ZenGround0 opened this issue Feb 13, 2023 · 3 comments
Open
8 of 18 tasks

Investigate ipld: could not find errors #10260

ZenGround0 opened this issue Feb 13, 2023 · 3 comments
Assignees
Labels
area/chain Area: Chain kind/bug Kind: Bug need/team-input Hint: Needs Team Input P1 P1: Must be resolved

Comments

@ZenGround0
Copy link
Contributor

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

idk

Describe the Bug

Thanks to @ribasushi we have some evidence of compaction erroring on startup because receipts are not added to snapshot. It appears that this happens in messages mode but maybe discard mode (need to examine config below)

This is a bug which we can fix. However this problem resolves itself after ~4 finalities when the receipts are computed. So its probably not the root cause of discard not discarding.

Logging Information

Compaction error logs


2023-02-04T21:14:37.473Z    ERROR   splitstore  splitstore/splitstore_compact.go:536    COMPACTION ERROR: error marking: error walking block (cid: bafy2bzaceczpzccm2faelihcxwtyhc5zjz6xrim22irzur7j5yrs7krd6fohu): error walking message receipts (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): error scanning linked block (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): ipld: could not find bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq
2023-02-05T00:02:56.098Z    ERROR   splitstore  splitstore/splitstore_compact.go:536    COMPACTION ERROR: error marking: error walking block (cid: bafy2bzacecwes64aagmt5eyerscpbicvshfzfqrribh2sx6mw5za5bd24mx6u): error walking message receipts (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): error scanning linked block (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): ipld: could not find bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq
2023-02-05T03:36:34.031Z    ERROR   splitstore  splitstore/splitstore_compact.go:536    COMPACTION ERROR: error marking: error walking block (cid: bafy2bzaceczpzccm2faelihcxwtyhc5zjz6xrim22irzur7j5yrs7krd6fohu): error walking message receipts (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): error scanning linked block (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): ipld: could not find bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq
2023-02-05T08:21:33.125Z    ERROR   splitstore  splitstore/splitstore_compact.go:536    COMPACTION ERROR: error marking: error walking block (cid: bafy2bzaceb73otita5kkb33tc7afwbhkqjf34b7criguoxjlmc5xguufjz2be): error walking message receipts (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): error scanning linked block (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): ipld: could not find bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq
2023-02-05T14:09:33.845Z    ERROR   splitstore  splitstore/splitstore_compact.go:536    COMPACTION ERROR: error marking: error walking block (cid: bafy2bzacecwes64aagmt5eyerscpbicvshfzfqrribh2sx6mw5za5bd24mx6u): error walking message receipts (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): error scanning linked block (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): ipld: could not find bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq
2023-02-05T21:33:47.433Z    ERROR   splitstore  splitstore/splitstore_compact.go:536    COMPACTION ERROR: error marking: error walking block (cid: bafy2bzaceb73otita5kkb33tc7afwbhkqjf34b7criguoxjlmc5xguufjz2be): error walking message receipts (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): error scanning linked block (cid: bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq): ipld: could not find bafy2bzaced6qwy3qvlykm23olkomk7sqvpj3vqboueewavqz2oaryi5qj3pbq

node configured:

LOTUS_CHAIN_BADGERSTORE_DISABLE_FSYNC=1 \
LOTUS_CHAINSTORE_ENABLESPLITSTORE=1 \
LOTUS_CHAINSTORE_SPLITSTORE_COLDSTORETYPE=discard \
LOTUS_CHAINSTORE_SPLITSTORE_MARKSETTYPE=badger \
LOTUS_CHAINSTORE_SPLITSTORE_HOTSTOREFULLGCFREQUENCY=1 \
LOTUS_CHAINSTORE_SPLITSTORE_COLDSTOREFULLGCFREQUENCY=0 \


### Repo Steps

Start splitstore with the above config and look at logs for 2 days
@ZenGround0
Copy link
Contributor Author

Im starting to suspect that this is an issue with not entering warmup properly since I think this is exactly the point of warmup. Will need to dig further.

@TippyFlitsUK TippyFlitsUK added P1 P1: Must be resolved need/team-input Hint: Needs Team Input area/chain Area: Chain and removed need/triage labels Apr 11, 2023
@rjan90 rjan90 changed the title Splitstore fails early compactions on receipts not found Investigate ipld: could not find errors Feb 20, 2025
@github-project-automation github-project-automation bot moved this to 📌 Triage in FilOz Feb 20, 2025
@rjan90
Copy link
Contributor

rjan90 commented Feb 20, 2025

I´m going to repurpose this ticket to track and investigate the recurring issue of excessive ipld: could not find errors that have been reported across various components of the system.

Background

Over the past year, there have been multiple reports of excessive ipld: could not find bafy... errors appearing in logs. These issues manifest in different contexts and have been observed by multiple users and developers.

Reported Cases

  1. Magik's Report (BLS Message Issue)

A bug appears to be causing splitstore to remove signed BLS messages prematurely (or before they properly land on-chain), leading to issues in boost/curio/lotus-miner. The typical sequence:

  • A BLS message is sent
  • StateWaitMsg is initiated
  • Splitstore compacts
  • Node restarts
  • StateWaitMsg fails with: "failed to load message: ipld: could not find bafy..."
  1. Slash Detector User Report:

A user reported excessive ERROR-level logs during slash detector usage:

2025-02-19T10:33:00.304+0300    ERROR   slashsvc        slashsvc/slashservice.go:72     slash detector errored: chain get block error:ipld: could not find bafy2bzacebkhsecmxeb5l4rl4iztdtqw6u2opbt3psvrvhsasu775l5w36vle

Additional Context

Next Steps

While immediate bandwidth for a deep dive into this issue in the team may be limited, we will use this ticket to collect additional reports and context for future investigation.

@rvagg
Copy link
Member

rvagg commented Feb 20, 2025

@rjan90's mention of this issue @ #12907 (comment) made me connect this issue with this one -> #12897 - that's an instance where a BLS message wasn't found by the ChainIndexer and in that case it caused a panic because it's assumed that it can always find messages for new tipsets. And in that case, it's on an archival node so we're not even dealing with splitstore. Not sure it's connected but it sounds eerily related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chain Area: Chain kind/bug Kind: Bug need/team-input Hint: Needs Team Input P1 P1: Must be resolved
Projects
Status: 📌 Triage
Development

No branches or pull requests

4 participants