Do not unnecessarily retransmit `commitment_signed` in dual funding #1214

TheBlueMatt · 2024-12-09T19:47:07Z

On reconnection in the middle of the dual-funding flow, if both nodes have exchanged the initial commitment_signed and node A had sent its (initial) tx_signatures but node B never received them, both nodes should send a channel_reestablish with next_funding_txid set and a next_commitment_number of 1 (as they've already received the commitment transaction for commitment number 0).

The spec indicates in this case that both nodes should retransmit their commitment_signed, however, as this is only gated on next_funding_txid and not the next_commitment_number field. This may cause implementations which assume that each new commitment_signed is for a new state to fail and potentially fail the channel.

Instead, we should rely both the presence of next_funding_txid and next_commitment_number being zero to decide if we need to resend our commitment_signed. Sadly, we cannot rely on just next_commitment_number as that is used to request a force-closure in a non-standard way to work around implementations not honoring the error message.

02-peer-protocol.md

…umber` As pointed out in lightning/bolts#1214, when reconnecting a partially signed `interactive-tx` session, we should set `next_commitment_number` to the current commitment number if we haven't received our peer's `commit_sig`, which tells them they need to retransmit it. That's not what we're currently doing: we're currently setting this value to the next commitment number, regardless of whether or not we have received our peer's `commit_sig`. And we always retransmit our `commit_sig` if our peer is setting `next_funding_txid`, even if they have already received it. More importantly, if our peer behaves correctly and sends us the current commitment number, we will think that they're late and will halt, waiting for them to send `error`. This commit fixes that by allowing our peers to use the current commitment number when they set `next_funding_txid`. Note that this doesn't yet make us spec-compliant, but in order to guarantee backwards-compatibility, we must first deploy that change before we can start removing spurious `commit_sig` retransmissions.

We fully implement lightning/bolts#1214 to stop retransmitting `commit_sig` when our peer has already received it. We also correctly set `next_commitment_number` to let our peer know whether we have received their `commit_sig` or not.

As pointed out in lightning/bolts#1214, when reconnecting a partially signed `interactive-tx` session, we should set `next_commitment_number` to the current commitment number if we haven't received our peer's `commit_sig`, which tells them they need to retransmit it. More importantly, if our peer behaves correctly and sends us the current commitment number, we must not think that they're late and halt, waiting for them to send `error`. This commit fixes that by allowing our peers to use the current commitment number when they set `next_funding_txid`. Note that we keep retransmitting our `commit_sig` regardless of the value our peer set in `next_commitment_number`, because we need to wait for them to have an opportunity to upgrade. In a future commit we will stop sending spurious `commit_sig`.

We fully implement lightning/bolts#1214 to stop retransmitting `commit_sig` when our peer has already received it. We also correctly set `next_commitment_number` to let our peer know whether we have received their `commit_sig` or not.

ddustin · 2025-01-06T22:41:59Z

  - if `next_commitment_number` is equal to the commitment number of
  the last `commitment_signed` message the receiving node has sent:
    - MUST reuse the same commitment number for its next `commitment_signed`.
  - otherwise:
    - if `next_commitment_number` is not 1 greater than the
  commitment number of the last `commitment_signed` message the receiving
  node has sent:
      - SHOULD send an `error` and fail the channel.
    - if it has not sent `commitment_signed`, AND `next_commitment_number`
    is not equal to 1:
      - SHOULD send an `error` and fail the channel.

Core lightning currently follows these rules during channel_reestablish and will fail the channel on reestablish with a 0 value of next_commitment_number.

	/* BOLT #2:
	 *
	 *   - if `next_commitment_number` is equal to the commitment
	 *     number of the last `commitment_signed` message the receiving node
	 *     has sent:
	 *     - MUST reuse the same commitment number for its next
	 *       `commitment_signed`.
	 */
	if (next_commitment_number == peer->next_index[REMOTE] - 1) {
		/* We completed opening, we don't re-transmit that one! */
		if (next_commitment_number == 0)
			peer_failed_err(peer->pps,
					 &peer->channel_id,
					 "bad reestablish commitment_number: %"
					 PRIu64,
					 next_commitment_number);

		retransmit_commitment_signed = true;

	/* BOLT #2:
	 *
	 *   - otherwise:
	 *     - if `next_commitment_number` is not 1 greater than the
	 *       commitment number of the last `commitment_signed` message the
	 *       receiving node has sent:
	 *       - SHOULD send an `error` and fail the channel.
	 */
	} else if (next_commitment_number != peer->next_index[REMOTE])
		peer_failed_err(peer->pps,
				&peer->channel_id,
				"bad reestablish commitment_number: %"PRIu64
				" vs %"PRIu64,
				next_commitment_number,
				peer->next_index[REMOTE]);
	else
		retransmit_commitment_signed = false;

niftynei · 2025-01-06T22:49:25Z

This may cause implementations which assume that each new commitment_signed is for a new state to fail and potentially fail the channel.

It's worth noting that the proposed change (setting next_commitment_number to zero) will fail channels for existing CLN deployments.

…umber` (#2965) As pointed out in lightning/bolts#1214, when reconnecting a partially signed `interactive-tx` session, we should set `next_commitment_number` to the current commitment number if we haven't received our peer's `commit_sig`, which tells them they need to retransmit it. That's not what we're currently doing: we're currently setting this value to the next commitment number, regardless of whether or not we have received our peer's `commit_sig`. And we always retransmit our `commit_sig` if our peer is setting `next_funding_txid`, even if they have already received it. More importantly, if our peer behaves correctly and sends us the current commitment number, we will think that they're late and will halt, waiting for them to send `error`. This commit fixes that by allowing our peers to use the current commitment number when they set `next_funding_txid`. Note that this doesn't yet make us spec-compliant, but in order to guarantee backwards-compatibility, we must first deploy that change before we can start removing spurious `commit_sig` retransmissions.

rustyrussell · 2025-01-13T10:24:40Z

But if I understand correctly it will only fail on the reconnect corner case, so maybe nobody will notice?

Technically it's still experimental for CLN, so we could change it. Better would be to change the feature bit, but that is pretty intrusive as it's been merged in the spec already.

t-bast · 2025-01-13T10:29:25Z

But if I understand correctly it will only fail on the reconnect corner case, so maybe nobody will notice?

Yes exactly. And I just combed our node's logs, and it never happened that we were in this reconnection case with a non-phoenix node. So I really feel that we shouldn't bother with backwards-compat and that nobody will run into this issue in practice.

Better would be to change the feature bit, but that is pretty intrusive as it's been merged in the spec already.

Agreed, using a feature bit here would be annoying...I think it's safe to update this without it and consider it an implementation issue?

On reconnection in the middle of the dual-funding flow, if both nodes have exchanged the initial `commitment_signed` and node A had sent its (initial) `tx_signatures` but node B never received them, both nodes should send a `channel_reestablish` with `next_funding_txid` set and a `next_commitment_number` of 1 (as they've already received the commitment transaction for commitment number 0). The spec indicates in this case that both nodes should retransmit their `commitment_signed`, however, as this is only gated on `next_funding_txid` and not the `next_commitment_number` field. This may cause implementations which assume that each new `commitment_signed` is for a new state to fail and potentially fail the channel. Instead, we should rely both the presence of `next_funding_txid` *and* `next_commitment_number` being zero to decide if we need to resend our `commitment_signed`. Sadly, we cannot rely on just `next_commitment_number` as that is used to request a force-closure in a non-standard way to work around implementations not honoring the `error` message.

Sending a `channel_reestablish` with `next_commitment_number` of zero is used in practice to request a peer fail a channel and broadcast the latest state (for implementations which continue to ignore the `error` message). Because its used in practice we should document it to avoid accidentally writing up incompatible things in the future.

TheBlueMatt · 2025-01-14T16:54:37Z

Fixed the issue dusty pointed out:

$ git diff-tree -U5 90772ff d8cfa95
diff --git a/02-peer-protocol.md b/02-peer-protocol.md
index e9ba8bc..3e7ecb2 100644
--- a/02-peer-protocol.md
+++ b/02-peer-protocol.md
@@ -2480,16 +2480,12 @@ A node:
     - MUST ignore any redundant `channel_ready` it receives.
   - if `next_commitment_number` is equal to the commitment number of
   the last `commitment_signed` message the receiving node has sent:
     - MUST reuse the same commitment number for its next `commitment_signed`.
   - otherwise:
-    - if `next_commitment_number` is not 1 greater than the
-  commitment number of the last `commitment_signed` message the receiving
-  node has sent:
-      - SHOULD send an `error` and fail the channel.
-    - if it has not sent `commitment_signed`, AND `next_commitment_number`
-    is not equal to 1:
+    - if `next_commitment_number` is not equal to the commitment number of the
+      next `commitment_signed` the receiving node will send:
       - SHOULD send an `error` and fail the channel.
   - if `next_revocation_number` is equal to the commitment number of
   the last `revoke_and_ack` the receiving node sent, AND the receiving node
   hasn't already received a `closing_signed`:
     - MUST re-send the `revoke_and_ack`.

t-bast

ACK d8cfa95, thanks!

As pointed out in lightning/bolts#1214, when reconnecting a partially signed `interactive-tx` session, we should set `next_commitment_number` to the current commitment number if we haven't received our peer's `commit_sig`, which tells them they need to retransmit it. More importantly, if our peer behaves correctly and sends us the current commitment number, we must not think that they're late and halt, waiting for them to send `error`. This commit fixes that by allowing our peers to use the current commitment number when they set `next_funding_txid`. Note that we keep retransmitting our `commit_sig` regardless of the value our peer set in `next_commitment_number`, because we need to wait for them to have an opportunity to upgrade. In a future commit we will stop sending spurious `commit_sig`.

We fully implement lightning/bolts#1214 to stop retransmitting `commit_sig` when our peer has already received it. We also correctly set `next_commitment_number` to let our peer know whether we have received their `commit_sig` or not.

dunxen · 2025-01-20T14:59:24Z

ACK d8cfa95

As pointed out in lightning/bolts#1214, when reconnecting a partially signed `interactive-tx` session, we should set `next_commitment_number` to the current commitment number if we haven't received our peer's `commit_sig`, which tells them they need to retransmit it. More importantly, if our peer behaves correctly and sends us the current commitment number, we must not think that they're late and halt, waiting for them to send `error`. This commit fixes that by allowing our peers to use the current commitment number when they set `next_funding_txid`. Note that we keep retransmitting our `commit_sig` regardless of the value our peer set in `next_commitment_number`, because we need to wait for them to have an opportunity to upgrade. In a future commit we will stop sending spurious `commit_sig`.

TheBlueMatt mentioned this pull request Dec 9, 2024

Follow-ups to PR 3137 lightningdevkit/rust-lightning#3423

Open

t-bast mentioned this pull request Dec 11, 2024

Lightning Specification Meeting 2024/12/16 #1213

Closed

19 tasks

t-bast reviewed Dec 11, 2024

View reviewed changes

02-peer-protocol.md Show resolved Hide resolved

02-peer-protocol.md Show resolved Hide resolved

t-bast mentioned this pull request Dec 12, 2024

Add more splice RBF reconnection tests ACINQ/eclair#2964

Merged

t-bast mentioned this pull request Dec 12, 2024

Reestablish partially signed splice with current remote_commitment_number ACINQ/eclair#2965

Merged

t-bast mentioned this pull request Dec 13, 2024

Remove spurious interactive-tx commit_sig retransmission ACINQ/eclair#2966

Draft

t-bast mentioned this pull request Dec 13, 2024

Correctly set next_commitment_number during splice reconnect ACINQ/lightning-kmp#736

Merged

t-bast mentioned this pull request Dec 13, 2024

Remove spurious interactive-tx commit_sig retransmission ACINQ/lightning-kmp#737

Draft

ziggie1984 mentioned this pull request Dec 18, 2024

[bug]: Remove Soft-Error Handling during ChannelLink reestablishment. lightningnetwork/lnd#9370

Open

This was referenced Jan 2, 2025

Channel Splicing (feature 62/63) #1160

Open

Lightning Specification Meeting 2025/01/13 #1216

Closed

TheBlueMatt added 2 commits January 14, 2025 16:53

TheBlueMatt force-pushed the 2024-12-no-spurious-retransmit branch from 90772ff to d8cfa95 Compare January 14, 2025 16:53

t-bast approved these changes Jan 15, 2025

View reviewed changes

t-bast mentioned this pull request Jan 22, 2025

Lightning Specification Meeting 2025/01/27 #1221

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not unnecessarily retransmit `commitment_signed` in dual funding #1214

Do not unnecessarily retransmit `commitment_signed` in dual funding #1214

TheBlueMatt commented Dec 9, 2024

ddustin commented Jan 6, 2025

niftynei commented Jan 6, 2025

rustyrussell commented Jan 13, 2025

t-bast commented Jan 13, 2025

TheBlueMatt commented Jan 14, 2025

t-bast left a comment

dunxen commented Jan 20, 2025

Do not unnecessarily retransmit commitment_signed in dual funding #1214

Are you sure you want to change the base?

Do not unnecessarily retransmit commitment_signed in dual funding #1214

Conversation

TheBlueMatt commented Dec 9, 2024

ddustin commented Jan 6, 2025

niftynei commented Jan 6, 2025

rustyrussell commented Jan 13, 2025

t-bast commented Jan 13, 2025

TheBlueMatt commented Jan 14, 2025

t-bast left a comment

Choose a reason for hiding this comment

dunxen commented Jan 20, 2025

Do not unnecessarily retransmit `commitment_signed` in dual funding #1214

Do not unnecessarily retransmit `commitment_signed` in dual funding #1214