Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not unnecessarily retransmit commitment_signed in dual funding #1214

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

TheBlueMatt
Copy link
Collaborator

On reconnection in the middle of the dual-funding flow, if both nodes have exchanged the initial commitment_signed and node A had sent its (initial) tx_signatures but node B never received them, both nodes should send a channel_reestablish with next_funding_txid set and a next_commitment_number of 1 (as they've already received the commitment transaction for commitment number 0).

The spec indicates in this case that both nodes should retransmit their commitment_signed, however, as this is only gated on next_funding_txid and not the next_commitment_number field. This may cause implementations which assume that each new commitment_signed is for a new state to fail and potentially fail the channel.

Instead, we should rely both the presence of next_funding_txid and next_commitment_number being zero to decide if we need to resend our commitment_signed. Sadly, we cannot rely on just next_commitment_number as that is used to request a force-closure in a non-standard way to work around implementations not honoring the error message.

02-peer-protocol.md Show resolved Hide resolved
02-peer-protocol.md Show resolved Hide resolved
t-bast added a commit to ACINQ/eclair that referenced this pull request Dec 12, 2024
…umber`

As pointed out in lightning/bolts#1214, when
reconnecting a partially signed `interactive-tx` session, we should
set `next_commitment_number` to the current commitment number if we
haven't received our peer's `commit_sig`, which tells them they need
to retransmit it.

That's not what we're currently doing: we're currently setting this
value to the next commitment number, regardless of whether or not
we have received our peer's `commit_sig`. And we always retransmit
our `commit_sig` if our peer is setting `next_funding_txid`, even
if they have already received it.

More importantly, if our peer behaves correctly and sends us the
current commitment number, we will think that they're late and will
halt, waiting for them to send `error`. This commit fixes that by
allowing our peers to use the current commitment number when they
set `next_funding_txid`.

Note that this doesn't yet make us spec-compliant, but in order to
guarantee backwards-compatibility, we must first deploy that change
before we can start removing spurious `commit_sig` retransmissions.
t-bast added a commit to ACINQ/eclair that referenced this pull request Dec 13, 2024
We fully implement lightning/bolts#1214 to stop
retransmitting `commit_sig` when our peer has already received it. We
also correctly set `next_commitment_number` to let our peer know whether
we have received their `commit_sig` or not.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Dec 13, 2024
As pointed out in lightning/bolts#1214, when
reconnecting a partially signed `interactive-tx` session, we should
set `next_commitment_number` to the current commitment number if we
haven't received our peer's `commit_sig`, which tells them they need
to retransmit it.

More importantly, if our peer behaves correctly and sends us the
current commitment number, we must not think that they're late and
halt, waiting for them to send `error`. This commit fixes that by
allowing our peers to use the current commitment number when they
set `next_funding_txid`.

Note that we keep retransmitting our `commit_sig` regardless of the
value our peer set in `next_commitment_number`, because we need to
wait for them to have an opportunity to upgrade. In a future commit
we will stop sending spurious `commit_sig`.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Dec 13, 2024
We fully implement lightning/bolts#1214 to stop
retransmitting `commit_sig` when our peer has already received it. We
also correctly set `next_commitment_number` to let our peer know whether
we have received their `commit_sig` or not.
@ddustin
Copy link
Contributor

ddustin commented Jan 6, 2025

  - if `next_commitment_number` is equal to the commitment number of
  the last `commitment_signed` message the receiving node has sent:
    - MUST reuse the same commitment number for its next `commitment_signed`.
  - otherwise:
    - if `next_commitment_number` is not 1 greater than the
  commitment number of the last `commitment_signed` message the receiving
  node has sent:
      - SHOULD send an `error` and fail the channel.
    - if it has not sent `commitment_signed`, AND `next_commitment_number`
    is not equal to 1:
      - SHOULD send an `error` and fail the channel.

Core lightning currently follows these rules during channel_reestablish and will fail the channel on reestablish with a 0 value of next_commitment_number.

	/* BOLT #2:
	 *
	 *   - if `next_commitment_number` is equal to the commitment
	 *     number of the last `commitment_signed` message the receiving node
	 *     has sent:
	 *     - MUST reuse the same commitment number for its next
	 *       `commitment_signed`.
	 */
	if (next_commitment_number == peer->next_index[REMOTE] - 1) {
		/* We completed opening, we don't re-transmit that one! */
		if (next_commitment_number == 0)
			peer_failed_err(peer->pps,
					 &peer->channel_id,
					 "bad reestablish commitment_number: %"
					 PRIu64,
					 next_commitment_number);

		retransmit_commitment_signed = true;

	/* BOLT #2:
	 *
	 *   - otherwise:
	 *     - if `next_commitment_number` is not 1 greater than the
	 *       commitment number of the last `commitment_signed` message the
	 *       receiving node has sent:
	 *       - SHOULD send an `error` and fail the channel.
	 */
	} else if (next_commitment_number != peer->next_index[REMOTE])
		peer_failed_err(peer->pps,
				&peer->channel_id,
				"bad reestablish commitment_number: %"PRIu64
				" vs %"PRIu64,
				next_commitment_number,
				peer->next_index[REMOTE]);
	else
		retransmit_commitment_signed = false;

@niftynei
Copy link
Collaborator

niftynei commented Jan 6, 2025

This may cause implementations which assume that each new commitment_signed is for a new state to fail and potentially fail the channel.

It's worth noting that the proposed change (setting next_commitment_number to zero) will fail channels for existing CLN deployments.

t-bast added a commit to ACINQ/eclair that referenced this pull request Jan 9, 2025
…umber` (#2965)

As pointed out in lightning/bolts#1214, when
reconnecting a partially signed `interactive-tx` session, we should
set `next_commitment_number` to the current commitment number if we
haven't received our peer's `commit_sig`, which tells them they need
to retransmit it.

That's not what we're currently doing: we're currently setting this
value to the next commitment number, regardless of whether or not
we have received our peer's `commit_sig`. And we always retransmit
our `commit_sig` if our peer is setting `next_funding_txid`, even
if they have already received it.

More importantly, if our peer behaves correctly and sends us the
current commitment number, we will think that they're late and will
halt, waiting for them to send `error`. This commit fixes that by
allowing our peers to use the current commitment number when they
set `next_funding_txid`.

Note that this doesn't yet make us spec-compliant, but in order to
guarantee backwards-compatibility, we must first deploy that change
before we can start removing spurious `commit_sig` retransmissions.
@rustyrussell
Copy link
Collaborator

But if I understand correctly it will only fail on the reconnect corner case, so maybe nobody will notice?

Technically it's still experimental for CLN, so we could change it. Better would be to change the feature bit, but that is pretty intrusive as it's been merged in the spec already.

@t-bast
Copy link
Collaborator

t-bast commented Jan 13, 2025

But if I understand correctly it will only fail on the reconnect corner case, so maybe nobody will notice?

Yes exactly. And I just combed our node's logs, and it never happened that we were in this reconnection case with a non-phoenix node. So I really feel that we shouldn't bother with backwards-compat and that nobody will run into this issue in practice.

Better would be to change the feature bit, but that is pretty intrusive as it's been merged in the spec already.

Agreed, using a feature bit here would be annoying...I think it's safe to update this without it and consider it an implementation issue?

On reconnection in the middle of the dual-funding flow, if both
nodes have exchanged the initial `commitment_signed` and node A
had sent its (initial) `tx_signatures` but node B never received
them, both nodes should send a `channel_reestablish` with
`next_funding_txid` set and a `next_commitment_number` of 1 (as
they've already received the commitment transaction for commitment
number 0).

The spec indicates in this case that both nodes should retransmit
their `commitment_signed`, however, as this is only gated on
`next_funding_txid` and not the `next_commitment_number` field.
This may cause implementations which assume that each new
`commitment_signed` is for a new state to fail and potentially
fail the channel.

Instead, we should rely both the presence of `next_funding_txid`
*and* `next_commitment_number` being zero to decide if we need to
resend our `commitment_signed`. Sadly, we cannot rely on just
`next_commitment_number` as that is used to request a force-closure
in a non-standard way to work around implementations not honoring
the `error` message.
Sending a `channel_reestablish` with `next_commitment_number` of
zero is used in practice to request a peer fail a channel and
broadcast the latest state (for implementations which continue to
ignore the `error` message).

Because its used in practice we should document it to avoid
accidentally writing up incompatible things in the future.
@TheBlueMatt TheBlueMatt force-pushed the 2024-12-no-spurious-retransmit branch from 90772ff to d8cfa95 Compare January 14, 2025 16:53
@TheBlueMatt
Copy link
Collaborator Author

Fixed the issue dusty pointed out:

$ git diff-tree -U5 90772ff d8cfa95
diff --git a/02-peer-protocol.md b/02-peer-protocol.md
index e9ba8bc..3e7ecb2 100644
--- a/02-peer-protocol.md
+++ b/02-peer-protocol.md
@@ -2480,16 +2480,12 @@ A node:
     - MUST ignore any redundant `channel_ready` it receives.
   - if `next_commitment_number` is equal to the commitment number of
   the last `commitment_signed` message the receiving node has sent:
     - MUST reuse the same commitment number for its next `commitment_signed`.
   - otherwise:
-    - if `next_commitment_number` is not 1 greater than the
-  commitment number of the last `commitment_signed` message the receiving
-  node has sent:
-      - SHOULD send an `error` and fail the channel.
-    - if it has not sent `commitment_signed`, AND `next_commitment_number`
-    is not equal to 1:
+    - if `next_commitment_number` is not equal to the commitment number of the
+      next `commitment_signed` the receiving node will send:
       - SHOULD send an `error` and fail the channel.
   - if `next_revocation_number` is equal to the commitment number of
   the last `revoke_and_ack` the receiving node sent, AND the receiving node
   hasn't already received a `closing_signed`:
     - MUST re-send the `revoke_and_ack`.

Copy link
Collaborator

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d8cfa95, thanks!

t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Jan 17, 2025
As pointed out in lightning/bolts#1214, when
reconnecting a partially signed `interactive-tx` session, we should
set `next_commitment_number` to the current commitment number if we
haven't received our peer's `commit_sig`, which tells them they need
to retransmit it.

More importantly, if our peer behaves correctly and sends us the
current commitment number, we must not think that they're late and
halt, waiting for them to send `error`. This commit fixes that by
allowing our peers to use the current commitment number when they
set `next_funding_txid`.

Note that we keep retransmitting our `commit_sig` regardless of the
value our peer set in `next_commitment_number`, because we need to
wait for them to have an opportunity to upgrade. In a future commit
we will stop sending spurious `commit_sig`.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Jan 17, 2025
As pointed out in lightning/bolts#1214, when
reconnecting a partially signed `interactive-tx` session, we should
set `next_commitment_number` to the current commitment number if we
haven't received our peer's `commit_sig`, which tells them they need
to retransmit it.

More importantly, if our peer behaves correctly and sends us the
current commitment number, we must not think that they're late and
halt, waiting for them to send `error`. This commit fixes that by
allowing our peers to use the current commitment number when they
set `next_funding_txid`.

Note that we keep retransmitting our `commit_sig` regardless of the
value our peer set in `next_commitment_number`, because we need to
wait for them to have an opportunity to upgrade. In a future commit
we will stop sending spurious `commit_sig`.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Jan 17, 2025
We fully implement lightning/bolts#1214 to stop
retransmitting `commit_sig` when our peer has already received it. We
also correctly set `next_commitment_number` to let our peer know whether
we have received their `commit_sig` or not.
@dunxen
Copy link
Contributor

dunxen commented Jan 20, 2025

ACK d8cfa95

pm47 pushed a commit to ACINQ/lightning-kmp that referenced this pull request Jan 20, 2025
As pointed out in lightning/bolts#1214, when
reconnecting a partially signed `interactive-tx` session, we should
set `next_commitment_number` to the current commitment number if we
haven't received our peer's `commit_sig`, which tells them they need
to retransmit it.

More importantly, if our peer behaves correctly and sends us the
current commitment number, we must not think that they're late and
halt, waiting for them to send `error`. This commit fixes that by
allowing our peers to use the current commitment number when they
set `next_funding_txid`.

Note that we keep retransmitting our `commit_sig` regardless of the
value our peer set in `next_commitment_number`, because we need to
wait for them to have an opportunity to upgrade. In a future commit
we will stop sending spurious `commit_sig`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants