Gracefully recover from wedged session (discard session for next distribution, or reshare as m.room.key) #1992

pmaier1 · 2023-08-17T12:38:56Z

High Level:

Explore a way to (selectively!) re-request encryption keys to increase crypto reliability.
Given that olm session can get wedged, could it be possible find a way to recover from that: i.e be able to get back the failed to decrypt message.

IT IS NOT the re-implementation of key-requesting that was terrible (creating a lot of traffic and slowing down everything)

Scenario:

As a sender I am notified that a recipient wasn't able to decrypt a message I sent
Given that this user was supposed to get that key, we could reset the olm session then re-share the current outbound session as a m.room_key (not a forwarded key)

The recipient would send the to_device failed to decrypt ack only to the exact sender of the to device! (different from key requesting that would ask to all the current user devices)

The text was updated successfully, but these errors were encountered:

richvdh · 2023-10-19T22:00:52Z

@pmaier1 I'm not really sure what's to be done here. We already have recovery from wedged olm sessions. Can you explain what precise scenario we're trying to cover here?

pmaier1 · 2023-10-25T15:30:35Z

Well, to my understanding there still are cases where a user has to manually type /discardsession in order to recover. The idea here was to automate this, if possible.

richvdh · 2023-10-25T15:47:32Z

Hrm. /discardsession replaces the megolm session, and megolm sessions can't really get "wedged" in the same way as olm sessions. /discardsession won't do anything to help with a wedged olm session.

Generally I'd say we should figure out what cases a /discardsession actually helps with, and propose real fixes for them on a case-by-case basis, rather than automating a /discardsession (which I suspect would be hard, in any case).

Any idea what those cases are?

kegsay · 2024-01-17T15:42:35Z

We only expect to see this when clients or servers need to rollback their database.

There are other cases where /discardsession will help fix things, but those other cases should be fixable, whereas server/client rollbacks aren't.

pmaier1 · 2024-01-19T16:04:12Z

The solution to this ticket is supposed to also solve #2155.

BillCarsonFr · 2024-02-26T15:33:41Z

We think this would be usefull for robustness. But for now we focus efforts on finding root causes of wedging

richvdh · 2024-02-28T17:25:29Z

It isn't made explicit anywhere above: I believe the intention of this issue is to improve the current olm session recovery: the current implementation does nothing to help with Olm messages that have already been sent. (Which is why /discardsession helps: it ensures that the next message the user sends will cause a new megolm keyshare, over a new Olm session.)

As a recipient, if we detect a wedged olm session (which causes us to make a new olm session), we need to tell the sender about the situation so that they know they still need to send us the key. Ideally, the sender needs to send us all megolm keys they already tried to send, but at the very least they need to mark all existing keys as "not yet shared with this device".

richvdh · 2024-04-10T15:56:09Z

As a partial solution to this, we could not worry about past messages, but at least improve the situation for future messages.

Currently: the olm-session unwedging doesn't help with existing megolm sessions until the sending user does a /discardsession, or the megolm session is rotated for another reason (eg, someone leaving the room).

Instead we could:

When a recipient receives an undecryptable Olm message: instead of sending an Olm-encrypted m.dummy message (as now), they instead send a more explicit "failure" message.
When the sender receives a "failure" message, then: for all active outgoing megolm sessions, they clear the flag that says the session has been successfully shared with the recipient device.
This will then mean that, next time the sender sends a message, the megolm session will be *re-*shared using a new Olm session (though note that it will only be shared in its ratcheted state, so this won't help decrypt messages which have already been sent). In other words, we are automating the work currently done by a /discardsession in this scenario, in a rather more efficient way.

⚠️ we would need to be careful to still rotate the megolm session if the device leaves the room after sending the "failure" notification, in case it was lying about the failure.

richvdh · 2024-04-11T09:54:41Z

As a partial solution to this, we could not worry about past messages, but at least improve the situation for future messages....

I have split this out to #2389.

BillCarsonFr · 2024-04-24T12:54:26Z

Some more context for the record.
Issues like the following element-hq/synapse#17117 (following mx.org outage), could result also in dlivery failures. This is a bug that need to be fixed, but currently there is no way to recover from it (until next session rotation)

kegsay · 2024-04-25T16:41:55Z

I asked some questions about how we recover from wedged Olm sessions. The purpose of sending an m.dummy Olm message is:

to try to make the receiver use the new session and hence not cause the sender to get a UTD from them by continuing to use the wedged session

There are potentially problems around this mechanism though. Rust SDK seems to not follow the spec which says:

If a client has multiple sessions established with another device, it should use the session from which it last received and successfully decrypted a message.

It seems to use creation time instead. We need to align on which is correct. In addition, vdh points out that client session timestamps have been corrupted so we likely need to consider that too.

In all cases, recovering from a wedged OIm session involves a new /keys/claim which itself may cause UTDs which is particularly ironic, given we're doing this to fix UTDs.

BillCarsonFr · 2024-05-06T15:19:25Z

Crosslink matrix-org/matrix-rust-sdk#3356

pmaier1 mentioned this issue Aug 17, 2023

Unable To Decrypt meta issue #245

Open

82 tasks

pmaier1 added the Team: Crypto label Aug 17, 2023

pmaier1 changed the title ~~🕛 Gracefully recover from wedged session (discard session for next distribution, or reshare as m.room.key)~~ Gracefully recover from wedged session (discard session for next distribution, or reshare as m.room.key) Aug 28, 2023

richvdh mentioned this issue Jan 12, 2024

Rolling a homeserver's database back via backup could cause duplicate OTKs and hence UISIs #2155

Open

richvdh mentioned this issue Feb 28, 2024

Users whose servers were unreachable when you logged in will send you undecryptable messages #2314

Open

richvdh mentioned this issue Apr 11, 2024

Ensure that existing megolm sessions are re-shared after a wedged olm session #2389

Closed

richvdh added the A-E2EE label Apr 11, 2024

kegsay mentioned this issue Apr 25, 2024

Epic: One-Time Keys can desync between client and server causing UTDs #2406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gracefully recover from wedged session (discard session for next distribution, or reshare as m.room.key) #1992

Gracefully recover from wedged session (discard session for next distribution, or reshare as m.room.key) #1992

pmaier1 commented Aug 17, 2023 •

edited by BillCarsonFr

Loading

richvdh commented Oct 19, 2023

pmaier1 commented Oct 25, 2023

richvdh commented Oct 25, 2023 •

edited

Loading

kegsay commented Jan 17, 2024 •

edited

Loading

pmaier1 commented Jan 19, 2024

BillCarsonFr commented Feb 26, 2024

richvdh commented Feb 28, 2024

richvdh commented Apr 10, 2024 •

edited

Loading

richvdh commented Apr 11, 2024

BillCarsonFr commented Apr 24, 2024

kegsay commented Apr 25, 2024 •

edited

Loading

BillCarsonFr commented May 6, 2024

Gracefully recover from wedged session (discard session for next distribution, or reshare as m.room.key) #1992

Gracefully recover from wedged session (discard session for next distribution, or reshare as m.room.key) #1992

Comments

pmaier1 commented Aug 17, 2023 • edited by BillCarsonFr Loading

richvdh commented Oct 19, 2023

pmaier1 commented Oct 25, 2023

richvdh commented Oct 25, 2023 • edited Loading

kegsay commented Jan 17, 2024 • edited Loading

pmaier1 commented Jan 19, 2024

BillCarsonFr commented Feb 26, 2024

richvdh commented Feb 28, 2024

richvdh commented Apr 10, 2024 • edited Loading

richvdh commented Apr 11, 2024

BillCarsonFr commented Apr 24, 2024

kegsay commented Apr 25, 2024 • edited Loading

BillCarsonFr commented May 6, 2024

pmaier1 commented Aug 17, 2023 •

edited by BillCarsonFr

Loading

richvdh commented Oct 25, 2023 •

edited

Loading

kegsay commented Jan 17, 2024 •

edited

Loading

richvdh commented Apr 10, 2024 •

edited

Loading

kegsay commented Apr 25, 2024 •

edited

Loading