-
-
Notifications
You must be signed in to change notification settings - Fork 674
Holes
Following on from the state resolution v2 blog post we did, the Dendrite team is here to try to explain another mystical aspect of Federation: holes.
Events in a room form a Directed Acyclic Graph (DAG), where each event "points" to the previous event before it.
A ----- B ----- C
[] [A] [B] prev_events
Events can also branch out and form forks, which can then later be merged:
.--- D --.
| [B] |
| |
A ----- B -+--- C --+--- E
[] [A] [B] [C,D] prev_events
When a server receives an event, it probably has the prev_events
and therefore it can perform all the calculations
it needs to. However, if it doesn't have them it can request them via /get_missing_events:
A ---- B ---- C ---- D ---- E
Server 1 has? ✓ ✓ ✓
Server 2 has? ✓ ✓ ✓ ✓ ✓
Server 1 has A,B and then receives E. It needs to request C,D:
POST /_matrix/federation/v1/get_missing_events/%21SomeRoom%3Amatrix.org HTTP/1.1
Content-Type: application/json
{
"limit": 10,
"min_depth": 0,
"earliest_events": [
"$event_B:example.org"
],
"latest_events": [
"$event_E:example.org"
]
}
NB: Be aware of the misleading key names. /get_missing_events
does not return events "between" the two events given, but instead
walks the DAG, backwards from latest_events
and ignoring earliest_events
. This distinction is important for DAGs like this:
.----- D ---- E ---- F Server 2
A --- B --|
`----- C Server 1
Server 1 has A,B,C and then receives F
Server 2 has A,B,D,E,F
Server 1 needs to request D,E
POST /_matrix/federation/v1/get_missing_events/%21SomeRoom%3Amatrix.org HTTP/1.1
Content-Type: application/json
{
"limit": 10,
"min_depth": 0,
"earliest_events": [
"$event_C:example.org"
],
"latest_events": [
"$event_F:example.org"
]
}
If /get_missing_events
returned events "between" the two events, it would not return D,E as they are not between C,F in the DAG.
This API works well when the server is missing a few events, but sometimes the server is missing a lot of events. When this happens,
the server may decide to give up trying to call /get_missing_events
over and over and instead a hole is born.
A hole refers to the "hole" in the DAG that is formed when a server has missed many events from Federation and decides to give up
backfilling via /get_missing_events
e.g. the server has been offline for a long time.
These DAGs can have many different shapes: (Server previously had A,B,C and wants to now store X,Y,Z) - we'll refer to these events throughout.
A A A
| / \ |
B B D B
| | | |
C C E C
| | / \
... ... ... ...
| | \ /
X X X
| | |
Y Y Y
| | |
Z Z Z
Linear Early fork Fork
Servers need to care about holes so they can perform auth checks and verify the room state for all the messages it wants to persist. This means it needs to know the state
of the room before X to work out if X is allowed, and so on for Y and Z. Clients need to care about holes so they can show the gap on their UI and backfill via /messages
requests.
Servers need to perform auth checks to work out if the event being processed is "allowed" according to the rules of the room. These checks are done on a snapshot of the room state at a particular point in time: just before the event being processed is added to the DAG.
A frequent gotcha is the distinction between before/after/at when referring to the "state of the room" for a particular event. The state "at" an event is an ambigous term as it could refer to before or after the event has been applied. Ideally, no code or specification should refer to the state "at" an event. The terms "before" and "after" are also somewhat suggestive and unfortunately the intuition is wrong: the state after B is not always the state before C.
Consider the following graphs, one linear and one fork:
Before A ___ ________ Before B
After A ____A A B ________ After B - no state from A involved, B applied.
Before B ___| \ /__________ Before C - state resolution over A,B applied, C not involved.
After B ____B C___________ After C - state resolution over A,B applied, C applied.
Before C ___|
After C ____C
For the linear DAG, the state after B is the same as the state before C. However, for the fork DAG, the state after B has no knowledge of A and hence isn't included. The state before C is the result of the state resolution algorithm being applied over the two branches A,B and hence will result in different state.
- We want to persist X,Y,Z
- You need to know the state before X.
- Get the state before X (e.g via
/state_ids
)
This is intuitive and simple but unfortunately is wrong because it means that we are entirely trusting the room state in /state_ids
. Consider the example:
TODO: Explain an attack.
We need to perform state resolution to resolve this. So instead, server implementations should do the following...
- We want to persist X,Y,Z
- You need to know the state before X. Steps:
- Get the state AFTER X's prev_events
- Apply your current known state after event C.
- Apply state resolution over it.
- You now have the state before X.
Yep, we mix in what we previously knew about the room into the state resolution algorithm, and rely on the algorithm to come to a sensible conclusion. Effectively, we're pretending that C is a prev_event to X, even when it isn't! This fixes the attack whereby:
TODO: Explain how this fixes the attack
TODO
- It can still be referenced in the DAG by other servers
- But also, just because the event is rejected doesn't mean its useless
- (eg the prev events might still be useful for ordering purposes, and for figuring out state of any events pointing at it)