Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial sync uses a ton of memory on Synapse's synchrotron #258

Open
turt2live opened this issue Aug 17, 2023 · 7 comments
Open

Initial sync uses a ton of memory on Synapse's synchrotron #258

turt2live opened this issue Aug 17, 2023 · 7 comments

Comments

@turt2live
Copy link
Member

I set up a brand new sliding sync proxy to test Element Android X, and when it actually started the poller for the first time it slowly ate all 7.5gb of memory I was able to give the synapse synchrotron worker, eventually causing OOM issues.

For comparison, an initial sync for my account on Element Desktop only uses 2-3gb of Synapse's synchrotron.

I suspect this is related to the use of an inline filter on the initial sync, but haven't confirmed. Using a pre-uploaded filter might yield better results, as I believe Synapse does the filtering in-database when using a pre-uploaded filter.

@turt2live
Copy link
Member Author

[synchrotron_1] 2023-08-17 21:50:00,271 - synapse.access.http.8050 - 465 - INFO - GET-1- 172.18.0.3 - 8050 - {@travis:t2l.io} Processed request: 646.819sec/-345.729sec (269.006sec, 42.309sec) (133.925sec/1309.064sec/34008) 0B 200! "GET /_matrix/client/r0/sync?timeout=0&filter=%7B%22room%22%3A%7B%22timeline%22%3A%7B%22limit%22%3A1%7D%7D%7D HTTP/1.0" "sync-v3-proxy-0.99.4" [1093305 dbevts]

final tally for memory usage was 11gb.

@DMRobertson
Copy link
Contributor

Sounds like a Synapse bug to me.

func (v *HTTPClient) createSyncURL(since string, isFirst, toDeviceOnly bool) string {
qps := "?"
if isFirst { // first time polling for v2-sync in this process
qps += "timeout=0"
} else {
qps += "timeout=30000"
}
if since != "" {
qps += "&since=" + since
}
// To reduce the likelihood of a gappy v2 sync, ask for a large timeline by default.
// Synapse's default is 10; 50 is the maximum allowed, by my reading of
// https://github.com/matrix-org/synapse/blob/89a71e73905ffa1c97ae8be27d521cd2ef3f3a0c/synapse/handlers/sync.py#L576-L577
// NB: this is a stopgap to reduce the likelihood of hitting
// https://github.com/matrix-org/sliding-sync/issues/18
timelineLimit := 50
if since == "" {
// First time the poller has sync v2-ed for this user
timelineLimit = 1
}
room := map[string]interface{}{}
room["timeline"] = map[string]interface{}{"limit": timelineLimit}
if toDeviceOnly {
// no rooms match this filter, so we get everything but room data
room["rooms"] = []string{}
}
filter := map[string]interface{}{
"room": room,
}
filterJSON, _ := json.Marshal(filter)
qps += "&filter=" + url.QueryEscape(string(filterJSON))
return v.DestinationServer + "/_matrix/client/r0/sync" + qps
}
is the logic for generating a sync URL.

We use a filter for two purposes:

  • setting a timeline limit (1 initial sync, 50 otherwise)
  • requesting no rooms, when starting a poller and we already have a poller running for that user.

@DMRobertson
Copy link
Contributor

Using a pre-uploaded filter might yield better results, as I believe Synapse does the filtering in-database when using a pre-uploaded filter.

I can't see any difference in the filtering logic for these two cases: https://github.com/matrix-org/synapse/blob/54317d34b76adb1e8f694acd91f631b3abe38947/synapse/rest/client/sync.py#L166-L187

@turt2live
Copy link
Member Author

from the sliding sync internal room, a realization: the filter sliding sync uses does not lazy load room members, while Element Desktop will. This almost certainly explains the 11gb of memory required to process the initial sync.

If it's not strictly required to have all the member events, I'd suggest the proxy aggressively lazy load members.

@kegsay
Copy link
Member

kegsay commented Aug 21, 2023

The proxy needs the member events at every event in order to locally calculate history visibility. E.g consider:

  • Alice joins the room. We're lazy loading so do not see Bob in the room.
  • 20 events go by in the room.
  • Bob starts using the proxy. As we already have a snapshot for the room, we do not create a new one. We do prepend unknown state events, which will include his join event.
  • Bob requests a timeline limit of 10. From the proxy's pov, Bob is not in the room prior to his sync so he sees nothing prior to his initial v2 sync.

The proxy was not designed to handle partial room state, and adding that in would be a significant, risky and costly change.

@kegsay
Copy link
Member

kegsay commented Mar 5, 2024

The scenario above is mitigated somewhat because of the cache invalidation work, coupled with #366 - the proxy tries really hard NOT to do history visibility checks so it will cut off serving events up to the user's join event.

There's still numerous pitfalls:

  • required_state: [["m.room.member","*"]] needs the entire member list to serve up the response, and clients need this for E2EE.
  • If when Bob starts syncing there are no new events, we will correctly update the current room state with his join event (as we always see if the state block has new events) but we won't touch the timeline, meaning from the proxy's pov Bob will not be able to see ANY events in the lazy loaded room, as there exists no timeline events with Bob as a joined user.

We ultimately need the entire member list. Synapse ideally should stream the list back if it's too large.

@kegsay
Copy link
Member

kegsay commented Apr 10, 2024

To further emphasise why we cannot using Synapse lazy loading: it's not even accurate. See element-hq/synapse#17050 and related issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants