Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker pool and dirty list #79

Merged

Conversation

pzel
Copy link
Contributor

@pzel pzel commented Jun 8, 2017

Tracker Pool & dirty_list API

Summary

This work is aimed at increasing the capacity of applications using
Phoenix Presence.

Specifically, it addresses two issues:

  1. A single Tracker server becoming a bottleneck under high throughput
  2. Tracker.list calls invoking GenServer.call to get a list of
    presences

The first issue is resolved by starting a pool of named Tracker.Shards,
and dispatching calls to them based on the topic in question. The default
pool size is 1, and therefore the current behavior of Phoenix.Presence is
not affected.

The second issue is resolved by introducing a Tracker.dirty_list function
with the same API as Tracker.list, but with much less overhead and less
precise results.

Comments & Caveats

This likely addresses #71.

A corresponding change will need to be made to the Phoenix Framework in order
for the default Phoenix.Presence handler to expose the new
Tracker.dirty_list functionality. In the meantime, applications that need
to use dirty_list can implement it directly, as shown below:

# file: myapp/lib/myapp/presence_tracker.ex
#

defmodule MyApp.PresenceTracker do
  use Phoenix.Presence,
    otp_app: :myapp,
    pubsub_server: MyApp.PubSub,
    pool_size: 128

  def dirty_list(%Phoenix.Socket{topic: topic}), do: dirty_list(topic)
  def dirty_list(topic), do: dirty_list(__MODULE__, topic)

  def dirty_list(module, topic) do
    grouped =
      module
      |> Phoenix.Tracker.dirty_list(topic)
      |> group()
    module.fetch(topic, grouped)
  end

  defp group(presences) do
    presences
    |> Enum.reverse()
    |> Enum.reduce(%{}, fn {key, meta}, acc ->
      Map.update(acc, to_string(key), %{metas: [meta]}, fn %{metas: metas} ->
        %{metas: [meta | metas]}
      end)
    end)
  end
end                                                                                          

Graphs from the application running under target production load

The graphs compare the same scenario with different configurations of the proposed changes, the default setting being the one of them.

Pool Size Load Test Results:

  • 150K channels connected via websocket;
  • A single channel lifetime uniformly distributed between 0 and 300 seconds;
  • Every channel subscribes to presence updates of 100 other channels.

At pool_size = 1, the track time tail latencies are much higher than the corresponding metric at pool_size = 36.

Pool size: 1

pool size 1

Pool size: 36

pool size 36

List Function Load Test Results

  • 137.5K channels connected via websocket;
  • A single channel lifetime uniformly distributed between 0 and 300 seconds;
  • Every channel subscribes to presence updates of 500 other channels.

List times are dramatically lower when using 'dirty_list, due to the complete bypass of GenServer.call`s.

Tracker.list @ pool size = 128

presence list

Tracker.dirty_list @ pool size = 128

presence dirty_list

@pzel pzel force-pushed the tracker-pool-and-dirty-list branch 2 times, most recently from 10324f2 to 14e4512 Compare June 8, 2017 17:37
@pzel pzel force-pushed the tracker-pool-and-dirty-list branch from 14e4512 to 7206a18 Compare July 4, 2017 12:39
@pzel pzel force-pushed the tracker-pool-and-dirty-list branch from 7206a18 to 1c9ab71 Compare July 4, 2017 12:53
@pzel
Copy link
Contributor Author

pzel commented Jul 4, 2017

Hi @chrismccord!
We rebased this PR on top of your clouds work on the Tracker. We think it's ready to go -- please take a look when you find a moment.

Happy 4th! 🇺🇸

@josevalim
Copy link
Member

Hi @pzel, let me hijack this thread briefly. I have just watched your excellent ElixirConf talk. At some point in the talk you mentioned the use of phx_requests for tracking subscription and I want to be sure I understood the design. Previously, you would join a channel for every friend you had. Then you changed it to have a single channel that subscribes to the topic of every friend. Is this correct?

@josevalim
Copy link
Member

I just got to the questions part of the talk and Chris mentioned fastlaning and I would like to point out you can still fastlane if you pass the proper options on subscribe.

@chrismccord, it may be worth adding a subscribe function to channels that always pass the proper fastlane options. I feel like we have talked about it in the past. Any downsides?

@OvermindDL1
Copy link

Then you changed it to have a single channel that subscribes to the topic of every friend.

That is what I do, prevents process-explosion. ^.^;

I just got to the questions part of the talk and Chris mentioned fastlaning and I would like to point out you can still fastlane if you pass the proper options on subscribe.

Ooo, I did not know about those options...

it may be worth adding a subscribe function to channels that always pass the proper fastlane options.

Please yes? :-)

@chrismccord
Copy link
Member

@josevalim I implemented this but then reverted because the clients are not aware of the fastlaned topics. Broadcasts are matched on the client via the topic, so the only way to handle fastlaned messages would be to use socket.onMessage and try to parse out what special topics/events you are looking for. So including it as a generalized easy-to-use option option doesn't seem straightforward enough to me as a feature. As you noted, it's all still possible today by passing the options yourself if you have special handling needs. Are there other options for handling on the client that I'm missing?

@chrismccord
Copy link
Member

I'll also note that the "subscribe to many topics via a single channel process" is definitely a pattern we promote for different usecases. Some references to this in the guides would be nice.

@josevalim
Copy link
Member

josevalim commented Sep 30, 2017 via email

@chrismccord
Copy link
Member

chrismccord commented Sep 30, 2017 via email

@pzel
Copy link
Contributor Author

pzel commented Oct 1, 2017

@josevalim: Chris pretty much summed up what we ran into.

We developed a fastlane module that does some sanitization, and plugged it in globally. It worked great, except for the fact that the topic was now hardcoded to the topic of the originator of the broadcast. The clients were not listening to these (they only reacted to messages coming from OurPrefix:MyId, their home-channel topic), and so we had to move back to fastlane: nil to maintain backward compatibility.

This is what our setup looked like before we had to revert:

supervisor(Phoenix.PubSub.PG2,
        [OurApp.PubSub, [fastlane: OurApp.PrivacyPreservingFastLane,
                         broadcast_strategy: Phoenix.PubSub.Strategy.Serial]]),

@lucaspolonio
Copy link

lucaspolonio commented Apr 16, 2018

Hey @chrismccord, are there any plans to merge this PR? I'd love to see it released. Phoenix.Presence really suffers under high throughput right now and I believe this PR addresses most of the issues.

@chrismccord
Copy link
Member

Yes, it is still on my plate!

@chrismccord chrismccord merged commit 1c9ab71 into phoenixframework:master Jul 18, 2018
@chrismccord
Copy link
Member

Sorry this took so long to get merged in. Thanks so much!!! Note: I have decided to remove the dirty_list API for now so as not to increase the API surface area. Thank you! ❤️❤️❤️🐥🔥

@pzel
Copy link
Contributor Author

pzel commented Jul 18, 2018

Sweet! Hope people enjoy the speed benefits!

@indrekj
Copy link
Contributor

indrekj commented Jun 21, 2019

@pzel do you have recommendations how to choose a pool size for the tracker?

Also, I understand dirty_list was removed from the tracker API. I still see it in the Shard module. Is it possible to still use it?

@pzel
Copy link
Contributor Author

pzel commented Jun 21, 2019

@indrekj I've been out of the loop regarding PubSub performance for a while now, so I can't make any recommendations about how to measure total system performance for your use case. However, I think that using a pool size equal to the number of CPUs on your target deployment machine is a reasonable choice. If you don't know this number, use 16 and see how the system behaves ;)

Regarding dirty_list, I think the only way to access it now is to keep a local fork of phoenix_pubsub and add
https://github.com/phoenixframework/phoenix_pubsub/pull/79/files#diff-fc3d1e8efd95176c339eaafe5ab24ad6R232 to the tracker source. Not elegant, I know, but I understand why the Phoenix project wouldn't want a 'broken' function in their api.

@chasers
Copy link

chasers commented Oct 15, 2019

If anyone else needs this you don't need to fork it, just implement a dirty_list function in your app that looks like list, except use dirty_list. Like:

def dirty_list(tracker_name, topic) do
    pool_size = your_pool_size

    tracker_name
    |> Phoenix.Tracker.Shard.name_for_topic(topic, pool_size)
    |> Phoenix.Tracker.Shard.dirty_list(topic)
  end

@indrekj
Copy link
Contributor

indrekj commented Oct 15, 2019

I also made a PR #127 which removes the need for dirty_list.

@chasers
Copy link

chasers commented Oct 15, 2019

@indrekj I saw that! I was trying not to diverge from what's here, as I'm pretty new to all this. I hope they merge it! The whole thing is eventually consistent so why serialize the reads in the first place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants