Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replaced werkzeug/flask with uvicorn/starlette #375

Merged
merged 41 commits into from
May 3, 2023
Merged

Conversation

viniarck
Copy link
Member

@viniarck viniarck commented Apr 20, 2023

Closes #347
Closes #372
Closes #301
Closes #168
Closes #280
Closes #225

Summary

See updated changelog file.

Summary of upcoming starlette/uvicorn changes for NApp developers to be aware:

  • any unit test that calls a rest endpoint of kytos-ng platform should use an async client, get_test_client, will return an instance of httpx.asyncclient. Although it's an async client it can test both sync or async client, but the test case must be async, this is a constraint to ensure that pytest and event loops plays well (as mentioned earlier, unittest.testcase isn't compatible).
  • by default, prefer async endpoints over sync endpoints for IO-bound work. if you're using pymongo or any other lib that's blocking you should stick with sync endpoints to avoid blocking the event loop, since starlette will run sync endpoints in a threadpool. notice that race conditions can still happen, but they're easier to manage since context switching is explicit, and threading.lock isn't compatible, so if you have a dependency using a threading.lock and can't be migrated or moved then you should stick with a sync endpoint too.
  • starlette with uvicorn generally outperforms flask and uvicorn in most cases, latency has been also improved even for cases where sync endpoints are used, since uvicorn threadpool machinery is a bit more optimized.
  • kytos core dependencies will ship httpx, it's both a sync and async version of requests, relatively the same usability. you don't need to replace existing requests usage on our NApps but http calls should preferably use httpx since by default it works synchronously but is compatible with asyncio too.
  • uvicorn supports auto reload, but auto reloading the entire process isn't trivial especially considering the foreground mode and also how NApps start/stop and their life cycle, uvicorn provides a slightly better experience when serving the ui files, so even though we won't have yet a full blown hot-reload for any un changes, the workflow when developing ui for a Napp won't have that much friction though, since a page refresh is expected to work with any new changes on .kytos files.
  • starlette unlocks python-based websockets implementation, in the future, we could also allow websocket routes for bidirectional communications for certain NApps in the future.
  • the last significant IO-blocking lib that we have is pymongo, one day we might also introduce an async option with motor, but apm instrumentation doesn't work with it yet, and its implementation is just wrapping asyncio over threads (it could still be handy, but it's worth waiting to see how it'll evolve). other than that, you should be able to reach out to asyncio and asyncio-compatible libs to pretty much any other io parts of our code base.

Benchmark

Here's the benchmark request stress tests with uvicorn (with all of the recent draft PRs) and werkzeug. In summary, uvicorn is outperforming in most cases, and overall has lower latencies metrics for both async and ThreadPool-based (sync) routes:

  • GET topology/v3 500 reqs/sec over 60 secs with uvicorn, sync route:
❯ jq -ncM '{method: "GET", url: "http://localhost:8181/api/kytos/topology/v3/"}' | vegeta attack -format=json -rate 500/1s -duration=60s -timeout=60s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         30000, 500.02, 500.01
Duration      [total, attack, wait]             59.999s, 59.998s, 1.184ms
Latencies     [min, mean, 50, 90, 95, 99, max]  564.334µs, 1.922ms, 1.26ms, 2.258ms, 3.878ms, 18.72ms, 70.42ms
Bytes In      [total, mean]                     258030000, 8601.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:30000  
Error Set:
  • GET topology/v3 500 reqs/sec over 60 secs with werkzeug, sync route (this was the case where werkzeug lead to unstability):
❯ jq -ncM '{method: "GET", url: "http://localhost:8181/api/kytos/topology/v3/"}' | vegeta attack -format=json -rate 500/1s -duration=60s -timeout=60s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         30000, 500.01, 256.45
Duration      [total, attack, wait]             1m47s, 59.999s, 46.706s
Latencies     [min, mean, 50, 90, 95, 99, max]  2.356ms, 6.704s, 329.991ms, 32.181s, 55.272s, 1m0s, 1m0s
Bytes In      [total, mean]                     224220616, 7474.02
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           91.21%
Status Codes  [code:count]                      0:2636  200:27364  
Error Set:
Get "http://localhost:8181/api/kytos/topology/v3/": read tcp 127.0.0.1:49031->127.0.0.1:8181: read: connection reset by peer
Get "http://localhost:8181/api/kytos/topology/v3/": read tcp 127.0.0.1:36911->127.0.0.1:8181: read: connection reset by peer
Get "http://localhost:8181/api/kytos/topology/v3/": read tcp 127.0.0.1:53479->127.0.0.1:8181: read: connection reset by peer
Get "http://localhost:8181/api/kytos/topology/v3/": read tcp 127.0.0.1:33709->127.0.0.1:8181: read: connection reset by peer
  • POST of_lldp/v1/polling_time 200 reqs/sec over 60 scs with uvicorn, async route (this endpoint doesn't have additional IO though to make the diff even more evident):
~/repos/napps master*  1m 0s
❯ jq -ncM '{method: "POST", url: "http://localhost:8181/api/kytos/of_lldp/v1/polling_time", body: { "polling_time": 4 } | @base64, header: {"Content-Type": ["application/json"]}}' | vege
ta attack -format=json -rate 200/1s -duration=60s -timeout=120s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         12000, 200.02, 200.01
Duration      [total, attack, wait]             59.996s, 59.995s, 950.506µs
Latencies     [min, mean, 50, 90, 95, 99, max]  469.586µs, 1.12ms, 1.008ms, 1.603ms, 1.805ms, 2.232ms, 17.973ms
Bytes In      [total, mean]                     384000, 32.00
Bytes Out     [total, mean]                     216000, 18.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:12000  
Error Set:
  • POST of_lldp/v1/polling_time 200 reqs/sec over 60 scs with werkzeug, sync route (latency metrics are worse as expected with werkzeug compared to uvicorn):
❯ jq -ncM '{method: "POST", url: "http://localhost:8181/api/kytos/of_lldp/v1/polling_time", body: { "polling_time": 4 } | @base64, header: {"Content-Type": ["application/json"]}}' | vege
ta attack -format=json -rate 200/1s -duration=60s -timeout=120s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         12000, 200.02, 200.01
Duration      [total, attack, wait]             59.998s, 59.995s, 2.914ms
Latencies     [min, mean, 50, 90, 95, 99, max]  715.538µs, 3.044ms, 3.116ms, 3.668ms, 3.831ms, 4.229ms, 19.844ms
Bytes In      [total, mean]                     396000, 33.00
Bytes Out     [total, mean]                     216000, 18.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:12000  
Error Set:
  • POST kytos/flow_manager/v2/flows/{dpid} 200 reqs/sec over 60 secs with uvicorn, sync route (despite pymongo driver is still IO-blocking, it performed better, especially comparing the mean and under 50th percentile):
❯ jq -ncM '{method: "POST", url: "http://localhost:8181/api/kytos/flow_manager/v2/flows/00:00:00:00:00:00:00:01", body: { "force": true, "flows": [ { "priority": 10, "match": { "in_port"
: 1, "dl_vlan": 100 }, "actions": [ { "action_type": "output", "port": 1 } ] } ] } | @base64, header: {"Content-Type": ["application/json"]}}' | vegeta attack -format=json -rate 200/1s -
duration=60s -timeout=120s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         12000, 200.01, 140.00
Duration      [total, attack, wait]             1m26s, 59.996s, 25.718s
Latencies     [min, mean, 50, 90, 95, 99, max]  11.722ms, 12.601s, 632.181ms, 52.68s, 1m8s, 1m21s, 1m24s
Bytes In      [total, mean]                     432000, 36.00
Bytes Out     [total, mean]                     1464000, 122.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      202:12000  
Error Set:


  • POST kytos/flow_manager/v2/flows/{dpid} 200 reqs/sec over 60 secs with werkzeug:
❯ jq -ncM '{method: "POST", url: "http://localhost:8181/api/kytos/flow_manager/v2/flows/00:00:00:00:00:00:00:01", body: { "force": true, "flows": [ { "priority": 10, "match": { "in_port"
: 1, "dl_vlan": 100 }, "actions": [ { "action_type": "output", "port": 1 } ] } ] } | @base64, header: {"Content-Type": ["application/json"]}}' | vegeta attack -format=json -rate 200/1s -
duration=60s -timeout=120s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         12000, 200.02, 98.24
Duration      [total, attack, wait]             2m2s, 59.995s, 1m2s
Latencies     [min, mean, 50, 90, 95, 99, max]  75.724ms, 58.496s, 1m0s, 1m17s, 1m23s, 1m47s, 1m58s
Bytes In      [total, mean]                     444000, 37.00
Bytes Out     [total, mean]                     1464000, 122.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      202:12000  
Error Set:

Local Tests

I ran local tests with with all linked PRs (check out their PR summary for more info)

End-to-End Tests

e2e tests with this PR and related starlette PRs can be found here, they're passing

viniarck added 20 commits April 5, 2023 13:42
removed flask, flask-socketio, flask_cors
api_client
dead_letter
auth
Deleted autouse ev_loop fixture to avoid ev loop conflicts
Replaced werkzeug with uvicorn
Adapted APIServer methods accordingly
Used httpx when fetching ui web latest release tag
Refactored DeadLetter endpoints to be async
Introduced  to validate async routes
Broken down functions for reusability
@viniarck viniarck requested a review from a team as a code owner April 20, 2023 19:12
@viniarck viniarck marked this pull request as draft April 20, 2023 19:13
@viniarck viniarck marked this pull request as ready for review May 1, 2023 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment