Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support continuous batching in llama.cpp backend #659

Merged
merged 2 commits into from
Oct 29, 2023

Conversation

wsxiaoys
Copy link
Member

@wsxiaoys wsxiaoys commented Oct 28, 2023

On Modal T4 (k6 tests/*.loadtest.js), StarCoder-1B

0.4.0 (ctranslate2)

     ✗ success
      ↳  95% — ✓ 60 / ✗ 3

     checks.........................: 95.23% ✓ 60       ✗ 3
     data_received..................: 63 kB  2.1 kB/s
     data_sent......................: 16 kB  506 B/s
     http_req_blocked...............: avg=24.06ms  min=0s      med=1µs     max=322.84ms p(90)=169.29ms p(95)=170.47ms
     http_req_connecting............: avg=10.3ms   min=0s      med=0s      max=83.21ms  p(90)=80.51ms  p(95)=81ms
   ✗ http_req_duration..............: avg=2.87s    min=964.2ms med=3.1s    max=4.18s    p(90)=3.73s    p(95)=4.04s
       { expected_response:true }...: avg=2.82s    min=964.2ms med=3.05s   max=4.18s    p(90)=3.42s    p(95)=3.9s
   ✗ http_req_failed................: 4.76%  ✓ 3        ✗ 60
     http_req_receiving.............: avg=90.54ms  min=55µs    med=43.88ms max=576.66ms p(90)=193.14ms p(95)=336.46ms
     http_req_sending...............: avg=212.76µs min=72µs    med=208µs   max=446µs    p(90)=277.4µs  p(95)=312.4µs
     http_req_tls_handshaking.......: avg=11.38ms  min=0s      med=0s      max=91.95ms  p(90)=88.63ms  p(95)=89.04ms
     http_req_waiting...............: avg=2.78s    min=920ms   med=2.99s   max=4.04s    p(90)=3.57s    p(95)=3.93s
     http_reqs......................: 63     2.049418/s
     iteration_duration.............: avg=3.39s    min=1.63s   med=3.6s    max=4.68s    p(90)=4.23s    p(95)=4.54s
     iterations.....................: 63     2.049418/s
     vus............................: 3      min=2      max=8
     vus_max........................: 8      min=8      max=8

this PR (Parallelism = 4)

     ✓ success

     checks.........................: 100.00% ✓ 82       ✗ 0
     data_received..................: 72 kB   2.3 kB/s
     data_sent......................: 19 kB   602 B/s
     http_req_blocked...............: avg=16.74ms  min=0s    med=1µs     max=177.16ms p(90)=2µs     p(95)=170.24ms
     http_req_connecting............: avg=8.05ms   min=0s    med=0s      max=86.69ms  p(90)=0s      p(95)=81.76ms
   ✗ http_req_duration..............: avg=2.05s    min=1.12s med=2.07s   max=3.21s    p(90)=2.51s   p(95)=2.76s
       { expected_response:true }...: avg=2.05s    min=1.12s med=2.07s   max=3.21s    p(90)=2.51s   p(95)=2.76s
   ✓ http_req_failed................: 0.00%   ✓ 0        ✗ 82
     http_req_receiving.............: avg=54.06ms  min=28µs  med=41.33ms max=672.14ms p(90)=98.34ms p(95)=143.51ms
     http_req_sending...............: avg=194.36µs min=49µs  med=194.5µs max=441µs    p(90)=281µs   p(95)=351.89µs
     http_req_tls_handshaking.......: avg=8.65ms   min=0s    med=0s      max=90.61ms  p(90)=0s      p(95)=88.78ms
     http_req_waiting...............: avg=1.99s    min=1.12s med=2.05s   max=3.21s    p(90)=2.43s   p(95)=2.67s
     http_reqs......................: 82      2.622244/s
     iteration_duration.............: avg=2.56s    min=1.62s med=2.57s   max=3.71s    p(90)=3.04s   p(95)=3.26s
     iterations.....................: 82      2.622244/s
     vus............................: 1       min=1      max=8
     vus_max........................: 8       min=8      max=8

@wsxiaoys wsxiaoys changed the title refactor: switch back to llama batch interface feat: support continuous batching in llama.cpp backend Oct 29, 2023
@wsxiaoys wsxiaoys marked this pull request as ready for review October 29, 2023 06:30
@wsxiaoys wsxiaoys merged commit 7bd99d1 into main Oct 29, 2023
5 checks passed
@wsxiaoys wsxiaoys deleted the support-cont-batch-llama-cpp branch October 29, 2023 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant