unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows #7956

ariel291888 · 2025-01-21T14:37:48Z

Description
Hey there, we are working on openshift
When increasing instance group over a specific amount X (in my case it was 6), there is no improvement in the throughput, GPU utilization and the memory doesn't even surpass its half.
On the other hand, when deploying two pods on the same GPU (using shared computing windows and fractional memory partitioning- runai fractions), both the throughput and GPU utilization were much higher.

Triton Information
r23.08
We were using model analyzer to check the throughput – 24.10

Are you using the Triton container or did you build it yourself?
Both containers.

To Reproduce
Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well):
platform: "pytorch_libtorch"
max_batch_size: 128
input
{
name: “input"
data_type: TYPE_FP32
dims: [-1,1]
}
input
{
name: "lengths"
data_type: TYPE_INT64
dims: [1]
}

output
{
name: "output"
data_type: TYPE_FP32
dims: [-1,-1]
}
instance_group {
count: 3
kind: KIND_GPU
}
dynamic_batching {
max_queue_delay_microseconds: 4000
}
Backend:”pytorch”

Expected behavior
My expectations are that the instances count approach would have at least the same throughput as running multiple pods on the same GPU approach.

tanmayv25 · 2025-01-25T03:00:09Z

Do you observe similar behavior when dynamic batching is turned off? Trying to see if it is the bottleneck here.
In multi-instance case, a single batcher thread would be handling muliple intances, in the latter each will have its own DB thread.

tanmayv25 added the performance A possible performance tune-up label Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows #7956

unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows #7956

ariel291888 commented Jan 21, 2025

tanmayv25 commented Jan 25, 2025

unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows #7956

unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows #7956

Comments

ariel291888 commented Jan 21, 2025

tanmayv25 commented Jan 25, 2025