unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows #7956
Labels
performance
A possible performance tune-up
Description
Hey there, we are working on openshift
When increasing instance group over a specific amount X (in my case it was 6), there is no improvement in the throughput, GPU utilization and the memory doesn't even surpass its half.
On the other hand, when deploying two pods on the same GPU (using shared computing windows and fractional memory partitioning- runai fractions), both the throughput and GPU utilization were much higher.
Triton Information
r23.08
We were using model analyzer to check the throughput – 24.10
Are you using the Triton container or did you build it yourself?
Both containers.
To Reproduce
Steps to reproduce the behavior.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well):
platform: "pytorch_libtorch"
max_batch_size: 128
input
{
name: “input"
data_type: TYPE_FP32
dims: [-1,1]
}
input
{
name: "lengths"
data_type: TYPE_INT64
dims: [1]
}
output
{
name: "output"
data_type: TYPE_FP32
dims: [-1,-1]
}
instance_group {
count: 3
kind: KIND_GPU
}
dynamic_batching {
max_queue_delay_microseconds: 4000
}
Backend:”pytorch”
Expected behavior
My expectations are that the instances count approach would have at least the same throughput as running multiple pods on the same GPU approach.
The text was updated successfully, but these errors were encountered: