-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: expected number of inputs between 1 and 3 but got 9 inputs for model #38
Comments
I think I know the problem, my trition backend use trition with vllm. Do we have a plan to support it? |
it's not planned yet, but I think it's trivial to adapt the codes for your use case. |
Do you have any suggestions? I can try to implement it, and if I can, I can contribute this part of the code |
Can you provide how to calling vllm-based triton backend? The grpc interface, the parameters for example to call the service. |
okay, @npuichigo You can see an example here. if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"-m",
"--model",
type=str,
required=False,
default="vllm_model",
help="Model name",
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
required=False,
default=False,
help="Enable verbose output",
)
parser.add_argument(
"-u",
"--url",
type=str,
required=False,
default="localhost:8001",
help="Inference server URL and its gRPC port. Default is localhost:8001.",
)
parser.add_argument(
"-t",
"--stream-timeout",
type=float,
required=False,
default=None,
help="Stream timeout in seconds. Default is None.",
)
parser.add_argument(
"--offset",
type=int,
required=False,
default=0,
help="Add offset to request IDs used",
)
parser.add_argument(
"--input-prompts",
type=str,
required=False,
default="prompts.txt",
help="Text file with input prompts",
)
parser.add_argument(
"--results-file",
type=str,
required=False,
default="results.txt",
help="The file with output results",
)
parser.add_argument(
"--iterations",
type=int,
required=False,
default=1,
help="Number of iterations through the prompts file",
)
parser.add_argument(
"-s",
"--streaming-mode",
action="store_true",
required=False,
default=False,
help="Enable streaming mode",
)
parser.add_argument(
"--exclude-inputs-in-outputs",
action="store_true",
required=False,
default=False,
help="Exclude prompt from outputs",
) |
my trition backend also use trition with vllm,have a plan to support it? |
Would be great if vllm option was supported. |
I made some change to let it support vllm backend. |
{"timestamp":"2024-04-15T05:20:55.796456Z","level":"ERROR","error":"AppError(error message received from triton: [request id: <id_unknown>] expected number of inputs between 1 and 3 but got 9 inputs for model 'myserving')","target":"openai_trtllm::routes::completions","span":{"headers":"{"host": "localhost:3030", "user-agent": "OpenAI/Python 1.17.1", "content-length": "55", "accept": "application/json", "accept-encoding": "gzip, deflate", "authorization": "Bearer test", "content-type": "application/json", "x-stainless-arch": "arm64", "x-stainless-async": "false", "x-stainless-lang": "python", "x-stainless-os": "MacOS", "x-stainless-package-version": "1.17.1", "x-stainless-runtime": "CPython", "x-stainless-runtime-version": "3.10.5"}","name":"non-streaming completions"},"spans":[{"http.request.method":"POST","http.route":"/v1/completions","network.protocol.version":"1.1","otel.kind":"Server","otel.name":"POST /v1/completions","server.address":"localhost:3030","span.type":"web","url.path":"/v1/completions","url.scheme":"","user_agent.original":"OpenAI/Python 1.17.1","name":"HTTP request"},{"headers":"{"host": "localhost:3030", "user-agent": "OpenAI/Python 1.17.1", "content-length": "55", "accept": "application/json", "accept-encoding": "gzip, deflate", "authorization": "Bearer test", "content-type": "application/json", "x-stainless-arch": "arm64", "x-stainless-async": "false", "x-stainless-lang": "python", "x-stainless-os": "MacOS", "x-stainless-package-version": "1.17.1", "x-stainless-runtime": "CPython", "x-stainless-runtime-version": "3.10.5"}","name":"completions"},{"headers":"{"host": "localhost:3030", "user-agent": "OpenAI/Python 1.17.1", "content-length": "55", "accept": "application/json", "accept-encoding": "gzip, deflate", "authorization": "Bearer test", "content-type": "application/json", "x-stainless-arch": "arm64", "x-stainless-async": "false", "x-stainless-lang": "python", "x-stainless-os": "MacOS", "x-stainless-package-version": "1.17.1", "x-stainless-runtime": "CPython", "x-stainless-runtime-version": "3.10.5"}","name":"non-streaming completions"}]}
use
client/openai_completion.py
The text was updated successfully, but these errors were encountered: