Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: expected number of inputs between 1 and 3 but got 9 inputs for model #38

Open
samzong opened this issue Apr 15, 2024 · 8 comments

Comments

@samzong
Copy link

samzong commented Apr 15, 2024

{"timestamp":"2024-04-15T05:20:55.796456Z","level":"ERROR","error":"AppError(error message received from triton: [request id: <id_unknown>] expected number of inputs between 1 and 3 but got 9 inputs for model 'myserving')","target":"openai_trtllm::routes::completions","span":{"headers":"{"host": "localhost:3030", "user-agent": "OpenAI/Python 1.17.1", "content-length": "55", "accept": "application/json", "accept-encoding": "gzip, deflate", "authorization": "Bearer test", "content-type": "application/json", "x-stainless-arch": "arm64", "x-stainless-async": "false", "x-stainless-lang": "python", "x-stainless-os": "MacOS", "x-stainless-package-version": "1.17.1", "x-stainless-runtime": "CPython", "x-stainless-runtime-version": "3.10.5"}","name":"non-streaming completions"},"spans":[{"http.request.method":"POST","http.route":"/v1/completions","network.protocol.version":"1.1","otel.kind":"Server","otel.name":"POST /v1/completions","server.address":"localhost:3030","span.type":"web","url.path":"/v1/completions","url.scheme":"","user_agent.original":"OpenAI/Python 1.17.1","name":"HTTP request"},{"headers":"{"host": "localhost:3030", "user-agent": "OpenAI/Python 1.17.1", "content-length": "55", "accept": "application/json", "accept-encoding": "gzip, deflate", "authorization": "Bearer test", "content-type": "application/json", "x-stainless-arch": "arm64", "x-stainless-async": "false", "x-stainless-lang": "python", "x-stainless-os": "MacOS", "x-stainless-package-version": "1.17.1", "x-stainless-runtime": "CPython", "x-stainless-runtime-version": "3.10.5"}","name":"completions"},{"headers":"{"host": "localhost:3030", "user-agent": "OpenAI/Python 1.17.1", "content-length": "55", "accept": "application/json", "accept-encoding": "gzip, deflate", "authorization": "Bearer test", "content-type": "application/json", "x-stainless-arch": "arm64", "x-stainless-async": "false", "x-stainless-lang": "python", "x-stainless-os": "MacOS", "x-stainless-package-version": "1.17.1", "x-stainless-runtime": "CPython", "x-stainless-runtime-version": "3.10.5"}","name":"non-streaming completions"}]}

use client/openai_completion.py

@samzong
Copy link
Author

samzong commented Apr 15, 2024

I think I know the problem, my trition backend use trition with vllm.

Do we have a plan to support it?

@npuichigo
Copy link
Owner

it's not planned yet, but I think it's trivial to adapt the codes for your use case.

@samzong
Copy link
Author

samzong commented Apr 15, 2024

it's not planned yet, but I think it's trivial to adapt the codes for your use case.

Do you have any suggestions? I can try to implement it, and if I can, I can contribute this part of the code

@npuichigo
Copy link
Owner

Can you provide how to calling vllm-based triton backend? The grpc interface, the parameters for example to call the service.

@samzong
Copy link
Author

samzong commented Apr 16, 2024

okay, @npuichigo You can see an example here.

https://github.com/triton-inference-server/vllm_backend/blob/a01475157290bdf6fd0f50688f69aafea41b04c5/samples/client.py#L192

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-m",
        "--model",
        type=str,
        required=False,
        default="vllm_model",
        help="Model name",
    )
    parser.add_argument(
        "-v",
        "--verbose",
        action="store_true",
        required=False,
        default=False,
        help="Enable verbose output",
    )
    parser.add_argument(
        "-u",
        "--url",
        type=str,
        required=False,
        default="localhost:8001",
        help="Inference server URL and its gRPC port. Default is localhost:8001.",
    )
    parser.add_argument(
        "-t",
        "--stream-timeout",
        type=float,
        required=False,
        default=None,
        help="Stream timeout in seconds. Default is None.",
    )
    parser.add_argument(
        "--offset",
        type=int,
        required=False,
        default=0,
        help="Add offset to request IDs used",
    )
    parser.add_argument(
        "--input-prompts",
        type=str,
        required=False,
        default="prompts.txt",
        help="Text file with input prompts",
    )
    parser.add_argument(
        "--results-file",
        type=str,
        required=False,
        default="results.txt",
        help="The file with output results",
    )
    parser.add_argument(
        "--iterations",
        type=int,
        required=False,
        default=1,
        help="Number of iterations through the prompts file",
    )
    parser.add_argument(
        "-s",
        "--streaming-mode",
        action="store_true",
        required=False,
        default=False,
        help="Enable streaming mode",
    )
    parser.add_argument(
        "--exclude-inputs-in-outputs",
        action="store_true",
        required=False,
        default=False,
        help="Exclude prompt from outputs",
    )

@liyan77
Copy link

liyan77 commented Jul 5, 2024

my trition backend also use trition with vllm,have a plan to support it?

@crslen
Copy link

crslen commented Jul 28, 2024

Would be great if vllm option was supported.

@ChaseDreamInfinity
Copy link

I made some change to let it support vllm backend.
https://github.com/ChaseDreamInfinity/openai_triton_vllm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants