Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate perf testing tools #23

Open
kfswain opened this issue Feb 19, 2025 · 7 comments
Open

Consolidate perf testing tools #23

kfswain opened this issue Feb 19, 2025 · 7 comments

Comments

@kfswain
Copy link

kfswain commented Feb 19, 2025

xref: kubernetes-sigs/gateway-api-inference-extension#332

Heya Perf folks! We have a need for specific perf testing in GIE. And this issue is just to centralize discussion in the inf-perf repo, so that we can all be on the same page and working towards the same goals. Thanks!!

cc: @liu-cong @Kuromesi

@Kuromesi
Copy link

My llmperf performance testing was conducted in Istio service mesh, and I have made some adaptions to make the inference gateway to run in service mesh. However, when I was trying to reproduce my performance test base on the community version, some issues blocked my test.

  • llmperf request /metrics without request data to evaluate the pod usage, however, it seems like if no request data found in the request, inference gateway fails to receive the request with error:
2025-02-20T04:53:37Z    ERROR   handlers/server.go:95   Cannot receive stream request
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/handlers.(*Server).Process
        /src/pkg/epp/handlers/server.go:95
github.com/envoyproxy/go-control-plane/envoy/service/ext_proc/v3._ExternalProcessor_Process_Handler
        /go/pkg/mod/github.com/envoyproxy/go-control-plane/[email protected]/service/ext_proc/v3/external_processor_grpc.pb.go:106
google.golang.org/grpc.(*Server).processStreamingRPC
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1690
google.golang.org/grpc.(*Server).handleStream
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1814
google.golang.org/grpc.(*Server).serveStreams.func2.1
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1030
  • And if I manually set request data, inference gateway also fails to handle the request with error:
2025-02-20T04:53:53Z    LEVEL(-3)       handlers/server.go:182  Response generated      {"response": "request_headers:{response:{clear_route_cache:true}}"}
2025-02-20T04:53:53Z    LEVEL(-3)       handlers/request.go:45  Handling request body
2025-02-20T04:53:53Z    LEVEL(-3)       handlers/request.go:54  Request body unmarshalled       {"body": {"test":"test"}}
2025-02-20T04:53:53Z    LEVEL(-3)       handlers/server.go:111  Request context after HandleRequestBody {"context": {"TargetPod":"","TargetEndpoint":"","Model":"","ResolvedTargetModel":"","RequestReceivedTimestamp":"2025-02-20T04:53:53.605204501Z","ResponseCompleteTimestamp":"0001-01-01T00:00:00Z","RequestSize":0,"Response":{"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}},"ResponseSize":0,"ResponseComplete":false,"ResponseStatusCode":""}}
2025-02-20T04:53:53Z    ERROR   handlers/server.go:131  Failed to process request       {"request": "request_body:{body:\"{\\\"test\\\": \\\"test\\\"}\"  end_of_stream:true}  metadata_context:{}", "error": "inference gateway: BadRequest - model not found in request"}

So I have to resolve those issues fist or the performance testing with llmperf can not proceed. cc @danehans

@LiorLieberman
Copy link
Member

Hi @Kuromesi 👋
do you have a branch or a reference for where you implemented it in istio?

@danehans
Copy link

The model is not in the request body. Can you share the details of the manual request being sent?

@Kuromesi
Copy link

The model is not in the request body. Can you share the details of the manual request being sent?

My request like:

curl ${GATEWAY_IP}:${GATEWAY_PORT}/metrics -d '{"test":"test"}'

Yes I did not set the the request model, I just want to illustrate that the error Cannot receive stream request seems caused by the empty request body.

BTW, does this work as expected? Should we require the request to have request body and request model? Can we randomly return a pod if they are not set?

@Kuromesi
Copy link

The model is not in the request body. Can you share the details of the manual request being sent?

Sorry my bad, I falsely set some configurations which cause llmperf not working as expected, now the llmperf works fine.

But I think we should provide a way to random return a pod if target model is not specified?

@Kuromesi
Copy link

Hi @Kuromesi 👋 do you have a branch or a reference for where you implemented it in istio?

We made some enhancement to based on the istio community version, which supports to add a header like x-ip: ${IP} to route to the backend. I'm afraid in the community version this may not work. :(

@danehans
Copy link

But I think we should provide a way to random return a pod if target model is not specified?

If the model is not specified in the request body, then the request does not comply with the OpenAI spec, e.g. v1/chat/completions.

@ahg-g @kfswain thoughts on how this issue should be handled, e.g. update TargetModels to state that requests must comply with a specific OpenAPI spec, provide better logging, etc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants