Consolidate perf testing tools #23

kfswain · 2025-02-19T19:24:36Z

xref: kubernetes-sigs/gateway-api-inference-extension#332

Heya Perf folks! We have a need for specific perf testing in GIE. And this issue is just to centralize discussion in the inf-perf repo, so that we can all be on the same page and working towards the same goals. Thanks!!

cc: @liu-cong @Kuromesi

Kuromesi · 2025-02-20T05:04:51Z

My llmperf performance testing was conducted in Istio service mesh, and I have made some adaptions to make the inference gateway to run in service mesh. However, when I was trying to reproduce my performance test base on the community version, some issues blocked my test.

llmperf request /metrics without request data to evaluate the pod usage, however, it seems like if no request data found in the request, inference gateway fails to receive the request with error:

2025-02-20T04:53:37Z    ERROR   handlers/server.go:95   Cannot receive stream request
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/handlers.(*Server).Process
        /src/pkg/epp/handlers/server.go:95
github.com/envoyproxy/go-control-plane/envoy/service/ext_proc/v3._ExternalProcessor_Process_Handler
        /go/pkg/mod/github.com/envoyproxy/go-control-plane/[email protected]/service/ext_proc/v3/external_processor_grpc.pb.go:106
google.golang.org/grpc.(*Server).processStreamingRPC
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1690
google.golang.org/grpc.(*Server).handleStream
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1814
google.golang.org/grpc.(*Server).serveStreams.func2.1
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1030

And if I manually set request data, inference gateway also fails to handle the request with error:

2025-02-20T04:53:53Z    LEVEL(-3)       handlers/server.go:182  Response generated      {"response": "request_headers:{response:{clear_route_cache:true}}"}
2025-02-20T04:53:53Z    LEVEL(-3)       handlers/request.go:45  Handling request body
2025-02-20T04:53:53Z    LEVEL(-3)       handlers/request.go:54  Request body unmarshalled       {"body": {"test":"test"}}
2025-02-20T04:53:53Z    LEVEL(-3)       handlers/server.go:111  Request context after HandleRequestBody {"context": {"TargetPod":"","TargetEndpoint":"","Model":"","ResolvedTargetModel":"","RequestReceivedTimestamp":"2025-02-20T04:53:53.605204501Z","ResponseCompleteTimestamp":"0001-01-01T00:00:00Z","RequestSize":0,"Response":{"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}},"ResponseSize":0,"ResponseComplete":false,"ResponseStatusCode":""}}
2025-02-20T04:53:53Z    ERROR   handlers/server.go:131  Failed to process request       {"request": "request_body:{body:\"{\\\"test\\\": \\\"test\\\"}\"  end_of_stream:true}  metadata_context:{}", "error": "inference gateway: BadRequest - model not found in request"}

So I have to resolve those issues fist or the performance testing with llmperf can not proceed. cc @danehans

LiorLieberman · 2025-02-20T19:40:42Z

Hi @Kuromesi 👋
do you have a branch or a reference for where you implemented it in istio?

danehans · 2025-02-20T22:27:12Z

The model is not in the request body. Can you share the details of the manual request being sent?

Kuromesi · 2025-02-21T00:55:07Z

The model is not in the request body. Can you share the details of the manual request being sent?

My request like:

curl ${GATEWAY_IP}:${GATEWAY_PORT}/metrics -d '{"test":"test"}'

Yes I did not set the the request model, I just want to illustrate that the error Cannot receive stream request seems caused by the empty request body.

BTW, does this work as expected? Should we require the request to have request body and request model? Can we randomly return a pod if they are not set?

Kuromesi · 2025-02-21T01:14:06Z

The model is not in the request body. Can you share the details of the manual request being sent?

Sorry my bad, I falsely set some configurations which cause llmperf not working as expected, now the llmperf works fine.

But I think we should provide a way to random return a pod if target model is not specified?

Kuromesi · 2025-02-21T01:20:04Z

Hi @Kuromesi 👋 do you have a branch or a reference for where you implemented it in istio?

We made some enhancement to based on the istio community version, which supports to add a header like x-ip: ${IP} to route to the backend. I'm afraid in the community version this may not work. :(

danehans · 2025-02-21T01:55:19Z

But I think we should provide a way to random return a pod if target model is not specified?

If the model is not specified in the request body, then the request does not comply with the OpenAI spec, e.g. v1/chat/completions.

@ahg-g @kfswain thoughts on how this issue should be handled, e.g. update TargetModels to state that requests must comply with a specific OpenAPI spec, provide better logging, etc?

kfswain mentioned this issue Feb 19, 2025

Consolidate Performance Testing Tools and Docs kubernetes-sigs/gateway-api-inference-extension#332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate perf testing tools #23

Consolidate perf testing tools #23

kfswain commented Feb 19, 2025

Kuromesi commented Feb 20, 2025

LiorLieberman commented Feb 20, 2025

danehans commented Feb 20, 2025

Kuromesi commented Feb 21, 2025

Kuromesi commented Feb 21, 2025

Kuromesi commented Feb 21, 2025

danehans commented Feb 21, 2025

Consolidate perf testing tools #23

Consolidate perf testing tools #23

Comments

kfswain commented Feb 19, 2025

Kuromesi commented Feb 20, 2025

LiorLieberman commented Feb 20, 2025

danehans commented Feb 20, 2025

Kuromesi commented Feb 21, 2025

Kuromesi commented Feb 21, 2025

Kuromesi commented Feb 21, 2025

danehans commented Feb 21, 2025