Add Kubernetes Orchestration Library for Model Server Deployment and Benchmarking #22

wangchen615 · 2025-02-13T20:08:48Z

Currently, inference-perf provides libraries for client requests, dataset handling, load generation, and result reporting. However, there's a need to add Kubernetes orchestration capabilities to deploy and manage model servers for benchmarking purposes.

Current Status:

The project has established libraries for:
- Client request handling
- Dataset management
- Load generation
- Benchmark reporting
Missing functionality for Kubernetes deployment and orchestration of model servers and benchmarking jobs

Requirements:

Create a Python-based Kubernetes orchestration library that can:
- Deploy model servers to a specified Kubernetes cluster
- Manage the lifecycle of model server deployments
- Support co-location of benchmarking tools with model servers
- Configure and manage benchmarking jobs

Design Considerations:

Integration with Existing Structure:
- How should the K8s orchestration library fit within the current project structure?
- Should it be a separate module similar to loadgen, dataset, etc.?
Deployment Architecture:
- Consider whether to enforce co-location of load testers and model servers (similar to fmperf)
- Evaluate the option to support flexible deployment patterns (separated or co-located)
- Define clear interfaces for different deployment scenarios
Reference Implementation:
- fmperf project provides a working example of K8s orchestration:
- https://github.com/fmperf-project/fmperf/tree/main/fmperf

Next Steps:

Design the API interface for the K8s orchestration library
Define the integration points with existing components
Implement deployment patterns for both co-located and separated benchmarking scenarios
Create documentation for deployment configurations and usage

Questions to Address:

Should we maintain flexibility in deployment patterns or standardize on a specific approach?
What level of customization should we support for K8s deployments?
How should we handle different cluster configurations and requirements?

Please share your thoughts and suggestions on the proposed approach.

/kind feature
/priority important-soon
/area orchestration

k8s-ci-robot · 2025-02-13T20:08:52Z

@wangchen615: The label(s) area/orchestration cannot be applied, because the repository doesn't have them.

In response to this:

Currently, inference-perf provides libraries for client requests, dataset handling, load generation, and result reporting. However, there's a need to add Kubernetes orchestration capabilities to deploy and manage model servers for benchmarking purposes.

Current Status:

The project has established libraries for:

Client request handling

Dataset management

Load generation

Benchmark reporting

Missing functionality for Kubernetes deployment and orchestration of model servers and benchmarking jobs

Requirements:

Create a Python-based Kubernetes orchestration library that can:

Deploy model servers to a specified Kubernetes cluster

Manage the lifecycle of model server deployments

Support co-location of benchmarking tools with model servers

Configure and manage benchmarking jobs

Design Considerations:

Integration with Existing Structure:

How should the K8s orchestration library fit within the current project structure?

Should it be a separate module similar to loadgen, dataset, etc.?

Deployment Architecture:

Consider whether to enforce co-location of load testers and model servers (similar to fmperf)

Evaluate the option to support flexible deployment patterns (separated or co-located)

Define clear interfaces for different deployment scenarios

Reference Implementation:

fmperf project provides a working example of K8s orchestration:

https://github.com/fmperf-project/fmperf/tree/main/fmperf

Next Steps:

Design the API interface for the K8s orchestration library

Define the integration points with existing components

Implement deployment patterns for both co-located and separated benchmarking scenarios

Create documentation for deployment configurations and usage

Questions to Address:

Should we maintain flexibility in deployment patterns or standardize on a specific approach?

What level of customization should we support for K8s deployments?

How should we handle different cluster configurations and requirements?

Please share your thoughts and suggestions on the proposed approach.

/kind feature
/priority important-soon
/area orchestration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

achandrasekar · 2025-02-18T23:12:02Z

My thoughts:

Having it as a separate module called orchestrator or something similar makes sense. I'm wondering if it has to be within the inference-perf library itself or have a separate python library outside of it but in the same repo for orchestration since it deploys inference-perf as a container on the cluster along with the model servers.
Having a prescribed specific way to deploy the model server and colocating it with the benchmarking tool makes sense. I'd say customizability can make this scope much larger and is not the ideal focus for benchmarking and if a lot of specific customization is needed, it should come from outside like serving-catalog to deploy model servers with specific models, configurations, etc. This allows us to keep the benchmarking tool focussed on benchmarking.
For handling different cluster configurations, it would be good to expose certain basic flags that can be configured to deploy the model servers that can be tweaked to run on different cluster configurations.

cc @smarterclayton @terrytangyuan @sjmonson @ahg-g for thoughts on this.

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kubernetes Orchestration Library for Model Server Deployment and Benchmarking #22

Add Kubernetes Orchestration Library for Model Server Deployment and Benchmarking #22

wangchen615 commented Feb 13, 2025

k8s-ci-robot commented Feb 13, 2025

achandrasekar commented Feb 18, 2025

Add Kubernetes Orchestration Library for Model Server Deployment and Benchmarking #22

Add Kubernetes Orchestration Library for Model Server Deployment and Benchmarking #22

Comments

wangchen615 commented Feb 13, 2025

k8s-ci-robot commented Feb 13, 2025

achandrasekar commented Feb 18, 2025