Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kubernetes Orchestration Library for Model Server Deployment and Benchmarking #22

Open
wangchen615 opened this issue Feb 13, 2025 · 2 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@wangchen615
Copy link

Currently, inference-perf provides libraries for client requests, dataset handling, load generation, and result reporting. However, there's a need to add Kubernetes orchestration capabilities to deploy and manage model servers for benchmarking purposes.

Current Status:

  • The project has established libraries for:
    • Client request handling
    • Dataset management
    • Load generation
    • Benchmark reporting
  • Missing functionality for Kubernetes deployment and orchestration of model servers and benchmarking jobs

Requirements:

  1. Create a Python-based Kubernetes orchestration library that can:
    • Deploy model servers to a specified Kubernetes cluster
    • Manage the lifecycle of model server deployments
    • Support co-location of benchmarking tools with model servers
    • Configure and manage benchmarking jobs

Design Considerations:

  1. Integration with Existing Structure:

    • How should the K8s orchestration library fit within the current project structure?
    • Should it be a separate module similar to loadgen, dataset, etc.?
  2. Deployment Architecture:

    • Consider whether to enforce co-location of load testers and model servers (similar to fmperf)
    • Evaluate the option to support flexible deployment patterns (separated or co-located)
    • Define clear interfaces for different deployment scenarios
  3. Reference Implementation:

Next Steps:

  1. Design the API interface for the K8s orchestration library
  2. Define the integration points with existing components
  3. Implement deployment patterns for both co-located and separated benchmarking scenarios
  4. Create documentation for deployment configurations and usage

Questions to Address:

  1. Should we maintain flexibility in deployment patterns or standardize on a specific approach?
  2. What level of customization should we support for K8s deployments?
  3. How should we handle different cluster configurations and requirements?

Please share your thoughts and suggestions on the proposed approach.

/kind feature
/priority important-soon
/area orchestration

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Feb 13, 2025
@k8s-ci-robot
Copy link
Contributor

@wangchen615: The label(s) area/orchestration cannot be applied, because the repository doesn't have them.

In response to this:

Currently, inference-perf provides libraries for client requests, dataset handling, load generation, and result reporting. However, there's a need to add Kubernetes orchestration capabilities to deploy and manage model servers for benchmarking purposes.

Current Status:

  • The project has established libraries for:
  • Client request handling
  • Dataset management
  • Load generation
  • Benchmark reporting
  • Missing functionality for Kubernetes deployment and orchestration of model servers and benchmarking jobs

Requirements:

  1. Create a Python-based Kubernetes orchestration library that can:
  • Deploy model servers to a specified Kubernetes cluster
  • Manage the lifecycle of model server deployments
  • Support co-location of benchmarking tools with model servers
  • Configure and manage benchmarking jobs

Design Considerations:

  1. Integration with Existing Structure:
  • How should the K8s orchestration library fit within the current project structure?
  • Should it be a separate module similar to loadgen, dataset, etc.?
  1. Deployment Architecture:
  • Consider whether to enforce co-location of load testers and model servers (similar to fmperf)
  • Evaluate the option to support flexible deployment patterns (separated or co-located)
  • Define clear interfaces for different deployment scenarios
  1. Reference Implementation:

Next Steps:

  1. Design the API interface for the K8s orchestration library
  2. Define the integration points with existing components
  3. Implement deployment patterns for both co-located and separated benchmarking scenarios
  4. Create documentation for deployment configurations and usage

Questions to Address:

  1. Should we maintain flexibility in deployment patterns or standardize on a specific approach?
  2. What level of customization should we support for K8s deployments?
  3. How should we handle different cluster configurations and requirements?

Please share your thoughts and suggestions on the proposed approach.

/kind feature
/priority important-soon
/area orchestration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@achandrasekar
Copy link
Contributor

My thoughts:

  1. Having it as a separate module called orchestrator or something similar makes sense. I'm wondering if it has to be within the inference-perf library itself or have a separate python library outside of it but in the same repo for orchestration since it deploys inference-perf as a container on the cluster along with the model servers.

  2. Having a prescribed specific way to deploy the model server and colocating it with the benchmarking tool makes sense. I'd say customizability can make this scope much larger and is not the ideal focus for benchmarking and if a lot of specific customization is needed, it should come from outside like serving-catalog to deploy model servers with specific models, configurations, etc. This allows us to keep the benchmarking tool focussed on benchmarking.

  3. For handling different cluster configurations, it would be good to expose certain basic flags that can be configured to deploy the model servers that can be tweaked to run on different cluster configurations.

cc @smarterclayton @terrytangyuan @sjmonson @ahg-g for thoughts on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

3 participants