Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi model deployment #208

Draft
wants to merge 74 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
4eac006
Removing load balancing config
TosinSeg Jun 19, 2023
c68e999
Reformatting tests
TosinSeg Jun 20, 2023
5ce1a92
Fixed the formatting
TosinSeg Jun 20, 2023
fa10e19
Removed print statement
TosinSeg Jun 20, 2023
f9cbd74
Merging main
TosinSeg Jun 26, 2023
8970f4e
Removing unused import
TosinSeg Jun 26, 2023
517bea8
Fixing tests
TosinSeg Jun 26, 2023
58dd2b2
Fixing merge issue
TosinSeg Jun 26, 2023
bb0d551
Creating hostfile when one is not provided
TosinSeg Jun 26, 2023
e2bb9d5
Merge branch 'main' into Always_enable_load_balancing
TosinSeg Jun 26, 2023
3823534
Fixing import statements removed by merge
TosinSeg Jun 26, 2023
6f9b4ad
Removing load_balancing check
TosinSeg Jun 26, 2023
499b9ad
Removing redudant definitions
TosinSeg Jun 26, 2023
5419ef6
Removing hostfile from test
TosinSeg Jun 26, 2023
a70b6de
Removing hostfile from non-persistent test
TosinSeg Jun 26, 2023
eea658b
initial changes
TosinSeg Jun 27, 2023
20f0878
Merge branch 'main' into multi-model-deployment
TosinSeg Jun 27, 2023
c21c31b
Maintaining current behavior
TosinSeg Jun 28, 2023
f525329
Reading from score file
TosinSeg Jun 28, 2023
3c0937f
fixing syntax errors
TosinSeg Jun 28, 2023
156ac83
Fixing more syntax errors
TosinSeg Jun 28, 2023
38e270e
Fixing more syntax issues
TosinSeg Jun 29, 2023
4d4e0d8
initial lb changes
TosinSeg Jun 29, 2023
01c8e59
Merge branch 'main' into multi-model-deployment
TosinSeg Jun 29, 2023
f801b36
More load balancing changes
TosinSeg Jun 29, 2023
fd4e2ed
LB changes and syntax
TosinSeg Jun 30, 2023
0a3b7e5
Refactor client, and unpack request in load balancer
TosinSeg Jun 30, 2023
6523c04
First working queries
TosinSeg Jul 3, 2023
06b40f5
Fixing conversational and q&a args
TosinSeg Jul 3, 2023
96d0dcb
Updates to _allocate_processes and fixing example
TosinSeg Jul 5, 2023
ab41d24
Adding host map for allocating processes and formatting
TosinSeg Jul 5, 2023
8673a9a
Fixing terminate functionality
TosinSeg Jul 5, 2023
8d09b37
Refactored client
TosinSeg Jul 6, 2023
7a136d6
More Refactoring and q/a example
TosinSeg Jul 6, 2023
2c6ec08
Reformatting to maintain previous syntax
TosinSeg Jul 6, 2023
0cb88a9
Removing print/debug statements
TosinSeg Jul 6, 2023
7c0ee12
Fixing non-persistent deloyments
TosinSeg Jul 6, 2023
7a956d5
Refactoring Load balancer launch
TosinSeg Jul 7, 2023
f8cfe28
Fixing restful gateway client
TosinSeg Jul 10, 2023
079807d
Fixing replica issue
TosinSeg Jul 10, 2023
ea1e47e
Fixing non persistent client
TosinSeg Jul 10, 2023
98b6129
Adding trust_remote_code support (#203)
msinha251 Jul 11, 2023
daab5e6
Refactoring
TosinSeg Jul 12, 2023
84073f9
Update mii/models/score/generate.py
TosinSeg Jul 12, 2023
3ee3410
Merge branch 'multi-model-deployment' of github.com:TosinSeg/DeepSpee…
Jul 13, 2023
b4edc2b
Refactoring Load Balancer and request_proto
Jul 13, 2023
6346194
Formatting
Jul 13, 2023
94b6699
Fixing the client
Jul 14, 2023
710c20b
Initial partial deployment commit
Jul 21, 2023
c2636b7
More partial deploy updates
Jul 21, 2023
189e75c
Partial deploy started
Jul 21, 2023
adee843
fixing add deploy api queries
Jul 24, 2023
a145be5
Support for empty deployment 'group'
Jul 24, 2023
082c05e
Support for empty deployment 'group'
Jul 24, 2023
3ce77d2
Partial Termination
Jul 25, 2023
b40ecbd
Refactoring
Jul 25, 2023
72dd95c
formatting
Jul 25, 2023
a4e3d56
fixing bug for partial termination
Jul 25, 2023
4b5bb47
Removing comments
Jul 25, 2023
30d2b03
Including GPU index map in score file
Jul 26, 2023
c5d5996
Refactoring deployment
Jul 26, 2023
3ae1781
Refactoring and formatting
Jul 26, 2023
4b8f02f
Refactoring
Jul 28, 2023
c51ce37
Fixing Readme
Jul 28, 2023
43479db
Refactoring GRPC
Jul 28, 2023
e1b6d23
Fixing LB process not terminating
Jul 28, 2023
1675bd8
Adding multi_deployment and partial deploy/terminate unit tests
Jul 31, 2023
8684a61
Removing comments
Jul 31, 2023
56a7fce
Fixing spelling issues
Aug 1, 2023
fb70c3d
Update mii/client.py
TosinSeg Aug 1, 2023
e2cfe8a
Update mii/client.py
TosinSeg Aug 1, 2023
1312738
Removing AML from addDeploy
Aug 1, 2023
b0f0da4
Refactoring MIIConfig and DeploymentConfig
Aug 2, 2023
b78068e
Partial deploy/termination example
Aug 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions examples/multi_model/query.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

import mii
import time

generator = mii.mii_query_handle("first_test")
result = generator.query(
{"query": ["DeepSpeed is",
"Seattle is"]},
"bloom560m_deployment",
do_sample=True,
max_new_tokens=30,
)
print(result)

time.sleep(5)
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
result = generator.query({'query': "DeepSpeed is the greatest"},
"microsoft/DialogRPT-human-vs-rand_deployment")
print(result)

time.sleep(5)

result = generator.query(
{
'text': "DeepSpeed is the greatest",
'conversation_id': 3,
'past_user_inputs': [],
'generated_responses': []
},
"microsoft/DialoGPT-large_deployment")
print(result)

results = generator.query(
{
'question': "What is the greatest?",
'context': "DeepSpeed is the greatest"
},
"deepset/roberta-large-squad2" + "-qa-deployment")
print(results)
7 changes: 7 additions & 0 deletions examples/multi_model/shutdown.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
import mii

mii.terminate("first_test")
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
46 changes: 46 additions & 0 deletions examples/multi_model/text-generation-bloom560m-example.py
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
import mii

gpu_index_map1 = {'master': [0]}
gpu_index_map2 = {'master': [1]}
gpu_index_map3 = {'master': [0, 1]}

deployments = []
mii_configs1 = {"tensor_parallel": 2, "dtype": "fp16"}
deployments.append(
mii.Deployment(task='text-generation',
model="bigscience/bloom-560m",
deployment_name="bloom560m_deployment",
GPU_index_map=gpu_index_map3,
mii_config=mii.config.MIIConfig(**mii_configs1)))

# gpt2
name = "microsoft/DialogRPT-human-vs-rand"
deployments.append(
mii.Deployment(task='text-classification',
model=name,
deployment_name=name + "_deployment",
GPU_index_map=gpu_index_map2))

mii_configs2 = {"tensor_parallel": 1}

name = "microsoft/DialoGPT-large"

deployments.append(
mii.Deployment(task='conversational',
model=name,
deployment_name=name + "_deployment",
GPU_index_map=gpu_index_map1,
mii_config=mii.config.MIIConfig(**mii_configs2)))

name = "deepset/roberta-large-squad2"
deployments.append(
mii.Deployment(task="question-answering",
model=name,
deployment_name=name + "-qa-deployment",
GPU_index_map=gpu_index_map2))

mii.deploy(deployment_tag="first_test", deployments=deployments)
2 changes: 1 addition & 1 deletion mii/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from .constants import DeploymentType, Tasks
from .aml_related.utils import aml_output_path

from .config import MIIConfig, LoadBalancerConfig
from .config import MIIConfig, LoadBalancerConfig, Deployment
from .grpc_related.proto import modelresponse_pb2_grpc

__version__ = "0.0.0"
Expand Down
95 changes: 70 additions & 25 deletions mii/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,27 @@
from mii.method_table import GRPC_METHOD_TABLE


def _get_deployment_info(deployment_name):
configs = mii.utils.import_score_file(deployment_name).configs
task = configs[mii.constants.TASK_NAME_KEY]
mii_configs_dict = configs[mii.constants.MII_CONFIGS_KEY]
def _get_deployment_info(deployment_tag):
deployments = []
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
configs = mii.utils.import_score_file(deployment_tag).configs
for deployment in configs:
if not isinstance(configs[deployment], dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that these would not be a dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the data is written to the score file it stores the dictionaries of all the deployments, along with the load balancer, model_path, and deployment_tag. The 'deployment' on line 18 in the for loop looks at all of them. So I determine if it is a model by checking if it is a dictionary

continue
deployments.append(configs[deployment])
mii_configs_dict = configs[deployment][mii.constants.MII_CONFIGS_KEY]
mii_configs = mii.config.MIIConfig(**mii_configs_dict)
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
return deployments
"""
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
task = configs[deployment_name][mii.constants.TASK_NAME_KEY]
mii_configs_dict = configs[deployment_name][mii.constants.MII_CONFIGS_KEY]
mii_configs = mii.config.MIIConfig(**mii_configs_dict)

assert task is not None, "The task name should be set before calling init"
return task, mii_configs
"""


def mii_query_handle(deployment_name):
def mii_query_handle(deployment_tag):
"""Get a query handle for a local deployment:

mii/examples/local/gpt2-query-example.py
Expand All @@ -35,12 +45,14 @@ def mii_query_handle(deployment_name):
query_handle: A query handle with a single method `.query(request_dictionary)` using which queries can be sent to the model.
"""

if deployment_name in mii.non_persistent_models:
inference_pipeline, task = mii.non_persistent_models[deployment_name]
return MIINonPersistentClient(task, deployment_name)
if deployment_tag in mii.non_persistent_models:
inference_pipeline, task = mii.non_persistent_models[deployment_tag]
return MIINonPersistentClient(task, deployment_tag)

task_name, mii_configs = _get_deployment_info(deployment_name)
return MIIClient(task_name, "localhost", mii_configs.port_number)
deployments = _get_deployment_info(deployment_tag)
mii_configs_dict = deployments[0][mii.constants.MII_CONFIGS_KEY]
mii_configs = mii.config.MIIConfig(**mii_configs_dict)
return MIIClient(deployments, "localhost", mii_configs.port_number)


def create_channel(host, port):
Expand All @@ -55,24 +67,36 @@ class MIIClient():
"""
Client to send queries to a single endpoint.
"""
def __init__(self, task_name, host, port):
def __init__(self, deployments, host, port):
self.asyncio_loop = asyncio.get_event_loop()
channel = create_channel(host, port)
self.stub = modelresponse_pb2_grpc.ModelResponseStub(channel)
self.task = get_task(task_name)
#self.task = get_task(task_name)
self.deployments = deployments

async def _request_async_response(self, request_dict, **query_kwargs):
if self.task not in GRPC_METHOD_TABLE:
raise ValueError(f"unknown task: {self.task}")
async def _request_async_response(self, request_dict, task, **query_kwargs):
if task not in GRPC_METHOD_TABLE:
raise ValueError(f"unknown task: {task}")

task_methods = GRPC_METHOD_TABLE[self.task]
task_methods = GRPC_METHOD_TABLE[task]
proto_request = task_methods.pack_request_to_proto(request_dict, **query_kwargs)
proto_response = await getattr(self.stub, task_methods.method)(proto_request)
return task_methods.unpack_response_from_proto(proto_response)

def query(self, request_dict, **query_kwargs):
def query(self, request_dict, deployment_name=None, **query_kwargs):
task = None
if deployment_name is None: #mii.terminate() or single model
deployment_name = self.deployments[0][mii.constants.DEPLOYMENT_NAME_KEY]
Copy link
Contributor Author

@TosinSeg TosinSeg Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for multiple models if there is no deployment name passed in.

task = get_task(self.deployments[0][mii.constants.TASK_NAME_KEY])
else:
for deployment in self.deployments:
if deployment[mii.constants.DEPLOYMENT_NAME_KEY] == deployment_name:
task = get_task(deployment[mii.constants.TASK_NAME_KEY])
break
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
query_kwargs['deployment_name'] = deployment_name
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
return self.asyncio_loop.run_until_complete(
self._request_async_response(request_dict,
task,
**query_kwargs))

async def terminate_async(self):
Expand All @@ -86,17 +110,35 @@ async def create_session_async(self, session_id):
return await self.stub.CreateSession(
modelresponse_pb2.SessionID(session_id=session_id))

def create_session(self, session_id):
assert self.task == Tasks.TEXT_GENERATION, f"Session creation only available for task '{Tasks.TEXT_GENERATION}'."
def create_session(self, session_id, deployment_name=None):
task = None
if deployment_name is None: #mii.terminate() or single model
deployment_name = self.deployments[0][mii.constants.DEPLOYMENT_NAME_KEY]
task = get_task(self.deployments[0][mii.constants.TASK_NAME_KEY])
else:
for deployment in self.deployments:
if deployment[mii.constants.DEPLOYMENT_NAME_KEY] == deployment_name:
task = get_task(deployment[mii.constants.TASK_NAME_KEY])
break
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
assert task == Tasks.TEXT_GENERATION, f"Session creation only available for task '{Tasks.TEXT_GENERATION}'."
return self.asyncio_loop.run_until_complete(
self.create_session_async(session_id))

async def destroy_session_async(self, session_id):
await self.stub.DestroySession(modelresponse_pb2.SessionID(session_id=session_id)
)

def destroy_session(self, session_id):
assert self.task == Tasks.TEXT_GENERATION, f"Session deletion only available for task '{Tasks.TEXT_GENERATION}'."
def destroy_session(self, session_id, deployment_name=None):
task = None
if deployment_name is None: #mii.terminate() or single model
deployment_name = self.deployments[0][mii.constants.DEPLOYMENT_NAME_KEY]
task = get_task(self.deployments[0][mii.constants.TASK_NAME_KEY])
else:
for deployment in self.deployments:
if deployment[mii.constants.DEPLOYMENT_NAME_KEY] == deployment_name:
task = get_task(deployment[mii.constants.TASK_NAME_KEY])
break
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
assert task == Tasks.TEXT_GENERATION, f"Session deletion only available for task '{Tasks.TEXT_GENERATION}'."
self.asyncio_loop.run_until_complete(self.destroy_session_async(session_id))


Expand Down Expand Up @@ -188,7 +230,10 @@ def terminate(self):
del mii.non_persistent_models[self.deployment_name]


def terminate_restful_gateway(deployment_name):
_, mii_configs = _get_deployment_info(deployment_name)
if mii_configs.enable_restful_api:
requests.get(f"http://localhost:{mii_configs.restful_api_port}/terminate")
def terminate_restful_gateway(deployment_tag):
deployments = _get_deployment_info(deployment_tag)
for deployment in deployments:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support having separate restful APIs for each deployment in a multi-model deployment? I think it is tied to the load balancer, not the inference servers.

mii_configs_dict = deployment[mii.constants.MII_CONFIGS_KEY]
mii_configs = mii.config.MIIConfig(**mii_configs_dict)
if mii_configs.enable_restful_api:
requests.get(f"http://localhost:{mii_configs.restful_api_port}/terminate")
19 changes: 17 additions & 2 deletions mii/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from typing import Union, List
from enum import Enum
from pydantic import BaseModel, validator, root_validator

from deepspeed.launcher.runner import DLTS_HOSTFILE


Expand Down Expand Up @@ -107,6 +106,8 @@ class Config:


class ReplicaConfig(BaseModel):
task: str = ""
deployment_name: str = ""
hostname: str = ""
tensor_parallel_ports: List[int] = []
torch_dist_port: int = None
Expand All @@ -123,4 +124,18 @@ class LoadBalancerConfig(BaseModel):

class Config:
validate_all = True
validate_assignment = True


validate_assignment = True
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved


class Deployment(BaseModel):
TosinSeg marked this conversation as resolved.
Show resolved Hide resolved
deployment_name: str
task: str
model: str
enable_deepspeed: bool = True
enable_zero: bool = False
GPU_index_map: dict = None
mii_config: MIIConfig = MIIConfig.parse_obj({})
ds_config: dict = None
version: int = 1
2 changes: 1 addition & 1 deletion mii/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ class ModelProvider(enum.Enum):
DEPLOYMENT_NAME_KEY = 'deployment_name'
MODEL_PATH_KEY = 'model_path'
LOAD_BALANCER_CONFIG_KEY = 'load_balancer_config'

DEPLOYMENT_TAG_KEY = 'deployment_tag'
ENABLE_DEEPSPEED_KEY = 'ds_optimize'
ENABLE_DEEPSPEED_ZERO_KEY = 'ds_zero'
DEEPSPEED_CONFIG_KEY = 'ds_config'
Expand Down
Loading