prompt for agent evaluation tasks #32

Yutong-Dai · 2025-02-03T21:59:45Z

Thanks for the nice work!

I am wondering whether the prompts shared in the readme is also used for online / offline agent evaluations reported in the paper.

Specifically, I tried the prompt template for the computer. On the one hand, the grounding is very accurate. On the other hand, however, the model can easily stuck in some states, for example, keep issuing actions like click non-interactive elements like a string. Like the prompt suggested, I indeed add the history of the actions performed as part of the user instruction. This is not helpful in resolving the issue.

Provided the SOTA performances on the online / offline agent benchmarks, I would like to learn what else I could potentially miss.

Thanks.

More details are provided below.

ui_tars_prompt = r"""You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. 

## Output Format

Thought: ...
Action: ...

## Action Space

click(start_box='<|box_start|>(x1,y1)')
left_double(start_box='<|box_start|>(x1,y1)<|box_end|>')
right_single(start_box='<|box_start|>(x1,y1)<|box_end|>')
drag(start_box='<|box_start|>(x1,y1)<|box_end|>', end_box='<|box_start|>(x3,y3)<|box_end|>')
hotkey(key='')
type(content='') #If you want to submit your input, use \"\\n" at the end of `content`.
scroll(start_box='<|box_start|>(x1,y1)<|box_end|>', direction='down or up or right or left')
wait() #Sleep for 5s and take a screenshot to check for any changes.
finished()
call_user() # Submit the task and call the user when the task is unsolvable, or when you need the user's help.


## Note
- Use English in `Thought` part.
- Summarize your next action (with its target element) in one sentence in `Thought` part.

## User Instruction
"""    

user_prompt = f"GOAL: {task_description}\n"
user_prompt +=   '<a list of actions performed so far>'
content = [
                        {"type": "text", "text": system_message + user_prompt + "\n CURRENT SCREENSHOT:"},
                        {"type": "image_url", "image_url": {"url": pil_to_b64(current_screenshots)}},
                    ]
message = [{"role": "user", "content": content}]

... 
send message to the 72B-DPO model served with vllm

The text was updated successfully, but these errors were encountered:

pooruss · 2025-02-08T02:30:35Z

Hi, we have updated the online inference logic here: https://github.com/bytedance/UI-TARS/blob/feat/oswd_infer/infer/osworld.py

Meantime, we are supporting UI-TARS model in the original OSWorld repo, please stay tune.

lemonhall · 2025-02-08T04:41:29Z

能提一个请求么，上架火山引擎，你们这个视觉模型真的很好，但国内真的好难用到…. 发自我的iPhone

…

------------------ Original ------------------ From: Shihao Liang ***@***.***> Date: Sat,Feb 8,2025 10:30 AM To: bytedance/UI-TARS ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [bytedance/UI-TARS] prompt for agent evaluation tasks (Issue #32) Hi, we have updated the online inference logic here: https://github.com/bytedance/UI-TARS/blob/feat/oswd_infer/infer/osworld.py — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

AHEADer · 2025-02-08T06:09:42Z

火山暂时不支持，readme的部署文档有魔搭+阿里云一键部署的方案

Yutong-Dai changed the title ~~prompt for navigation tasks~~ prompt for agent evaluation tasks Feb 3, 2025

korbinian-hoermann mentioned this issue Feb 6, 2025

Prompt format for multi-step set up #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt for agent evaluation tasks #32

prompt for agent evaluation tasks #32

Yutong-Dai commented Feb 3, 2025

pooruss commented Feb 8, 2025 •

edited

Loading

lemonhall commented Feb 8, 2025 via email

AHEADer commented Feb 8, 2025

prompt for agent evaluation tasks #32

prompt for agent evaluation tasks #32

Comments

Yutong-Dai commented Feb 3, 2025

pooruss commented Feb 8, 2025 • edited Loading

lemonhall commented Feb 8, 2025 via email

AHEADer commented Feb 8, 2025

pooruss commented Feb 8, 2025 •

edited

Loading