Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prompt for agent evaluation tasks #32

Open
Yutong-Dai opened this issue Feb 3, 2025 · 3 comments
Open

prompt for agent evaluation tasks #32

Yutong-Dai opened this issue Feb 3, 2025 · 3 comments

Comments

@Yutong-Dai
Copy link

Thanks for the nice work!

I am wondering whether the prompts shared in the readme is also used for online / offline agent evaluations reported in the paper.

Specifically, I tried the prompt template for the computer. On the one hand, the grounding is very accurate. On the other hand, however, the model can easily stuck in some states, for example, keep issuing actions like click non-interactive elements like a string. Like the prompt suggested, I indeed add the history of the actions performed as part of the user instruction. This is not helpful in resolving the issue.

Provided the SOTA performances on the online / offline agent benchmarks, I would like to learn what else I could potentially miss.

Thanks.


More details are provided below.

ui_tars_prompt = r"""You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. 

## Output Format

Thought: ...
Action: ...

## Action Space

click(start_box='<|box_start|>(x1,y1)')
left_double(start_box='<|box_start|>(x1,y1)<|box_end|>')
right_single(start_box='<|box_start|>(x1,y1)<|box_end|>')
drag(start_box='<|box_start|>(x1,y1)<|box_end|>', end_box='<|box_start|>(x3,y3)<|box_end|>')
hotkey(key='')
type(content='') #If you want to submit your input, use \"\\n" at the end of `content`.
scroll(start_box='<|box_start|>(x1,y1)<|box_end|>', direction='down or up or right or left')
wait() #Sleep for 5s and take a screenshot to check for any changes.
finished()
call_user() # Submit the task and call the user when the task is unsolvable, or when you need the user's help.


## Note
- Use English in `Thought` part.
- Summarize your next action (with its target element) in one sentence in `Thought` part.

## User Instruction
"""    

user_prompt = f"GOAL: {task_description}\n"
user_prompt +=   '<a list of actions performed so far>'
content = [
                        {"type": "text", "text": system_message + user_prompt + "\n CURRENT SCREENSHOT:"},
                        {"type": "image_url", "image_url": {"url": pil_to_b64(current_screenshots)}},
                    ]
message = [{"role": "user", "content": content}]

... 
send message to the 72B-DPO model served with vllm
@Yutong-Dai Yutong-Dai changed the title prompt for navigation tasks prompt for agent evaluation tasks Feb 3, 2025
@pooruss
Copy link
Contributor

pooruss commented Feb 8, 2025

Hi, we have updated the online inference logic here: https://github.com/bytedance/UI-TARS/blob/feat/oswd_infer/infer/osworld.py

Meantime, we are supporting UI-TARS model in the original OSWorld repo, please stay tune.

@lemonhall
Copy link

lemonhall commented Feb 8, 2025 via email

@AHEADer
Copy link
Contributor

AHEADer commented Feb 8, 2025

火山暂时不支持,readme的部署文档有魔搭+阿里云一键部署的方案

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants