feat: added attempt_complete tool, giving agent ability to signal completion #406

ErikBjare · 2025-01-16T18:12:18Z

Played with this a couple of days ago, idea was to mimic a similar tool I saw in one of the SWE-Bench submissions.

Could be a more reliable stopping-condition (and a good place to run appropriate checks) than waiting for a non-tooluse message.

Not sure if this should be merged here or in gptme-contrib, but since it required some changes to the Python tool it'll go here for now.

Important

Add attempt_complete tool to signal task completion, integrated into Python execution flow in python.py.

New Tool:
- Added attempt_complete.py with Completed class to signal task completion.
- attempt_complete() function attempts task completion and returns Completed.
- check_completion() placeholder for task completion checks.
Integration:
- In python.py, import Completed from attempt_complete.py.
- Update execute_python() to handle Completed by exiting process.
Misc:
- attempt_complete tool is disabled by default in ToolSpec.

^{This description was created by}^{for 70e5b91. It will automatically update as commits are pushed.}

…pletion

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 70e5b91 in 1 minute and 9 seconds

More details

Looked at 69 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. gptme/tools/python.py:145

Draft comment:
Using sys.exit(0) in a library function can be problematic as it abruptly terminates the program. Consider raising an exception or handling this more gracefully.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The use of sys.exit(0) here seems intentional as part of handling a specific completion signal. This is a tool for executing Python code interactively, and the Completed signal appears to be a deliberate way to terminate the session. The comment's suggestion might make the code more library-friendly, but it would complicate the simple and clear termination logic currently in place.
I might be overlooking potential use cases where this code needs to be used as a library without terminating the entire program. The comment could be raising a valid architectural concern.
While the concern is theoretically valid, the code appears to be part of an interactive tool where immediate termination on completion is the desired behavior. The current implementation is clear and serves its purpose.
Delete the comment. The use of sys.exit(0) appears intentional for this interactive tool's use case, and the suggested change would complicate the code without clear benefit.

2. gptme/tools/attempt_complete.py:27

Draft comment:
check_completion always returns True, which means it doesn't perform any actual checks. Implement meaningful checks here.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is technically correct that check_completion() always returns True. However, the code already has a TODO comment explaining this is temporary and listing what real checks will be added. The automated comment is therefore redundant and doesn't add new information or insight beyond what's already documented in the code.
Maybe the TODO comment isn't sufficient and having an additional review comment would put more pressure on fixing this soon?
The TODO comment is actually more specific and helpful than the review comment, as it suggests concrete examples of what checks to add. The review comment doesn't add meaningful pressure beyond the TODO.
Delete this comment as it's redundant with the existing TODO comment that already acknowledges and better explains the placeholder implementation.

Workflow ID: wflow_UmZ3ogCrZ4bZ4gyR

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

codecov-commenter · 2025-01-16T19:08:17Z

Codecov Report

Attention: Patch coverage is 50.00000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 69.86%. Comparing base (d27da5e) to head (70e5b91).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
gptme/tools/attempt_complete.py	46.15%	7 Missing ⚠️
gptme/tools/python.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #406      +/-   ##
==========================================
- Coverage   70.10%   69.86%   -0.25%     
==========================================
  Files          70       71       +1     
  Lines        5761     5778      +17     
==========================================
- Hits         4039     4037       -2     
- Misses       1722     1741      +19

Flag	Coverage Δ
anthropic/claude-3-haiku-20240307	`68.76% <50.00%> (-0.16%)`	⬇️
openai/gpt-4o-mini	`67.68% <50.00%> (-0.29%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

feat: added attempt_complete tool, giving agent ability to signal com…

70e5b91

…pletion

ellipsis-dev bot reviewed Jan 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: added attempt_complete tool, giving agent ability to signal completion #406

feat: added attempt_complete tool, giving agent ability to signal completion #406

ErikBjare commented Jan 16, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

codecov-commenter commented Jan 16, 2025

feat: added attempt_complete tool, giving agent ability to signal completion #406

Are you sure you want to change the base?

feat: added attempt_complete tool, giving agent ability to signal completion #406

Conversation

ErikBjare commented Jan 16, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 16, 2025

Codecov Report

ErikBjare commented Jan 16, 2025 •

edited by ellipsis-dev bot

Loading