This project aims to replicate how a person naturally interacts with a computer to solve tasks. The system utilizes multiple agents to handle tasks such as web browsing, terminal operations, GUI interactions, and memory storage for adaptive decision-making. The Memory Agent is pivotal in recalling previous actions, which enhances the efficiency and accuracy of solving related tasks in the future.
The following diagram outlines the high-level interactions between the different agents:
This project is currently in the development stage, but significant progress has been made on its core components:
- Web Search Agent: Fully functional and effectively automates web interactions to retrieve data.
- System Agent: Almost fully functional, capable of handling GUI tasks. Currently supported only on Windows operating systems.
- Terminal Agent: Working at an optimal level for executing advanced or specific terminal commands.
- Memory Agent: Still under development. Thinking whether a centralized memory agent or dedicated memory for each agent would provide better efficiency and functionality.
This program runs directly on your machine and leverages a large language model (LLM) to execute commands. Use this system with caution, as it can interact with your system files, settings, and applications. Improper use or unintended commands may lead to potential harm to your system.
The system is structured around the following agents, each with a specialized role:
-
System Agent
Handles GUI-based tasks such as opening files, manipulating windows, and interacting with desktop elements. Currently supported only on Windows and nearing full functionality. -
Web Search Agent
Automates web browsing to gather data or solve tasks using online resources. This agent is fully functional and reliable. -
Terminal Agent
Executes terminal commands for advanced operations. This component is fully operational and performs tasks efficiently. -
Memory Agent
Stores and recalls past actions to improve decision-making and avoid redundant operations. Still under development, with ongoing discussions about the best architecture for memory management.
- The Memory Agent (or agent-specific memory) stores methodologies and steps used for solving tasks.
- When a new query is received, the system checks past stored actions to replicate or adapt a similar solution.
- The system integrates the System, Web Search, and Terminal Agent as needed to solve the task, mimicking a human approach.
- Clone the repository:
git clone https://github.com/jeomon/computer-agent.git
- Install dependencies:
pip install -r requirements.txt
- Start the agent system:
python app.py Enter your query: <your query goes here>
Entering Computer Agent
Thought: First, I need to get the weather information for Singapore. I can use the Web Agent for this task. Once I have the weather information, I'll use the System Agent to open Notepad and write the information into it.
Agent Name: Web Agent
Agent Request: Search for the current weather in Singapore and provide the information.
Entering Web Agent
Thought: I need to search for the current weather in Singapore. I will use the search engine to find this information. I will start by going to google.com.
Action Name: GoTo Tool
Action Input: {'url': 'https://www.google.com'}
Observation: Navigated to https://www.google.com
Thought: I'm on the Google homepage. I need to type "current weather in Singapore" in the search box and then click the Google Search button.
Action Name: Type Tool
Action Input: {'index': 6, 'text': 'current weather in Singapore'}
Observation: Typed current weather in Singapore in element 6
Thought: I have typed the query in the search box. Now I need to click on the Google Search button to get the results.
Action Name: Click Tool
Action Input: {'index': 21}
Observation: Clicked element 21
Thought: I have successfully searched for the current weather in Singapore. The current temperature is 28 degrees Celsius, with light rain. The precipitation is 45%, humidity is 75%, and wind is 13 km/h. I can provide this information to the user.
Final Answer: The current weather in Singapore is 28 degrees Celsius with light rain. The precipitation is 45%, humidity is 75%, and wind is 13 km/h.
Agent Response: The current weather in Singapore is 28 degrees Celsius with light rain. The precipitation is 45%, humidity is 75%, and wind is 13 km/h.
Thought: Now that I have the weather information, I need to open Notepad and write the weather information into it. I will use the System Agent for this task.
Agent Name: System Agent
Agent Request: Open Notepad. Then type the following: "The current weather in Singapore is 28 degrees Celsius with light rain. The precipitation is 45%, humidity is 75%, and wind is 13 km/h."
Entering System Agent
Thought: First, I need to open Notepad. I will use the search bar to find and open it.
Action Name: Click Tool
Action Input: {'index': 2, 'button': 'left'}
Final Answer: The text "The current weather in Singapore is 28 degrees Celsius with light rain. The precipitation is 45%, humidity is 75%, and wind is 13 km/h." has been typed into Notepad.
Agent Response: The text "The current weather in Singapore is 28 degrees Celsius with light rain. The precipitation is 45%, humidity is 75%, and wind is 13 km/h." has been typed into Notepad.
Thought: I have successfully retrieved the weather information using the Web Agent and then written it into Notepad using the System Agent. The task is complete.
Final Answer: The weather information for Singapore has been written into notepad. The current weather in Singapore is 28 degrees Celsius with light rain. The precipitation is 45%, humidity is 75%, and wind is 13 km/h
- Enhance the System Agent to handle more advanced and complex GUI tasks.
- Finalize the design and implementation of the Memory Agent, determining whether centralized or agent-specific memory is most effective.
- Continue refining the Web Search Agent and Terminal Agent for robust performance across a wide range of scenarios.
This project is licensed under the MIT License. See the LICENSE file for details.