Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Website search tool #160

Merged
merged 24 commits into from
Dec 6, 2024
Merged

Conversation

anthonydevs17
Copy link
Collaborator

@anthonydevs17 anthonydevs17 commented Nov 26, 2024

RAG-based Search Tools Release 🚀

This PR introduces four new powerful search tools leveraging Retrieval-Augmented Generation (RAG) technology.

What's New

  • SimpleRAG: Foundational RAG implementation with langchain components
  • WebsiteSearch: Semantic search capabilities for web content
  • PDFSearch: Comprehensive PDF document analysis tool
  • TextFileSearch: Optimized plain text document search

Key Features

  • Built on RAG technology with OpenAI integration
  • Flexible vector store support (Memory, Pinecone)
  • Customizable chunking and processing options
  • Server-side execution support
  • Comprehensive documentation for each tool

Testing

  • Unit tests added for all new tools
  • Integration tests with different vector stores
  • Browser compatibility tests for relevant tools
  • Documentation examples verified

Documentation

  • Added README files for each tool with:
    • Installation instructions
    • Usage examples
    • Advanced configuration options
    • Integration guides

Dependencies

Added:

  • cheerio for HTML parsing
  • pdf-parse for Node.js PDF processing
  • pdfjs-dist for browser PDF processing
  • Core langchain components

Related Issues

Closes #141

if (!this.content || this.content === '') {
throw new Error('Please provide content to process.');
}
if (!query || query === '') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I'd like to suggest updating the error messages in SimpleRAG to be more explicit for agent decision-making.

Currently, the tool throws errors, but since this tool is meant to be used by agents, we should return structured error messages that help agents make decisions. Here's the proposed change:

js
async call(input) {
const { content = this.content, query } = input;
if (!content) {
return "ERROR_MISSING_CONTENT: No text content was provided for analysis. Agent should provide content in the 'content' field.";
}

if (!query) {
return "ERROR_MISSING_QUERY: No question was provided. Agent should provide a question in the 'query' field.";
}
try {
const ragToolkit = this.ragToolkit;
await ragToolkit.addDocuments([{ source: content, type: 'string' }]);
const response = await ragToolkit.askQuestion(query);
return response;
} catch (error) {
return ERROR_RAG_PROCESSING: RAG processing failed. Details: ${error.message}. Agent should verify content format and query validity.;
}
}

Key improvements:

  • Returns errors as strings instead of throwing exceptions
  • Adds clear ERROR_ prefixes for easy error type identification
  • Makes error messages explicit about what's missing or wrong
  • Provides direct guidance on what the agent should do next
  • Makes error states machine-parseable while remaining human-readable

Let me know if you'd like me to explain any part of these changes!

this.url = url;
}
if (!this.url || this.url === '') {
throw new Error('Please provide url to process.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.chunkOptions = fields.chunkOptions;
this.embeddings = fields.embeddings;
this.vectorStore = fields.vectorStore;
this.llm = fields.llm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's called it llmInstance ... To speak the same "language" of the KaibanJS framework

this.chunkOptions = fields.chunkOptions;
this.embeddings = fields.embeddings;
this.vectorStore = fields.vectorStore;
this.llm = fields.llm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's called it llmInstance ... To speak the same "language" of the KaibanJS framework

this.llm =
options.llm ||
new ChatOpenAI({
model: 'gpt-4',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the 4o-mini as the default instead?


this.loaders = {
string: (source) => new TextInputLoader(source),
// text: source => new TextLoader(source),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this from here if we are not using it yet

…de.js and Browser environments

refactor(pdf-search): rename PDF File Analyzer to PDF File Searcher for consistency
refactor(textfile-search): rename Text File Analyzer to Text File Searcher for consistency
refactor(textfile-search): update task description and expected output for semantic search
fix(website-search): add installation instructions for cheerio dependency
chore: add server.js to .gitignore
@darielnoel darielnoel merged commit 7c163d0 into kaiban-ai:main Dec 6, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add RAG Tools to the kaibanjs/tools Package
2 participants