Website search tool #160

anthonydevs17 · 2024-11-26T18:09:37Z

RAG-based Search Tools Release 🚀

This PR introduces four new powerful search tools leveraging Retrieval-Augmented Generation (RAG) technology.

What's New

SimpleRAG: Foundational RAG implementation with langchain components
WebsiteSearch: Semantic search capabilities for web content
PDFSearch: Comprehensive PDF document analysis tool
TextFileSearch: Optimized plain text document search

Key Features

Built on RAG technology with OpenAI integration
Flexible vector store support (Memory, Pinecone)
Customizable chunking and processing options
Server-side execution support
Comprehensive documentation for each tool

Testing

Unit tests added for all new tools
Integration tests with different vector stores
Browser compatibility tests for relevant tools
Documentation examples verified

Documentation

Added README files for each tool with:
- Installation instructions
- Usage examples
- Advanced configuration options
- Integration guides

Dependencies

Added:

cheerio for HTML parsing
pdf-parse for Node.js PDF processing
pdfjs-dist for browser PDF processing
Core langchain components

Related Issues

Closes #141

…ool descriptions

darielnoel · 2024-11-26T21:38:28Z

packages/tools/src/simple-rag/index.js

+    if (!this.content || this.content === '') {
+      throw new Error('Please provide content to process.');
+    }
+    if (!query || query === '') {


Hi! I'd like to suggest updating the error messages in SimpleRAG to be more explicit for agent decision-making.

Currently, the tool throws errors, but since this tool is meant to be used by agents, we should return structured error messages that help agents make decisions. Here's the proposed change:

js async call(input) { const { content = this.content, query } = input; if (!content) { return "ERROR_MISSING_CONTENT: No text content was provided for analysis. Agent should provide content in the 'content' field."; } if (!query) { return "ERROR_MISSING_QUERY: No question was provided. Agent should provide a question in the 'query' field."; } try { const ragToolkit = this.ragToolkit; await ragToolkit.addDocuments([{ source: content, type: 'string' }]); const response = await ragToolkit.askQuestion(query); return response; } catch (error) { return ERROR_RAG_PROCESSING: RAG processing failed. Details: ${error.message}. Agent should verify content format and query validity.; } }

Key improvements:

Returns errors as strings instead of throwing exceptions

Adds clear ERROR_ prefixes for easy error type identification

Makes error messages explicit about what's missing or wrong

Provides direct guidance on what the agent should do next

Makes error states machine-parseable while remaining human-readable

Let me know if you'd like me to explain any part of these changes!

darielnoel · 2024-11-26T21:38:52Z

packages/tools/src/website-search/index.js

+      this.url = url;
+    }
+    if (!this.url || this.url === '') {
+      throw new Error('Please provide url to process.');


Same here -> https://github.com/kaiban-ai/KaibanJS/pull/160/files#r1859259421

darielnoel · 2024-11-26T21:39:32Z

packages/tools/src/simple-rag/index.js

+    this.chunkOptions = fields.chunkOptions;
+    this.embeddings = fields.embeddings;
+    this.vectorStore = fields.vectorStore;
+    this.llm = fields.llm;


Let's called it llmInstance ... To speak the same "language" of the KaibanJS framework

darielnoel · 2024-11-26T21:39:45Z

packages/tools/src/website-search/index.js

+    this.chunkOptions = fields.chunkOptions;
+    this.embeddings = fields.embeddings;
+    this.vectorStore = fields.vectorStore;
+    this.llm = fields.llm;


Let's called it llmInstance ... To speak the same "language" of the KaibanJS framework

darielnoel · 2024-11-26T21:44:30Z

packages/tools/src/_utils/rag/ragToolkit.js

+    this.llm =
+      options.llm ||
+      new ChatOpenAI({
+        model: 'gpt-4',


Can we use the 4o-mini as the default instead?

darielnoel · 2024-11-26T21:47:05Z

packages/tools/src/_utils/rag/ragToolkit.js

+
+    this.loaders = {
+      string: (source) => new TextInputLoader(source),
+      // text: source => new TextLoader(source),


Let's remove this from here if we are not using it yet

…te query in tool stories

… and stories

…rror handling

…or handling

…ironment

…nd error handling

…F and textfile search exports

…nd restore replace plugin functionality

…ist for improved compatibility

…ng PdfSearch

…de.js and Browser environments refactor(pdf-search): rename PDF File Analyzer to PDF File Searcher for consistency refactor(textfile-search): rename Text File Analyzer to Text File Searcher for consistency refactor(textfile-search): update task description and expected output for semantic search fix(website-search): add installation instructions for cheerio dependency chore: add server.js to .gitignore

…ant exclusions for pdf-parse and pdfjs-dist

…in Rollup configuration

… with 'fs/promises'

…cies in README files

anthonydevs17 added 6 commits November 25, 2024 23:53

new simple-rag tool

2b32bbe

feat(tools): Add Website Search tool with RAG capabilities

1924b14

fix(tools): Update RagToolkit import path in SimpleRAG and WebsiteSearch

fa141d2

docs(tools): Update README to include Simple RAG and Website Search t…

0fa1620

…ool descriptions

add dependency installation to README file

dc0b24d

change build bundle order

2c1db0d

darielnoel reviewed Nov 26, 2024

View reviewed changes

anthonydevs17 added 18 commits November 26, 2024 22:09

refactoring code

8dd2e08

refactor(ragToolkit): rename llmInstance to llm for consistency; upda…

1ccbd4c

…te query in tool stories

feat(tools): add PDF and text file search tools; update rollup config…

6b767d7

… and stories

feat(pdf-search): remove BrowserPDFLoader; enhance PDF fetching and e…

e7100f5

…rror handling

feat(textfile-search): integrate ky for HTTP requests and enhance err…

056ecc1

…or handling

feat(pdf-search): add BrowserPDFLoader for PDF loading in browser env…

476f546

…ironment

test(textfile-search): enhance tests to verify mockRagToolkit calls a…

a7e873b

…nd error handling

fix(tools): update rollup config to include pdfjs-dist and restore PD…

62020e5

…F and textfile search exports

fix(tools): update rollup config to include additional pdf.js paths a…

0da6ee9

…nd restore replace plugin functionality

fix(loaders): refactor BrowserPDFLoader to dynamically import pdfjs-d…

4de54c1

…ist for improved compatibility

feat(server): implement HTTP server with PDF search functionality usi…

240cdb1

…ng PdfSearch

refactor(server): remove deprecated HTTP server implementation

5936ab4

fix(tools): simplify commonjs plugin configuration by removing redund…

91a1b20

…ant exclusions for pdf-parse and pdfjs-dist

fix(tools): remove redundant exclusions for pdf-parse and pdfjs-dist …

ff15e95

…in Rollup configuration

remove unnecessary dependency

3cf2756

fix(tools): update Rollup configuration to replace 'node:fs/promises'…

2bea254

… with 'fs/promises'

docs(tools): remove installation instructions for deprecated dependen…

0f09b2b

…cies in README files

darielnoel merged commit 7c163d0 into kaiban-ai:main Dec 6, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Website search tool #160

Website search tool #160

anthonydevs17 commented Nov 26, 2024 •

edited by darielnoel

Loading

darielnoel Nov 26, 2024

darielnoel Nov 26, 2024

darielnoel Nov 26, 2024

darielnoel Nov 26, 2024

darielnoel Nov 26, 2024

darielnoel Nov 26, 2024

Website search tool #160

Website search tool #160

Conversation

anthonydevs17 commented Nov 26, 2024 • edited by darielnoel Loading

RAG-based Search Tools Release 🚀

What's New

Key Features

Testing

Documentation

Dependencies

Related Issues

darielnoel Nov 26, 2024

Choose a reason for hiding this comment

darielnoel Nov 26, 2024

Choose a reason for hiding this comment

darielnoel Nov 26, 2024

Choose a reason for hiding this comment

darielnoel Nov 26, 2024

Choose a reason for hiding this comment

darielnoel Nov 26, 2024

Choose a reason for hiding this comment

darielnoel Nov 26, 2024

Choose a reason for hiding this comment

anthonydevs17 commented Nov 26, 2024 •

edited by darielnoel

Loading