Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat is very slow when using with the llama.cpp server #2845

Open
1 of 2 tasks
lehuythangit opened this issue Nov 8, 2024 · 1 comment
Open
1 of 2 tasks

Chat is very slow when using with the llama.cpp server #2845

lehuythangit opened this issue Nov 8, 2024 · 1 comment
Assignees
Labels
area:chat Relates to chat interface area:configuration Relates to configuration options

Comments

@lehuythangit
Copy link

Validations

  • I believe this is a way to improve. I'll try to join the Continue Discord for questions
  • I'm not able to find an open issue that requests the same enhancement

Problem

Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.

Solution

Please help add more property cache_prompt = true when call the /completion API,
or add more configuration property into the config.json

Thanks

@dosubot dosubot bot added area:chat Relates to chat interface area:configuration Relates to configuration options labels Nov 8, 2024
@lehuythangit
Copy link
Author

i just workaround with modified the .vscode\extensions\continue.continue-0.8.55-win32-x64\out\extension.js

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:chat Relates to chat interface area:configuration Relates to configuration options
Projects
None yet
Development

No branches or pull requests

2 participants