Chat is very slow when using with the llama.cpp server #2845

lehuythangit · 2024-11-08T14:46:32Z

Validations

I believe this is a way to improve. I'll try to join the Continue Discord for questions
I'm not able to find an open issue that requests the same enhancement

Problem

Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.

Solution

Please help add more property cache_prompt = true when call the /completion API,
or add more configuration property into the config.json

Thanks

lehuythangit · 2024-11-08T14:49:28Z

i just workaround with modified the .vscode\extensions\continue.continue-0.8.55-win32-x64\out\extension.js

sestinj assigned Patrick-Erichsen Nov 8, 2024

dosubot bot added area:chat Relates to chat interface area:configuration Relates to configuration options labels Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat is very slow when using with the llama.cpp server #2845

Chat is very slow when using with the llama.cpp server #2845

lehuythangit commented Nov 8, 2024

lehuythangit commented Nov 8, 2024

Chat is very slow when using with the llama.cpp server #2845

Chat is very slow when using with the llama.cpp server #2845

Comments

lehuythangit commented Nov 8, 2024

Validations

Problem

Solution

lehuythangit commented Nov 8, 2024