[Feature Request]: Remove think blocks when generating text in specified formats. #4983

huanshang141 · 2025-02-14T17:05:42Z

Is there an existing issue for the same feature request?

I have checked the existing issues.

Is your feature request related to a problem?

当我使用Deepseek-R1模型作为系统chat模型进行文档解析时，如果使用自动关键词、自动问题或使用RAPTOR等功能，生成的分块会带有思考块，尽管我已经在prompt中强调要去除思考块。

When using the Deepseek-R1 model as the system chat model for document parsing, generated chunks still contain thinking blocks despite explicit instructions in the prompt to remove them – whether using auto-keyword, auto-question, or RAPTOR etc.

For example，a chunk like "<think>The user requests me to generate keywords...</think>" keyword1, keyword2..."

Describe the feature you'd like

尽管使用推理模型进行这些生成特定格式的工作不一定能有很大提升，但移除生成内容中的思考块是很容易的，只需要使用正则替换即可，所以我认为可以增加去除思考块的代码。

While using reasoning models for generating specific formats may not yield significant improvements, removing thinking blocks from generated content is straightforward and can be achieved through regex replacement. Therefore, I recommend implementing code to filter out thinking blocks.

Describe implementation you've considered

我在生成关键词和知识图谱的实体提取中增加了这行代码，效果显著。提取实体时一个块消耗的tokens从10k+减少到≈3k，同时也避免了生成错误的块。

After adding this line of code to keyword generation and knowledge graph entity extraction, I observed significant improvements. Token consumption per chunk during entity extraction dropped from over 10k+ to approximately 3k, while also preventing the generation of erroneous chunks.

# remove think block
full_response = re.sub(r'<think>.*?</think>', '', full_response, flags=re.DOTALL)

所以或许可以在rag/llm/chat_model.py中定义一个新的chat函数，针对推理模型返回的内容先进行去除思考块的操作再返回，然后graphrag/和dialog_service.py中的函数就可以直接调用这个新的函数了。

Therefore, perhaps we could define a new chat function in rag/llm/chat_model.py that first remove think blocks from the reasoning model's output before returning it. This would allow functions in both graphrag/ and dialog_service.py to directly utilize this enhanced implementation.

Documentation, adoption, use case

Additional information

另外，我怀疑是推理模型响应速度的问题，我在生成知识图谱时经常会遇到408超时的返回代码，我推测是Deepseek-R1生成回答需要太多的事件，而客户端发出的请求一直没有得到回复，所以返回408超时。当我把生成知识图谱的函数调用的chat函数都改为chat_streamly函数时，我再也没有遇到过408超时问题，所以或许可以对部分工程量较大的对话使用chat_streamly。

Additionally, I suspect the root cause lies in the reasoning model's response latency. During knowledge graph generation, I frequently encountered 408 timeout errors. This likely occurs because Deepseek-R1 requires excessive processing time to generate responses, leaving client-side requests unacknowledged. After migrating all knowledge graph generation function calls from chat() to chat_streamly(), the timeout issues were completely resolved. Therefore, it's advisable to employ chat_streamly for compute-intensive dialogues requiring substantial processing workloads.

（我使用的是在Azure AI Foundry部署的Deepseek-R1，其速度似乎比官方的慢，我查阅了issue，没有发现类似的408问题，我的情况可能比较特殊。）

(I'm using Deepseek-R1 deployed via Azure AI Foundry, which appears to operate slower than the official version. After reviewing existing issues, I found no similar reports of 408 timeout problems, suggesting my scenario might be an edge case.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Remove think blocks when generating text in specified formats. #4983

[Feature Request]: Remove think blocks when generating text in specified formats. #4983

huanshang141 commented Feb 14, 2025

[Feature Request]: Remove think blocks when generating text in specified formats. #4983

[Feature Request]: Remove think blocks when generating text in specified formats. #4983

Comments

huanshang141 commented Feb 14, 2025

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information