Feature Request : data_preparation.py to add an option to create chunks by page for PDF #402

koichino · 2023-11-20T06:34:00Z

It would be better to have an option to create chunks by page for PDF in data_preparation.py. Currently, chunking is done by tokens(1024 tokens). This is only option.
I think index quality might be better to separate PDF into pages to have 1 page has 1 topic. My customer is cooperating to prepare for PDFs that 1 page has 1 topic. (1 topic not across pages etc..)

github-actions · 2025-01-30T02:00:06Z

This issue is stale because it has been open for 30 days with no activity.

github-actions bot added the stale label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request : data_preparation.py to add an option to create chunks by page for PDF #402

Feature Request : data_preparation.py to add an option to create chunks by page for PDF #402

koichino commented Nov 20, 2023

github-actions bot commented Jan 30, 2025

Feature Request : data_preparation.py to add an option to create chunks by page for PDF #402

Feature Request : data_preparation.py to add an option to create chunks by page for PDF #402

Comments

koichino commented Nov 20, 2023

github-actions bot commented Jan 30, 2025