Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

units.sort(key=lambda a: a[0][1]) #4

Open
Aaron-Ge opened this issue Jul 12, 2024 · 2 comments
Open

units.sort(key=lambda a: a[0][1]) #4

Aaron-Ge opened this issue Jul 12, 2024 · 2 comments

Comments

@Aaron-Ge
Copy link

在_get_units方法里,这段代码的必要性是什么呢?
发现这个会导致打乱原有百度ocr里的顺序,导致分行错误

@hiroi-sora
Copy link
Owner

你好。这个算法的前提,是假设OCR原有结果是错误的,于是从头开始进行重新排序。

如果你使用百度OCR已经获取了正确的顺序,那么就没有必要使用本算法进行处理了。

本算法是机械式的规则匹配,适用于本身没有排版分析模型的OCR结构。百度OCR等商业接口,可能已经内置了排版分析模型,灵活性和准确性可能比规则匹配更好。

@Aaron-Ge
Copy link
Author

感谢解答,我这边主要是用您的算法解决分行和分列的问题;现在我也在按照我的实际情况调整代码.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants