-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashed in a multi-threaded environment #225
Comments
That's expected, as one of marker's dependencies (pypdfium2/pdfium) is not thread-compatible: |
I have already used a thread lock during parsing, but the core dump occurred during GC |
Hmm, pypdfium2 auto-closes pdfium objects on garbage collection using A possible workaround/test might be to add explicit close calls to all pdfium root objects throughout the dependencies and see if that fixes the issue. Unfortunately, threading/GC-related issues are hard to debug. |
I had a similar error with the corrupted double-linked list (Colab A100). Running this code fixed the problem for me:
|
@aj8907 Sorry, I'm not much into threading, but I don't logically see how this is supposed to fix the above issue? And why is a bare RLock not sufficient?
If the cause is indeed simultaneous calls due to GC, and not other caller-caused corruption, I figured we may be able to add an API to plug in a caller-provided lock into our auto-close machinery. |
I have switched from multithreading to multiprocessing for inference, and my project version is currently at 0.2.17. Since the project seems to be undergoing a restructuring for version 2, I don't need a solution to this issue for now. |
pypdfium2==4.30.0
marker-pdf==0.2.13
I call the code as follows:
An error occurred when triggering Python's GC:
The text was updated successfully, but these errors were encountered: