Releases: VikParuchuri/marker
v1.5.3
Windows fixes
- Fix issue with streamlit app and permissions
- Fix torch classes issue
Memory leak fix
Fixed memory leak with repeatedly reusing the same converter.
Convert.py enhancements
- disable tqdm bars when converting multiple files
What's Changed
- Fix issue with reopening and deleting pdf file on Windows by @xiaoyao9184 in #463
- Fix SyntaxWarning: invalid escape sequence '\c' by using string r-prefix by @dantetemplar in #555
- Dev by @VikParuchuri in #560
New Contributors
- @xiaoyao9184 made their first contribution in #463
- @dantetemplar made their first contribution in #555
Full Changelog: v1.5.2...v1.5.3
Fix LLM service issue
Fix issue with initializing the LLM service with no default specified.
Fix OCR issue
Fix issue with OCRing documents with a mix of good and bad pages.
Inline math; speed up LLM calls; allow local models
Inline math
Marker will handle inline math if --use_llm
is set. This makes reading scientific papers a lot nicer! The feature has been optimized for speed.

Local LLMs
We now support Ollama - when you're passing the --use_llm
flag, you can select the Ollama inference service like this:
marker_single FILEPATH --use_llm --llm_service marker.services.ollama.OllamaService
You can set the options --ollama_base_url
and --ollama_model
. By default, it will use llama3.2-vision
.
Batch LLM calls
LLM calls are now batched across processors for a significant speedup if you're passing --use_llm
.
Misc fixes
- Biology PDFs now work a lot better - leading line numbers are stripped
- Improved OCR heuristics
- Updated the examples
What's Changed
- Batch together llm inference requests by @VikParuchuri in #536
- Add another heuristic to clean up line numbers by @iammosespaulr in #538
- Add Inline Math Support by @tarun-menta in #517
- Factor out llm services, enable local models by @VikParuchuri in #544
- Improve LLM speed; handle inline math; allow local models by @VikParuchuri in #537
Full Changelog: v1.4.0...v1.5.0
LLM fixes; new benchmarks
New benchmarks
Overall
Benchmark against llamaparse, docling, mathpix (see README for how to run benchmarks). Marker performs favorably against alternatives in speed, llm as judge scoring, and heuristic scoring.
Table
Benchmark tables against gemini flash:
Update gemini model
- Use the new genai library
- Update to gemini flash 2.0
Misc bugfixes
- Fix bug with OCR heuristics not being aggressive enough
- Fix bug with empty tables
- Ensure references get passed through in llm processors
What's Changed
- Add llm text support for references, superscripts etc by @iammosespaulr in #523
- Update overall benchmark by @VikParuchuri in #515
- Benchmarks by @VikParuchuri in #531
Full Changelog: v1.3.5...v1.4.0
Bump gemini version
When using the optional llm mode, there appears to be a bug with gemini flash 1.5. This release bumps the version to gemini flash 2.0, which appears to resolve the bug.
Fix pytorch bug
There was a bug with pytorch 2.6 and MPS that caused errors in inference - this has been fixed.
New LaTeX OCR model; block visualizer; better links/references
Improved LaTeX OCR
We trained a new LaTeX OCR model that works a lot better overall. It will reliably output KaTeX-compatible math. It also operates on longer sequences than before.
The rendered output is on the right, original document on the left:

Block visualization
You can now visualize blocks in the streamlit app, thanks to @jazzido . By selecting json output and checking "show blocks", you get a nice visualization where you can see how marker parsed the page. Clicking on blocks will show the HTML.

Links and references
We fixed a bug with links and references, they now render as one block. You can see the extracted references here:

Misc bugfixes
- Fixed some bugs with tables and row splitting
- Escaped $ inside text and tables so we don't accidentally render things as equations
What's Changed
- [streamlit_app] Visualize extracted blocks by @jazzido in #502
- Texify by @VikParuchuri in #513
- Update texify by @VikParuchuri in #514
New Contributors
Full Changelog: v1.3.2...v1.3.3
Fix table bugs
- Issue where some blocks were hidden when they were around tables
- Fix span id issue with
--use_llm
and tables - Fix problem with tables not OCRing when needed
Improved equations, bugfixes
- Equations in tables now render properly with
--use_llm
- Fix how block equations render
- Fix bug with markdown table rendering and
--use_llm
- Fix bug with convert.py CLI script