Skips images while converting PDF to Markdown #526

baburaoapte124421 · 2025-02-04T08:52:12Z

While converting a PDF file to a Markdown (.md) file using the Marker library, the output Markdown file does not include images from the original PDF. The images are entirely skipped, with no references or placeholders left in the Markdown output.

I tried
`from marker.converters.pdf import PdfConverter
from marker.models import create_model_dict
from marker.config.parser import ConfigParser

config = {
"output_format": "markdown",
"disable_image_extraction": False
}

config_parser = ConfigParser(config)
converter = PdfConverter(
config=config_parser.generate_config_dict(),
artifact_dict=create_model_dict(),
processor_list=config_parser.get_processors(),
renderer=config_parser.get_renderer()
)

rendered = converter("/path/to/sample.pdf")
print(rendered)
`

Expected vs. Actual Behavior:
Expected: The Markdown file should contain image references () along with the extracted images saved separately.
Actual: The output Markdown file does not include any images, and no references to images are present.

VikParuchuri · 2025-02-14T02:21:23Z

That's strange - do you have the PDF you were testing with? I haven't seen this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skips images while converting PDF to Markdown #526

Skips images while converting PDF to Markdown #526

baburaoapte124421 commented Feb 4, 2025

VikParuchuri commented Feb 14, 2025

Skips images while converting PDF to Markdown #526

Skips images while converting PDF to Markdown #526

Comments

baburaoapte124421 commented Feb 4, 2025

VikParuchuri commented Feb 14, 2025