Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skips images while converting PDF to Markdown #526

Open
baburaoapte124421 opened this issue Feb 4, 2025 · 1 comment
Open

Skips images while converting PDF to Markdown #526

baburaoapte124421 opened this issue Feb 4, 2025 · 1 comment

Comments

@baburaoapte124421
Copy link

While converting a PDF file to a Markdown (.md) file using the Marker library, the output Markdown file does not include images from the original PDF. The images are entirely skipped, with no references or placeholders left in the Markdown output.

I tried
`from marker.converters.pdf import PdfConverter
from marker.models import create_model_dict
from marker.config.parser import ConfigParser

config = {
"output_format": "markdown",
"disable_image_extraction": False
}

config_parser = ConfigParser(config)
converter = PdfConverter(
config=config_parser.generate_config_dict(),
artifact_dict=create_model_dict(),
processor_list=config_parser.get_processors(),
renderer=config_parser.get_renderer()
)

rendered = converter("/path/to/sample.pdf")
print(rendered)
`

Expected vs. Actual Behavior:
Expected: The Markdown file should contain image references () along with the extracted images saved separately.
Actual: The output Markdown file does not include any images, and no references to images are present.

@VikParuchuri
Copy link
Owner

That's strange - do you have the PDF you were testing with? I haven't seen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants