Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Python 3.x #19

Open
johnlinp opened this issue Jul 15, 2019 · 5 comments
Open

Support Python 3.x #19

johnlinp opened this issue Jul 15, 2019 · 5 comments
Assignees

Comments

@johnlinp
Copy link
Owner

Python 2 is going to be deprecated; let's support Python 3.x.

@johnlinp johnlinp self-assigned this Jul 15, 2019
@johnlinp
Copy link
Owner Author

Some issues were pointed out in #17 (comment)

@nidhi-wgl
Copy link

nidhi-wgl commented Dec 25, 2019

converted existing code base to python3 using 2to3 and installed the dist and tried running. It gives an error

Traceback (most recent call last):
  File "/usr/local/bin/pdf2md", line 4, in <module>
    __import__('pkg_resources').run_script('pdf-to-markdown==0.1.0', 'pdf2md')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1469, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.7/dist-packages/pdf_to_markdown-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf2md", line 32, in <module>
  File "/usr/local/lib/python3.7/dist-packages/pdf_to_markdown-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf2md", line 27, in main
  File "/usr/local/lib/python3.7/dist-packages/pdf_to_markdown-0.1.0-py3.7.egg/pdf2md/writer.py", line 27, in write
  File "/usr/local/lib/python3.7/dist-packages/pdf_to_markdown-0.1.0-py3.7.egg/pdf2md/writer.py", line 50, in _write_simple
  File "/usr/local/lib/python3.7/dist-packages/pdf_to_markdown-0.1.0-py3.7.egg/pdf2md/pile.py", line 74, in gen_markdown
  File "/usr/local/lib/python3.7/dist-packages/pdf_to_markdown-0.1.0-py3.7.egg/pdf2md/pile.py", line 266, in _gen_paragraph_markdown
  File "/usr/local/lib/python3.7/dist-packages/pdf_to_markdown-0.1.0-py3.7.egg/pdf2md/syntax.py", line 47, in pattern
  File "/usr/lib/python3.7/re.py", line 183, in search
    return _compile(pattern, flags).search(string)
TypeError: cannot use a string pattern on a bytes-like object

i thought maybe something with re.match or re.search but i guess the content is not getting as string but as bytes format. some encoding and decode issue when parsing with only english text also.

TypeError: can only concatenate str (not "bytes") to str

I just was hoping to inform about error nothing else, i might try to work on it when i have some time

@nidhi-wgl
Copy link

i am not sure if this is correct way to do it but .decode(encoding="utf-8") fixes it and the extension works perfect with all files including the example file in repo.

@nella17 nella17 mentioned this issue Jun 5, 2020
@johnlinp
Copy link
Owner Author

johnlinp commented Jun 7, 2020

Hi @nidhi-wgl,

According to @nella17's PR (#22), we can see that simply removing the .encode('utf8') part should work. Please see 6791abf.

Thanks @nella17!

@nidhi-wgl
Copy link

yeah, that is also one way around. I didn't want to remove .encode or any exiting code so I was proposing to add the decode line if anyone wanted to run the code in python3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants