- Create a folder named
pdf
inpdfReader/
and put pdf documents in./pdf/
- Run
Python pdfReader.py
- pdf documents will be converted into txt files in
./txt/
- Webpage: https://euske.github.io/pdfminer/
- Download (PyPI): https://pypi.python.org/pypi/pdfminer/
- Demo WebApp: http://pdf2html.tabesugi.net:8080/
- Install Python 2.6 or newer. (For Python 3 support have a look at pdfminer.six).
- Download the source code.
- Unpack it.
- Run
setup.py
:
$ python setup.py install
- Do the following test:
$ pdf2txt.py samples/simple1.pdf
Parse English and Chinese Papers
- Webpage:
- Python 2.7.9
- Python 3.4.3