PDFMiner

Software Screenshot:
PDFMiner
Software Details:
Version: 20140328
Upload Date: 13 May 15
Developer: Yusuke Shinyama
Distribution Type: Freeware
Downloads: 22

Rating: nan/5 (Total Votes: 0)

PDFMiner works by first taking the content of a PDF file and converting it to a more malleable format like HTML.

From there, text and data is extracted and analyzed, and based on the predefined rules separated and presented to the user or sent to other more powerful data analysis tools.

If text analysis is not what you intend to do, you can easily configure PDFMiner to simply extract or just convert PDF data as well.

Its functions can work separately from one another and allow a wider usage spectrum thanks to it.

Features:

  • 100% Python code, no C or C++
  • Parse PDFs
  • Analyze PDFs
  • Convert PDFs to other formats
  • ToC extractor
  • Get only tagged content
  • Support for a large number of text PDF features
  • Support for a large number of font types inside PDFs
  • Basic encryption (RC4) support

What is new in this release:

  • PDFDocument.initialize() method is removed and no longer needed. A password is given as an argument of a PDFDocument constructor.

What is new in version 20110515:

  • API changes.
  • LTPolygon class was renamed as LTCurve.

What is new in version 20110227:

  • Bug fixes and layout analysis improvements.

What is new in version 20101226:

  • A couple of bugfixes and minor improvements.

What is new in version 20101017:

  • A couple of bugfixes and a minor improvement.

What is new in version 20100424:

  • Bugfixes and tiny improvements on TOC extraction.

Requirements:

  • Python 2.4 up to 3

Limitations:

  • PDFMiner can be 20 times slower than C/C++-based software.

Similar Software

python-snappy
python-snappy

1 Mar 15

lxml
lxml

12 May 15

pyregion
pyregion

13 May 15

Grappelli
Grappelli

18 Apr 16

Other Software of Developer Yusuke Shinyama

XCruiser
XCruiser

3 Jun 15

Comments to PDFMiner

Comments not found
Add Comment
Turn on images!