ByteScout PDF Extractor SDK

Software Screenshot:
ByteScout PDF Extractor SDK
Software Details:
Version: 9.0.0.3079 updated
Upload Date: 15 Aug 18
Developer: ByteScout
Distribution Type: Shareware
Price: 10.00 $
Downloads: 130
Size: 596 Kb

Rating: 2.0/5 (Total Votes: 2)

PDF Extractor SDK for Windows software developers: PDF to Text, PDF to XML, Images from PDF, Read PDF information, PDF to CSV for Excel. Bytescout PDF Extractor SDK allows to convert PDF to text, PDF to XML, PDF to CSV, extract images from PDF, extract information about PDF files in .NET and ActiveX interfaces without any additional software required.

Benefits: converts PDF to plain text (and can follow columns if you converting a newspaper in PDF format) - including invisible text extraction; converts tables in PDF to Excel (CSV) by reading cells from given rectangle; converts tables in PDF to XML files; extracts PDF file metadata (title, author, description) and get other information about the file (number of pages, encrypted or not); extracts embedded images from PDF document (in ASP.NET, VB.NET, C#, VB6 and VBScript); DocumentMerger and DocumentSplitter interfaces and classes to merge and split PDF documents; doesn't require Adobe Reader or any other PDF reader software to be installed; provides .NET and ActiveX interfaces; made with 100% managed C# code.

What is new in this release:

Version 9.0.0.3079: Added filtering of extracted content by font name, font size and color. Updated OCR engine to the latest version. Update language files from 'tessdata' folder. Improved text extraction, lines grouping in tabular data, performance, XFA forms extraction, TableDetector, fixed PDF parsing issues.

What is new in version 8.7.0.2980:

Added filtering of extracted content by font name, font size and color. Updated OCR engine to the latest version. Update language files from 'tessdata' folder. Improved text extraction, lines grouping in tabular data, performance, XFA forms extraction, TableDetector, fixed PDF parsing issues.

What is new in version 8.6.0.2911:

Added filtering of extracted content by font name, font size and color. Updated OCR engine to the latest version. Update language files from 'tessdata' folder. Improved text extraction, lines grouping in tabular data, performance, XFA forms extraction, TableDetector, fixed PDF parsing issues.

What is new in version 8.2.0.2699:

Version 8.2.0.2699 may include unspecified updates, enhancements, or bug fixes.

What is new in version 8.0.0.2528:

  • Added filtering of extracted content by font name, font size and color.
  • Updated OCR engine to the latest version. Update language files from "tessdata" folder.
  • Improved text extraction.
  • Improved lines grouping in tabular data.
  • Improved performance.
  • Improved XFA forms extraction.
  • Improved TableDetector.
  • Fixed PDF parsing issues.
  • Fixed JBIG images decoding.
  • ImageExtractor: fixed per-page image extraction.
  • MultimediaExtractor: fixed extraction on embedded MPEG audio.
  • TextExtractor: fixed non-working RemoveHyphenation property.
  • Other minor improvements and bug fixes.
  • What is new in version 7.0.0.2474:

    Version 7.0.0.2474:

    • added new DocumentPrinter utility class allowing to print PDF documents silently (without any user dialogs)
    • added new JSONExtractor class
    • added override for DocumentSplitter.Split() method allowing to specify the output folder for generated files
    • fixed multi-threading bug in DocumentSplitter
    • tableDetector now respects extraction area set by SetExtractionArea() method
    • new properties in extraction classes: ExtractionColumns - contains coordinates of detected columns; CustomExtractionColumns - allows to override the column detection
    • GetPageRect* methods did not take the page rotation into account. Fixed bug in installer causing some files from previous installation were interfering with updates
    • reworked the registration checking. Now the library will not throw an exception, but work in demo mode if you missed or input wrong RegistrationName and RegistrationKey
    • PDF Multitool: Added recent document list to "Open PDF Document" button
    • PDF Multitool: Selection can be resized now
    • PDF Multitool: Added Extract JSON feature
    • PDF Multitool: Improved Table Detector UI
    • PDF Multitool: Greatly improved font rendering quality
    • PDF Multitool: Added debug option "Show Detected Extraction Columns" to the context menu to display the detected columns on the current page. Becomes visible only after running any extraction against the current displayed page
    • PDF Multitool: Fixed font rendering issue on 32-bit Windows
    • other minor improvements and bug fixes

    What is new in version 6.30.0.2421:

    Version 6.30.0.2421:

    • Added TextComparer utility class (available in .NET 4.0 assemblies only) allowing to compare text in two PDF documents and generate report.
    • Improved support of ICC color profiles.
    • Imporved handling of embedded fonts.
    • Improved AttachmentExtractor.
    • Fixed XMLExtractor.SaveXMLToStream() method.
    • Fixed extracted text duplication when using OCRCacheMode.WholePage option.
    • Other bug fixes and improvements.

    What is new in version 6.20.2354:

    Version 6.20.2354:

    • PDF To Text, PDF To CSV, PDF To XML functions improved
    • New Extract Video, Extract Audio examples
    • CSV and XML extractors improved support for tables with empty columns inside
    • new MultimediaExtractor to extract video and audio from PDF
    • new property PageDataCaching
    • new "MemoryCareProcessingOfHugeFiles" example
    • fixed null exception when trying to dispose already disposed pages
    • XLSExtractor: improves fonts support
    • SkipInvisibleText now skips clipped text (which is not visible)
    • text output rendering improved
    • XFDF Extractor: added support for checkboxes
    • Images output improved to support more sub-formats
    • Unicode text handling improved

    What is new in version 6.11.2149:

    Version 6.11.2149:

    • Batch Processing samples updated to show the use of Reset() method
    • C++ source code sample added for Pages Extraction
    • DocumentMerger adds Merge2(inputfile1, inputfile2, outputfile) method to merge 2 files
    • XLS Extractor minor bug-fixes
    • PDF Multitool now allows to enable/disable text, image, vector layers, adds advanced settings for text extraction
    • XML, CSV, Table extraction improves support for tables with emtpry cells inside columns
    • .ExtractShadowLikeText property improved: better filtering for shadow-like text

    What is new in version 6.10.2136:

    Version 6.10.2136:

    • PDF to XML, PDF To CSV, PDF To Text functionality improved
    • PDF To XLS command line sample added (based on vbscript)
    • PDF To HTML SDK adds new .DetectHyperLinks property (TRUE by default) to enable/disable automated links detection in the text
    • new SearchablePDFMaker (available for PRO licenses) to convert PDF into searchable PDF files
    • new properties in extractor: ConsiderFontNames, ConsiderFontSizes, ConsiderFontColors, ConsiderVerticalBorders in CFG files
    • header columns detection (when AutoAlighHeaderToColumns = true) improved
    • .DetectLinesInsteadOfParagraphs replaced with new .LineGroupingMode to control how lines are merged into paragraphs
    • IMPORTANT! PDF To XML fixes long time issue with incorrect Y coordinate for text objects (was point to the bottom left instead of top left)
    • .TableXMinIntersectionRequiredInPercents and .TableYMinIntersectionRequiredInPercents properties added
    • C++ source code sample added
    • XML Extractor fixes missing empty columns in PreserveFormatting=true mode
    • minor fixes in colors in some PDF files
    • support for multiple OCR languages added
    • PDF Multitool GUI: adds Copy to Clipboard button to TXT, CSV, XML and raster renderer dialogs
    • XLSExtractor: adds PageToWorksheet property to enable/disable generation of separate worksheets per page
    • new .TextEncodingCodePage property
    • PDFViewerControl: adds ValidateContextMenu allowing user to add custom items to context menu
    • PDF Viewer control: adds properties ShowTextObjects, ShowImageObjects, ShowVectorObjects
    • XMLExtractor now adds "OCRConfidence" attribute for recognized text
    • PDF/A checking functionality (in beta)
    • improving controls and text checking and alignment according to the original layout. The issue was caused by the shift of Y coordinates in controls while parsing: that was incorrect. The correct way is to shif...
    • XML Extractor updated: now produces CONTROL tag for checkboxes and text fields
    • changed using of current directory to temp directory
    • checkboxes,radioboxes, editboxes, comboboxes are better supported
    • now allows partial trust callers

    What is new in version 5.80.1781:

    Version 5.80.1781:

    • PDF to XML, PDF to CSV, PDF to Text functionality updated
    • OCRMode now provides 9 modes
    • .DetectLineInsteadOfParagraph now works much better. Set it to False to capture multiline text in table cells!
    • PDF controls support improved
    • FDF and XFDF data extraction

    What is new in version 5.10.1747:

    Version 5.10.1747:

    • PDF to XML, PDF to CSV, PDF to Text functions improved
    • now supports text extraction from text controls
    • XML extractor now adds font style, size, name, text coordinates into tags
    • ASP.NET sample for OCR usage added
    • new property OCRLanguageDataFolder to specify the location of "tessdata" folder
    • improved support of PDF files
    • improves support for rotated text
    • updated source code samples
    • updated documentation
    • minor improvements and fixes

    What is new in version 5.00.1626:

    Version 5.00.1626:

    • OCR (text from images) functionality added: now you may extract text from embedded images and repair damaged text
    • issue fixed with CSV and XML extractor missing last columns with some settings
    • improved support for damaged PDF files
    • multiline search text search with word matching modes is now supported
    • now may search text with hyphens and on different lines: see new source code sample Find Text With Hyphens
    • new property .RTLTextAutoDetectionEnabled (false by default) to auto detect RTL languages
    • PDF Viewer GUI demo improved
    • minor improvements and fixes

    Requirements:

    .NET Framework 2.0 or higher

    Limitations:

    Nag screen, watermark on output

    Supported Operation Systems

    Similar Software

    Other Software of Developer ByteScout

    Comments to ByteScout PDF Extractor SDK

    Comments not found
    Add Comment
    Turn on images!