Free Download Jericho HTML Parser for Linux ::: XML, RSS, CSS Tools

Jericho HTML Parser

Software Screenshot:

Software Details:

Version: 3.3

Upload Date: 20 Feb 15

Developer: Martin Jericho

Distribution Type: Freeware

Downloads: 56

Download

Currently 1.00/5
1
2
3
4
5

Rating: 1.0/5 (Total Votes: 1)

Jerich HTML Parser is an open source, simple, yet powerful library written entirely in Java.

It allows programmers to manipulate and analyse parts of a HTML document.

Jerich HTML Parser also incorporates high-level HTML form manipulation functions.

What is new in this release:

Bug Fixes:
[3581664] CharacterReference.decode() does not decode entities containing digits - ½ ¼ ¾ ¹ ² ³ ∴
[3311286] SourceCompactor does not respect TEXTAREA
[3519131] Renderer output incorrect when constructed with an Element object.
[3538829] Renderer output of font decoration on block boundaries incorrect.
Segment.getAllStartTags(name) and Segment.getFirstElement(name) do not work if the argument contains upper case characters.
The end delimiter of a common server tag inside an escaped server tag is falsely recognised as the end delimiter of the escaped tag.
CHANGES THAT COULD AFFECT THE BEHAVIOUR OF EXISTING PROGRAMS:
[3427073] Segment.getStyleURISegments() now includes style element content as well as style attribute values.
[3427927] Segment.getURIAttributes() now includes the archive attributes of object and applet elements.
Comments no longer recognised inside script elements during full sequential parse. Previously they were recognised for compatibility with major browsers but modern browser behaviour has changed.
Changed the log level of all parsing errors from INFO to ERROR, and the log level of the Source.fullSequentialParse() advisory message from WARN to INFO. The previous levels gave the advisory message a higher severity than the parsing errors, preventing logging systems from hiding the advisory message while showing parsing errors. Character encoding warnings remain unchanged at WARN level.
Changed the behaviour of the Renderer.renderHyperlinkURL(StartTag) method so that relative URLs are not rendered.
Changed the behaviour of the Renderer so that hyperlink element content is not rendered if it is the same as the hyperlink URL, ignoring any http:// prefix or / suffix.
EndTag.tidy() now removes whitespace before the closing bracket.
Added Source(File) constructor.
Added OutputDocument.getSegment() method.
Added OutputDocument.remove(int begin, int end) method.
Added Renderer.setHRLineLength() method.
Added RenderToText.jsp webapp sample.
Added Segment.getRowColumnVector() method.
Encoding detection now ignores common encodings specified in meta tags that have a code unit size incompatible with the preliminary encoding.
Upgraded to the following logger APIs: slf4j-api-1.7.2, log4j-1.2.17

What is new in version 3.1:

Bug Fixes:
[2793556] Infinite loop on Segment.getAllStartTags()
Infinite loop on Segment.getAllElements()
Segment.getFirst* methods returned segments outside the bounding segment.
Segment.getAllElements methods did not return all enclosed elements in some circumstances.
Fixed documentation errors in Segment.getAllElements methods.
Added StreamedSource class.
CHANGES THAT COULD AFFECT THE BEHAVIOUR OF EXISTING PROGRAMS:
Changed ParseText from class to interface.
Segment.getNodeIterator() now returns character references as separate nodes.
Added tag search methods based on attribute value regular expressions.
Added tag search methods based on HTML class attribute.
Added static Source.LegacyNodeIteratorCompatabilityMode property temporarily to restore Segment.getNodeIterator() functionality to that of previous versions.
Removed char[] based search methods in ParseText.
Added CharacterReference.appendCharTo(Appendable) method.
Added OutputDocument(Segment) constructor.
Added StreamedSourceCopy sample program.

What is new in version 3.0:

Bug Fixes:
Character references representing unicode supplementary characters were not decoded correctly to UTF-16 code unit pairs.
[2188446] Element.getDepth() and Element.getParentElement() returned incorrect results if called in parse on demand mode.
Comments are now recognised inside < script > elements.
API CHANGES THAT ARE NOT BACKWARD COMPATIBLE:
Changed package name to net.htmlparser.jericho
Attribute values must now be String rather than CharSequence.
Removed all deprecated methods/classes from previous versions.
All find* methods deprecated in favour of get* methods in order to apply a consistent naming convention across all tag search methods.
Tag, Element and HTMLElements classes no longer implement the HTMLElementName interface. (use static import instead)
All collections now stongly typed using generics.
Changed FormControlOutputStyle class to enum.
Changed FormControlType class to enum.
Added CharStreamSource.appendTo(Appendable) method.
Added Source.iterator() method.
Source now implements Iterable.
Internally uses StringBuilder for better performance.
Added Source.getNextStartTag(StartTagType) method.
Added Source.getNextEndTag(EndTagType) method.
Added Source.getPreviousStartTag(StartTagType) method.
Added Source.getPreviousEndTag(EndTagType) method.
Added Segment.getAllStartTags(StartTagType) method.
Added all Segment.getFirst* methods.
Added Renderer.renderHyperlinkURL(StartTag) method.
Added HTMLSanitiser sample program.
Upgraded to slf4j-api-1.5.6

Search by Category

Jericho HTML Parser

Similar Software

mdxflavours

MIB Smithy Standard Edition

JCAM Engine

Bib2x

Other Software of Developer Martin Jericho

Jericho HTML Parser

Jericho HTML Parser

Comments to Jericho HTML Parser

Comments not found

Add Comment

Search by Category

Last Viewed Software

PostgreSQL IBM DB2 Import, Export & Convert Software 16 Apr 15

CodePorting C#2Java Visual Studio Addin 16 Apr 15

An Introduction to TCP/IP Programming 29 Oct 15

Search by Category

Popular software

safox 3 Jun 15

seqdiag 20 Feb 15

wiki2csv 14 Apr 15

Nemo Templates 14 Apr 15

EDIReader 3 Jun 15

KoMar 11 May 15

Rubber 3 Jun 15

Jericho HTML Parser

Similar Software

mdxflavours

MIB Smithy Standard Edition

JCAM Engine

Bib2x

Other Software of Developer Martin Jericho

Jericho HTML Parser

Jericho HTML Parser

Comments to Jericho HTML Parser

Comments not found

Add Comment

Last Viewed Software

PostgreSQL IBM DB2 Import, Export & Convert Software 16 Apr 15

CodePorting C#2Java Visual Studio Addin 16 Apr 15

An Introduction to TCP/IP Programming 29 Oct 15

Search by Category

Popular software

Haroopad 19 Feb 15

dvidraw 20 Feb 15

YaHP Converter 15 Apr 15

epubmaker 14 Apr 15

Bluefish 3 Oct 17

RealObjects edit-on Pro 11 May 15

DataTree 12 May 15