lxml

Software Screenshot:
lxml
Software Details:
Version: 3.4.4 updated
Upload Date: 12 May 15
Developer: infrae.com
Distribution Type: Freeware
Downloads: 70

Rating: nan/5 (Total Votes: 0)

lxml combines the speed of those libraries with the simplicity of the Python language.

Compatible with all CPython versions from 2.4 to 3.2.

What is new in this release:

  • lxml.html.iterlinks now returns links inside meta refresh tags.
  • New XMLParser option collect_ids=False to disable ID hash table creation. This can substantially speed up parsing of documents with many different IDs that are not used.
  • The parser uses per-document hash tables for XML IDs. This reduces the load of the global parser dict and speeds up parsing for documents with many different IDs.
  • ElementTree.getelementpath(element) returns a structural ElementPath expression for the given element, which can be used for lookups later.
  • xmlfile() accepts a new argument close=True to close file(-like) objects after writing to them. Before, xmlfile() only closed the file if it had opened it internally.
  • Allow "bytearray" type for ASCII text input.

What is new in version 3.4.2:

  • lxml.html.iterlinks now returns links inside meta refresh tags.
  • New XMLParser option collect_ids=False to disable ID hash table creation. This can substantially speed up parsing of documents with many different IDs that are not used.
  • The parser uses per-document hash tables for XML IDs. This reduces the load of the global parser dict and speeds up parsing for documents with many different IDs.
  • ElementTree.getelementpath(element) returns a structural ElementPath expression for the given element, which can be used for lookups later.
  • xmlfile() accepts a new argument close=True to close file(-like) objects after writing to them. Before, xmlfile() only closed the file if it had opened it internally.
  • Allow "bytearray" type for ASCII text input.

What is new in version 3.3.2:

  • The properties resolvers and version, as well as the methods set_element_class_lookup() and makeelement(), were lost from iterparse objects.
  • Instances of XMLSchema, Schematron and RelaxNG did not clear their local error_log before running a validation.
  • lxml.doctestcompare mixed up "expected" and "actual" in attribute values.

What is new in version 3.3.1:

  • Bugs fixed:
  • HTML documents parsed with parser.feed() failed to find elements during tag iteration.
  • Building in PyPy failed due to missing support for PyUnicode_Compare() and PyByteArray_*() in PyPy's C-API.
  • Compilation in MSVC failed due to missing "stdint.h" standard header file.
  • iterparse() failed to parse BOM prefixed files.

What is new in version 3.3.0:

  • Bugs fixed:
  • The heuristic that distinguishes file paths from URLs was tightened to produce less false negatives.

What is new in version 3.2.3:

  • Fixed support for Python 2.4 which was lost in 3.2.2.

What is new in version 3.2.1:

  • The methods apply_templates() and process_children() of XSLT extension elements have gained two new boolean options elements_only and remove_blank_text that discard either all strings or whitespace-only strings from the result list.

What is new in version 3.2.0:

  • Leading whitespace could change the behaviour of the string parsing functions in lxml.html.
  • The string parsing functions in lxml.html are more robust in the face of uncommon HTML content like framesets or missing body tags.
  • I/O errors while trying to access files with paths that contain non-ASCII characters could raise UnicodeDecodeError instead of properly reporting the IOError.
  • Parsing from in-memory strings disabled network access in the default parser and made subsequent attempts to parse from a URL fail.

What is new in version 3.1.2:

  • Passing attributes through the namespace-unaware API of the sax bridge (i.e. the handler.startElement() method) failed with a TypeError.
  • Fixed serialisation error in XSLT output when converting the result tree to a Unicode string.

What is new in version 3.0.2:

  • Fixed crash during interpreter shutdown by switching to Cython 0.17.3 for building.

What is new in version 3.0:

  • C14N allows specifying the inclusive prefixes to be promoted to top-level during exclusive serialisation.
  • Initial support for building in PyPy (through cpyext).
  • DTD objects gained an API that allows read access to their declarations.
  • xpathgrep.py gained support for parsing line-by-line (e.g. from grep output) and for surrounding the output with a new root tag.
  • E-factory in lxml.builder accepts subtypes of known data types (such as string subtypes) when building elements around them.
  • Tree iteration and iterparse() with a selective tag argument supports passing a set of tags. Tree nodes will be returned by the iterators if they match any of the tags.

What is new in version 2.3.5:

  • Crash when merging text nodes in element.remove().
  • Crash in sax/target parser when reporting empty doctype.

What is new in version 2.3.4:

  • Crash when building an nsmap (Element property) with empty namespace URIs.
  • Crash due to race condition when errors (or user messages) occur during threaded XSLT processing.
  • XSLT stylesheet compilation could ignore compilation errors.

What is new in version 2.3.3:

  • Features added:
  • lxml.html.tostring() gained new serialisation options with_tail and doctype.
  • Bugs fixed:
  • Fixed a crash when using iterparse() for HTML parsing and requesting start events.
  • Fixed parsing of more selectors in cssselect. Whitespace before pseudo-elements and pseudo-classes is significant as it is a descendant combinator. "E :pseudo" should parse the same as "E *:pseudo", not "E:pseudo".
  • lxml.html.diff no longer raises an exception when hitting 'img' tags without 'src' attribute.

What is new in version 2.3.2:

  • Features added:
  • lxml.objectify.deannotate() has a new boolean option cleanup_namespaces to remove the objectify namespace declarations (and generally clean up the namespace declarations) after removing the type annotations.
  • lxml.objectify gained its own SubElement() function as a copy of etree.SubElement to avoid an otherwise redundant import of lxml.etree on the user side.
  • Bugs fixed:
  • Fixed the "descendant" bug in cssselect a second time (after a first fix in lxml 2.3.1). The previous change resulted in a serious performance regression for the XPath based evaluation of the translated expression. Note that this breaks the usage of some of the generated XPath expressions as XSLT location paths that previously worked in 2.3.1.
  • Fixed parsing of some selectors in cssselect. Whitespace after combinators ">", "+" and "~" is now correctly ignored. Previously is was parsed as a descendant combinator. For example, "div> .foo" was parsed the same as "div>* .foo" instead of "div>.foo".

What is new in version 2.3.1:

  • Features added:
  • New option kill_tags in lxml.html.clean to remove specific tags and their content (i.e. their whole subtree).
  • pi.get() and pi.attrib on processing instructions to parse pseudo-attributes from the text content of processing instructions.
  • lxml.get_include() returns a list of include paths that can be used to compile external C code against lxml.etree. This is specifically required for statically linked lxml builds when code needs to compile against the exact same header file versions as lxml itself.
  • Resolver.resolve_file() takes an additional option close_file that configures if the file(-like) object will be closed after reading or not. By default, the file will be closed, as the user is not expected to keep a reference to it.
  • Bugs fixed:
  • HTML cleaning didn't remove 'data:' links.
  • The html5lib parser integration now uses the 'official' implementation in html5lib itself, which makes it work with newer releases of the library.
  • In lxml.sax, endElementNS() could incorrectly reject a plain tag name when the corresponding start event inferred the same plain tag name to be in the default namespace.
  • When an open file-like object is passed into parse() or iterparse(), the parser will no longer close it after use. This reverts a change in lxml 2.3 where all files would be closed. It is the users responsibility to properly close the file(-like) object, also in error cases.
  • Assertion error in lxml.html.cleaner when discarding top-level elements.
  • In lxml.cssselect, use the xpath 'A//B' (short for 'A/descendant-or-self::node()/B') instead of 'A/descendant::B' for the css descendant selector ('A B'). This makes a few edge cases to be consistent with the selector behavior in WebKit and Firefox, and makes more css expressions valid location paths (for use in xsl:template match).
  • In lxml.html, non-selected <option> tags no longer show up in the collected form values.
  • Adding/removing <option> values to/from a multiple select form field properly selects them and unselects them.
  • Other changes:
  • Static builds can specify the download directory with the --download-dir option.

What is new in version 2.3:

  • Features added:
  • When looking for children, lxml.objectify takes '{}tag' as meaning an empty namespace, as opposed to the parent namespace.
  • Bugs fixed:
  • When finished reading from a file-like object, the parser immediately calls its .close() method.
  • When finished parsing, iterparse() immediately closes the input file.
  • Work-around for libxml2 bug that can leave the HTML parser in a non-functional state after parsing a severly broken document (fixed in libxml2 2.7.8).
  • Marque tag in HTML cleanup code is correctly named marquee.
  • Other changes:
  • Some public functions in the Cython-level C-API have more explicit return types.

What is new in version 2.3beta1:

  • Bugs fixed:
  • Crash in newer libxml2 versions when moving elements between documents that had attributes on replaced XInclude nodes.
  • XMLID() function was missing the optional parser and base_url parameters.
  • Searching for wildcard tags in iterparse() was broken in Py3.
  • lxml.html.open_in_browser() didn't work in Python 3 due to the use of os.tempnam. It now takes an optional 'encoding' parameter.

Similar Software

Tempita
Tempita

13 May 15

xls2db
xls2db

5 Jun 15

ISBNid
ISBNid

1 Mar 15

Comments to lxml

Comments not found
Add Comment
Turn on images!