Software Details:
Version: 3.4.1
Upload Date: 17 Feb 15
Distribution Type: Freeware
Downloads: 72
lxml is a sophisticated, powerful, free and unique Python module that binds the libxml2 and libxslt libraries, allowing Python developers to work with both XML and HTML files insider their Python code.
An XML processing library
lxml is an XML (Extensible Markup Language) processing library written in the Python programming language, specifically designed to follow the ElementTree API specification as much as possible.It can extend the ElementTree API to expose specific functionality of the libxslt and libxml2 libraries, such as Relax NG (Next-Generation), XPath, XML Schema, c14n, XSLT (EXtensible Stylesheet Language), etc.
Use lxml to call Python code from XSLT stylesheets
Developers will be able to use the lxml program to call Python code from XSLT stylesheets and XPath expressions via extension functions. A wide range of tutorials are available on the project’s homepage (see the link at the end of the article).The software is open source by design and combines the feature completeness and speed of the aforementioned libraries with the simplicity of Python’s API (Application Programming Interface).
Getting started with lxml
It is quite easy to install lxml on a GNU/Linux distribution using the source archive distributed on Softoware and the project’s official website. Simply download the source package, save it on your Home directory, unpack it, open the Terminal app and navigate to the location of the extracted archive files (e.g. cd /home/softoware/lxml-3.4.1).Run the ‘make’ command to compile the program, which should take about 1-2 minutes on a modern computer. After a successful compilation, run the ‘make install’ command as root or the ‘sudo make install’ command as a privileged user to install lxml system wide.
Supports GNU/Linux and Microsoft Windows operating systems
The software is officially supported on GNU/Linux and Microsoft Windows operating systems. It has been successfully tested on 32-bit and 64-bit computers.What is new in this release:
- Features added:
- New htmlfile HTML generator to accompany the incremental xmlfile serialisation API. Patch by Burak Arslan.
- Bugs fixed:
- lxml.sax.ElementTreeContentHandler did not initialise its superclass.
What is new in version 3.3.1:
- Bugs fixed:
- LP#1014290: HTML documents parsed with parser.feed() failed to find elements during tag iteration.
- LP#1273709: Building in PyPy failed due to missing support for PyUnicode_Compare() and PyByteArray_*() in PyPy's C-API.
- LP#1274413: Compilation in MSVC failed due to missing "stdint.h" standard header file.
- LP#1274118: iterparse() failed to parse BOM prefixed files.
What is new in version 3.0 Alpha 2:
- Features added:
- The .iter() method of elements now accepts tag arguments like "{*}name" to search for elements with a given local name in any namespace. With this addition, all combinations of wildcards now work as expected: "{ns}name", "{}name", "{*}name", "{ns}*", "{}*" and "{*}*". Note that "name" is equivalent to "{}name", but "*" is "{*}*". The same change applies to the .getiterator(), .itersiblings(), .iterancestors(), .iterdescendants(), .iterchildren() and .itertext() methods;the strip_attributes(), strip_elements() and strip_tags() functions as well as the iterparse() class.
- C14N allows specifying the inclusive prefixes to be promoted to top-level during exclusive serialisation.
- Bugs fixed:
- Passing long Unicode strings into the feed() parser interface failed to read the entire string.
What is new in version 2.3.5:
- Crash when merging text nodes in element.remove().
- Crash in sax/target parser when reporting empty doctype.
What is new in version 2.3.4:
- Crash when building an nsmap (Element property) with empty namespace URIs.
- Crash due to race condition when errors (or user messages) occur during threaded XSLT processing.
- XSLT stylesheet compilation could ignore compilation errors.
What is new in version 2.3.2:
- Features added:
- lxml.objectify.deannotate() has a new boolean option cleanup_namespaces to remove the objectify namespace declarations (and generally clean up the namespace declarations) after removing the type annotations.
- lxml.objectify gained its own SubElement() function as a copy of etree.SubElement to avoid an otherwise redundant import of lxml.etree on the user side.
- Bugs fixed:
- Fixed the "descendant" bug in cssselect a second time (after a first fix in lxml 2.3.1). The previous change resulted in a serious performance regression for the XPath based evaluation of the translated expression. Note that this breaks the usage of some of the generated XPath expressions as XSLT location paths that previously worked in 2.3.1.
- Fixed parsing of some selectors in cssselect. Whitespace after combinators " >", "+" and "~" is now correctly ignored. Previously is was parsed as a descendant combinator. For example, "div > .foo" was parsed the same as "div >* .foo" instead of "div >.foo".
What is new in version 2.3.1:
- Features added:
- New option kill_tags in lxml.html.clean to remove specific tags and their content (i.e. their whole subtree).
- pi.get() and pi.attrib on processing instructions to parse pseudo-attributes from the text content of processing instructions.
- lxml.get_include() returns a list of include paths that can be used to compile external C code against lxml.etree. This is specifically required for statically linked lxml builds when code needs to compile against the exact same header file versions as lxml itself.
- Resolver.resolve_file() takes an additional option close_file that configures if the file(-like) object will be closed after reading or not. By default, the file will be closed, as the user is not expected to keep a reference to it.
- Bugs fixed:
- HTML cleaning didn't remove 'data:' links.
- The html5lib parser integration now uses the 'official' implementation in html5lib itself, which makes it work with newer releases of the library.
- In lxml.sax, endElementNS() could incorrectly reject a plain tag name when the corresponding start event inferred the same plain tag name to be in the default namespace.
- When an open file-like object is passed into parse() or iterparse(), the parser will no longer close it after use. This reverts a change in lxml 2.3 where all files would be closed. It is the users responsibility to properly close the file(-like) object, also in error cases.
- Assertion error in lxml.html.cleaner when discarding top-level elements.
- In lxml.cssselect, use the xpath 'A//B' (short for 'A/descendant-or-self::node()/B') instead of 'A/descendant::B' for the css descendant selector ('A B'). This makes a few edge cases to be consistent with the selector behavior in WebKit and Firefox, and makes more css expressions valid location paths (for use in xsl:template match).
- In lxml.html, non-selected tags no longer show up in the collected form values.
- Adding/removing values to/from a multiple select form field properly selects them and unselects them.
- Other changes:
- Static builds can specify the download directory with the --download-dir option.
What is new in version 2.3:
- Features added:
- When looking for children, lxml.objectify takes '{}tag' as meaning an empty namespace, as opposed to the parent namespace.
- Bugs fixed:
- When finished reading from a file-like object, the parser immediately calls its .close() method.
- When finished parsing, iterparse() immediately closes the input file.
- Work-around for libxml2 bug that can leave the HTML parser in a non-functional state after parsing a severly broken document (fixed in libxml2 2.7.8).
- marque tag in HTML cleanup code is correctly named marquee.
- Other changes:
- Some public functions in the Cython-level C-API have more explicit return types.
What is new in version 2.2.8 / 2.3 Beta 1:
- Crash in newer libxml2 versions when moving elements between documents that had attributes on replaced XInclude nodes.
- XMLID() function was missing the optional parser and base_url parameters.
- Searching for wildcard tags in iterparse() was broken in Py3.
- lxml.html.open_in_browser() didn't work in Python 3 due to the use of os.tempnam. It now takes an optional 'encoding' parameter.
Requirements:
- Python
Comments not found