Sanitize

Software Screenshot:
Sanitize
Software Details:
Version: 4.0.0 updated
Upload Date: 12 May 15
Developer: Ryan Grove
Distribution Type: Freeware
Downloads: 38

Rating: nan/5 (Total Votes: 0)

Based on the Nokogiri HTML parser for Ruby, Sanitize is a whitelist-based system for removing HTML from a block of text.

The "whitelist" technique will allow developers to setup a list of HTML tags which Sanitize will use as a reference for what it considers "acceptable" HTML.

Every other HTML tag not in the list will be removed from the parsed text.

Sanitize can work with standards-compliant or with malformed HTML.

The library can detect and filter out HTML tags, attributes and protocols.

The cleaned text will always be outputted as valid HTML or XHTML.

To help developers get started on their projects, Sanitize comes with a few ready-made configurations included. Check the README file for more details.

What is new in this release:

  • Added two new CSS config settings, :at_rules_with_properties and :at_rules_with_styles.
  • Added full support for CSS @page rules in the relaxed config, including support for all page-margin box rules.
  • Added the following CSS at-rules to the relaxed config.
  • Added a whole bunch of CSS properties to the relaxed config. View the complete list here.
  • Small performance improvements.
  • Upgraded Crass to 1.0.2 to pick up a fix that affected the parsing of CSS @page rules.

What is new in version 3.1.2:

  • Fixed: #document and #fragment failed on frozen strings, and could unintentionally modify unfrozen strings if they used an encoding other than UTF-8 or if they contained characters not allowed in HTML.

What is new in version 3.0.2:

  • Updated Nokogumbo to 1.1.12, because 1.1.11 silently reverted the change we were trying to pick up in the last release.

What is new in version 3.0.0:

  • Added advanced CSS sanitization support using Crass, which is fully compliant with the CSS Syntax Module Level 3 parsing spec. The contents of whitelisted <style> elements and style attributes in HTML will be sanitized as CSS, or you can use the Sanitize::CSS class to manually sanitize CSS stylesheets or properties.
  • Added an :allow_doctype setting. When true, well-formed doctype definitions will be allowed in documents. When false (the default), doctype definitions will be removed from documents. Doctype definitions are never allowed in fragments, regardless of this setting.
  • Added the following elements to the relaxed config, in addition to various attributes: article, aside, body, data, div, footer, head, header, html, main, nav, section, span, style, title.
  • The :whitespace_elements config is now a Hash, and allows you to specify the text that should be inserted before and after these elements when they're removed. The old-style Array-based config value is still supported for backwards compatibility.
  • Unsuitable Unicode characters are now removed from HTML before it's parsed.
  • Fixed:
  • Non-tag brackets in input like "1 > 2 and 2 < 1" are now parsed and escaped correctly in accordance with the HTML5 spec, becoming "1 > 2 and 2 < 1".
  • Siblings added after the current node during traversal are now also traversed. In previous versions they were simply skipped.
  • Nokogiri has been smacked and instructed to stop adding newlines after certain elements, because if people wanted newlines there they'd have put them there, dammit.

What is new in version 2.0.6:

  • Version 2.0.5 inadvertently included some work-in-progress changes that shouldn't have made their way into the master branch.

What is new in version 1.2.1:

  • Added a :remove_contents config setting. If set to true, Sanitize will remove the contents of all non-whitelisted elements in addition to the elements themselves. If set to an Array of element names, Sanitize will remove the contents of only those elements (when filtered), and leave the contents of other filtered elements. [Thanks to Rafael Souza for the Array option]
  • Added an :output_encoding config setting to allow the character encoding for HTML output to be specified. The default is 'utf-8'.
  • The environment hash passed into transformers now includes a :node_name item containing the lowercase name of the current HTML node (e.g. "div").
  • Returning anything other than a Hash or nil from a transformer will now raise a meaningful Sanitize::Error exception rather than an unintended NameError.

Requirements:

  • Ruby 1.9.2 or higher
  • Nokogiri 1.4.4 or higher

Similar Software

csv2html
csv2html

5 Jun 15

LESS
LESS

13 May 15

DOMBrew
DOMBrew

13 May 15

django-html5
django-html5

6 Jun 15

Other Software of Developer Ryan Grove

Net::Amazon::S3
Net::Amazon::S3

11 May 15

jsmin-php
jsmin-php

5 Jun 15

JSHint
JSHint

10 Apr 16

Comments to Sanitize

Comments not found
Add Comment
Turn on images!