aboutsummaryrefslogtreecommitdiffstats
path: root/lib/htmlpurifier/docs/enduser-overview.txt
diff options
context:
space:
mode:
Diffstat (limited to 'lib/htmlpurifier/docs/enduser-overview.txt')
-rw-r--r--lib/htmlpurifier/docs/enduser-overview.txt59
1 files changed, 0 insertions, 59 deletions
diff --git a/lib/htmlpurifier/docs/enduser-overview.txt b/lib/htmlpurifier/docs/enduser-overview.txt
deleted file mode 100644
index fe7f8705d..000000000
--- a/lib/htmlpurifier/docs/enduser-overview.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-
-HTML Purifier
- by Edward Z. Yang
-
-There are a number of ad hoc HTML filtering solutions out there on the web
-(some examples including HTML_Safe, kses and SafeHtmlChecker.class.php) that
-claim to filter HTML properly, preventing malicious JavaScript and layout
-breaking HTML from getting through the parser. None of them, however,
-demonstrates a thorough knowledge of neither the DTD that defines the HTML
-nor the caveats of HTML that cannot be expressed by a DTD. Configurable
-filters (such as kses or PHP's built-in striptags() function) have trouble
-validating the contents of attributes and can be subject to security attacks
-due to poor configuration. Other filters take the naive approach of
-blacklisting known threats and tags, failing to account for the introduction
-of new technologies, new tags, new attributes or quirky browser behavior.
-
-However, HTML Purifier takes a different approach, one that doesn't use
-specification-ignorant regexes or narrow blacklists. HTML Purifier will
-decompose the whole document into tokens, and rigorously process the tokens by:
-removing non-whitelisted elements, transforming bad practice tags like <font>
-into <span>, properly checking the nesting of tags and their children and
-validating all attributes according to their RFCs.
-
-To my knowledge, there is nothing like this on the web yet. Not even MediaWiki,
-which allows an amazingly diverse mix of HTML and wikitext in its documents,
-gets all the nesting quirks right. Existing solutions hope that no JavaScript
-will slip through, but either do not attempt to ensure that the resulting
-output is valid XHTML or send the HTML through a draconic XML parser (and yet
-still get the nesting wrong: SafeHtmlChecker.class.php does not prevent <a>
-tags from being nested within each other).
-
-This document no longer is a detailed description of how HTMLPurifier works,
-as those descriptions have been moved to the appropriate code. The first
-draft was drawn up after two rough code sketches and the implementation of a
-forgiving lexer. You may also be interested in the unit tests located in the
-tests/ folder, which provide a living document on how exactly the filter deals
-with malformed input.
-
-In summary (see corresponding classes for more details):
-
-1. Parse document into an array of tag and text tokens (Lexer)
-2. Remove all elements not on whitelist and transform certain other elements
- into acceptable forms (i.e. <font>)
-3. Make document well formed while helpfully taking into account certain quirks,
- such as the fact that <p> tags traditionally are closed by other block-level
- elements.
-4. Run through all nodes and check children for proper order (especially
- important for tables).
-5. Validate attributes according to more restrictive definitions based on the
- RFCs.
-6. Translate back into a string. (Generator)
-
-HTML Purifier is best suited for documents that require a rich array of
-HTML tags. Things like blog comments are, in all likelihood, most appropriately
-written in an extremely restrictive set of markup that doesn't require
-all this functionality (or not written in HTML at all), although this may
-be changing in the future with the addition of levels of filtering.
-
- vim: et sw=4 sts=4