diff options
Diffstat (limited to 'lib/htmlpurifier/docs/ref-html-modularization.txt')
-rw-r--r-- | lib/htmlpurifier/docs/ref-html-modularization.txt | 166 |
1 files changed, 0 insertions, 166 deletions
diff --git a/lib/htmlpurifier/docs/ref-html-modularization.txt b/lib/htmlpurifier/docs/ref-html-modularization.txt deleted file mode 100644 index d26d30ada..000000000 --- a/lib/htmlpurifier/docs/ref-html-modularization.txt +++ /dev/null @@ -1,166 +0,0 @@ - -The Modularization of HTMLDefinition in HTML Purifier - -WARNING: This document was drafted before the implementation of this - system, and some implementation details may have evolved over time. - -HTML Purifier uses the modularization of XHTML -<http://www.w3.org/TR/xhtml-modularization/> to organize the internals -of HTMLDefinition into a more manageable and extensible fashion. Rather -than have one super-object, HTMLDefinition is split into HTMLModules, -each of which are responsible for defining elements, their attributes, -and other properties (for a more indepth coverage, see -/library/HTMLPurifier/HTMLModule.php's docblock comments). These modules -are managed by HTMLModuleManager. - -Modules that we don't support but could support are: - - * 5.6. Table Modules - o 5.6.1. Basic Tables Module [?] - * 5.8. Client-side Image Map Module [?] - * 5.9. Server-side Image Map Module [?] - * 5.12. Target Module [?] - * 5.21. Name Identification Module [deprecated] - -These modules would be implemented as "unsafe": - - * 5.2. Core Modules - o 5.2.1. Structure Module - * 5.3. Applet Module - * 5.5. Forms Modules - o 5.5.1. Basic Forms Module - o 5.5.2. Forms Module - * 5.10. Object Module - * 5.11. Frames Module - * 5.13. Iframe Module - * 5.14. Intrinsic Events Module - * 5.15. Metainformation Module - * 5.16. Scripting Module - * 5.17. Style Sheet Module - * 5.19. Link Module - * 5.20. Base Module - -We will not be using W3C's XML Schemas or DTDs directly due to the lack -of robust tools for handling them (the main problem is that all the -current parsers are usually PHP 5 only and solely-validating, not -correcting). - -This system may be generalized and ported over for CSS. - -== General Use-Case == - -The outwards API of HTMLDefinition has been largely preserved, not -only for backwards-compatibility but also by design. Instead, -HTMLDefinition can be retrieved "raw", in which it loads a structure -that closely resembles the modules of XHTML 1.1. This structure is very -dynamic, making it easy to make cascading changes to global content -sets or remove elements in bulk. - -However, once HTML Purifier needs the actual definition, it retrieves -a finalized version of HTMLDefinition. The finalized definition involves -processing the modules into a form that it is optimized for multiple -calls. This final version is immutable and, even if editable, would -be extremely hard to change. - -So, some code taking advantage of the XHTML modularization may look -like this: - -<?php - $config = HTMLPurifier_Config::createDefault(); - $def =& $config->getHTMLDefinition(true); // reference to raw - $def->addElement('marquee', 'Block', 'Flow', 'Common'); - $purifier = new HTMLPurifier($config); - $purifier->purify($html); // now the definition is finalized -?> - -== Inclusions == - -One of the nice features of HTMLDefinition is that piggy-backing off -of global attribute and content sets is extremely easy to do. - -=== Attributes === - -HTMLModule->elements[$element]->attr stores attribute information for the -specific attributes of $element. This is quite close to the final -API that HTML Purifier interfaces with, but there's an important -extra feature: attr may also contain a array with a member index zero. - -<?php - HTMLModule->elements[$element]->attr[0] = array('AttrSet'); -?> - -Rather than map the attribute key 0 to an array (which should be -an AttrDef), it defines a number of attribute collections that should -be merged into this elements attribute array. - -Furthermore, the value of an attribute key, attribute value pair need -not be a fully fledged AttrDef object. They can also be a string, which -signifies a AttrDef that is looked up from a centralized registry -AttrTypes. This allows more concise attribute definitions that look -more like W3C's declarations, as well as offering a centralized point -for modifying the behavior of one attribute type. And, of course, the -old method of manually instantiating an AttrDef still works. - -=== Attribute Collections === - -Attribute collections are stored and processed in the AttrCollections -object, which is responsible for performing the inclusions signified -by the 0 index. These attribute collections, too, are mutable, by -using HTMLModule->attr_collections. You may add new attributes -to a collection or define an entirely new collection for your module's -use. Inclusions can also be cumulative. - -Attribute collections allow us to get rid of so called "global attributes" -(which actually aren't so global). - -=== Content Models and ChildDef === - -An implementation of the above-mentioned attributes and attribute -collections was applied to the ChildDef system. HTML Purifier uses -a proprietary system called ChildDef for performance and flexibility -reasons, but this does not line up very well with W3C's notion of -regexps for defining the allowed children of an element. - -HTMLPurifier->elements[$element]->content_model and -HTMLPurifier->elements[$element]->content_model_type store information -about the final ChildDef that will be stored in -HTMLPurifier->elements[$element]->child (we use a different variable -because the two forms are sufficiently different). - -$content_model is an abstract, string representation of the internal -state of ChildDef, while $content_model_type is a string identifier -of which ChildDef subclass to instantiate. $content_model is processed -by substituting all content set identifiers (capitalized element names) -with their contents. It is then parsed and passed into the appropriate -ChildDef class, as defined by the ContentSets->getChildDef() or the -custom fallback HTMLModule->getChildDef() for custom child definitions -not in the core. - -You'll need to use these facilities if you plan on referencing a content -set like "Inline" or "Block", and using them is recommended even if you're -not due to their conciseness. - -A few notes on $content_model: it's structure can be as complicated -as you want, but the pipe symbol (|) is reserved for defining possible -choices, due to the content sets implementation. For example, a content -model that looks like: - -"Inline -> Block -> a" - -...when the Inline content set is defined as "span | b" and the Block -content set is defined as "div | blockquote", will expand into: - -"span | b -> div | blockquote -> a" - -The custom HTMLModule->getChildDef() function will need to be able to -then feed this information to ChildDef in a usable manner. - -=== Content Sets === - -Content sets can be altered using HTMLModule->content_sets, an associative -array of content set names to content set contents. If the content set -already exists, your values are appended on to it (great for, say, -registering the font tag as an inline element), otherwise it is -created. They are substituted into content_model. - - vim: et sw=4 sts=4 |