diff options
author | friendica <info@friendica.com> | 2012-07-18 03:59:10 -0700 |
---|---|---|
committer | friendica <info@friendica.com> | 2012-07-18 03:59:10 -0700 |
commit | 22cf19e174bcee88b44968f2773d1bad2da2b54d (patch) | |
tree | f4e01db6f73754418438b020c2327e18c256653c /lib/htmlpurifier/docs/enduser-customize.html | |
parent | 7a40f4354b32809af3d0cfd6e3af0eda02ab0e0a (diff) | |
download | volse-hubzilla-22cf19e174bcee88b44968f2773d1bad2da2b54d.tar.gz volse-hubzilla-22cf19e174bcee88b44968f2773d1bad2da2b54d.tar.bz2 volse-hubzilla-22cf19e174bcee88b44968f2773d1bad2da2b54d.zip |
bad sync with github windows client
Diffstat (limited to 'lib/htmlpurifier/docs/enduser-customize.html')
-rw-r--r-- | lib/htmlpurifier/docs/enduser-customize.html | 850 |
1 files changed, 0 insertions, 850 deletions
diff --git a/lib/htmlpurifier/docs/enduser-customize.html b/lib/htmlpurifier/docs/enduser-customize.html deleted file mode 100644 index 7e1ffa260..000000000 --- a/lib/htmlpurifier/docs/enduser-customize.html +++ /dev/null @@ -1,850 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> -<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head> -<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> -<meta name="description" content="Tutorial for customizing HTML Purifier's tag and attribute sets." /> -<link rel="stylesheet" type="text/css" href="style.css" /> - -<title>Customize - HTML Purifier</title> - -</head><body> - -<h1 class="subtitled">Customize!</h1> -<div class="subtitle">HTML Purifier is a Swiss-Army Knife</div> - -<div id="filing">Filed under End-User</div> -<div id="index">Return to the <a href="index.html">index</a>.</div> -<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div> - -<p> - HTML Purifier has this quirk where if you try to allow certain elements or - attributes, HTML Purifier will tell you that it's not supported, and that - you should go to the forums to find out how to implement it. Well, this - document is how to implement elements and attributes which HTML Purifier - doesn't support out of the box. -</p> - -<h2>Is it necessary?</h2> - -<p> - Before we even write any code, it is paramount to consider whether or - not the code we're writing is necessary or not. HTML Purifier, by default, - contains a large set of elements and attributes: large enough so that - <em>any</em> element or attribute in XHTML 1.0 or 1.1 (and its HTML variants) - that can be safely used by the general public is implemented. -</p> - -<p> - So what needs to be implemented? (Feel free to skip this section if - you know what you want). -</p> - -<h3>XHTML 1.0</h3> - -<p> - All of the modules listed below are based off of the - <a href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#sec_5.2.">modularization of - XHTML</a>, which, while technically for XHTML 1.1, is quite a useful - resource. -</p> - -<ul> - <li>Structure</li> - <li>Frames</li> - <li>Applets (deprecated)</li> - <li>Forms</li> - <li>Image maps</li> - <li>Objects</li> - <li>Frames</li> - <li>Events</li> - <li>Meta-information</li> - <li>Style sheets</li> - <li>Link (not hypertext)</li> - <li>Base</li> - <li>Name</li> -</ul> - -<p> - If you don't recognize it, you probably don't need it. But the curious - can look all of these modules up in the above-mentioned document. Note - that inline scripting comes packaged with HTML Purifier (more on this - later). -</p> - -<h3>XHTML 1.1</h3> - -<p> - As of HTMLPurifier 2.1.0, we have implemented the - <a href="http://www.w3.org/TR/2001/REC-ruby-20010531/">Ruby module</a>, - which defines a set of tags - for publishing short annotations for text, used mostly in Japanese - and Chinese school texts, but applicable for positioning any text (not - limited to translations) above or below other corresponding text. -</p> - -<h3>HTML 5</h3> - -<p> - <a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML 5</a> - is a fork of HTML 4.01 by WHATWG, who believed that XHTML 2.0 was headed - in the wrong direction. It too is a working draft, and may change - drastically before publication, but it should be noted that the - <code>canvas</code> tag has been implemented by many browser vendors. -</p> - -<h3>Proprietary</h3> - -<p> - There are a number of proprietary tags still in the wild. Many of them - have been documented in <a href="ref-proprietary-tags.txt">ref-proprietary-tags.txt</a>, - but there is currently no implementation for any of them. -</p> - -<h3>Extensions</h3> - -<p> - There are also a number of other XML languages out there that can - be embedded in HTML documents: two of the most popular are MathML and - SVG, and I frequently get requests to implement these. But they are - expansive, comprehensive specifications, and it would take far too long - to implement them <em>correctly</em> (most systems I've seen go as far - as whitelisting tags and no further; come on, what about nesting!) -</p> - -<p> - Word of warning: HTML Purifier is currently <em>not</em> namespace - aware. -</p> - -<h2>Giving back</h2> - -<p> - As you may imagine from the details above (don't be abashed if you didn't - read it all: a glance over would have done), there's quite a bit that - HTML Purifier doesn't implement. Recent architectural changes have - allowed HTML Purifier to implement elements and attributes that are not - safe! Don't worry, they won't be activated unless you set %HTML.Trusted - to true, but they certainly help out users who need to put, say, forms - on their page and don't want to go through the trouble of reading this - and implementing it themself. -</p> - -<p> - So any of the above that you implement for your own application could - help out some other poor sap on the other side of the globe. Help us - out, and send back code so that it can be hammered into a module and - released with the core. Any code would be greatly appreciated! -</p> - -<h2>And now...</h2> - -<p> - Enough philosophical talk, time for some code: -</p> - -<pre>$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML.DefinitionRev', 1); -if ($def = $config->maybeGetRawHTMLDefinition()) { - // our code will go here -}</pre> - -<p> - Assuming that HTML Purifier has already been properly loaded (hint: - include <code>HTMLPurifier.auto.php</code>), this code will set up - the environment that you need to start customizing the HTML definition. - What's going on? -</p> - -<ul> - <li> - The first three lines are regular configuration code: - <ul> - <li> - %HTML.DefinitionID is set to a unique identifier for your - custom HTML definition. This prevents it from clobbering - other custom definitions on the same installation. - </li> - <li> - %HTML.DefinitionRev is a revision integer of your HTML - definition. Because HTML definitions are cached, you'll need - to increment this whenever you make a change in order to flush - the cache. - </li> - </ul> - </li> - <li> - The fourth line retrieves a raw <code>HTMLPurifier_HTMLDefinition</code> - object that we will be tweaking. Interestingly enough, we have - placed it in an if block: this is because - <code>maybeGetRawHTMLDefinition</code>, as its name suggests, may - return a NULL, in which case we should skip doing any - initialization. This, in fact, will correspond to when our fully - customized object is already in the cache. - </li> -</ul> - -<h2>Turn off caching</h2> - -<p> - To make development easier, we're going to temporarily turn off - definition caching: -</p> - -<pre>$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML.DefinitionRev', 1); -<strong>$config->set('Cache.DefinitionImpl', null); // TODO: remove this later!</strong> -$def = $config->getHTMLDefinition(true);</pre> - -<p> - A few things should be mentioned about the caching mechanism before - we move on. For performance reasons, HTML Purifier caches generated - <code>HTMLPurifier_Definition</code> objects in serialized files - stored (by default) in <code>library/HTMLPurifier/DefinitionCache/Serializer</code>. - A lot of processing is done in order to create these objects, so it - makes little sense to repeat the same processing over and over again - whenever HTML Purifier is called. -</p> - -<p> - In order to identify a cache entry, HTML Purifier uses three variables: - the library's version number, the value of %HTML.DefinitionRev and - a serial of relevant configuration. Whenever any of these changes, - a new HTML definition is generated. Notice that there is no way - for the definition object to track changes to customizations: here, it - is up to you to supply appropriate information to DefinitionID and - DefinitionRev. -</p> - -<h2 id="addAttribute">Add an attribute</h2> - -<p> - For this example, we're going to implement the <code>target</code> attribute found - on <code>a</code> elements. To implement an attribute, we have to - ask a few questions: -</p> - -<ol> - <li>What element is it found on?</li> - <li>What is its name?</li> - <li>Is it required or optional?</li> - <li>What are valid values for it?</li> -</ol> - -<p> - The first three are easy: the element is <code>a</code>, the attribute - is <code>target</code>, and it is not a required attribute. (If it - was required, we'd need to append an asterisk to the attribute name, - you'll see an example of this in the addElement() example). -</p> - -<p> - The last question is a little trickier. - Lets allow the special values: _blank, _self, _target and _top. - The form of this is called an <strong>enumeration</strong>, a list of - valid values, although only one can be used at a time. To translate - this into code form, we write: -</p> - -<pre>$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML.DefinitionRev', 1); -$config->set('Cache.DefinitionImpl', null); // remove this later! -$def = $config->getHTMLDefinition(true); -<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre> - -<p> - The <code>Enum#_blank,_self,_target,_top</code> does all the magic. - The string is split into two parts, separated by a hash mark (#): -</p> - -<ol> - <li>The first part is the name of what we call an <code>AttrDef</code></li> - <li>The second part is the parameter of the above-mentioned <code>AttrDef</code></li> -</ol> - -<p> - If that sounds vague and generic, it's because it is! HTML Purifier defines - an assortment of different attribute types one can use, and each of these - has their own specialized parameter format. Here are some of the more useful - ones: -</p> - -<table class="table"> - <thead> - <tr> - <th>Type</th> - <th>Format</th> - <th>Description</th> - </tr> - </thead> - <tbody> - <tr> - <th>Enum</th> - <td><em>[s:]</em>value1,value2,...</td> - <td> - Attribute with a number of valid values, one of which may be used. When - s: is present, the enumeration is case sensitive. - </td> - </tr> - <tr> - <th>Bool</th> - <td>attribute_name</td> - <td> - Boolean attribute, with only one valid value: the name - of the attribute. - </td> - </tr> - <tr> - <th>CDATA</th> - <td></td> - <td> - Attribute of arbitrary text. Can also be referred to as <strong>Text</strong> - (the specification makes a semantic distinction between the two). - </td> - </tr> - <tr> - <th>ID</th> - <td></td> - <td> - Attribute that specifies a unique ID - </td> - </tr> - <tr> - <th>Pixels</th> - <td></td> - <td> - Attribute that specifies an integer pixel length - </td> - </tr> - <tr> - <th>Length</th> - <td></td> - <td> - Attribute that specifies a pixel or percentage length - </td> - </tr> - <tr> - <th>NMTOKENS</th> - <td></td> - <td> - Attribute that specifies a number of name tokens, example: the - <code>class</code> attribute - </td> - </tr> - <tr> - <th>URI</th> - <td></td> - <td> - Attribute that specifies a URI, example: the <code>href</code> - attribute - </td> - </tr> - <tr> - <th>Number</th> - <td></td> - <td> - Attribute that specifies an positive integer number - </td> - </tr> - </tbody> -</table> - -<p> - For a complete list, consult - <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>; - more information on attributes that accept parameters can be found on their - respective includes in - <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/AttrDef"><code>library/HTMLPurifier/AttrDef</code></a>. -</p> - -<p> - Sometimes, the restrictive list in AttrTypes just doesn't cut it. Don't - sweat: you can also use a fully instantiated object as the value. The - equivalent, verbose form of the above example is: -</p> - -<pre>$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML.DefinitionRev', 1); -$config->set('Cache.DefinitionImpl', null); // remove this later! -$def = $config->getHTMLDefinition(true); -<strong>$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( - array('_blank','_self','_target','_top') -));</strong></pre> - -<p> - Trust me, you'll learn to love the shorthand. -</p> - -<h2>Add an element</h2> - -<p> - Adding attributes is really small-fry stuff, though, and it was possible - to add them (albeit a bit more wordy) prior to 2.0. The real gem of - the Advanced API is adding elements. There are five questions to - ask when adding a new element: -</p> - -<ol> - <li>What is the element's name?</li> - <li>What content set does this element belong to?</li> - <li>What are the allowed children of this element?</li> - <li>What attributes does the element allow that are general?</li> - <li>What attributes does the element allow that are specific to this element?</li> -</ol> - -<p> - It's a mouthful, and you'll be slightly lost if your not familiar with - the HTML specification, so let's explain them step by step. -</p> - -<h3>Content set</h3> - -<p> - The HTML specification defines two major content sets: Inline - and Block. Each of these - content sets contain a list of elements: Inline contains things like - <code>span</code> and <code>b</code> while Block contains things like - <code>div</code> and <code>blockquote</code>. -</p> - -<p> - These content sets amount to a macro mechanism for HTML definition. Most - elements in HTML are organized into one of these two sets, and most - elements in HTML allow elements from one of these sets. If we had - to write each element verbatim into each other element's allowed - children, we would have ridiculously large lists; instead we use - content sets to compactify the declaration. -</p> - -<p> - Practically speaking, there are several useful values you can use here: -</p> - -<table class="table"> - <thead> - <tr> - <th>Content set</th> - <th>Description</th> - </tr> - </thead> - <tbody> - <tr> - <th>Inline</th> - <td>Character level elements, text</td> - </tr> - <tr> - <th>Block</th> - <td>Block-like elements, like paragraphs and lists</td> - </tr> - <tr> - <th><em>false</em></th> - <td> - Any element that doesn't fit into the mold, for example <code>li</code> - or <code>tr</code> - </td> - </tr> - </tbody> -</table> - -<p> - By specifying a valid value here, all other elements that use that - content set will also allow your element, without you having to do - anything. If you specify <em>false</em>, you'll have to register - your element manually. -</p> - -<h3>Allowed children</h3> - -<p> - Allowed children defines the elements that this element can contain. - The allowed values may range from none to a complex regexp depending on - your element. -</p> - -<p> - If you've ever taken a look at the HTML DTD's before, you may have - noticed declarations like this: -</p> - -<pre><!ELEMENT LI - O (%flow;)* -- list item --></pre> - -<p> - The <code>(%flow;)*</code> indicates the allowed children of the - <code>li</code> tag: <code>li</code> allows any number of flow - elements as its children. (The <code>- O</code> allows the closing tag to be - omitted, though in XML this is not allowed.) In HTML Purifier, - we'd write it like <code>Flow</code> (here's where the content sets - we were discussing earlier come into play). There are three shorthand - content models you can specify: -</p> - -<table class="table"> - <thead> - <tr> - <th>Content model</th> - <th>Description</th> - </tr> - </thead> - <tbody> - <tr> - <th>Empty</th> - <td>No children allowed, like <code>br</code> or <code>hr</code></td> - </tr> - <tr> - <th>Inline</th> - <td>Any number of inline elements and text, like <code>span</code></td> - </tr> - <tr> - <th>Flow</th> - <td>Any number of inline elements, block elements and text, like <code>div</code></td> - </tr> - </tbody> -</table> - -<p> - This covers 90% of all the cases out there, but what about elements that - break the mold like <code>ul</code>? This guy requires at least one - child, and the only valid children for it are <code>li</code>. The - content model is: <code>Required: li</code>. There are two parts: the - first type determines what <code>ChildDef</code> will be used to validate - content models. The most common values are: -</p> - -<table class="table"> - <thead> - <tr> - <th>Type</th> - <th>Description</th> - </tr> - </thead> - <tbody> - <tr> - <th>Required</th> - <td>Children must be one or more of the valid elements</td> - </tr> - <tr> - <th>Optional</th> - <td>Children can be any number of the valid elements</td> - </tr> - <tr> - <th>Custom</th> - <td>Children must follow the DTD-style regex</td> - </tr> - </tbody> -</table> - -<p> - You can also implement your own <code>ChildDef</code>: this was done - for a few special cases in HTML Purifier such as <code>Chameleon</code> - (for <code>ins</code> and <code>del</code>), <code>StrictBlockquote</code> - and <code>Table</code>. -</p> - -<p> - The second part specifies either valid elements or a regular expression. - Valid elements are separated with horizontal bars (|), i.e. - "<code>a | b | c</code>". Use #PCDATA to represent plain text. - Regular expressions are based off of DTD's style: -</p> - -<ul> - <li>Parentheses () are used for grouping</li> - <li>Commas (,) separate elements that should come one after another</li> - <li>Horizontal bars (|) indicate one or the other elements should be used</li> - <li>Plus signs (+) are used for a one or more match</li> - <li>Asterisks (*) are used for a zero or more match</li> - <li>Question marks (?) are used for a zero or one match</li> -</ul> - -<p> - For example, "<code>a, b?, (c | d), e+, f*</code>" means "In this order, - one <code>a</code> element, at most one <code>b</code> element, - one <code>c</code> or <code>d</code> element (but not both), one or more - <code>e</code> elements, and any number of <code>f</code> elements." - Regex veterans should be able to jump right in, and those not so savvy - can always copy-paste W3C's content model definitions into HTML Purifier - and hope for the best. -</p> - -<p> - A word of warning: while the regex format is extremely flexible on - the developer's side, it is - quite unforgiving on the user's side. If the user input does not <em>exactly</em> - match the specification, the entire contents of the element will - be nuked. This is why there is are specific content model types like - Optional and Required: while they could be implemented as <code>Custom: - (valid | elements)*</code>, the custom classes contain special recovery - measures that make sure as much of the user's original content gets - through. HTML Purifier's core, as a rule, does not use Custom. -</p> - -<p> - One final note: you can also use Content Sets inside your valid elements - lists or regular expressions. In fact, the three shorthand content models - mentioned above are just that: abbreviations: -</p> - -<table class="table"> - <thead> - <tr> - <th>Content model</th> - <th>Implementation</th> - </tr> - </thead> - <tbody> - <tr> - <th>Inline</th> - <td>Optional: Inline | #PCDATA</td> - </tr> - <tr> - <th>Flow</th> - <td>Optional: Flow | #PCDATA</td> - </tr> - </tbody> -</table> - -<p> - When the definition is compiled, Inline will be replaced with a - horizontal-bar separated list of inline elements. Also, notice that - it does not contain text: you have to specify that yourself. -</p> - -<h3>Common attributes</h3> - -<p> - Congratulations: you have just gotten over the proverbial hump (Allowed - children). Common attributes is much simpler, and boils down to - one question: does your element have the <code>id</code>, <code>style</code>, - <code>class</code>, <code>title</code> and <code>lang</code> attributes? - If so, you'll want to specify the <code>Common</code> attribute collection, - which contains these five attributes that are found on almost every - HTML element in the specification. -</p> - -<p> - There are a few more collections, but they're really edge cases: -</p> - -<table class="table"> - <thead> - <tr> - <th>Collection</th> - <th>Attributes</th> - </tr> - </thead> - <tbody> - <tr> - <th>I18N</th> - <td><code>lang</code>, possibly <code>xml:lang</code></td> - </tr> - <tr> - <th>Core</th> - <td><code>style</code>, <code>class</code>, <code>id</code> and <code>title</code></td> - </tr> - </tbody> -</table> - -<p> - Common is a combination of the above-mentioned collections. -</p> - -<p class="aside"> - Readers familiar with the modularization may have noticed that the Core - attribute collection differs from that specified by the <a - href="http://www.w3.org/TR/xhtml-modularization/abstract_modules.html#s_commonatts">abstract - modules of the XHTML Modularization 1.1</a>. We believe this section - to be in error, as <code>br</code> permits the use of the <code>style</code> - attribute even though it uses the <code>Core</code> collection, and - the DTD and XML Schemas supplied by W3C support our interpretation. -</p> - -<h3>Attributes</h3> - -<p> - If you didn't read the <a href="#addAttribute">earlier section on - adding attributes</a>, read it now. The last parameter is simply - an array of attribute names to attribute implementations, in the exact - same format as <code>addAttribute()</code>. -</p> - -<h3>Putting it all together</h3> - -<p> - We're going to implement <code>form</code>. Before we embark, lets - grab a reference implementation from over at the - <a href="http://www.w3.org/TR/html4/sgml/loosedtd.html">transitional DTD</a>: -</p> - -<pre><!ELEMENT FORM - - (%flow;)* -(FORM) -- interactive form --> -<!ATTLIST FORM - %attrs; -- %coreattrs, %i18n, %events -- - action %URI; #REQUIRED -- server-side form handler -- - method (GET|POST) GET -- HTTP method used to submit the form-- - enctype %ContentType; "application/x-www-form-urlencoded" - accept %ContentTypes; #IMPLIED -- list of MIME types for file upload -- - name CDATA #IMPLIED -- name of form for scripting -- - onsubmit %Script; #IMPLIED -- the form was submitted -- - onreset %Script; #IMPLIED -- the form was reset -- - target %FrameTarget; #IMPLIED -- render in this frame -- - accept-charset %Charsets; #IMPLIED -- list of supported charsets -- - ></pre> - -<p> - Juicy! With just this, we can answer four of our five questions: -</p> - -<ol> - <li>What is the element's name? <strong>form</strong></li> - <li>What content set does this element belong to? <strong>Block</strong> - (this needs a little sleuthing, I find the easiest way is to search - the DTD for <code>FORM</code> and determine which set it is in.)</li> - <li>What are the allowed children of this element? <strong>One - or more flow elements, but no nested <code>form</code>s</strong></li> - <li>What attributes does the element allow that are general? <strong>Common</strong></li> - <li>What attributes does the element allow that are specific to this element? <strong>A whole bunch, see ATTLIST; - we're going to do the vital ones: <code>action</code>, <code>method</code> and <code>name</code></strong></li> -</ol> - -<p> - Time for some code: -</p> - -<pre>$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML.DefinitionRev', 1); -$config->set('Cache.DefinitionImpl', null); // remove this later! -$def = $config->getHTMLDefinition(true); -$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( - array('_blank','_self','_target','_top') -)); -<strong>$form = $def->addElement( - 'form', // name - 'Block', // content set - 'Flow', // allowed children - 'Common', // attribute collection - array( // attributes - 'action*' => 'URI', - 'method' => 'Enum#get|post', - 'name' => 'ID' - ) -); -$form->excludes = array('form' => true);</strong></pre> - -<p> - Each of the parameters corresponds to one of the questions we asked. - Notice that we added an asterisk to the end of the <code>action</code> - attribute to indicate that it is required. If someone specifies a - <code>form</code> without that attribute, the tag will be axed. - Also, the extra line at the end is a special extra declaration that - prevents forms from being nested within each other. -</p> - -<p> - And that's all there is to it! Implementing the rest of the form - module is left as an exercise to the user; to see more examples - check the <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/HTMLModule"><code>library/HTMLPurifier/HTMLModule/</code></a> directory - in your local HTML Purifier installation. -</p> - -<h2>And beyond...</h2> - -<p> - Perceptive users may have realized that, to a certain extent, we - have simply re-implemented the facilities of XML Schema or the - Document Type Definition. What you are seeing here, however, is - not just an XML Schema or Document Type Definition: it is a fully - expressive method of specifying the definition of HTML that is - a portable superset of the capabilities of the two above-mentioned schema - languages. What makes HTMLDefinition so powerful is the fact that - if we don't have an implementation for a content model or an attribute - definition, you can supply it yourself by writing a PHP class. -</p> - -<p> - There are many facets of HTMLDefinition beyond the Advanced API I have - walked you through today. To find out more about these, you can - check out these source files: -</p> - -<ul> - <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li> - <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li> -</ul> - -<h2 id="optimized">Notes for HTML Purifier 4.2.0 and earlier</h3> - -<p> - Previously, this tutorial gave some incorrect template code for - editing raw definitions, and that template code will now produce the - error <q>Due to a documentation error in previous version of HTML - Purifier...</q> Here is how to mechanically transform old-style - code into new-style code. -</p> - -<p> - First, identify all code that edits the raw definition object, and - put it together. Ensure none of this code must be run on every - request; if some sub-part needs to always be run, move it outside - this block. Here is an example below, with the raw definition - object code bolded. -</p> - -<pre>$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML.DefinitionRev', 1); -$def = $config->getHTMLDefinition(true); -<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong> -$purifier = new HTMLPurifier($config);</pre> - -<p> - Next, replace the raw definition retrieval with a - maybeGetRawHTMLDefinition method call inside an if conditional, and - place the editing code inside that if block. -</p> - -<pre>$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML.DefinitionRev', 1); -<strong>if ($def = $config->maybeGetRawHTMLDefinition()) { - $def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top'); -}</strong> -$purifier = new HTMLPurifier($config);</pre> - -<p> - And you're done! Alternatively, if you're OK with not ever caching - your code, the following will still work and not emit warnings. -</p> - -<pre>$config = HTMLPurifier_Config::createDefault(); -$def = $config->getHTMLDefinition(true); -$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top'); -$purifier = new HTMLPurifier($config);</pre> - -<p> - A slightly less efficient version of this was what was going on with - old versions of HTML Purifier. -</p> - -<p> - <em>Technical notes:</em> ajh pointed out on <a - href="http://htmlpurifier.org/phorum/read.php?5,5164,5169#msg-5169">in a forum topic</a> that - HTML Purifier appeared to be repeatedly writing to the cache even - when a cache entry already existed. Investigation lead to the - discovery of the following infelicity: caching of customized - definitions didn't actually work! The problem was that even though - a cache file would be written out at the end of the process, there - was no way for HTML Purifier to say, <q>Actually, I've already got a - copy of your work, no need to reconfigure your - customizations</q>. This required the API to change: placing - all of the customizations to the raw definition object in a - conditional which could be skipped. -</p> - -</body></html> - -<!-- vim: et sw=4 sts=4 ---> |