1 files changed, 850 insertions, 0 deletions
diff --git a/lib/htmlpurifier/docs/enduser-customize.html b/lib/htmlpurifier/docs/enduser-customize.html
new file mode 100644
index 000000000..7e1ffa260
--- /dev/null
+++ b/lib/htmlpurifier/docs/enduser-customize.html
@@ -0,0 +1,850 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+<meta name="description" content="Tutorial for customizing HTML Purifier's tag and attribute sets." />
+<link rel="stylesheet" type="text/css" href="style.css" />
+
+<title>Customize - HTML Purifier</title>
+
+</head><body>
+
+<h1 class="subtitled">Customize!</h1>
+<div class="subtitle">HTML Purifier is a Swiss-Army Knife</div>
+
+<div id="filing">Filed under End-User</div>
+<div id="index">Return to the <a href="index.html">index</a>.</div>
+<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
+
+<p>
+  HTML Purifier has this quirk where if you try to allow certain elements or
+  attributes, HTML Purifier will tell you that it's not supported, and that
+  you should go to the forums to find out how to implement it. Well, this
+  document is how to implement elements and attributes which HTML Purifier
+  doesn't support out of the box.
+</p>
+
+<h2>Is it necessary?</h2>
+
+<p>
+  Before we even write any code, it is paramount to consider whether or
+  not the code we're writing is necessary or not. HTML Purifier, by default,
+  contains a large set of elements and attributes: large enough so that
+  <em>any</em> element or attribute in XHTML 1.0 or 1.1 (and its HTML variants)
+  that can be safely used by the general public is implemented.
+</p>
+
+<p>
+  So what needs to be implemented? (Feel free to skip this section if
+  you know what you want).
+</p>
+
+<h3>XHTML 1.0</h3>
+
+<p>
+  All of the modules listed below are based off of the
+  <a href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#sec_5.2.">modularization of
+  XHTML</a>, which, while technically for XHTML 1.1, is quite a useful
+  resource.
+</p>
+
+<ul>
+  <li>Structure</li>
+  <li>Frames</li>
+  <li>Applets (deprecated)</li>
+  <li>Forms</li>
+  <li>Image maps</li>
+  <li>Objects</li>
+  <li>Frames</li>
+  <li>Events</li>
+  <li>Meta-information</li>
+  <li>Style sheets</li>
+  <li>Link (not hypertext)</li>
+  <li>Base</li>
+  <li>Name</li>
+</ul>
+
+<p>
+  If you don't recognize it, you probably don't need it. But the curious
+  can look all of these modules up in the above-mentioned document.  Note
+  that inline scripting comes packaged with HTML Purifier (more on this
+  later).
+</p>
+
+<h3>XHTML 1.1</h3>
+
+<p>
+  As of HTMLPurifier 2.1.0, we have implemented the
+  <a href="http://www.w3.org/TR/2001/REC-ruby-20010531/">Ruby module</a>,
+  which defines a set of tags
+  for publishing short annotations for text, used mostly in Japanese
+  and Chinese school texts, but applicable for positioning any text (not
+  limited to translations) above or below other corresponding text.
+</p>
+
+<h3>HTML 5</h3>
+
+<p>
+  <a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML 5</a>
+  is a fork of HTML 4.01 by WHATWG, who believed that XHTML 2.0 was headed
+  in the wrong direction.  It too is a working draft, and may change
+  drastically before publication, but it should be noted that the
+  <code>canvas</code> tag has been implemented by many browser vendors.
+</p>
+
+<h3>Proprietary</h3>
+
+<p>
+  There are a number of proprietary tags still in the wild. Many of them
+  have been documented in <a href="ref-proprietary-tags.txt">ref-proprietary-tags.txt</a>,
+  but there is currently no implementation for any of them.
+</p>
+
+<h3>Extensions</h3>
+
+<p>
+  There are also a number of other XML languages out there that can
+  be embedded in HTML documents: two of the most popular are MathML and
+  SVG, and I frequently get requests to implement these.  But they are
+  expansive, comprehensive specifications, and it would take far too long
+  to implement them <em>correctly</em> (most systems I've seen go as far
+  as whitelisting tags and no further; come on, what about nesting!)
+</p>
+
+<p>
+  Word of warning: HTML Purifier is currently <em>not</em> namespace
+  aware.
+</p>
+
+<h2>Giving back</h2>
+
+<p>
+  As you may imagine from the details above (don't be abashed if you didn't
+  read it all: a glance over would have done), there's quite a bit that
+  HTML Purifier doesn't implement.  Recent architectural changes have
+  allowed HTML Purifier to implement elements and attributes that are not
+  safe!  Don't worry, they won't be activated unless you set %HTML.Trusted
+  to true, but they certainly help out users who need to put, say, forms
+  on their page and don't want to go through the trouble of reading this
+  and implementing it themself.
+</p>
+
+<p>
+  So any of the above that you implement for your own application could
+  help out some other poor sap on the other side of the globe.  Help us
+  out, and send back code so that it can be hammered into a module and
+  released with the core.  Any code would be greatly appreciated!
+</p>
+
+<h2>And now...</h2>
+
+<p>
+  Enough philosophical talk, time for some code:
+</p>
+
+<pre>$config = HTMLPurifier_Config::createDefault();
+$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
+$config-&gt;set('HTML.DefinitionRev', 1);
+if ($def = $config-&gt;maybeGetRawHTMLDefinition()) {
+    // our code will go here
+}</pre>
+
+<p>
+  Assuming that HTML Purifier has already been properly loaded (hint:
+  include <code>HTMLPurifier.auto.php</code>), this code will set up
+  the environment that you need to start customizing the HTML definition.
+  What's going on?
+</p>
+
+<ul>
+  <li>
+    The first three lines are regular configuration code:
+    <ul>
+      <li>
+        %HTML.DefinitionID is set to a unique identifier for your
+        custom HTML definition.  This prevents it from clobbering
+        other custom definitions on the same installation.
+      </li>
+      <li>
+        %HTML.DefinitionRev is a revision integer of your HTML
+        definition.  Because HTML definitions are cached, you'll need
+        to increment this whenever you make a change in order to flush
+        the cache.
+      </li>
+    </ul>
+  </li>
+  <li>
+    The fourth line retrieves a raw <code>HTMLPurifier_HTMLDefinition</code>
+    object that we will be tweaking.  Interestingly enough, we have
+    placed it in an if block: this is because
+    <code>maybeGetRawHTMLDefinition</code>, as its name suggests, may
+    return a NULL, in which case we should skip doing any
+    initialization.  This, in fact, will correspond to when our fully
+    customized object is already in the cache.
+  </li>
+</ul>
+
+<h2>Turn off caching</h2>
+
+<p>
+  To make development easier, we're going to temporarily turn off
+  definition caching:
+</p>
+
+<pre>$config = HTMLPurifier_Config::createDefault();
+$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
+$config-&gt;set('HTML.DefinitionRev', 1);
+<strong>$config-&gt;set('Cache.DefinitionImpl', null); // TODO: remove this later!</strong>
+$def = $config-&gt;getHTMLDefinition(true);</pre>
+
+<p>
+  A few things should be mentioned about the caching mechanism before
+  we move on.  For performance reasons, HTML Purifier caches generated
+  <code>HTMLPurifier_Definition</code> objects in serialized files
+  stored (by default) in <code>library/HTMLPurifier/DefinitionCache/Serializer</code>.
+  A lot of processing is done in order to create these objects, so it
+  makes little sense to repeat the same processing over and over again
+  whenever HTML Purifier is called.
+</p>
+
+<p>
+  In order to identify a cache entry, HTML Purifier uses three variables:
+  the library's version number, the value of %HTML.DefinitionRev and
+  a serial of relevant configuration.  Whenever any of these changes,
+  a new HTML definition is generated.  Notice that there is no way
+  for the definition object to track changes to customizations: here, it
+  is up to you to supply appropriate information to DefinitionID and
+  DefinitionRev.
+</p>
+
+<h2 id="addAttribute">Add an attribute</h2>
+
+<p>
+  For this example, we're going to implement the <code>target</code> attribute found
+  on <code>a</code> elements.  To implement an attribute, we have to
+  ask a few questions:
+</p>
+
+<ol>
+  <li>What element is it found on?</li>
+  <li>What is its name?</li>
+  <li>Is it required or optional?</li>
+  <li>What are valid values for it?</li>
+</ol>
+
+<p>
+  The first three are easy: the element is <code>a</code>, the attribute
+  is <code>target</code>, and it is not a required attribute. (If it
+  was required, we'd need to append an asterisk to the attribute name,
+  you'll see an example of this in the addElement() example).
+</p>
+
+<p>
+  The last question is a little trickier.
+  Lets allow the special values: _blank, _self, _target and _top.
+  The form of this is called an <strong>enumeration</strong>, a list of
+  valid values, although only one can be used at a time.  To translate
+  this into code form, we write:
+</p>
+
+<pre>$config = HTMLPurifier_Config::createDefault();
+$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
+$config-&gt;set('HTML.DefinitionRev', 1);
+$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
+$def = $config-&gt;getHTMLDefinition(true);
+<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre>
+
+<p>
+  The <code>Enum#_blank,_self,_target,_top</code> does all the magic.
+  The string is split into two parts, separated by a hash mark (#):
+</p>
+
+<ol>
+  <li>The first part is the name of what we call an <code>AttrDef</code></li>
+  <li>The second part is the parameter of the above-mentioned <code>AttrDef</code></li>
+</ol>
+
+<p>
+  If that sounds vague and generic, it's because it is!  HTML Purifier defines
+  an assortment of different attribute types one can use, and each of these
+  has their own specialized parameter format.  Here are some of the more useful
+  ones:
+</p>
+
+<table class="table">
+  <thead>
+    <tr>
+      <th>Type</th>
+      <th>Format</th>
+      <th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>Enum</th>
+      <td><em>[s:]</em>value1,value2,...</td>
+      <td>
+        Attribute with a number of valid values, one of which may be used. When
+        s: is present, the enumeration is case sensitive.
+      </td>
+    </tr>
+    <tr>
+      <th>Bool</th>
+      <td>attribute_name</td>
+      <td>
+        Boolean attribute, with only one valid value: the name
+        of the attribute.
+      </td>
+    </tr>
+    <tr>
+      <th>CDATA</th>
+      <td></td>
+      <td>
+        Attribute of arbitrary text. Can also be referred to as <strong>Text</strong>
+        (the specification makes a semantic distinction between the two).
+      </td>
+    </tr>
+    <tr>
+      <th>ID</th>
+      <td></td>
+      <td>
+        Attribute that specifies a unique ID
+      </td>
+    </tr>
+    <tr>
+      <th>Pixels</th>
+      <td></td>
+      <td>
+        Attribute that specifies an integer pixel length
+      </td>
+    </tr>
+    <tr>
+      <th>Length</th>
+      <td></td>
+      <td>
+        Attribute that specifies a pixel or percentage length
+      </td>
+    </tr>
+    <tr>
+      <th>NMTOKENS</th>
+      <td></td>
+      <td>
+        Attribute that specifies a number of name tokens, example: the
+        <code>class</code> attribute
+      </td>
+    </tr>
+    <tr>
+      <th>URI</th>
+      <td></td>
+      <td>
+        Attribute that specifies a URI, example: the <code>href</code>
+        attribute
+      </td>
+    </tr>
+    <tr>
+      <th>Number</th>
+      <td></td>
+      <td>
+        Attribute that specifies an positive integer number
+      </td>
+    </tr>
+  </tbody>
+</table>
+
+<p>
+  For a complete list, consult
+  <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>;
+  more information on attributes that accept parameters can be found on their
+  respective includes in
+  <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/AttrDef"><code>library/HTMLPurifier/AttrDef</code></a>.
+</p>
+
+<p>
+  Sometimes, the restrictive list in AttrTypes just doesn't cut it. Don't
+  sweat: you can also use a fully instantiated object as the value. The
+  equivalent, verbose form of the above example is:
+</p>
+
+<pre>$config = HTMLPurifier_Config::createDefault();
+$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
+$config-&gt;set('HTML.DefinitionRev', 1);
+$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
+$def = $config-&gt;getHTMLDefinition(true);
+<strong>$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
+  array('_blank','_self','_target','_top')
+));</strong></pre>
+
+<p>
+  Trust me, you'll learn to love the shorthand.
+</p>
+
+<h2>Add an element</h2>
+
+<p>
+  Adding attributes is really small-fry stuff, though, and it was possible
+  to add them (albeit a bit more wordy) prior to 2.0. The real gem of
+  the Advanced API is adding elements. There are five questions to
+  ask when adding a new element:
+</p>
+
+<ol>
+  <li>What is the element's name?</li>
+  <li>What content set does this element belong to?</li>
+  <li>What are the allowed children of this element?</li>
+  <li>What attributes does the element allow that are general?</li>
+  <li>What attributes does the element allow that are specific to this element?</li>
+</ol>
+
+<p>
+  It's a mouthful, and you'll be slightly lost if your not familiar with
+  the HTML specification, so let's explain them step by step.
+</p>
+
+<h3>Content set</h3>
+
+<p>
+  The HTML specification defines two major content sets: Inline
+  and Block.  Each of these
+  content sets contain a list of elements: Inline contains things like
+  <code>span</code> and <code>b</code> while Block contains things like
+  <code>div</code> and <code>blockquote</code>.
+</p>
+
+<p>
+  These content sets amount to a macro mechanism for HTML definition. Most
+  elements in HTML are organized into one of these two sets, and most
+  elements in HTML allow elements from one of these sets.  If we had
+  to write each element verbatim into each other element's allowed
+  children, we would have ridiculously large lists; instead we use
+  content sets to compactify the declaration.
+</p>
+
+<p>
+  Practically speaking, there are several useful values you can use here:
+</p>
+
+<table class="table">
+  <thead>
+    <tr>
+      <th>Content set</th>
+      <th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>Inline</th>
+      <td>Character level elements, text</td>
+    </tr>
+    <tr>
+      <th>Block</th>
+      <td>Block-like elements, like paragraphs and lists</td>
+    </tr>
+    <tr>
+      <th><em>false</em></th>
+      <td>
+        Any element that doesn't fit into the mold, for example <code>li</code>
+        or <code>tr</code>
+      </td>
+    </tr>
+  </tbody>
+</table>
+
+<p>
+  By specifying a valid value here, all other elements that use that
+  content set will also allow your element, without you having to do
+  anything. If you specify <em>false</em>, you'll have to register
+  your element manually.
+</p>
+
+<h3>Allowed children</h3>
+
+<p>
+  Allowed children defines the elements that this element can contain.
+  The allowed values may range from none to a complex regexp depending on
+  your element.
+</p>
+
+<p>
+  If you've ever taken a look at the HTML DTD's before, you may have
+  noticed declarations like this:
+</p>
+
+<pre>&lt;!ELEMENT LI - O (%flow;)*             -- list item --&gt;</pre>
+
+<p>
+  The <code>(%flow;)*</code> indicates the allowed children of the
+  <code>li</code> tag: <code>li</code> allows any number of flow
+  elements as its children. (The <code>- O</code> allows the closing tag to be
+  omitted, though in XML this is not allowed.) In HTML Purifier,
+  we'd write it like <code>Flow</code> (here's where the content sets
+  we were discussing earlier come into play). There are three shorthand
+  content models you can specify:
+</p>
+
+<table class="table">
+  <thead>
+    <tr>
+      <th>Content model</th>
+      <th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>Empty</th>
+      <td>No children allowed, like <code>br</code> or <code>hr</code></td>
+    </tr>
+    <tr>
+      <th>Inline</th>
+      <td>Any number of inline elements and text, like <code>span</code></td>
+    </tr>
+    <tr>
+      <th>Flow</th>
+      <td>Any number of inline elements, block elements and text, like <code>div</code></td>
+    </tr>
+  </tbody>
+</table>
+
+<p>
+  This covers 90% of all the cases out there, but what about elements that
+  break the mold like <code>ul</code>? This guy requires at least one
+  child, and the only valid children for it are <code>li</code>. The
+  content model is: <code>Required: li</code>. There are two parts: the
+  first type determines what <code>ChildDef</code> will be used to validate
+  content models. The most common values are:
+</p>
+
+<table class="table">
+  <thead>
+    <tr>
+      <th>Type</th>
+      <th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>Required</th>
+      <td>Children must be one or more of the valid elements</td>
+    </tr>
+    <tr>
+      <th>Optional</th>
+      <td>Children can be any number of the valid elements</td>
+    </tr>
+    <tr>
+      <th>Custom</th>
+      <td>Children must follow the DTD-style regex</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>
+  You can also implement your own <code>ChildDef</code>: this was done
+  for a few special cases in HTML Purifier such as <code>Chameleon</code>
+  (for <code>ins</code> and <code>del</code>), <code>StrictBlockquote</code>
+  and <code>Table</code>.
+</p>
+
+<p>
+  The second part specifies either valid elements or a regular expression.
+  Valid elements are separated with horizontal bars (|), i.e.
+  "<code>a | b | c</code>".  Use #PCDATA to represent plain text.
+  Regular expressions are based off of DTD's style:
+</p>
+
+<ul>
+  <li>Parentheses () are used for grouping</li>
+  <li>Commas (,) separate elements that should come one after another</li>
+  <li>Horizontal bars (|) indicate one or the other elements should be used</li>
+  <li>Plus signs (+) are used for a one or more match</li>
+  <li>Asterisks (*) are used for a zero or more match</li>
+  <li>Question marks (?) are used for a zero or one match</li>
+</ul>
+
+<p>
+  For example, "<code>a, b?, (c | d), e+, f*</code>" means "In this order,
+  one <code>a</code> element, at most one <code>b</code> element,
+  one <code>c</code> or <code>d</code> element (but not both), one or more
+  <code>e</code> elements, and any number of <code>f</code> elements."
+  Regex veterans should be able to jump right in, and those not so savvy
+  can always copy-paste W3C's content model definitions into HTML Purifier
+  and hope for the best.
+</p>
+
+<p>
+  A word of warning: while the regex format is extremely flexible on
+  the developer's side, it is
+  quite unforgiving on the user's side.  If the user input does not <em>exactly</em>
+  match the specification, the entire contents of the element will
+  be nuked.  This is why there is are specific content model types like
+  Optional and Required: while they could be implemented as <code>Custom:
+  (valid | elements)*</code>, the custom classes contain special recovery
+  measures that make sure as much of the user's original content gets
+  through. HTML Purifier's core, as a rule, does not use Custom.
+</p>
+
+<p>
+  One final note: you can also use Content Sets inside your valid elements
+  lists or regular expressions. In fact, the three shorthand content models
+  mentioned above are just that: abbreviations:
+</p>
+
+<table class="table">
+  <thead>
+    <tr>
+      <th>Content model</th>
+      <th>Implementation</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>Inline</th>
+      <td>Optional: Inline | #PCDATA</td>
+    </tr>
+    <tr>
+      <th>Flow</th>
+      <td>Optional: Flow | #PCDATA</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>
+  When the definition is compiled, Inline will be replaced with a
+  horizontal-bar separated list of inline elements. Also, notice that
+  it does not contain text: you have to specify that yourself.
+</p>
+
+<h3>Common attributes</h3>
+
+<p>
+  Congratulations: you have just gotten over the proverbial hump (Allowed
+  children). Common attributes is much simpler, and boils down to
+  one question: does your element have the <code>id</code>, <code>style</code>,
+  <code>class</code>, <code>title</code> and <code>lang</code> attributes?
+  If so, you'll want to specify the <code>Common</code> attribute collection,
+  which contains these five attributes that are found on almost every
+  HTML element in the specification.
+</p>
+
+<p>
+  There are a few more collections, but they're really edge cases:
+</p>
+
+<table class="table">
+  <thead>
+    <tr>
+      <th>Collection</th>
+      <th>Attributes</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>I18N</th>
+      <td><code>lang</code>, possibly <code>xml:lang</code></td>
+    </tr>
+    <tr>
+      <th>Core</th>
+      <td><code>style</code>, <code>class</code>, <code>id</code> and <code>title</code></td>
+    </tr>
+  </tbody>
+</table>
+
+<p>
+  Common is a combination of the above-mentioned collections.
+</p>
+
+<p class="aside">
+  Readers familiar with the modularization may have noticed that the Core
+  attribute collection differs from that specified by the <a
+  href="http://www.w3.org/TR/xhtml-modularization/abstract_modules.html#s_commonatts">abstract
+  modules of the XHTML Modularization 1.1</a>. We believe this section
+  to be in error, as <code>br</code> permits the use of the <code>style</code>
+  attribute even though it uses the <code>Core</code> collection, and
+  the DTD and XML Schemas supplied by W3C support our interpretation.
+</p>
+
+<h3>Attributes</h3>
+
+<p>
+  If you didn't read the <a href="#addAttribute">earlier section on
+  adding attributes</a>, read it now.  The last parameter is simply
+  an array of attribute names to attribute implementations, in the exact
+  same format as <code>addAttribute()</code>.
+</p>
+
+<h3>Putting it all together</h3>
+
+<p>
+  We're going to implement <code>form</code>. Before we embark, lets
+  grab a reference implementation from over at the
+  <a href="http://www.w3.org/TR/html4/sgml/loosedtd.html">transitional DTD</a>:
+</p>
+
+<pre>&lt;!ELEMENT FORM - - (%flow;)* -(FORM)   -- interactive form --&gt;
+&lt;!ATTLIST FORM
+  %attrs;                              -- %coreattrs, %i18n, %events --
+  action      %URI;          #REQUIRED -- server-side form handler --
+  method      (GET|POST)     GET       -- HTTP method used to submit the form--
+  enctype     %ContentType;  &quot;application/x-www-form-urlencoded&quot;
+  accept      %ContentTypes; #IMPLIED  -- list of MIME types for file upload --
+  name        CDATA          #IMPLIED  -- name of form for scripting --
+  onsubmit    %Script;       #IMPLIED  -- the form was submitted --
+  onreset     %Script;       #IMPLIED  -- the form was reset --
+  target      %FrameTarget;  #IMPLIED  -- render in this frame --
+  accept-charset %Charsets;  #IMPLIED  -- list of supported charsets --
+  &gt;</pre>
+
+<p>
+  Juicy! With just this, we can answer four of our five questions:
+</p>
+
+<ol>
+  <li>What is the element's name? <strong>form</strong></li>
+  <li>What content set does this element belong to? <strong>Block</strong>
+    (this needs a little sleuthing, I find the easiest way is to search
+    the DTD for <code>FORM</code> and determine which set it is in.)</li>
+  <li>What are the allowed children of this element? <strong>One
+    or more flow elements, but no nested <code>form</code>s</strong></li>
+  <li>What attributes does the element allow that are general? <strong>Common</strong></li>
+  <li>What attributes does the element allow that are specific to this element? <strong>A whole bunch, see ATTLIST;
+    we're going to do the vital ones: <code>action</code>, <code>method</code> and <code>name</code></strong></li>
+</ol>
+
+<p>
+  Time for some code:
+</p>
+
+<pre>$config = HTMLPurifier_Config::createDefault();
+$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
+$config-&gt;set('HTML.DefinitionRev', 1);
+$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
+$def = $config-&gt;getHTMLDefinition(true);
+$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
+  array('_blank','_self','_target','_top')
+));
+<strong>$form = $def-&gt;addElement(
+  'form',   // name
+  'Block',  // content set
+  'Flow', // allowed children
+  'Common', // attribute collection
+  array( // attributes
+    'action*' => 'URI',
+    'method' => 'Enum#get|post',
+    'name' => 'ID'
+  )
+);
+$form-&gt;excludes = array('form' => true);</strong></pre>
+
+<p>
+  Each of the parameters corresponds to one of the questions we asked.
+  Notice that we added an asterisk to the end of the <code>action</code>
+  attribute to indicate that it is required. If someone specifies a
+  <code>form</code> without that attribute, the tag will be axed.
+  Also, the extra line at the end is a special extra declaration that
+  prevents forms from being nested within each other.
+</p>
+
+<p>
+  And that's all there is to it! Implementing the rest of the form
+  module is left as an exercise to the user; to see more examples
+  check the <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/HTMLModule"><code>library/HTMLPurifier/HTMLModule/</code></a> directory
+  in your local HTML Purifier installation.
+</p>
+
+<h2>And beyond...</h2>
+
+<p>
+  Perceptive users may have realized that, to a certain extent, we
+  have simply re-implemented the facilities of XML Schema or the
+  Document Type Definition.  What you are seeing here, however, is
+  not just an XML Schema or Document Type Definition: it is a fully
+  expressive method of specifying the definition of HTML that is
+  a portable superset of the capabilities of the two above-mentioned schema
+  languages.  What makes HTMLDefinition so powerful is the fact that
+  if we don't have an implementation for a content model or an attribute
+  definition, you can supply it yourself by writing a PHP class.
+</p>
+
+<p>
+  There are many facets of HTMLDefinition beyond the Advanced API I have
+  walked you through today.  To find out more about these, you can
+  check out these source files:
+</p>
+
+<ul>
+  <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li>
+  <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li>
+</ul>
+
+<h2 id="optimized">Notes for HTML Purifier 4.2.0 and earlier</h3>
+
+<p>
+    Previously, this tutorial gave some incorrect template code for
+    editing raw definitions, and that template code will now produce the
+    error <q>Due to a documentation error in previous version of HTML
+    Purifier...</q>  Here is how to mechanically transform old-style
+    code into new-style code.
+</p>
+
+<p>
+    First, identify all code that edits the raw definition object, and
+    put it together.  Ensure none of this code must be run on every
+    request; if some sub-part needs to always be run, move it outside
+    this block.  Here is an example below, with the raw definition
+    object code bolded.
+</p>
+
+<pre>$config = HTMLPurifier_Config::createDefault();
+$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
+$config-&gt;set('HTML.DefinitionRev', 1);
+$def = $config-&gt;getHTMLDefinition(true);
+<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong>
+$purifier = new HTMLPurifier($config);</pre>
+
+<p>
+    Next, replace the raw definition retrieval with a
+    maybeGetRawHTMLDefinition method call inside an if conditional, and
+    place the editing code inside that if block.
+</p>
+
+<pre>$config = HTMLPurifier_Config::createDefault();
+$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
+$config-&gt;set('HTML.DefinitionRev', 1);
+<strong>if ($def = $config-&gt;maybeGetRawHTMLDefinition()) {
+    $def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
+}</strong>
+$purifier = new HTMLPurifier($config);</pre>
+
+<p>
+    And you're done!  Alternatively, if you're OK with not ever caching
+    your code, the following will still work and not emit warnings.
+</p>
+
+<pre>$config = HTMLPurifier_Config::createDefault();
+$def = $config-&gt;getHTMLDefinition(true);
+$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
+$purifier = new HTMLPurifier($config);</pre>
+
+<p>
+    A slightly less efficient version of this was what was going on with
+    old versions of HTML Purifier.
+</p>
+
+<p>
+    <em>Technical notes:</em> ajh pointed out on <a
+        href="http://htmlpurifier.org/phorum/read.php?5,5164,5169#msg-5169">in a forum topic</a> that
+    HTML Purifier appeared to be repeatedly writing to the cache even
+    when a cache entry already existed.  Investigation lead to the
+    discovery of the following infelicity: caching of customized
+    definitions didn't actually work!  The problem was that even though
+    a cache file would be written out at the end of the process, there
+    was no way for HTML Purifier to say, <q>Actually, I've already got a
+        copy of your work, no need to reconfigure your
+        customizations</q>.  This required the API to change: placing
+    all of the customizations to the raw definition object in a
+    conditional which could be skipped.
+</p>
+
+</body></html>
+
+<!-- vim: et sw=4 sts=4
+-->