46 files changed, 0 insertions, 7840 deletions
diff --git a/lib/htmlpurifier/docs/dev-advanced-api.html b/lib/htmlpurifier/docs/dev-advanced-api.html
deleted file mode 100644
index 5b7aaa3c8..000000000
--- a/lib/htmlpurifier/docs/dev-advanced-api.html
+++ /dev/null
@@ -1,26 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Specification for HTML Purifier's advanced API for defining custom filtering behavior." />
-<link rel="stylesheet" type="text/css" href="style.css" />
-
-<title>Advanced API - HTML Purifier</title>
-
-</head><body>
-
-<h1>Advanced API</h1>
-
-<div id="filing">Filed under Development</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>
-  Please see <a href="enduser-customize.html">Customize!</a>
-</p>
-
-</body></html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/dev-code-quality.txt b/lib/htmlpurifier/docs/dev-code-quality.txt
deleted file mode 100644
index bceedebc4..000000000
--- a/lib/htmlpurifier/docs/dev-code-quality.txt
+++ /dev/null
@@ -1,29 +0,0 @@
-
-Code Quality Issues
-
-Okay, face it.  Programmers can get lazy, cut corners, or make mistakes. They
-also can do quick prototypes, and then forget to rewrite them later.  Well,
-while I can't list mistakes in here, I can list prototype-like segments
-of code that should be aggressively refactored.  This does not list
-optimization issues, that needs to be done after intense profiling.
-
-docs/examples/demo.php - ad hoc HTML/PHP soup to the extreme
-
-AttrDef - a lot of duplication, more generic classes need to be created;
-a lot of strtolower() calls, no legit casing
-    Class - doesn't support Unicode characters (fringe); uses regular expressions
-    Lang - code duplication; premature optimization
-    Length - easily mistaken for CSSLength
-    URI - multiple regular expressions; missing validation for parts (?)
-    CSS - parser doesn't accept advanced CSS (fringe)
-    Number - constructor interface inconsistent with Integer
-Strategy
-    FixNesting - cannot bubble nodes out of structures, duplicated checks
-        for special-case parent node
-    RemoveForeignElements - should be run in parallel with MakeWellFormed
-URIScheme - needs to have callable generic checks
-    mailto - doesn't validate emails, doesn't validate querystring
-    news - doesn't validate opaque path
-    nntp - doesn't constrain path
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/dev-config-bcbreaks.txt b/lib/htmlpurifier/docs/dev-config-bcbreaks.txt
deleted file mode 100644
index 29a58ca2f..000000000
--- a/lib/htmlpurifier/docs/dev-config-bcbreaks.txt
+++ /dev/null
@@ -1,79 +0,0 @@
-
-Configuration Backwards-Compatibility Breaks
-
-In version 4.0.0, the configuration subsystem (composed of the outwards
-facing Config class, as well as the ConfigSchema and ConfigSchema_Interchange
-subsystems), was significantly revamped to make use of property lists.
-While most of the changes are internal, some internal APIs were changed for the
-sake of clarity. HTMLPurifier_Config was kept completely backwards compatible,
-although some of the functions were retrofitted with an unambiguous alternate
-syntax. Both of these changes are discussed in this document.
-
-
-
-1. Outwards Facing Changes
---------------------------------------------------------------------------------
-
-The HTMLPurifier_Config class now takes an alternate syntax. The general rule
-is:
-
-    If you passed $namespace, $directive, pass "$namespace.$directive"
-    instead.
-
-An example:
-
-    $config->set('HTML', 'Allowed', 'p');
-
-becomes:
-
-    $config->set('HTML.Allowed', 'p');
-
-New configuration options may have more than one namespace, they might
-look something like %Filter.YouTube.Blacklist. While you could technically
-set it with ('HTML', 'YouTube.Blacklist'), the logical extension
-('HTML', 'YouTube', 'Blacklist') does not work.
-
-The old API will still work, but will emit E_USER_NOTICEs.
-
-
-
-2. Internal API Changes
---------------------------------------------------------------------------------
-
-Some overarching notes: we've completely eliminated the notion of namespace;
-it's now an informal construct for organizing related configuration directives.
-
-Also, the validation routines for keys (formerly "$namespace.$directive")
-have been completely relaxed. I don't think it really should be necessary.
-
-2.1 HTMLPurifier_ConfigSchema
-
-First off, if you're interfacing with this class, you really shouldn't.
-HTMLPurifier_ConfigSchema_Builder_ConfigSchema is really the only class that
-should ever be creating HTMLPurifier_ConfigSchema, and HTMLPurifier_Config the
-only class that should be reading it.
-
-All namespace related methods were removed; they are completely unnecessary
-now. Any $namespace, $name arguments must be replaced with $key (where
-$key == "$namespace.$name"), including for addAlias().
-
-The $info and $defaults member variables are no longer indexed as
-[$namespace][$name]; they are now indexed as ["$namespace.$name"].
-
-All deprecated methods were finally removed, after having yelled at you as
-an E_USER_NOTICE for a while now.
-
-2.2 HTMLPurifier_ConfigSchema_Interchange
-
-Member variable $namespaces was removed.
-
-2.3 HTMLPurifier_ConfigSchema_Interchange_Id
-
-Member variable $namespace and $directive removed; member variable $key added.
-Any method that took $namespace, $directive now takes $key.
-
-2.4 HTMLPurifier_ConfigSchema_Interchange_Namespace
-
-Removed.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/dev-config-naming.txt b/lib/htmlpurifier/docs/dev-config-naming.txt
deleted file mode 100644
index 66db5bce3..000000000
--- a/lib/htmlpurifier/docs/dev-config-naming.txt
+++ /dev/null
@@ -1,164 +0,0 @@
-Configuration naming
-
-HTML Purifier 4.0.0 features a new configuration naming system that
-allows arbitrary nesting of namespaces.  While there are certain cases
-in which using two namespaces is obviously better (the canonical example
-is where we were using AutoFormatParam to contain directives for AutoFormat
-parameters), it is unclear whether or not a general migration to highly
-namespaced directives is a good idea or not.
-
-== Case studies ==
-
-=== Attr.* ===
-
-We have a dead duck HTML.Attr.Name.UseCDATA which migrated before we decided
-to think this out thoroughly.
-
-We currently have a large number of directives in the Attr.* namespace.
-These directives tweak the behavior of some HTML attributes.  They have
-the properties:
-
-* While they apply to only one attribute at a time, the attribute can
-  span over multiple elements (not necessarily all attributes, either).
-  The information of which elements it impacts is either omitted or
-  informally stated (EnableID applies to all elements, DefaultImageAlt
-  applies to <img> tags, AllowedRev doesn't say but only applies to a tags).
-
-* There is a certain degree of clustering that could be applied, especially
-  to the ID directives.  The clustering could be done with respect to
-  what element/attribute was used, i.e.
-
-    *.id -> EnableID, IDBlacklistRegexp, IDBlacklist, IDPrefixLocal, IDPrefix
-    img.src -> DefaultInvalidImage
-    img.alt -> DefaultImageAlt, DefaultInvalidImageAlt
-    bdo.dir -> DefaultTextDir
-    a.rel -> AllowedRel
-    a.rev -> AllowedRev
-    a.target -> AllowedFrameTargets
-    a.name -> Name.UseCDATA
-
-* The directives often reference generic attribute types that were specified
-  in the DTD/specification.  However, some of the behavior specifically relies
-  on the fact that other use cases of the attribute are not, at current,
-  supported by HTML Purifier.
-
-    AllowedRel, AllowedRev -> heavily <a> specific; if <link> ends up being
-        allowed, we will also have to give users specificity there (we also
-        want to preserve generality) DTD %Linktypes, HTML5 distinguishes
-        between <link> and <a>/<area>
-    AllowedFrameTargets -> heavily <a> specific, but also used by <area>
-        and <form>. Transitional DTD %FrameTarget, not present in strict,
-        HTML5 calls them "browsing contexts"
-    Default*Image* -> as a default parameter, is almost entirely exlcusive
-        to <img>
-    EnableID -> global attribute
-    Name.UseCDATA -> heavily <a> specific, but has heavy other usage by
-        many things
-
-== AutoFormat.* ==
-
-These have the fairly normal pluggable architecture that lends itself to
-large amounts of namespaces (pluggability may be the key to figuring
-out when gratuitous namespacing is good.)  Properties:
-
-* Boolean directives are fair game for being namespaced: for example,
-  RemoveEmpty.RemoveNbsp triggers RemoveEmpty.RemoveNbsp.Exceptions,
-  the latter of which only makes sense when RemoveEmpty.RemoveNbsp
-  is set to true. (The same applies to RemoveNbsp too)
-
-The AutoFormat string is a bit long, but is the only bit of repeated
-context.
-
-== Core.* ==
-
-Core is the potpourri of directives, mostly regarding some minor behavioral
-tweaks for HTML handling abilities.
-
-    AggressivelyFixLt
-    ConvertDocumentToFragment
-    DirectLexLineNumberSyncInterval
-    LexerImpl
-    MaintainLineNumbers
-        Lexer
-    CollectErrors
-    Language
-        Error handling (Language is ostensibly a little more general, but
-        it's only used for error handling right now)
-    ColorKeywords
-        CSS and HTML
-    Encoding
-    EscapeNonASCIICharacters
-        Character encoding
-    EscapeInvalidChildren
-    EscapeInvalidTags
-    HiddenElements
-    RemoveInvalidImg
-        Lexing/Output
-    RemoveScriptContents
-        Deprecated
-
-== HTML.* ==
-
-    AllowedAttributes
-    AllowedElements
-    AllowedModules
-    Allowed
-    ForbiddenAttributes
-    ForbiddenElements
-        Element set tuning
-    BlockWrapper
-        Child def advanced twiddle
-    CoreModules
-    CustomDoctype
-        Advanced HTMLModuleManager twiddles
-    DefinitionID
-    DefinitionRev
-        Caching
-    Doctype
-    Parent
-    Strict
-    XHTML
-        Global environment
-    MaxImgLength
-        Attribute twiddle? (applies to two attributes)
-    Proprietary
-    SafeEmbed
-    SafeObject
-    Trusted
-        Extra functionality/tagsets
-    TidyAdd
-    TidyLevel
-    TidyRemove
-        Tidy
-
-== Output.* ==
-
-These directly affect the output of Generator. These are all advanced
-twiddles.
-
-== URI.* ==
-
-    AllowedSchemes
-    OverrideAllowedSchemes
-        Scheme tuning
-    Base
-    DefaultScheme
-    Host
-        Global environment
-    DefinitionID
-    DefinitionRev
-        Caching
-    DisableExternalResources
-    DisableExternal
-    DisableResources
-    Disable
-        Contextual/authority tuning
-    HostBlacklist
-        Authority tuning
-    MakeAbsolute
-    MungeResources
-    MungeSecretKey
-    Munge
-        Transformation behavior (munge can be grouped)
-
-
diff --git a/lib/htmlpurifier/docs/dev-config-schema.html b/lib/htmlpurifier/docs/dev-config-schema.html
deleted file mode 100644
index 07aecd35a..000000000
--- a/lib/htmlpurifier/docs/dev-config-schema.html
+++ /dev/null
@@ -1,412 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
-  <head>
-    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <meta name="description" content="Describes config schema framework in HTML Purifier." />
-    <link rel="stylesheet" type="text/css" href="./style.css" />
-    <title>Config Schema - HTML Purifier</title>
-  </head>
-  <body>
-
-    <h1>Config Schema</h1>
-
-    <div id="filing">Filed under Development</div>
-    <div id="index">Return to the <a href="index.html">index</a>.</div>
-    <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-    <p>
-      HTML Purifier has a fairly complex system for configuration. Users
-      interact with a <code>HTMLPurifier_Config</code> object to
-      set configuration directives. The values they set are validated according
-      to a configuration schema, <code>HTMLPurifier_ConfigSchema</code>.
-    </p>
-
-    <p>
-      The schema is mostly transparent to end-users, but if you're doing development
-      work for HTML Purifier and need to define a new configuration directive,
-      you'll need to interact with it. We'll also talk about how to define
-      userspace configuration directives at the very end.
-    </p>
-
-    <h2>Write a directive file</h2>
-
-    <p>
-      Directive files define configuration directives to be used by
-      HTML Purifier. They are placed in <code>library/HTMLPurifier/ConfigSchema/schema/</code>
-      in the form <code><em>Namespace</em>.<em>Directive</em>.txt</code> (I
-      couldn't think of a more descriptive file extension.)
-      Directive files are actually what we call <code>StringHash</code>es,
-      i.e. associative arrays represented in a string form reminiscent of
-      <a href="http://qa.php.net/write-test.php">PHPT</a> tests. Here's a
-      sample directive file, <code>Test.Sample.txt</code>:
-    </p>
-
-    <pre>Test.Sample
-TYPE: string/null
-DEFAULT: NULL
-ALLOWED: 'foo', 'bar'
-VALUE-ALIASES: 'baz' => 'bar'
-VERSION: 3.1.0
---DESCRIPTION--
-This is a sample configuration directive for the purposes of the
-&lt;code&gt;dev-config-schema.html&lt;code&gt; documentation.
---ALIASES--
-Test.Example</pre>
-
-    <p>
-      Each of these segments has a specific meaning:
-    </p>
-
-    <table class="table">
-      <thead>
-        <tr>
-          <th>Key</th>
-          <th>Example</th>
-          <th>Description</th>
-        </tr>
-      </thead>
-      <tbody>
-        <tr>
-          <td>ID</td>
-          <td>Test.Sample</td>
-          <td>The name of the directive, in the form Namespace.Directive
-          (implicitly the first line)</td>
-        </tr>
-        <tr>
-          <td>TYPE</td>
-          <td>string/null</td>
-          <td>The type of variable this directive accepts. See below for
-          details. You can also add <code>/null</code> to the end of
-          any basic type to allow null values too.</td>
-        </tr>
-        <tr>
-          <td>DEFAULT</td>
-          <td>NULL</td>
-          <td>A parseable PHP expression of the default value.</td>
-        </tr>
-        <tr>
-          <td>DESCRIPTION</td>
-          <td>This is a...</td>
-          <td>An HTML description of what this directive does.</td>
-        </tr>
-        <tr>
-          <td>VERSION</td>
-          <td>3.1.0</td>
-          <td><em>Recommended</em>. The version of HTML Purifier this directive was added.
-          Directives that have been around since 1.0.0 don't have this,
-          but any new ones should.</td>
-        </tr>
-        <tr>
-          <td>ALIASES</td>
-          <td>Test.Example</td>
-          <td><em>Optional</em>. A comma separated list of aliases for this directive.
-          This is most useful for backwards compatibility and should
-          not be used otherwise.</td>
-        </tr>
-        <tr>
-          <td>ALLOWED</td>
-          <td>'foo', 'bar'</td>
-          <td><em>Optional</em>. Set of allowed value for a directive,
-          a comma separated list of parseable PHP expressions. This
-          is only allowed string, istring, text and itext TYPEs.</td>
-        </tr>
-        <tr>
-          <td>VALUE-ALIASES</td>
-          <td>'baz' =&gt; 'bar'</td>
-          <td><em>Optional</em>. Mapping of one value to another, and
-          should be a comma separated list of keypair duples. This
-          is only allowed string, istring, text and itext TYPEs.</td>
-        </tr>
-        <tr>
-          <td>DEPRECATED-VERSION</td>
-          <td>3.1.0</td>
-          <td><em>Not shown</em>. Indicates that the directive was
-          deprecated this version.</td>
-        </tr>
-        <tr>
-          <td>DEPRECATED-USE</td>
-          <td>Test.NewDirective</td>
-          <td><em>Not shown</em>. Indicates what new directive should be
-          used instead. Note that the directives will functionally be
-          different, although they should offer the same functionality.
-          If they are identical, use an alias instead.</td>
-        </tr>
-        <tr>
-          <td>EXTERNAL</td>
-          <td>CSSTidy</td>
-          <td><em>Not shown</em>. Indicates if there is an external library
-          the user will need to download and install to use this configuration
-          directive. As of right now, this is merely a Google-able name; future
-          versions may also provide links and instructions.</td>
-        </tr>
-      </tbody>
-    </table>
-
-    <p>
-      Some notes on format and style:
-    </p>
-
-    <ul>
-      <li>
-        Each of these keys can be expressed in the short format
-        (<code>KEY: Value</code>) or the long format
-        (<code>--KEY--</code> with value beneath). You must use the
-        long format if multiple lines are needed, or if a long format
-        has been used already (that's why <code>ALIASES</code> in our
-        example is in the long format); otherwise, it's user preference.
-      </li>
-      <li>
-        The HTML descriptions should be wrapped at about 80 columns; do
-        not rely on editor word-wrapping.
-      </li>
-    </ul>
-
-    <p>
-      Also, as promised, here is the set of possible types:
-    </p>
-
-    <table class="table">
-      <thead>
-        <tr>
-          <th>Type</th>
-          <th>Example</th>
-          <th>Description</th>
-        </tr>
-      </thead>
-      <tbody>
-        <tr>
-          <td>string</td>
-          <td>'Foo'</td>
-          <td><a href="http://docs.php.net/manual/en/language.types.string.php">String</a> without newlines</td>
-        </tr>
-        <tr>
-          <td>istring</td>
-          <td>'foo'</td>
-          <td>Case insensitive ASCII string without newlines</td>
-        </tr>
-        <tr>
-          <td>text</td>
-          <td>"A<em>\n</em>b"</td>
-          <td>String with newlines</td>
-        </tr>
-        <tr>
-          <td>itext</td>
-          <td>"a<em>\n</em>b"</td>
-          <td>Case insensitive ASCII string without newlines</td>
-        </tr>
-        <tr>
-          <td>int</td>
-          <td>23</td>
-          <td>Integer</td>
-        </tr>
-        <tr>
-          <td>float</td>
-          <td>3.0</td>
-          <td>Floating point number</td>
-        </tr>
-        <tr>
-          <td>bool</td>
-          <td>true</td>
-          <td>Boolean</td>
-        </tr>
-        <tr>
-          <td>lookup</td>
-          <td>array('key' =&gt; true)</td>
-          <td>Lookup array, used with <code>isset($var[$key])</code></td>
-        </tr>
-        <tr>
-          <td>list</td>
-          <td>array('f', 'b')</td>
-          <td>List array, with ordered numerical indexes</td>
-        </tr>
-        <tr>
-          <td>hash</td>
-          <td>array('key' =&gt; 'val')</td>
-          <td>Associative array of keys to values</td>
-        </tr>
-        <tr>
-          <td>mixed</td>
-          <td>new stdclass</td>
-          <td>Any PHP variable is fine</td>
-        </tr>
-      </tbody>
-    </table>
-
-    <p>
-      The examples represent what will be returned out of the configuration
-      object; users have a little bit of leeway when setting configuration
-      values (for example, a lookup value can be specified as a list;
-      HTML Purifier will flip it as necessary.) These types are defined
-      in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php">
-      library/HTMLPurifier/VarParser.php</a>.
-    </p>
-
-    <p>
-      For more information on what values are allowed, and how they are parsed,
-      consult <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
-      library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well
-      as <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php">
-      library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for
-      the semantics of the parsed values.
-    </p>
-
-    <h2>Refreshing the cache</h2>
-
-    <p>
-      You may have noticed that your directive file isn't doing anything
-      yet. That's because it hasn't been added to the runtime
-      <code>HTMLPurifier_ConfigSchema</code> instance. Run
-      <code>maintenance/generate-schema-cache.php</code> to fix this.
-      If there were no errors, you're good to go! Don't forget to add
-      some unit tests for your functionality!
-    </p>
-
-    <p>
-      If you ever make changes to your configuration directives, you
-      will need to run this script again.
-    </p>
-    <h2>Adding in-house schema definitions</h2>
-
-    <p>
-      Placing stuff directly in HTML Purifier's source tree is generally not a
-      good idea, so HTML Purifier 4.0.0+ has some facilities in place to make your
-      life easier.
-    </p>
-
-    <p>
-      The first is to pass an extra parameter to <code>maintenance/generate-schema-cache.php</code>
-      with the location of your directory (relative or absolute path will do). For example,
-      if I'm storing my custom definitions in <em>/var/htmlpurifier/myschema</em>, run:
-      <code>php maintenance/generate-schema-cache.php /var/htmlpurifier/myschema</code>.
-    </p>
-
-    <p>
-      Alternatively, you can create a small loader PHP file in the HTML Purifier base
-      directory named <code>config-schema.php</code> (this is the same directory
-      you would place a <code>test-settings.php</code> file).  In this file, add
-      the following line for each directory you want to load:
-    </p>
-
-<pre>$builder-&gt;buildDir($interchange, '/var/htmlpurifier/myschema');</pre>
-
-    <p>You can even load a single file using:</p>
-
-<pre>$builder-&gt;buildFile($interchange, '/var/htmlpurifier/myschema/MyApp.Directive.txt');</pre>
-
-    <p>Storing custom definitions that you don't plan on sending back upstream in
-    a separate directory is <em>definitely</em> a good idea! Additionally, picking
-    a good namespace can go a long way to saving you grief if you want to use
-    someone else's change, but they picked the same name, or if HTML Purifier
-    decides to add support for a configuration directive that has the same name.</p>
-
-    <!-- TODO: how to name directives that rely on naming conventions -->
-
-    <h2>Errors</h2>
-
-    <p>
-      All directive files go through a rigorous validation process
-      through <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php">
-      library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well
-      as some basic checks during building. While
-      listing every error out here is out-of-scope for this document, we
-      can give some general tips for interpreting error messages.
-      There are two types of errors: builder errors and validation errors.
-    </p>
-
-    <h3>Builder errors</h3>
-
-    <blockquote>
-      <p>
-        <strong>Exception:</strong> Expected type string, got
-        integer in DEFAULT in directive hash 'Ns.Dir'
-      </p>
-    </blockquote>
-
-    <p>
-      You can identify a builder error by the keyword "directive hash."
-      These are the easiest to deal with, because they directly correspond
-      with your directive file. Find the offending directive file (which
-      is the directive hash plus the .txt extension), find the
-      offending index ("in DEFAULT" means the DEFAULT key) and fix the error.
-      This particular error would occur if your default value is not the same
-      type as TYPE.
-    </p>
-
-    <h3>Validation errors</h3>
-
-    <blockquote>
-      <p>
-        <strong>Exception:</strong> Alias 3 in valueAliases in directive
-        'Ns.Dir' must be a string
-      </p>
-    </blockquote>
-
-    <p>
-      These are a little trickier, because we're not actually validating
-      your directive file, or even the direct string hash representation.
-      We're validating an Interchange object, and the error messages do
-      not mention any string hash keys.
-    </p>
-
-    <p>
-      Nevertheless, it's not difficult to figure out what went wrong.
-      Read the "context" statements in reverse:
-    </p>
-
-    <dl>
-      <dt>in directive 'Ns.Dir'</dt>
-        <dd>This means we need to look at the directive file <code>Ns.Dir.txt</code></dd>
-      <dt>in valueAliases</dt>
-        <dd>There's no key actually called this, but there's one that's close:
-          VALUE-ALIASES. Indeed, that's where to look.</dd>
-      <dt>Alias 3</dt>
-        <dd>The value alias that is equal to 3 is the culprit.</dd>
-    </dl>
-
-    <p>
-      In this particular case, you're not allowed to alias integers values to
-      strings values.
-    </p>
-
-    <p>
-      The most difficult part is translating the Interchange member variable (valueAliases)
-      into a directive file key (VALUE-ALIASES), but there's a one-to-one
-      correspondence currently. If the two formats diverge, any discrepancies
-      will be described in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
-      library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>.
-    </p>
-
-    <h2>Internals</h2>
-
-    <p>
-      Much of the configuration schema framework's codebase deals with
-      shuffling data from one format to another, and doing validation on this
-      data.
-      The keystone of all of this is the <code>HTMLPurifier_ConfigSchema_Interchange</code>
-      class, which represents the purest, parsed representation of the schema.
-    </p>
-
-    <p>
-      Hand-writing this data is unwieldy, however, so we write directive files.
-      These directive files are parsed by <code>HTMLPurifier_StringHashParser</code>
-      into <code>HTMLPurifier_StringHash</code>es, which then
-      are run through <code>HTMLPurifier_ConfigSchema_InterchangeBuilder</code>
-      to construct the interchange object.
-    </p>
-
-    <p>
-      From the interchange object, the data can be siphoned into other forms
-      using <code>HTMLPurifier_ConfigSchema_Builder</code> subclasses.
-      For example, <code>HTMLPurifier_ConfigSchema_Builder_ConfigSchema</code>
-      generates a runtime <code>HTMLPurifier_ConfigSchema</code> object,
-      which <code>HTMLPurifier_Config</code> uses to validate its incoming
-      data. There is also an XML serializer, which is used to build documentation.
-    </p>
-
-  </body>
-</html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/dev-flush.html b/lib/htmlpurifier/docs/dev-flush.html
deleted file mode 100644
index 4a3a78351..000000000
--- a/lib/htmlpurifier/docs/dev-flush.html
+++ /dev/null
@@ -1,68 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
-<head>
-    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-    <meta name="description" content="Discusses when to flush HTML Purifier's various caches." />
-    <link rel="stylesheet" type="text/css" href="./style.css" />
-    <title>Flushing the Purifier - HTML Purifier</title>
-</head>
-<body>
-
-<h1>Flushing the Purifier</h1>
-
-<div id="filing">Filed under Development</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>
-    If you've been poking around the various folders in HTML Purifier,
-    you may have noticed the <code>maintenance</code> directory.  Almost
-    all of these scripts are devoted to flushing out the various caches
-    HTML Purifier uses.  Normal users don't have to worry about this:
-    regular library usage is transparent.  However, when doing development
-    work on HTML Purifier, you may find you have to flush one of the
-    caches.
-</p>
-
-<p>
-    As a general rule of thumb, run <code>flush.php</code> whenever you make
-    any <em>major</em> changes, or when tests start mysteriously failing.
-    In more detail, run this script if:
-</p>
-
-<ul>
-    <li>
-        You added new source files to HTML Purifier's main library.
-        (see <code>generate-includes.php</code>)
-    </li>
-    <li>
-        You modified the configuration schema (see
-        <code>generate-schema-cache.php</code>). This usually means
-        adding or modifying files in <code>HTMLPurifier/ConfigSchema/schema/</code>,
-        although in rare cases modifying <code>HTMLPurifier/ConfigSchema.php</code>
-        will also require this.
-    </li>
-    <li>
-        You modified a Definition, or its subsystems. The most usual candidate
-        is <code>HTMLPurifier/HTMLDefinition.php</code>, which also encompasses
-        the files in <code>HTMLPurifier/HTMLModule/</code> as well as if you've
-        <a href="enduser-customize.html">customizing definitions</a> without
-        the cache disabled. (see <code>flush-generation-cache.php</code>)
-    </li>
-    <li>
-        You modified source files, and have been using the standalone
-        version from the full installation. (see <code>generate-standalone.php</code>)
-    </li>
-</ul>
-
-<p>
-    You can check out the corresponding scripts for more information on what they
-    do.
-</p>
-
-</body></html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/dev-includes.txt b/lib/htmlpurifier/docs/dev-includes.txt
deleted file mode 100644
index d3382b593..000000000
--- a/lib/htmlpurifier/docs/dev-includes.txt
+++ /dev/null
@@ -1,281 +0,0 @@
-
-INCLUDES, AUTOLOAD, BYTECODE CACHES and OPTIMIZATION
-
-The Problem
------------
-
-HTML Purifier contains a number of extra components that are not used all
-of the time, only if the user explicitly specifies that we should use
-them.
-
-Some of these optional components are optionally included (Filter,
-Language, Lexer, Printer), while others are included all the time
-(Injector, URIFilter, HTMLModule, URIScheme). We will stipulate that these
-are all developer specified: it is conceivable that certain Tokens are not
-used, but this is user-dependent and should not be trusted.
-
-We should come up with a consistent way to handle these things and ensure
-that we get the maximum performance when there is bytecode caches and
-when there are not. Unfortunately, these two goals seem contrary to each
-other.
-
-A peripheral issue is the performance of ConfigSchema, which has been
-shown take a large, constant amount of initialization time, and is
-intricately linked to the issue of includes due to its pervasive use
-in our plugin architecture.
-
-Pros and Cons
--------------
-
-We will assume that user-based extensions will be included by them.
-
-Conditional includes:
-  Pros:
-    - User management is simplified; only a single directive needs to be set
-    - Only necessary code is included
-  Cons:
-    - Doesn't play nicely with opcode caches
-    - Adds complexity to standalone version
-    - Optional configuration directives are not exposed without a little
-      extra coaxing (not implemented yet)
-
-Include it all:
-  Pros:
-    - User management is still simple
-    - Plays nicely with opcode caches and standalone version
-    - All configuration directives are present
-  Cons:
-    - Lots of (how much?) extra code is included
-    - Classes that inherit from external libraries will cause compile
-      errors
-
-Build an include stub (Let's do this!):
-  Pros:
-    - Only necessary code is included
-    - Plays nicely with opcode caches and standalone version
-    - require (without once) can be used, see above
-    - Could further extend as a compilation to one file
-  Cons:
-    - Not implemented yet
-    - Requires user intervention and use of a command line script
-    - Standalone script must be chained to this
-    - More complex and compiled-language-like
-    - Requires a whole new class of system-wide configuration directives,
-      as configuration objects can be reused
-    - Determining what needs to be included can be complex (see above)
-    - No way of autodetecting dynamically instantiated classes
-    - Might be slow
-
-Include stubs
--------------
-
-This solution may be "just right" for users who are heavily oriented
-towards performance. However, there are a number of picky implementation
-details to work out beforehand.
-
-The number one concern is how to make the HTML Purifier files "work
-out of the box", while still being able to easily get them into a form
-that works with this setup. As the codebase stands right now, it would
-be necessary to strip out all of the require_once calls. The only way
-we could get rid of the require_once calls is to use __autoload or
-use the stub for all cases (which might not be a bad idea).
-
-    Aside
-    -----
-    An important thing to remember, however, is that these require_once's
-    are valuable data about what classes a file needs. Unfortunately, there's
-    no distinction between whether or not the file is needed all the time,
-    or whether or not it is one of our "optional" files. Thus, it is
-    effectively useless.
-
-    Deprecated
-    ----------
-    One of the things I'd like to do is have the code search for any classes
-    that are explicitly mentioned in the code. If a class isn't mentioned, I
-    get to assume that it is "optional," i.e. included via introspection.
-    The choice is either to use PHP's tokenizer or use regexps; regexps would
-    be faster but a tokenizer would be more correct. If this ends up being
-    unfeasible, adding dependency comments isn't a bad idea. (This could
-    even be done automatically by search/replacing require_once, although
-    we'd have to manually inspect the results for the optional requires.)
-
-    NOTE: This ends up not being necessary, as we're going to make the user
-    figure out all the extra classes they need, and only include the core
-    which is predetermined.
-
-Using the autoload framework with include stubs works nicely with
-introspective classes: instead of having to have require_once inside
-the function, we can let autoload do the work; we simply need to
-new $class or accept the object straight from the caller. Handling filters
-becomes a simple matter of ticking off configuration directives, and
-if ConfigSchema spits out errors, adding the necessary includes. We could
-also use the autoload framework as a fallback, in case the user forgets
-to make the include, but doesn't really care about performance.
-
-    Insight
-    -------
-    All of this talk is merely a natural extension of what our current
-    standalone functionality does. However, instead of having our code
-    perform the includes, or attempting to inline everything that possibly
-    could be used, we boot the issue to the user, making them include
-    everything or setup the fallback autoload handler.
-
-Configuration Schema
---------------------
-
-A common deficiency for all of the conditional include setups (including
-the dynamically built include PHP stub) is that if one of this
-conditionally included files includes a configuration directive, it
-is not accessible to configdoc. A stopgap solution for this problem is
-to have it piggy-back off of the data in the merge-library.php script
-to figure out what extra files it needs to include, but if the file also
-inherits classes that don't exist, we're in big trouble.
-
-I think it's high time we centralized the configuration documentation.
-However, the type checking has been a great boon for the library, and
-I'd like to keep that. The compromise is to use some other source, and
-then parse it into the ConfigSchema internal format (sans all of those
-nasty documentation strings which we really don't need at runtime) and
-serialize that for future use.
-
-The next question is that of format. XML is very verbose, and the prospect
-of setting defaults in it gives me willies. However, this may be necessary.
-Splitting up the file into manageable chunks may alleviate this trouble,
-and we may be even want to create our own format optimized for specifying
-configuration. It might look like (based off the PHPT format, which is
-nicely compact yet unambiguous and human-readable):
-
-Core.HiddenElements
-TYPE:    lookup
-DEFAULT: array('script', 'style') // auto-converted during processing
---ALIASES--
-Core.InvisibleElements, Core.StupidElements
---DESCRIPTION--
-<p>
-  Blah blah
-</p>
-
-The first line is the directive name, the lines after that prior to the
-first --HEADER-- block are single-line values, and then after that
-the multiline values are there. No value is restricted to a particular
-format: DEFAULT could very well be multiline if that would be easier.
-This would make it insanely easy, also, to add arbitrary extra parameters,
-like:
-
-VERSION:  3.0.0
-ALLOWED:  'none', 'light', 'medium', 'heavy' // this is wrapped in array()
-EXTERNAL: CSSTidy // this would be documented somewhere else with a URL
-
-The final loss would be that you wouldn't know what file the directive
-was used in; with some clever regexps it should be possible to
-figure out where $config->get($ns, $d); occurs. Reflective calls to
-the configuration object is mitigated by the fact that getBatch is
-used, so we can simply talk about that in the namespace definition page.
-This might be slow, but it would only happen when we are creating
-the documentation for consumption, and is sugar.
-
-We can put this in a schema/ directory, outside of HTML Purifier. The serialized
-data gets treated like entities.ser.
-
-The final thing that needs to be handled is user defined configurations.
-They can be added at runtime using ConfigSchema::registerDirectory()
-which globs the directory and grabs all of the directives to be incorporated
-in. Then, the result is saved. We may want to take advantage of the
-DefinitionCache framework, although it is not altogether certain what
-configuration directives would be used to generate our key (meta-directives!)
-
-    Further thoughts
-    ----------------
-    Our master configuration schema will only need to be updated once
-    every new version, so it's easily versionable. User specified
-    schema files are far more volatile, but it's far too expensive
-    to check the filemtimes of all the files, so a DefinitionRev style
-    mechanism works better. However, we can uniquely identify the
-    schema based on the directories they loaded, so there's no need
-    for a DefinitionId until we give them full programmatic control.
-
-    These variables should be directly incorporated into ConfigSchema,
-    and ConfigSchema should handle serialization. Some refactoring will be
-    necessary for the DefinitionCache classes, as they are built with
-    Config in mind. If the user changes something, the cache file gets
-    rebuilt. If the version changes, the cache file gets rebuilt. Since
-    our unit tests flush the caches before we start, and the operation is
-    pretty fast, this will not negatively impact unit testing.
-
-One last thing: certain configuration directives require that files
-get added. They may even be specified dynamically. It is not a good idea
-for the HTMLPurifier_Config object to be used directly for such matters.
-Instead, the userland code should explicitly perform the includes. We may
-put in something like:
-
-REQUIRES: HTMLPurifier_Filter_ExtractStyleBlocks
-
-To indicate that if that class doesn't exist, and the user is attempting
-to use the directive, we should fatally error out. The stub includes the core files,
-and the user includes everything else. Any reflective things like new
-$class would be required to tie in with the configuration.
-
-It would work very well with rarely used configuration options, but it
-wouldn't be so good for "core" parts that can be disabled. In such cases
-the core include file would need to be modified, and the only way
-to properly do this is use the configuration object. Once again, our
-ability to create cache keys saves the day again: we can create arbitrary
-stub files for arbitrary configurations and include those. They could
-even be the single file affairs. The only thing we'd need to include,
-then, would be HTMLPurifier_Config! Then, the configuration object would
-load the library.
-
-    An aside...
-    -----------
-    One questions, however, the wisdom of letting PHP files write other PHP
-    files. It seems like a recipe for disaster, or at least lots of headaches
-    in highly secured setups, where PHP does not have the ability to write
-    to its root. In such cases, we could use sticky bits or tell the user
-    to manually generate the file.
-
-    The other troublesome bit is actually doing the calculations necessary.
-    For certain cases, it's simple (such as URIScheme), but for AttrDef
-    and HTMLModule the dependency trees are very complex in relation to
-    %HTML.Allowed and friends. I think that this idea should be shelved
-    and looked at a later, less insane date.
-
-An interesting dilemma presents itself when a configuration form is offered
-to the user. Normally, the configuration object is not accessible without
-editing PHP code; this facility changes thing. The sensible thing to do
-is stipulate that all classes required by the directives you allow must
-be included.
-
-Unit testing
-------------
-
-Setting up the parsing and translation into our existing format would not
-be difficult to do. It might represent a good time for us to rethink our
-tests for these facilities; as creative as they are, they are often hacky
-and require public visibility for things that ought to be protected.
-This is especially applicable for our DefinitionCache tests.
-
-Migration
----------
-
-Because we are not *adding* anything essentially new, it should be trivial
-to write a script to take our existing data and dump it into the new format.
-Well, not trivial, but fairly easy to accomplish. Primary implementation
-difficulties would probably involve formatting the file nicely.
-
-Backwards-compatibility
------------------------
-
-I expect that the ConfigSchema methods should stick around for a little bit,
-but display E_USER_NOTICE warnings that they are deprecated. This will
-require documentation!
-
-New stuff
----------
-
-VERSION: Version number directive was introduced
-DEPRECATED-VERSION: If the directive was deprecated, when was it deprecated?
-DEPRECATED-USE: If the directive was deprecated, what should the user use now?
-REQUIRES: What classes does this configuration directive require, but are
-    not part of the HTML Purifier core?
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/dev-naming.html b/lib/htmlpurifier/docs/dev-naming.html
deleted file mode 100644
index cea4b006f..000000000
--- a/lib/htmlpurifier/docs/dev-naming.html
+++ /dev/null
@@ -1,83 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Defines class naming conventions in HTML Purifier." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>Naming Conventions - HTML Purifier</title>
-
-</head><body>
-
-<h1>Naming Conventions</h1>
-
-<div id="filing">Filed under Development</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>The classes in this library follow a few naming conventions, which may
-help you find the correct functionality more quickly.  Here they are:</p>
-
-<dl>
-
-<dt>All classes occupy the HTMLPurifier pseudo-namespace.</dt>
-    <dd>This means that all classes are prefixed with HTMLPurifier_.  As such, all
-    names under HTMLPurifier_ are reserved.  I recommend that you use the name
-    HTMLPurifierX_YourName_ClassName, especially if you want to take advantage
-    of HTMLPurifier_ConfigDef.</dd>
-
-<dt>All classes correspond to their path if library/ was in the include path</dt>
-    <dd>HTMLPurifier_AttrDef is located at HTMLPurifier/AttrDef.php; replace
-    underscores with slashes and append .php and you'll have the location of
-    the class.</dd>
-
-<dt>Harness and Test are reserved class names for unit tests</dt>
-    <dd>The suffix <code>Test</code> indicates that the class is a subclass of UnitTestCase
-    (of the Simpletest library) and is testable. "Harness" indicates a subclass
-    of UnitTestCase that is not meant to be run but to be extended into
-    concrete test cases and contains custom test methods (i.e. assert*())</dd>
-
-<dt>Class names do not necessarily represent inheritance hierarchies</dt>
-    <dd>While we try to reflect inheritance in naming to some extent, it is not
-    guaranteed (for instance, none of the classes inherit from HTMLPurifier,
-    the base class).  However, all class files have the require_once
-    declarations to whichever classes they are tightly coupled to.</dd>
-
-<dt>Strategy has a meaning different from the Gang of Four pattern</dt>
-    <dd>In Design Patterns, the Gang of Four describes a Strategy object as
-    encapsulating an algorithm so that they can be switched at run-time.  While
-    our strategies are indeed algorithms, they are not meant to be substituted:
-    all must be present in order for proper functioning.</dd>
-
-<dt>Abbreviations are avoided</dt>
-    <dd>We try to avoid abbreviations as much as possible, but in some cases,
-    abbreviated version is more readable than the full version. Here, we
-    list common abbreviations:
-    <ul>
-        <li>Attr to Attributes (note that it is plural, i.e. <code>$attr = array()</code>)</li>
-        <li>Def to Definition</li>
-        <li><code>$ret</code> is the value to be returned in a function</li>
-    </ul>
-    </dd>
-
-<dt>Ambiguity concerning the definition of Def/Definition</dt>
-    <dd>While a definition normally defines the structure/acceptable values of
-    an entity, most of the definitions in this application also attempt
-    to validate and fix the value.  I am unsure of a better name, as
-    "Validator" would exclude fixing the value, "Fixer" doesn't invoke
-    the proper image of "fixing" something, and "ValidatorFixer" is too long!
-    Some other suggestions were "Handler", "Reference", "Check", "Fix",
-    "Repair" and "Heal".</dd>
-
-<dt>Transform not Transformer</dt>
-    <dd>Transform is both a noun and a verb, and thus we define a "Transform" as
-    something that "transforms," leaving "Transformer" (which sounds like an
-    electrical device/robot toy).</dd>
-
-</dl>
-
-</body></html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/dev-optimization.html b/lib/htmlpurifier/docs/dev-optimization.html
deleted file mode 100644
index 78f565813..000000000
--- a/lib/htmlpurifier/docs/dev-optimization.html
+++ /dev/null
@@ -1,33 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Discusses possible methods of optimizing HTML Purifier." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>Optimization - HTML Purifier</title>
-
-</head><body>
-
-<h1>Optimization</h1>
-
-<div id="filing">Filed under Development</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>Here are some possible optimization techniques we can apply to code sections if
-they turn out to be slow.  Be sure not to prematurely optimize: if you get
-that itch, put it here!</p>
-
-<ul>
-    <li>Make Tokens Flyweights (may prove problematic, probably not worth it)</li>
-    <li>Rewrite regexps into PHP code</li>
-    <li>Batch regexp validation (do as many per function call as possible)</li>
-    <li>Parallelize strategies</li>
-</ul>
-
-</body></html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/dev-progress.html b/lib/htmlpurifier/docs/dev-progress.html
deleted file mode 100644
index 105896ed6..000000000
--- a/lib/htmlpurifier/docs/dev-progress.html
+++ /dev/null
@@ -1,309 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Tables detailing HTML element and CSS property implementation coverage in HTML Purifier." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>Implementation Progress - HTML Purifier</title>
-
-<style type="text/css">
-
-td {padding-right:1em;border-bottom:1px solid #000;padding-left:0.5em;}
-th {text-align:left;padding-top:1.4em;font-size:13pt;
-    border-bottom:2px solid #000;background:#FFF;}
-thead th {text-align:left;padding:0.1em;background-color:#EEE;}
-
-.impl-yes {background:#9D9;}
-.impl-partial {background:#FFA;}
-.impl-no {background:#CCC;}
-
-.danger {color:#600;}
-.css1 {color:#060;}
-.required {font-weight:bold;}
-.feature {color:#999;}
-
-</style>
-
-</head><body>
-
-<h1>Implementation Progress</h1>
-
-<div id="filing">Filed under Development</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>
-  <strong>Warning:</strong> This table is kept for historical purposes and
-  is not being actively updated.
-</p>
-
-<h2>Key</h2>
-
-<table cellspacing="0"><tbody>
-<tr><td class="impl-yes">Implemented</td></tr>
-<tr><td class="impl-partial">Partially implemented</td></tr>
-<tr><td class="impl-no">Not priority to implement</td></tr>
-<tr><td class="danger">Dangerous attribute/property</td></tr>
-<tr><td class="css1">Present in CSS1</td></tr>
-<tr><td class="feature">Feature, requires extra work</td></tr>
-</tbody></table>
-
-<h2>CSS</h2>
-
-<table cellspacing="0">
-
-<thead>
-<tr><th>Name</th><th>Notes</th></tr>
-</thead>
-
-<!--
-<tr><td>-</td><td>-</td></tr>
--->
-
-<tbody>
-<tr><th colspan="2">Standard</th></tr>
-<tr class="css1 impl-yes"><td>background-color</td><td>COMPOSITE(&lt;color&gt;, transparent)</td></tr>
-<tr class="css1 impl-yes"><td>background</td><td>SHORTHAND, currently alias for background-color</td></tr>
-<tr class="css1 impl-yes"><td>border</td><td>SHORTHAND, MULTIPLE</td></tr>
-<tr class="css1 impl-yes"><td>border-color</td><td>MULTIPLE</td></tr>
-<tr class="css1 impl-yes"><td>border-style</td><td>MULTIPLE</td></tr>
-<tr class="css1 impl-yes"><td>border-width</td><td>MULTIPLE</td></tr>
-<tr class="css1 impl-yes"><td>border-*</td><td>SHORTHAND</td></tr>
-<tr class="impl-yes"><td>border-*-color</td><td>COMPOSITE(&lt;color&gt;, transparent)</td></tr>
-<tr class="impl-yes"><td>border-*-style</td><td>ENUM(none, hidden, dotted, dashed,
-    solid, double, groove, ridge, inset, outset)</td></tr>
-<tr class="css1 impl-yes"><td>border-*-width</td><td>COMPOSITE(&lt;length&gt;, thin, medium, thick)</td></tr>
-<tr class="css1 impl-yes"><td>clear</td><td>ENUM(none, left, right, both)</td></tr>
-<tr class="css1 impl-yes"><td>color</td><td>&lt;color&gt;</td></tr>
-<tr class="css1 impl-yes"><td>float</td><td>ENUM(left, right, none), May require layout
-    precautions with clear</td></tr>
-<tr class="css1 impl-yes"><td>font</td><td>SHORTHAND</td></tr>
-<tr class="css1 impl-yes"><td>font-family</td><td>CSS validator may complain if fallback font
-    family not specified</td></tr>
-<tr class="css1 impl-yes"><td>font-size</td><td>COMPOSITE(&lt;absolute-size&gt;,
-    &lt;relative-size&gt;, &lt;length&gt;, &lt;percentage&gt;)</td></tr>
-<tr class="css1 impl-yes"><td>font-style</td><td>ENUM(normal, italic, oblique)</td></tr>
-<tr class="css1 impl-yes"><td>font-variant</td><td>ENUM(normal, small-caps)</td></tr>
-<tr class="css1 impl-yes"><td>font-weight</td><td>ENUM(normal, bold, bolder, lighter,
-    100, 200, 300, 400, 500, 600, 700, 800, 900), maybe special code for
-    in-between integers</td></tr>
-<tr class="css1 impl-yes"><td>letter-spacing</td><td>COMPOSITE(&lt;length&gt;, normal)</td></tr>
-<tr class="css1 impl-yes"><td>line-height</td><td>COMPOSITE(&lt;number&gt;,
-    &lt;length&gt;, &lt;percentage&gt;, normal)</td></tr>
-<tr class="css1 impl-yes"><td>list-style-position</td><td>ENUM(inside, outside),
-    Strange behavior in browsers</td></tr>
-<tr class="css1 impl-yes"><td>list-style-type</td><td>ENUM(...),
-    Well-supported values are: disc, circle, square,
-    decimal, lower-roman, upper-roman, lower-alpha and upper-alpha. See also
-    CSS 3. Mostly IE lack of support.</td></tr>
-<tr class="css1 impl-yes"><td>list-style</td><td>SHORTHAND</td></tr>
-<tr class="css1 impl-yes"><td>margin</td><td>MULTIPLE</td></tr>
-<tr class="css1 impl-yes"><td>margin-*</td><td>COMPOSITE(&lt;length&gt;,
-    &lt;percentage&gt;, auto)</td></tr>
-<tr class="css1 impl-yes"><td>padding</td><td>MULTIPLE</td></tr>
-<tr class="css1 impl-yes"><td>padding-*</td><td>COMPOSITE(&lt;length&gt;(positive),
-    &lt;percentage&gt;(positive))</td></tr>
-<tr class="css1 impl-yes"><td>text-align</td><td>ENUM(left, right,
-    center, justify)</td></tr>
-<tr class="css1 impl-yes"><td>text-decoration</td><td>No blink (argh my eyes), not
-    enum, can be combined (composite sorta): underline, overline,
-    line-through</td></tr>
-<tr class="css1 impl-yes"><td>text-indent</td><td>COMPOSITE(&lt;length&gt;,
-    &lt;percentage&gt;)</td></tr>
-<tr class="css1 impl-yes"><td>text-transform</td><td>ENUM(capitalize, uppercase,
-    lowercase, none)</td></tr>
-<tr class="css1 impl-yes"><td>width</td><td>COMPOSITE(&lt;length&gt;,
-    &lt;percentage&gt;, auto), Interesting</td></tr>
-<tr class="css1 impl-yes"><td>word-spacing</td><td>COMPOSITE(&lt;length&gt;, auto),
-    IE 5 no support</td></tr>
-</tbody>
-
-<tbody>
-<tr><th colspan="2">Table</th></tr>
-<tr class="impl-yes"><td>border-collapse</td><td>ENUM(collapse, seperate)</td></tr>
-<tr class="impl-yes"><td>border-space</td><td>MULTIPLE</td></tr>
-<tr class="impl-yes"><td>caption-side</td><td>ENUM(top, bottom)</td></tr>
-<tr class="feature"><td>empty-cells</td><td>ENUM(show, hide), No IE support makes this useless,
-    possible fix with &amp;nbsp;? Unknown release milestone.</td></tr>
-<tr class="impl-yes"><td>table-layout</td><td>ENUM(auto, fixed)</td></tr>
-<tr class="impl-yes css1"><td>vertical-align</td><td>COMPOSITE(ENUM(baseline, sub,
-    super, top, text-top, middle, bottom, text-bottom), &lt;percentage&gt;,
-    &lt;length&gt;) Also applies to others with explicit height</td></tr>
-</tbody>
-
-<tbody>
-<tr><th colspan="2">Absolute positioning, unknown release milestone</th></tr>
-<tr class="danger impl-no"><td>bottom</td><td rowspan="4">Dangerous, must be non-negative to even be considered,
-    but it's still possible to arbitrarily position by running over.</td></tr>
-<tr class="danger impl-no"><td>left</td></tr>
-<tr class="danger impl-no"><td>right</td></tr>
-<tr class="danger impl-no"><td>top</td></tr>
-<tr class="impl-no"><td>clip</td><td>-</td></tr>
-<tr class="danger impl-no"><td>position</td><td>ENUM(static, relative, absolute, fixed)
-    relative not absolute?</td></tr>
-<tr class="danger impl-no"><td>z-index</td><td>Dangerous</td></tr>
-</tbody>
-
-<tbody>
-<tr><th colspan="2">Unknown</th></tr>
-<tr class="danger css1 impl-yes"><td>background-image</td><td>Dangerous</td></tr>
-<tr class="css1 impl-yes"><td>background-attachment</td><td>ENUM(scroll, fixed),
-    Depends on background-image</td></tr>
-<tr class="css1 impl-yes"><td>background-position</td><td>Depends on background-image</td></tr>
-<tr class="danger impl-no"><td>cursor</td><td>Dangerous but fluffy</td></tr>
-<tr class="danger impl-yes"><td>display</td><td>ENUM(...), Dangerous but interesting;
-    will not implement list-item, run-in (Opera only) or table (no IE);
-    inline-block has incomplete IE6 support and requires -moz-inline-box
-    for Mozilla. Unknown target milestone.</td></tr>
-<tr class="css1 impl-yes"><td>height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
-<tr class="danger css1 impl-yes"><td>list-style-image</td><td>Dangerous?</td></tr>
-<tr class="impl-no"><td>max-height</td><td rowspan="4">No IE 5/6</td></tr>
-<tr class="impl-no"><td>min-height</td></tr>
-<tr class="impl-no"><td>max-width</td></tr>
-<tr class="impl-no"><td>min-width</td></tr>
-<tr class="impl-no"><td>orphans</td><td>No IE support</td></tr>
-<tr class="impl-no"><td>widows</td><td>No IE support</td></tr>
-<tr><td>overflow</td><td>ENUM, IE 5/6 almost (remove visible if set). Unknown target milestone.</td></tr>
-<tr><td>page-break-after</td><td>ENUM(auto, always, avoid, left, right),
-    IE 5.5/6 and Opera. Unknown target milestone.</td></tr>
-<tr><td>page-break-before</td><td>ENUM(auto, always, avoid, left, right),
-    Mostly supported. Unknown target milestone.</td></tr>
-<tr><td>page-break-inside</td><td>ENUM(avoid, auto), Opera only. Unknown target milestone.</td></tr>
-<tr class="impl-no"><td>quotes</td><td>May be dropped from CSS2, fairly useless for inline context</td></tr>
-<tr class="danger impl-yes"><td>visibility</td><td>ENUM(visible, hidden, collapse),
-    Dangerous</td></tr>
-<tr class="css1 feature impl-partial"><td>white-space</td><td>ENUM(normal, pre, nowrap, pre-wrap,
-    pre-line), Spotty implementation:
-    pre (no IE 5/6), <em>nowrap</em> (no IE 5, supported),
-    pre-wrap (only Opera), pre-line (no support). Fixable? Unknown target milestone.</td></tr>
-</tbody>
-
-<tbody class="impl-no">
-<tr><th colspan="2">Aural</th></tr>
-<tr><td>azimuth</td><td>-</td></tr>
-<tr><td>cue</td><td>-</td></tr>
-<tr><td>cue-after</td><td>-</td></tr>
-<tr><td>cue-before</td><td>-</td></tr>
-<tr><td>elevation</td><td>-</td></tr>
-<tr><td>pause-after</td><td>-</td></tr>
-<tr><td>pause-before</td><td>-</td></tr>
-<tr><td>pause</td><td>-</td></tr>
-<tr><td>pitch-range</td><td>-</td></tr>
-<tr><td>pitch</td><td>-</td></tr>
-<tr><td>play-during</td><td>-</td></tr>
-<tr><td>richness</td><td>-</td></tr>
-<tr><td>speak-header</td><td>Table related</td></tr>
-<tr><td>speak-numeral</td><td>-</td></tr>
-<tr><td>speak-punctuation</td><td>-</td></tr>
-<tr><td>speak</td><td>-</td></tr>
-<tr><td>speech-rate</td><td>-</td></tr>
-<tr><td>stress</td><td>-</td></tr>
-<tr><td>voice-family</td><td>-</td></tr>
-<tr><td>volume</td><td>-</td></tr>
-</tbody>
-
-<tbody class="impl-no">
-<tr><th colspan="2">Will not implement</th></tr>
-<tr><td>content</td><td>Not applicable for inline styles</td></tr>
-<tr><td>counter-increment</td><td>Needs content, Opera only</td></tr>
-<tr><td>counter-reset</td><td>Needs content, Opera only</td></tr>
-<tr><td>direction</td><td>No support</td></tr>
-<tr><td>outline-color</td><td rowspan="4">IE Mac and Opera on outside,
-Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
-    <tr><td>outline-style</td></tr>
-    <tr><td>outline-width</td></tr>
-    <tr><td>outline</td></tr>
-<tr><td>unicode-bidi</td><td>No support</td></tr>
-</tbody>
-
-</table>
-
-<h2>Interesting Attributes</h2>
-
-<table cellspacing="0">
-
-<thead>
-<tr><th>Attribute</th><th>Tags</th><th>Notes</th></tr>
-</thead>
-
-<!--
-<tr><th></th></tr>
-<tbody>
-<tr><td>-</td><td>-</td><td>-</td></tr>
-</tbody>
--->
-
-<tbody>
-<tr><th colspan="3">CSS</th></tr>
-<tr class="impl-yes"><td>style</td><td>All</td><td>Parser is reasonably functional. Status here doesn't count individual properties.</td></tr>
-</tbody>
-
-<tbody>
-<tr><th colspan="3">Questionable</th></tr>
-<tr class="impl-no"><td>accesskey</td><td>A</td><td>May interfere with main interface</td></tr>
-<tr class="impl-no"><td>tabindex</td><td>A</td><td>May interfere with main interface</td></tr>
-<tr class="impl-yes"><td>target</td><td>A</td><td>Config enabled, only useful for frame layouts, disallowed in strict</td></tr>
-</tbody>
-
-<tbody>
-<tr><th colspan="3">Miscellaneous</th></tr>
-<tr><td>datetime</td><td>DEL, INS</td><td>No visible effect, ISO format</td></tr>
-<tr class="impl-yes"><td>rel</td><td>A</td><td>Largely user-defined: nofollow, tag (see microformats)</td></tr>
-<tr class="impl-yes"><td>rev</td><td>A</td><td>Largely user-defined: vote-*</td></tr>
-<tr class="feature"><td>axis</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
-<tr class="feature"><td>char</td><td>COL, COLGROUP, TBODY, TD, TFOOT, TH, THEAD, TR</td><td>W3C only: No browser implementation</td></tr>
-<tr class="feature"><td>headers</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
-<tr class="impl-yes"><td>scope</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
-</tbody>
-
-<tbody class="impl-yes">
-<tr><th colspan="3">URI</th></tr>
-<tr><td rowspan="2">cite</td><td>BLOCKQUOTE, Q</td><td>For attribution</td></tr>
-    <tr><td>DEL, INS</td><td>Link to explanation why it changed</td></tr>
-<tr><td>href</td><td>A</td><td>-</td></tr>
-<tr><td>longdesc</td><td>IMG</td><td>-</td></tr>
-<tr class="required"><td>src</td><td>IMG</td><td>Required</td></tr>
-</tbody>
-
-<tbody>
-<tr><th colspan="3">Transform</th></tr>
-<tr class="impl-yes"><td rowspan="5">align</td><td>CAPTION</td><td>'caption-side' for top/bottom, 'text-align' for left/right</td></tr>
-    <tr class="impl-yes"><td>IMG</td><td rowspan="3">See specimens/html-align-to-css.html</td></tr>
-    <tr class="impl-yes"><td>TABLE</td></tr>
-    <tr class="impl-yes"><td>HR</td></tr>
-    <tr class="impl-yes"><td>H1, H2, H3, H4, H5, H6, P</td><td>Equivalent style 'text-align'</td></tr>
-<tr class="required impl-yes"><td>alt</td><td>IMG</td><td>Required, insert image filename if src is present or default invalid image text</td></tr>
-<tr class="impl-yes"><td rowspan="3">bgcolor</td><td>TABLE</td><td>Superset style 'background-color'</td></tr>
-    <tr class="impl-yes"><td>TR</td><td>Superset style 'background-color'</td></tr>
-    <tr class="impl-yes"><td>TD, TH</td><td>Superset style 'background-color'</td></tr>
-<tr class="impl-yes"><td>border</td><td>IMG</td><td>Equivalent style <code>border:[number]px solid</code></td></tr>
-<tr class="impl-yes"><td>clear</td><td>BR</td><td>Near-equiv style 'clear', transform 'all' into 'both'</td></tr>
-<tr class="impl-no"><td>compact</td><td>DL, OL, UL</td><td>Boolean, needs custom CSS class; rarely used anyway</td></tr>
-<tr class="required impl-yes"><td>dir</td><td>BDO</td><td>Required, insert ltr (or configuration value) if none</td></tr>
-<tr class="impl-yes"><td>height</td><td>TD, TH</td><td>Near-equiv style 'height', needs px suffix if original was in pixels</td></tr>
-<tr class="impl-yes"><td>hspace</td><td>IMG</td><td>Near-equiv styles 'margin-top' and 'margin-bottom', needs px suffix</td></tr>
-<tr class="impl-yes"><td>lang</td><td>*</td><td>Copy value to xml:lang</td></tr>
-<tr class="impl-yes"><td rowspan="2">name</td><td>IMG</td><td>Turn into ID</td></tr>
-    <tr class="impl-yes"><td>A</td><td>Turn into ID</td></tr>
-<tr class="impl-yes"><td>noshade</td><td>HR</td><td>Boolean, style 'border-style:solid;'</td></tr>
-<tr class="impl-yes"><td>nowrap</td><td>TD, TH</td><td>Boolean, style 'white-space:nowrap;' (not compat with IE5)</td></tr>
-<tr class="impl-yes"><td>size</td><td>HR</td><td>Near-equiv 'height', needs px suffix if original was pixels</td></tr>
-<tr class="required impl-yes"><td>src</td><td>IMG</td><td>Required, insert blank or default img if not set</td></tr>
-<tr class="impl-yes"><td>start</td><td>OL</td><td>Poorly supported 'counter-reset', allowed in loose, dropped in strict</td></tr>
-<tr class="impl-yes"><td rowspan="3">type</td><td>LI</td><td rowspan="3">Equivalent style 'list-style-type', different allowed values though. (needs testing)</td></tr>
-    <tr class="impl-yes"><td>OL</td></tr>
-    <tr class="impl-yes"><td>UL</td></tr>
-<tr class="impl-yes"><td>value</td><td>LI</td><td>Poorly supported 'counter-reset', allowed in loose, dropped in strict</td></tr>
-<tr class="impl-yes"><td>vspace</td><td>IMG</td><td>Near-equiv styles 'margin-left' and 'margin-right', needs px suffix, see hspace</td></tr>
-<tr class="impl-yes"><td rowspan="2">width</td><td>HR</td><td rowspan="2">Near-equiv style 'width', needs px suffix if original was pixels</td></tr>
-    <tr class="impl-yes"><td>TD, TH</td></tr>
-</tbody>
-
-</table>
-
-</body></html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/dtd/xhtml1-transitional.dtd b/lib/htmlpurifier/docs/dtd/xhtml1-transitional.dtd
deleted file mode 100644
index 628f27ac5..000000000
--- a/lib/htmlpurifier/docs/dtd/xhtml1-transitional.dtd
+++ /dev/null
@@ -1,1201 +0,0 @@
-<!--
-   Extensible HTML version 1.0 Transitional DTD
-
-   This is the same as HTML 4 Transitional except for
-   changes due to the differences between XML and SGML.
-
-   Namespace = http://www.w3.org/1999/xhtml
-
-   For further information, see: http://www.w3.org/TR/xhtml1
-
-   Copyright (c) 1998-2002 W3C (MIT, INRIA, Keio),
-   All Rights Reserved. 
-
-   This DTD module is identified by the PUBLIC and SYSTEM identifiers:
-
-   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
-   SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
-
-   $Revision: 1.2 $
-   $Date: 2002/08/01 18:37:55 $
-
--->
-
-<!--================ Character mnemonic entities =========================-->
-
-<!ENTITY % HTMLlat1 PUBLIC
-   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
-   "xhtml-lat1.ent">
-%HTMLlat1;
-
-<!ENTITY % HTMLsymbol PUBLIC
-   "-//W3C//ENTITIES Symbols for XHTML//EN"
-   "xhtml-symbol.ent">
-%HTMLsymbol;
-
-<!ENTITY % HTMLspecial PUBLIC
-   "-//W3C//ENTITIES Special for XHTML//EN"
-   "xhtml-special.ent">
-%HTMLspecial;
-
-<!--================== Imported Names ====================================-->
-
-<!ENTITY % ContentType "CDATA">
-    <!-- media type, as per [RFC2045] -->
-
-<!ENTITY % ContentTypes "CDATA">
-    <!-- comma-separated list of media types, as per [RFC2045] -->
-
-<!ENTITY % Charset "CDATA">
-    <!-- a character encoding, as per [RFC2045] -->
-
-<!ENTITY % Charsets "CDATA">
-    <!-- a space separated list of character encodings, as per [RFC2045] -->
-
-<!ENTITY % LanguageCode "NMTOKEN">
-    <!-- a language code, as per [RFC3066] -->
-
-<!ENTITY % Character "CDATA">
-    <!-- a single character, as per section 2.2 of [XML] -->
-
-<!ENTITY % Number "CDATA">
-    <!-- one or more digits -->
-
-<!ENTITY % LinkTypes "CDATA">
-    <!-- space-separated list of link types -->
-
-<!ENTITY % MediaDesc "CDATA">
-    <!-- single or comma-separated list of media descriptors -->
-
-<!ENTITY % URI "CDATA">
-    <!-- a Uniform Resource Identifier, see [RFC2396] -->
-
-<!ENTITY % UriList "CDATA">
-    <!-- a space separated list of Uniform Resource Identifiers -->
-
-<!ENTITY % Datetime "CDATA">
-    <!-- date and time information. ISO date format -->
-
-<!ENTITY % Script "CDATA">
-    <!-- script expression -->
-
-<!ENTITY % StyleSheet "CDATA">
-    <!-- style sheet data -->
-
-<!ENTITY % Text "CDATA">
-    <!-- used for titles etc. -->
-
-<!ENTITY % FrameTarget "NMTOKEN">
-    <!-- render in this frame -->
-
-<!ENTITY % Length "CDATA">
-    <!-- nn for pixels or nn% for percentage length -->
-
-<!ENTITY % MultiLength "CDATA">
-    <!-- pixel, percentage, or relative -->
-
-<!ENTITY % Pixels "CDATA">
-    <!-- integer representing length in pixels -->
-
-<!-- these are used for image maps -->
-
-<!ENTITY % Shape "(rect|circle|poly|default)">
-
-<!ENTITY % Coords "CDATA">
-    <!-- comma separated list of lengths -->
-
-<!-- used for object, applet, img, input and iframe -->
-<!ENTITY % ImgAlign "(top|middle|bottom|left|right)">
-
-<!-- a color using sRGB: #RRGGBB as Hex values -->
-<!ENTITY % Color "CDATA">
-
-<!-- There are also 16 widely known color names with their sRGB values:
-
-    Black  = #000000    Green  = #008000
-    Silver = #C0C0C0    Lime   = #00FF00
-    Gray   = #808080    Olive  = #808000
-    White  = #FFFFFF    Yellow = #FFFF00
-    Maroon = #800000    Navy   = #000080
-    Red    = #FF0000    Blue   = #0000FF
-    Purple = #800080    Teal   = #008080
-    Fuchsia= #FF00FF    Aqua   = #00FFFF
--->
-
-<!--=================== Generic Attributes ===============================-->
-
-<!-- core attributes common to most elements
-  id       document-wide unique id
-  class    space separated list of classes
-  style    associated style info
-  title    advisory title/amplification
--->
-<!ENTITY % coreattrs
- "id          ID             #IMPLIED
-  class       CDATA          #IMPLIED
-  style       %StyleSheet;   #IMPLIED
-  title       %Text;         #IMPLIED"
-  >
-
-<!-- internationalization attributes
-  lang        language code (backwards compatible)
-  xml:lang    language code (as per XML 1.0 spec)
-  dir         direction for weak/neutral text
--->
-<!ENTITY % i18n
- "lang        %LanguageCode; #IMPLIED
-  xml:lang    %LanguageCode; #IMPLIED
-  dir         (ltr|rtl)      #IMPLIED"
-  >
-
-<!-- attributes for common UI events
-  onclick     a pointer button was clicked
-  ondblclick  a pointer button was double clicked
-  onmousedown a pointer button was pressed down
-  onmouseup   a pointer button was released
-  onmousemove a pointer was moved onto the element
-  onmouseout  a pointer was moved away from the element
-  onkeypress  a key was pressed and released
-  onkeydown   a key was pressed down
-  onkeyup     a key was released
--->
-<!ENTITY % events
- "onclick     %Script;       #IMPLIED
-  ondblclick  %Script;       #IMPLIED
-  onmousedown %Script;       #IMPLIED
-  onmouseup   %Script;       #IMPLIED
-  onmouseover %Script;       #IMPLIED
-  onmousemove %Script;       #IMPLIED
-  onmouseout  %Script;       #IMPLIED
-  onkeypress  %Script;       #IMPLIED
-  onkeydown   %Script;       #IMPLIED
-  onkeyup     %Script;       #IMPLIED"
-  >
-
-<!-- attributes for elements that can get the focus
-  accesskey   accessibility key character
-  tabindex    position in tabbing order
-  onfocus     the element got the focus
-  onblur      the element lost the focus
--->
-<!ENTITY % focus
- "accesskey   %Character;    #IMPLIED
-  tabindex    %Number;       #IMPLIED
-  onfocus     %Script;       #IMPLIED
-  onblur      %Script;       #IMPLIED"
-  >
-
-<!ENTITY % attrs "%coreattrs; %i18n; %events;">
-
-<!-- text alignment for p, div, h1-h6. The default is
-     align="left" for ltr headings, "right" for rtl -->
-
-<!ENTITY % TextAlign "align (left|center|right|justify) #IMPLIED">
-
-<!--=================== Text Elements ====================================-->
-
-<!ENTITY % special.extra
-   "object | applet | img | map | iframe">
-	
-<!ENTITY % special.basic
-	"br | span | bdo">
-
-<!ENTITY % special
-   "%special.basic; | %special.extra;">
-
-<!ENTITY % fontstyle.extra "big | small | font | basefont">
-
-<!ENTITY % fontstyle.basic "tt | i | b | u
-                      | s | strike ">
-
-<!ENTITY % fontstyle "%fontstyle.basic; | %fontstyle.extra;">
-
-<!ENTITY % phrase.extra "sub | sup">
-<!ENTITY % phrase.basic "em | strong | dfn | code | q |
-                   samp | kbd | var | cite | abbr | acronym">
-
-<!ENTITY % phrase "%phrase.basic; | %phrase.extra;">
-
-<!ENTITY % inline.forms "input | select | textarea | label | button">
-
-<!-- these can occur at block or inline level -->
-<!ENTITY % misc.inline "ins | del | script">
-
-<!-- these can only occur at block level -->
-<!ENTITY % misc "noscript | %misc.inline;">
-
-<!ENTITY % inline "a | %special; | %fontstyle; | %phrase; | %inline.forms;">
-
-<!-- %Inline; covers inline or "text-level" elements -->
-<!ENTITY % Inline "(#PCDATA | %inline; | %misc.inline;)*">
-
-<!--================== Block level elements ==============================-->
-
-<!ENTITY % heading "h1|h2|h3|h4|h5|h6">
-<!ENTITY % lists "ul | ol | dl | menu | dir">
-<!ENTITY % blocktext "pre | hr | blockquote | address | center | noframes">
-
-<!ENTITY % block
-    "p | %heading; | div | %lists; | %blocktext; | isindex |fieldset | table">
-
-<!-- %Flow; mixes block and inline and is used for list items etc. -->
-<!ENTITY % Flow "(#PCDATA | %block; | form | %inline; | %misc;)*">
-
-<!--================== Content models for exclusions =====================-->
-
-<!-- a elements use %Inline; excluding a -->
-
-<!ENTITY % a.content
-   "(#PCDATA | %special; | %fontstyle; | %phrase; | %inline.forms; | %misc.inline;)*">
-
-<!-- pre uses %Inline excluding img, object, applet, big, small,
-     font, or basefont -->
-
-<!ENTITY % pre.content
-   "(#PCDATA | a | %special.basic; | %fontstyle.basic; | %phrase.basic; |
-	   %inline.forms; | %misc.inline;)*">
-
-<!-- form uses %Flow; excluding form -->
-
-<!ENTITY % form.content "(#PCDATA | %block; | %inline; | %misc;)*">
-
-<!-- button uses %Flow; but excludes a, form, form controls, iframe -->
-
-<!ENTITY % button.content
-   "(#PCDATA | p | %heading; | div | %lists; | %blocktext; |
-      table | br | span | bdo | object | applet | img | map |
-      %fontstyle; | %phrase; | %misc;)*">
-
-<!--================ Document Structure ==================================-->
-
-<!-- the namespace URI designates the document profile -->
-
-<!ELEMENT html (head, body)>
-<!ATTLIST html
-  %i18n;
-  id          ID             #IMPLIED
-  xmlns       %URI;          #FIXED 'http://www.w3.org/1999/xhtml'
-  >
-
-<!--================ Document Head =======================================-->
-
-<!ENTITY % head.misc "(script|style|meta|link|object|isindex)*">
-
-<!-- content model is %head.misc; combined with a single
-     title and an optional base element in any order -->
-
-<!ELEMENT head (%head.misc;,
-     ((title, %head.misc;, (base, %head.misc;)?) |
-      (base, %head.misc;, (title, %head.misc;))))>
-
-<!ATTLIST head
-  %i18n;
-  id          ID             #IMPLIED
-  profile     %URI;          #IMPLIED
-  >
-
-<!-- The title element is not considered part of the flow of text.
-       It should be displayed, for example as the page header or
-       window title. Exactly one title is required per document.
-    -->
-<!ELEMENT title (#PCDATA)>
-<!ATTLIST title 
-  %i18n;
-  id          ID             #IMPLIED
-  >
-
-<!-- document base URI -->
-
-<!ELEMENT base EMPTY>
-<!ATTLIST base
-  id          ID             #IMPLIED
-  href        %URI;          #IMPLIED
-  target      %FrameTarget;  #IMPLIED
-  >
-
-<!-- generic metainformation -->
-<!ELEMENT meta EMPTY>
-<!ATTLIST meta
-  %i18n;
-  id          ID             #IMPLIED
-  http-equiv  CDATA          #IMPLIED
-  name        CDATA          #IMPLIED
-  content     CDATA          #REQUIRED
-  scheme      CDATA          #IMPLIED
-  >
-
-<!--
-  Relationship values can be used in principle:
-
-   a) for document specific toolbars/menus when used
-      with the link element in document head e.g.
-        start, contents, previous, next, index, end, help
-   b) to link to a separate style sheet (rel="stylesheet")
-   c) to make a link to a script (rel="script")
-   d) by stylesheets to control how collections of
-      html nodes are rendered into printed documents
-   e) to make a link to a printable version of this document
-      e.g. a PostScript or PDF version (rel="alternate" media="print")
--->
-
-<!ELEMENT link EMPTY>
-<!ATTLIST link
-  %attrs;
-  charset     %Charset;      #IMPLIED
-  href        %URI;          #IMPLIED
-  hreflang    %LanguageCode; #IMPLIED
-  type        %ContentType;  #IMPLIED
-  rel         %LinkTypes;    #IMPLIED
-  rev         %LinkTypes;    #IMPLIED
-  media       %MediaDesc;    #IMPLIED
-  target      %FrameTarget;  #IMPLIED
-  >
-
-<!-- style info, which may include CDATA sections -->
-<!ELEMENT style (#PCDATA)>
-<!ATTLIST style
-  %i18n;
-  id          ID             #IMPLIED
-  type        %ContentType;  #REQUIRED
-  media       %MediaDesc;    #IMPLIED
-  title       %Text;         #IMPLIED
-  xml:space   (preserve)     #FIXED 'preserve'
-  >
-
-<!-- script statements, which may include CDATA sections -->
-<!ELEMENT script (#PCDATA)>
-<!ATTLIST script
-  id          ID             #IMPLIED
-  charset     %Charset;      #IMPLIED
-  type        %ContentType;  #REQUIRED
-  language    CDATA          #IMPLIED
-  src         %URI;          #IMPLIED
-  defer       (defer)        #IMPLIED
-  xml:space   (preserve)     #FIXED 'preserve'
-  >
-
-<!-- alternate content container for non script-based rendering -->
-
-<!ELEMENT noscript %Flow;>
-<!ATTLIST noscript
-  %attrs;
-  >
-
-<!--======================= Frames =======================================-->
-
-<!-- inline subwindow -->
-
-<!ELEMENT iframe %Flow;>
-<!ATTLIST iframe
-  %coreattrs;
-  longdesc    %URI;          #IMPLIED
-  name        NMTOKEN        #IMPLIED
-  src         %URI;          #IMPLIED
-  frameborder (1|0)          "1"
-  marginwidth %Pixels;       #IMPLIED
-  marginheight %Pixels;      #IMPLIED
-  scrolling   (yes|no|auto)  "auto"
-  align       %ImgAlign;     #IMPLIED
-  height      %Length;       #IMPLIED
-  width       %Length;       #IMPLIED
-  >
-
-<!-- alternate content container for non frame-based rendering -->
-
-<!ELEMENT noframes %Flow;>
-<!ATTLIST noframes
-  %attrs;
-  >
-
-<!--=================== Document Body ====================================-->
-
-<!ELEMENT body %Flow;>
-<!ATTLIST body
-  %attrs;
-  onload      %Script;       #IMPLIED
-  onunload    %Script;       #IMPLIED
-  background  %URI;          #IMPLIED
-  bgcolor     %Color;        #IMPLIED
-  text        %Color;        #IMPLIED
-  link        %Color;        #IMPLIED
-  vlink       %Color;        #IMPLIED
-  alink       %Color;        #IMPLIED
-  >
-
-<!ELEMENT div %Flow;>  <!-- generic language/style container -->
-<!ATTLIST div
-  %attrs;
-  %TextAlign;
-  >
-
-<!--=================== Paragraphs =======================================-->
-
-<!ELEMENT p %Inline;>
-<!ATTLIST p
-  %attrs;
-  %TextAlign;
-  >
-
-<!--=================== Headings =========================================-->
-
-<!--
-  There are six levels of headings from h1 (the most important)
-  to h6 (the least important).
--->
-
-<!ELEMENT h1  %Inline;>
-<!ATTLIST h1
-  %attrs;
-  %TextAlign;
-  >
-
-<!ELEMENT h2 %Inline;>
-<!ATTLIST h2
-  %attrs;
-  %TextAlign;
-  >
-
-<!ELEMENT h3 %Inline;>
-<!ATTLIST h3
-  %attrs;
-  %TextAlign;
-  >
-
-<!ELEMENT h4 %Inline;>
-<!ATTLIST h4
-  %attrs;
-  %TextAlign;
-  >
-
-<!ELEMENT h5 %Inline;>
-<!ATTLIST h5
-  %attrs;
-  %TextAlign;
-  >
-
-<!ELEMENT h6 %Inline;>
-<!ATTLIST h6
-  %attrs;
-  %TextAlign;
-  >
-
-<!--=================== Lists ============================================-->
-
-<!-- Unordered list bullet styles -->
-
-<!ENTITY % ULStyle "(disc|square|circle)">
-
-<!-- Unordered list -->
-
-<!ELEMENT ul (li)+>
-<!ATTLIST ul
-  %attrs;
-  type        %ULStyle;     #IMPLIED
-  compact     (compact)     #IMPLIED
-  >
-
-<!-- Ordered list numbering style
-
-    1   arabic numbers      1, 2, 3, ...
-    a   lower alpha         a, b, c, ...
-    A   upper alpha         A, B, C, ...
-    i   lower roman         i, ii, iii, ...
-    I   upper roman         I, II, III, ...
-
-    The style is applied to the sequence number which by default
-    is reset to 1 for the first list item in an ordered list.
--->
-<!ENTITY % OLStyle "CDATA">
-
-<!-- Ordered (numbered) list -->
-
-<!ELEMENT ol (li)+>
-<!ATTLIST ol
-  %attrs;
-  type        %OLStyle;      #IMPLIED
-  compact     (compact)      #IMPLIED
-  start       %Number;       #IMPLIED
-  >
-
-<!-- single column list (DEPRECATED) --> 
-<!ELEMENT menu (li)+>
-<!ATTLIST menu
-  %attrs;
-  compact     (compact)     #IMPLIED
-  >
-
-<!-- multiple column list (DEPRECATED) --> 
-<!ELEMENT dir (li)+>
-<!ATTLIST dir
-  %attrs;
-  compact     (compact)     #IMPLIED
-  >
-
-<!-- LIStyle is constrained to: "(%ULStyle;|%OLStyle;)" -->
-<!ENTITY % LIStyle "CDATA">
-
-<!-- list item -->
-
-<!ELEMENT li %Flow;>
-<!ATTLIST li
-  %attrs;
-  type        %LIStyle;      #IMPLIED
-  value       %Number;       #IMPLIED
-  >
-
-<!-- definition lists - dt for term, dd for its definition -->
-
-<!ELEMENT dl (dt|dd)+>
-<!ATTLIST dl
-  %attrs;
-  compact     (compact)      #IMPLIED
-  >
-
-<!ELEMENT dt %Inline;>
-<!ATTLIST dt
-  %attrs;
-  >
-
-<!ELEMENT dd %Flow;>
-<!ATTLIST dd
-  %attrs;
-  >
-
-<!--=================== Address ==========================================-->
-
-<!-- information on author -->
-
-<!ELEMENT address (#PCDATA | %inline; | %misc.inline; | p)*>
-<!ATTLIST address
-  %attrs;
-  >
-
-<!--=================== Horizontal Rule ==================================-->
-
-<!ELEMENT hr EMPTY>
-<!ATTLIST hr
-  %attrs;
-  align       (left|center|right) #IMPLIED
-  noshade     (noshade)      #IMPLIED
-  size        %Pixels;       #IMPLIED
-  width       %Length;       #IMPLIED
-  >
-
-<!--=================== Preformatted Text ================================-->
-
-<!-- content is %Inline; excluding 
-        "img|object|applet|big|small|sub|sup|font|basefont" -->
-
-<!ELEMENT pre %pre.content;>
-<!ATTLIST pre
-  %attrs;
-  width       %Number;      #IMPLIED
-  xml:space   (preserve)    #FIXED 'preserve'
-  >
-
-<!--=================== Block-like Quotes ================================-->
-
-<!ELEMENT blockquote %Flow;>
-<!ATTLIST blockquote
-  %attrs;
-  cite        %URI;          #IMPLIED
-  >
-
-<!--=================== Text alignment ===================================-->
-
-<!-- center content -->
-<!ELEMENT center %Flow;>
-<!ATTLIST center
-  %attrs;
-  >
-
-<!--=================== Inserted/Deleted Text ============================-->
-
-<!--
-  ins/del are allowed in block and inline content, but its
-  inappropriate to include block content within an ins element
-  occurring in inline content.
--->
-<!ELEMENT ins %Flow;>
-<!ATTLIST ins
-  %attrs;
-  cite        %URI;          #IMPLIED
-  datetime    %Datetime;     #IMPLIED
-  >
-
-<!ELEMENT del %Flow;>
-<!ATTLIST del
-  %attrs;
-  cite        %URI;          #IMPLIED
-  datetime    %Datetime;     #IMPLIED
-  >
-
-<!--================== The Anchor Element ================================-->
-
-<!-- content is %Inline; except that anchors shouldn't be nested -->
-
-<!ELEMENT a %a.content;>
-<!ATTLIST a
-  %attrs;
-  %focus;
-  charset     %Charset;      #IMPLIED
-  type        %ContentType;  #IMPLIED
-  name        NMTOKEN        #IMPLIED
-  href        %URI;          #IMPLIED
-  hreflang    %LanguageCode; #IMPLIED
-  rel         %LinkTypes;    #IMPLIED
-  rev         %LinkTypes;    #IMPLIED
-  shape       %Shape;        "rect"
-  coords      %Coords;       #IMPLIED
-  target      %FrameTarget;  #IMPLIED
-  >
-
-<!--===================== Inline Elements ================================-->
-
-<!ELEMENT span %Inline;> <!-- generic language/style container -->
-<!ATTLIST span
-  %attrs;
-  >
-
-<!ELEMENT bdo %Inline;>  <!-- I18N BiDi over-ride -->
-<!ATTLIST bdo
-  %coreattrs;
-  %events;
-  lang        %LanguageCode; #IMPLIED
-  xml:lang    %LanguageCode; #IMPLIED
-  dir         (ltr|rtl)      #REQUIRED
-  >
-
-<!ELEMENT br EMPTY>   <!-- forced line break -->
-<!ATTLIST br
-  %coreattrs;
-  clear       (left|all|right|none) "none"
-  >
-
-<!ELEMENT em %Inline;>   <!-- emphasis -->
-<!ATTLIST em %attrs;>
-
-<!ELEMENT strong %Inline;>   <!-- strong emphasis -->
-<!ATTLIST strong %attrs;>
-
-<!ELEMENT dfn %Inline;>   <!-- definitional -->
-<!ATTLIST dfn %attrs;>
-
-<!ELEMENT code %Inline;>   <!-- program code -->
-<!ATTLIST code %attrs;>
-
-<!ELEMENT samp %Inline;>   <!-- sample -->
-<!ATTLIST samp %attrs;>
-
-<!ELEMENT kbd %Inline;>  <!-- something user would type -->
-<!ATTLIST kbd %attrs;>
-
-<!ELEMENT var %Inline;>   <!-- variable -->
-<!ATTLIST var %attrs;>
-
-<!ELEMENT cite %Inline;>   <!-- citation -->
-<!ATTLIST cite %attrs;>
-
-<!ELEMENT abbr %Inline;>   <!-- abbreviation -->
-<!ATTLIST abbr %attrs;>
-
-<!ELEMENT acronym %Inline;>   <!-- acronym -->
-<!ATTLIST acronym %attrs;>
-
-<!ELEMENT q %Inline;>   <!-- inlined quote -->
-<!ATTLIST q
-  %attrs;
-  cite        %URI;          #IMPLIED
-  >
-
-<!ELEMENT sub %Inline;> <!-- subscript -->
-<!ATTLIST sub %attrs;>
-
-<!ELEMENT sup %Inline;> <!-- superscript -->
-<!ATTLIST sup %attrs;>
-
-<!ELEMENT tt %Inline;>   <!-- fixed pitch font -->
-<!ATTLIST tt %attrs;>
-
-<!ELEMENT i %Inline;>   <!-- italic font -->
-<!ATTLIST i %attrs;>
-
-<!ELEMENT b %Inline;>   <!-- bold font -->
-<!ATTLIST b %attrs;>
-
-<!ELEMENT big %Inline;>   <!-- bigger font -->
-<!ATTLIST big %attrs;>
-
-<!ELEMENT small %Inline;>   <!-- smaller font -->
-<!ATTLIST small %attrs;>
-
-<!ELEMENT u %Inline;>   <!-- underline -->
-<!ATTLIST u %attrs;>
-
-<!ELEMENT s %Inline;>   <!-- strike-through -->
-<!ATTLIST s %attrs;>
-
-<!ELEMENT strike %Inline;>   <!-- strike-through -->
-<!ATTLIST strike %attrs;>
-
-<!ELEMENT basefont EMPTY>  <!-- base font size -->
-<!ATTLIST basefont
-  id          ID             #IMPLIED
-  size        CDATA          #REQUIRED
-  color       %Color;        #IMPLIED
-  face        CDATA          #IMPLIED
-  >
-
-<!ELEMENT font %Inline;> <!-- local change to font -->
-<!ATTLIST font
-  %coreattrs;
-  %i18n;
-  size        CDATA          #IMPLIED
-  color       %Color;        #IMPLIED
-  face        CDATA          #IMPLIED
-  >
-
-<!--==================== Object ======================================-->
-<!--
-  object is used to embed objects as part of HTML pages.
-  param elements should precede other content. Parameters
-  can also be expressed as attribute/value pairs on the
-  object element itself when brevity is desired.
--->
-
-<!ELEMENT object (#PCDATA | param | %block; | form | %inline; | %misc;)*>
-<!ATTLIST object
-  %attrs;
-  declare     (declare)      #IMPLIED
-  classid     %URI;          #IMPLIED
-  codebase    %URI;          #IMPLIED
-  data        %URI;          #IMPLIED
-  type        %ContentType;  #IMPLIED
-  codetype    %ContentType;  #IMPLIED
-  archive     %UriList;      #IMPLIED
-  standby     %Text;         #IMPLIED
-  height      %Length;       #IMPLIED
-  width       %Length;       #IMPLIED
-  usemap      %URI;          #IMPLIED
-  name        NMTOKEN        #IMPLIED
-  tabindex    %Number;       #IMPLIED
-  align       %ImgAlign;     #IMPLIED
-  border      %Pixels;       #IMPLIED
-  hspace      %Pixels;       #IMPLIED
-  vspace      %Pixels;       #IMPLIED
-  >
-
-<!--
-  param is used to supply a named property value.
-  In XML it would seem natural to follow RDF and support an
-  abbreviated syntax where the param elements are replaced
-  by attribute value pairs on the object start tag.
--->
-<!ELEMENT param EMPTY>
-<!ATTLIST param
-  id          ID             #IMPLIED
-  name        CDATA          #REQUIRED
-  value       CDATA          #IMPLIED
-  valuetype   (data|ref|object) "data"
-  type        %ContentType;  #IMPLIED
-  >
-
-<!--=================== Java applet ==================================-->
-<!--
-  One of code or object attributes must be present.
-  Place param elements before other content.
--->
-<!ELEMENT applet (#PCDATA | param | %block; | form | %inline; | %misc;)*>
-<!ATTLIST applet
-  %coreattrs;
-  codebase    %URI;          #IMPLIED
-  archive     CDATA          #IMPLIED
-  code        CDATA          #IMPLIED
-  object      CDATA          #IMPLIED
-  alt         %Text;         #IMPLIED
-  name        NMTOKEN        #IMPLIED
-  width       %Length;       #REQUIRED
-  height      %Length;       #REQUIRED
-  align       %ImgAlign;     #IMPLIED
-  hspace      %Pixels;       #IMPLIED
-  vspace      %Pixels;       #IMPLIED
-  >
-
-<!--=================== Images ===========================================-->
-
-<!--
-   To avoid accessibility problems for people who aren't
-   able to see the image, you should provide a text
-   description using the alt and longdesc attributes.
-   In addition, avoid the use of server-side image maps.
--->
-
-<!ELEMENT img EMPTY>
-<!ATTLIST img
-  %attrs;
-  src         %URI;          #REQUIRED
-  alt         %Text;         #REQUIRED
-  name        NMTOKEN        #IMPLIED
-  longdesc    %URI;          #IMPLIED
-  height      %Length;       #IMPLIED
-  width       %Length;       #IMPLIED
-  usemap      %URI;          #IMPLIED
-  ismap       (ismap)        #IMPLIED
-  align       %ImgAlign;     #IMPLIED
-  border      %Length;       #IMPLIED
-  hspace      %Pixels;       #IMPLIED
-  vspace      %Pixels;       #IMPLIED
-  >
-
-<!-- usemap points to a map element which may be in this document
-  or an external document, although the latter is not widely supported -->
-
-<!--================== Client-side image maps ============================-->
-
-<!-- These can be placed in the same document or grouped in a
-     separate document although this isn't yet widely supported -->
-
-<!ELEMENT map ((%block; | form | %misc;)+ | area+)>
-<!ATTLIST map
-  %i18n;
-  %events;
-  id          ID             #REQUIRED
-  class       CDATA          #IMPLIED
-  style       %StyleSheet;   #IMPLIED
-  title       %Text;         #IMPLIED
-  name        CDATA          #IMPLIED
-  >
-
-<!ELEMENT area EMPTY>
-<!ATTLIST area
-  %attrs;
-  %focus;
-  shape       %Shape;        "rect"
-  coords      %Coords;       #IMPLIED
-  href        %URI;          #IMPLIED
-  nohref      (nohref)       #IMPLIED
-  alt         %Text;         #REQUIRED
-  target      %FrameTarget;  #IMPLIED
-  >
-
-<!--================ Forms ===============================================-->
-
-<!ELEMENT form %form.content;>   <!-- forms shouldn't be nested -->
-
-<!ATTLIST form
-  %attrs;
-  action      %URI;          #REQUIRED
-  method      (get|post)     "get"
-  name        NMTOKEN        #IMPLIED
-  enctype     %ContentType;  "application/x-www-form-urlencoded"
-  onsubmit    %Script;       #IMPLIED
-  onreset     %Script;       #IMPLIED
-  accept      %ContentTypes; #IMPLIED
-  accept-charset %Charsets;  #IMPLIED
-  target      %FrameTarget;  #IMPLIED
-  >
-
-<!--
-  Each label must not contain more than ONE field
-  Label elements shouldn't be nested.
--->
-<!ELEMENT label %Inline;>
-<!ATTLIST label
-  %attrs;
-  for         IDREF          #IMPLIED
-  accesskey   %Character;    #IMPLIED
-  onfocus     %Script;       #IMPLIED
-  onblur      %Script;       #IMPLIED
-  >
-
-<!ENTITY % InputType
-  "(text | password | checkbox |
-    radio | submit | reset |
-    file | hidden | image | button)"
-   >
-
-<!-- the name attribute is required for all but submit & reset -->
-
-<!ELEMENT input EMPTY>     <!-- form control -->
-<!ATTLIST input
-  %attrs;
-  %focus;
-  type        %InputType;    "text"
-  name        CDATA          #IMPLIED
-  value       CDATA          #IMPLIED
-  checked     (checked)      #IMPLIED
-  disabled    (disabled)     #IMPLIED
-  readonly    (readonly)     #IMPLIED
-  size        CDATA          #IMPLIED
-  maxlength   %Number;       #IMPLIED
-  src         %URI;          #IMPLIED
-  alt         CDATA          #IMPLIED
-  usemap      %URI;          #IMPLIED
-  onselect    %Script;       #IMPLIED
-  onchange    %Script;       #IMPLIED
-  accept      %ContentTypes; #IMPLIED
-  align       %ImgAlign;     #IMPLIED
-  >
-
-<!ELEMENT select (optgroup|option)+>  <!-- option selector -->
-<!ATTLIST select
-  %attrs;
-  name        CDATA          #IMPLIED
-  size        %Number;       #IMPLIED
-  multiple    (multiple)     #IMPLIED
-  disabled    (disabled)     #IMPLIED
-  tabindex    %Number;       #IMPLIED
-  onfocus     %Script;       #IMPLIED
-  onblur      %Script;       #IMPLIED
-  onchange    %Script;       #IMPLIED
-  >
-
-<!ELEMENT optgroup (option)+>   <!-- option group -->
-<!ATTLIST optgroup
-  %attrs;
-  disabled    (disabled)     #IMPLIED
-  label       %Text;         #REQUIRED
-  >
-
-<!ELEMENT option (#PCDATA)>     <!-- selectable choice -->
-<!ATTLIST option
-  %attrs;
-  selected    (selected)     #IMPLIED
-  disabled    (disabled)     #IMPLIED
-  label       %Text;         #IMPLIED
-  value       CDATA          #IMPLIED
-  >
-
-<!ELEMENT textarea (#PCDATA)>     <!-- multi-line text field -->
-<!ATTLIST textarea
-  %attrs;
-  %focus;
-  name        CDATA          #IMPLIED
-  rows        %Number;       #REQUIRED
-  cols        %Number;       #REQUIRED
-  disabled    (disabled)     #IMPLIED
-  readonly    (readonly)     #IMPLIED
-  onselect    %Script;       #IMPLIED
-  onchange    %Script;       #IMPLIED
-  >
-
-<!--
-  The fieldset element is used to group form fields.
-  Only one legend element should occur in the content
-  and if present should only be preceded by whitespace.
--->
-<!ELEMENT fieldset (#PCDATA | legend | %block; | form | %inline; | %misc;)*>
-<!ATTLIST fieldset
-  %attrs;
-  >
-
-<!ENTITY % LAlign "(top|bottom|left|right)">
-
-<!ELEMENT legend %Inline;>     <!-- fieldset label -->
-<!ATTLIST legend
-  %attrs;
-  accesskey   %Character;    #IMPLIED
-  align       %LAlign;       #IMPLIED
-  >
-
-<!--
- Content is %Flow; excluding a, form, form controls, iframe
---> 
-<!ELEMENT button %button.content;>  <!-- push button -->
-<!ATTLIST button
-  %attrs;
-  %focus;
-  name        CDATA          #IMPLIED
-  value       CDATA          #IMPLIED
-  type        (button|submit|reset) "submit"
-  disabled    (disabled)     #IMPLIED
-  >
-
-<!-- single-line text input control (DEPRECATED) -->
-<!ELEMENT isindex EMPTY>
-<!ATTLIST isindex
-  %coreattrs;
-  %i18n;
-  prompt      %Text;         #IMPLIED
-  >
-
-<!--======================= Tables =======================================-->
-
-<!-- Derived from IETF HTML table standard, see [RFC1942] -->
-
-<!--
- The border attribute sets the thickness of the frame around the
- table. The default units are screen pixels.
-
- The frame attribute specifies which parts of the frame around
- the table should be rendered. The values are not the same as
- CALS to avoid a name clash with the valign attribute.
--->
-<!ENTITY % TFrame "(void|above|below|hsides|lhs|rhs|vsides|box|border)">
-
-<!--
- The rules attribute defines which rules to draw between cells:
-
- If rules is absent then assume:
-     "none" if border is absent or border="0" otherwise "all"
--->
-
-<!ENTITY % TRules "(none | groups | rows | cols | all)">
-  
-<!-- horizontal placement of table relative to document -->
-<!ENTITY % TAlign "(left|center|right)">
-
-<!-- horizontal alignment attributes for cell contents
-
-  char        alignment char, e.g. char=':'
-  charoff     offset for alignment char
--->
-<!ENTITY % cellhalign
-  "align      (left|center|right|justify|char) #IMPLIED
-   char       %Character;    #IMPLIED
-   charoff    %Length;       #IMPLIED"
-  >
-
-<!-- vertical alignment attributes for cell contents -->
-<!ENTITY % cellvalign
-  "valign     (top|middle|bottom|baseline) #IMPLIED"
-  >
-
-<!ELEMENT table
-     (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
-<!ELEMENT caption  %Inline;>
-<!ELEMENT thead    (tr)+>
-<!ELEMENT tfoot    (tr)+>
-<!ELEMENT tbody    (tr)+>
-<!ELEMENT colgroup (col)*>
-<!ELEMENT col      EMPTY>
-<!ELEMENT tr       (th|td)+>
-<!ELEMENT th       %Flow;>
-<!ELEMENT td       %Flow;>
-
-<!ATTLIST table
-  %attrs;
-  summary     %Text;         #IMPLIED
-  width       %Length;       #IMPLIED
-  border      %Pixels;       #IMPLIED
-  frame       %TFrame;       #IMPLIED
-  rules       %TRules;       #IMPLIED
-  cellspacing %Length;       #IMPLIED
-  cellpadding %Length;       #IMPLIED
-  align       %TAlign;       #IMPLIED
-  bgcolor     %Color;        #IMPLIED
-  >
-
-<!ENTITY % CAlign "(top|bottom|left|right)">
-
-<!ATTLIST caption
-  %attrs;
-  align       %CAlign;       #IMPLIED
-  >
-
-<!--
-colgroup groups a set of col elements. It allows you to group
-several semantically related columns together.
--->
-<!ATTLIST colgroup
-  %attrs;
-  span        %Number;       "1"
-  width       %MultiLength;  #IMPLIED
-  %cellhalign;
-  %cellvalign;
-  >
-
-<!--
- col elements define the alignment properties for cells in
- one or more columns.
-
- The width attribute specifies the width of the columns, e.g.
-
-     width=64        width in screen pixels
-     width=0.5*      relative width of 0.5
-
- The span attribute causes the attributes of one
- col element to apply to more than one column.
--->
-<!ATTLIST col
-  %attrs;
-  span        %Number;       "1"
-  width       %MultiLength;  #IMPLIED
-  %cellhalign;
-  %cellvalign;
-  >
-
-<!--
-    Use thead to duplicate headers when breaking table
-    across page boundaries, or for static headers when
-    tbody sections are rendered in scrolling panel.
-
-    Use tfoot to duplicate footers when breaking table
-    across page boundaries, or for static footers when
-    tbody sections are rendered in scrolling panel.
-
-    Use multiple tbody sections when rules are needed
-    between groups of table rows.
--->
-<!ATTLIST thead
-  %attrs;
-  %cellhalign;
-  %cellvalign;
-  >
-
-<!ATTLIST tfoot
-  %attrs;
-  %cellhalign;
-  %cellvalign;
-  >
-
-<!ATTLIST tbody
-  %attrs;
-  %cellhalign;
-  %cellvalign;
-  >
-
-<!ATTLIST tr
-  %attrs;
-  %cellhalign;
-  %cellvalign;
-  bgcolor     %Color;        #IMPLIED
-  >
-
-<!-- Scope is simpler than headers attribute for common tables -->
-<!ENTITY % Scope "(row|col|rowgroup|colgroup)">
-
-<!-- th is for headers, td for data and for cells acting as both -->
-
-<!ATTLIST th
-  %attrs;
-  abbr        %Text;         #IMPLIED
-  axis        CDATA          #IMPLIED
-  headers     IDREFS         #IMPLIED
-  scope       %Scope;        #IMPLIED
-  rowspan     %Number;       "1"
-  colspan     %Number;       "1"
-  %cellhalign;
-  %cellvalign;
-  nowrap      (nowrap)       #IMPLIED
-  bgcolor     %Color;        #IMPLIED
-  width       %Length;       #IMPLIED
-  height      %Length;       #IMPLIED
-  >
-
-<!ATTLIST td
-  %attrs;
-  abbr        %Text;         #IMPLIED
-  axis        CDATA          #IMPLIED
-  headers     IDREFS         #IMPLIED
-  scope       %Scope;        #IMPLIED
-  rowspan     %Number;       "1"
-  colspan     %Number;       "1"
-  %cellhalign;
-  %cellvalign;
-  nowrap      (nowrap)       #IMPLIED
-  bgcolor     %Color;        #IMPLIED
-  width       %Length;       #IMPLIED
-  height      %Length;       #IMPLIED
-  >
-
diff --git a/lib/htmlpurifier/docs/enduser-customize.html b/lib/htmlpurifier/docs/enduser-customize.html
deleted file mode 100644
index 7e1ffa260..000000000
--- a/lib/htmlpurifier/docs/enduser-customize.html
+++ /dev/null
@@ -1,850 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Tutorial for customizing HTML Purifier's tag and attribute sets." />
-<link rel="stylesheet" type="text/css" href="style.css" />
-
-<title>Customize - HTML Purifier</title>
-
-</head><body>
-
-<h1 class="subtitled">Customize!</h1>
-<div class="subtitle">HTML Purifier is a Swiss-Army Knife</div>
-
-<div id="filing">Filed under End-User</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>
-  HTML Purifier has this quirk where if you try to allow certain elements or
-  attributes, HTML Purifier will tell you that it's not supported, and that
-  you should go to the forums to find out how to implement it. Well, this
-  document is how to implement elements and attributes which HTML Purifier
-  doesn't support out of the box.
-</p>
-
-<h2>Is it necessary?</h2>
-
-<p>
-  Before we even write any code, it is paramount to consider whether or
-  not the code we're writing is necessary or not. HTML Purifier, by default,
-  contains a large set of elements and attributes: large enough so that
-  <em>any</em> element or attribute in XHTML 1.0 or 1.1 (and its HTML variants)
-  that can be safely used by the general public is implemented.
-</p>
-
-<p>
-  So what needs to be implemented? (Feel free to skip this section if
-  you know what you want).
-</p>
-
-<h3>XHTML 1.0</h3>
-
-<p>
-  All of the modules listed below are based off of the
-  <a href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#sec_5.2.">modularization of
-  XHTML</a>, which, while technically for XHTML 1.1, is quite a useful
-  resource.
-</p>
-
-<ul>
-  <li>Structure</li>
-  <li>Frames</li>
-  <li>Applets (deprecated)</li>
-  <li>Forms</li>
-  <li>Image maps</li>
-  <li>Objects</li>
-  <li>Frames</li>
-  <li>Events</li>
-  <li>Meta-information</li>
-  <li>Style sheets</li>
-  <li>Link (not hypertext)</li>
-  <li>Base</li>
-  <li>Name</li>
-</ul>
-
-<p>
-  If you don't recognize it, you probably don't need it. But the curious
-  can look all of these modules up in the above-mentioned document.  Note
-  that inline scripting comes packaged with HTML Purifier (more on this
-  later).
-</p>
-
-<h3>XHTML 1.1</h3>
-
-<p>
-  As of HTMLPurifier 2.1.0, we have implemented the
-  <a href="http://www.w3.org/TR/2001/REC-ruby-20010531/">Ruby module</a>,
-  which defines a set of tags
-  for publishing short annotations for text, used mostly in Japanese
-  and Chinese school texts, but applicable for positioning any text (not
-  limited to translations) above or below other corresponding text.
-</p>
-
-<h3>HTML 5</h3>
-
-<p>
-  <a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML 5</a>
-  is a fork of HTML 4.01 by WHATWG, who believed that XHTML 2.0 was headed
-  in the wrong direction.  It too is a working draft, and may change
-  drastically before publication, but it should be noted that the
-  <code>canvas</code> tag has been implemented by many browser vendors.
-</p>
-
-<h3>Proprietary</h3>
-
-<p>
-  There are a number of proprietary tags still in the wild. Many of them
-  have been documented in <a href="ref-proprietary-tags.txt">ref-proprietary-tags.txt</a>,
-  but there is currently no implementation for any of them.
-</p>
-
-<h3>Extensions</h3>
-
-<p>
-  There are also a number of other XML languages out there that can
-  be embedded in HTML documents: two of the most popular are MathML and
-  SVG, and I frequently get requests to implement these.  But they are
-  expansive, comprehensive specifications, and it would take far too long
-  to implement them <em>correctly</em> (most systems I've seen go as far
-  as whitelisting tags and no further; come on, what about nesting!)
-</p>
-
-<p>
-  Word of warning: HTML Purifier is currently <em>not</em> namespace
-  aware.
-</p>
-
-<h2>Giving back</h2>
-
-<p>
-  As you may imagine from the details above (don't be abashed if you didn't
-  read it all: a glance over would have done), there's quite a bit that
-  HTML Purifier doesn't implement.  Recent architectural changes have
-  allowed HTML Purifier to implement elements and attributes that are not
-  safe!  Don't worry, they won't be activated unless you set %HTML.Trusted
-  to true, but they certainly help out users who need to put, say, forms
-  on their page and don't want to go through the trouble of reading this
-  and implementing it themself.
-</p>
-
-<p>
-  So any of the above that you implement for your own application could
-  help out some other poor sap on the other side of the globe.  Help us
-  out, and send back code so that it can be hammered into a module and
-  released with the core.  Any code would be greatly appreciated!
-</p>
-
-<h2>And now...</h2>
-
-<p>
-  Enough philosophical talk, time for some code:
-</p>
-
-<pre>$config = HTMLPurifier_Config::createDefault();
-$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
-$config-&gt;set('HTML.DefinitionRev', 1);
-if ($def = $config-&gt;maybeGetRawHTMLDefinition()) {
-    // our code will go here
-}</pre>
-
-<p>
-  Assuming that HTML Purifier has already been properly loaded (hint:
-  include <code>HTMLPurifier.auto.php</code>), this code will set up
-  the environment that you need to start customizing the HTML definition.
-  What's going on?
-</p>
-
-<ul>
-  <li>
-    The first three lines are regular configuration code:
-    <ul>
-      <li>
-        %HTML.DefinitionID is set to a unique identifier for your
-        custom HTML definition.  This prevents it from clobbering
-        other custom definitions on the same installation.
-      </li>
-      <li>
-        %HTML.DefinitionRev is a revision integer of your HTML
-        definition.  Because HTML definitions are cached, you'll need
-        to increment this whenever you make a change in order to flush
-        the cache.
-      </li>
-    </ul>
-  </li>
-  <li>
-    The fourth line retrieves a raw <code>HTMLPurifier_HTMLDefinition</code>
-    object that we will be tweaking.  Interestingly enough, we have
-    placed it in an if block: this is because
-    <code>maybeGetRawHTMLDefinition</code>, as its name suggests, may
-    return a NULL, in which case we should skip doing any
-    initialization.  This, in fact, will correspond to when our fully
-    customized object is already in the cache.
-  </li>
-</ul>
-
-<h2>Turn off caching</h2>
-
-<p>
-  To make development easier, we're going to temporarily turn off
-  definition caching:
-</p>
-
-<pre>$config = HTMLPurifier_Config::createDefault();
-$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
-$config-&gt;set('HTML.DefinitionRev', 1);
-<strong>$config-&gt;set('Cache.DefinitionImpl', null); // TODO: remove this later!</strong>
-$def = $config-&gt;getHTMLDefinition(true);</pre>
-
-<p>
-  A few things should be mentioned about the caching mechanism before
-  we move on.  For performance reasons, HTML Purifier caches generated
-  <code>HTMLPurifier_Definition</code> objects in serialized files
-  stored (by default) in <code>library/HTMLPurifier/DefinitionCache/Serializer</code>.
-  A lot of processing is done in order to create these objects, so it
-  makes little sense to repeat the same processing over and over again
-  whenever HTML Purifier is called.
-</p>
-
-<p>
-  In order to identify a cache entry, HTML Purifier uses three variables:
-  the library's version number, the value of %HTML.DefinitionRev and
-  a serial of relevant configuration.  Whenever any of these changes,
-  a new HTML definition is generated.  Notice that there is no way
-  for the definition object to track changes to customizations: here, it
-  is up to you to supply appropriate information to DefinitionID and
-  DefinitionRev.
-</p>
-
-<h2 id="addAttribute">Add an attribute</h2>
-
-<p>
-  For this example, we're going to implement the <code>target</code> attribute found
-  on <code>a</code> elements.  To implement an attribute, we have to
-  ask a few questions:
-</p>
-
-<ol>
-  <li>What element is it found on?</li>
-  <li>What is its name?</li>
-  <li>Is it required or optional?</li>
-  <li>What are valid values for it?</li>
-</ol>
-
-<p>
-  The first three are easy: the element is <code>a</code>, the attribute
-  is <code>target</code>, and it is not a required attribute. (If it
-  was required, we'd need to append an asterisk to the attribute name,
-  you'll see an example of this in the addElement() example).
-</p>
-
-<p>
-  The last question is a little trickier.
-  Lets allow the special values: _blank, _self, _target and _top.
-  The form of this is called an <strong>enumeration</strong>, a list of
-  valid values, although only one can be used at a time.  To translate
-  this into code form, we write:
-</p>
-
-<pre>$config = HTMLPurifier_Config::createDefault();
-$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
-$config-&gt;set('HTML.DefinitionRev', 1);
-$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
-$def = $config-&gt;getHTMLDefinition(true);
-<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre>
-
-<p>
-  The <code>Enum#_blank,_self,_target,_top</code> does all the magic.
-  The string is split into two parts, separated by a hash mark (#):
-</p>
-
-<ol>
-  <li>The first part is the name of what we call an <code>AttrDef</code></li>
-  <li>The second part is the parameter of the above-mentioned <code>AttrDef</code></li>
-</ol>
-
-<p>
-  If that sounds vague and generic, it's because it is!  HTML Purifier defines
-  an assortment of different attribute types one can use, and each of these
-  has their own specialized parameter format.  Here are some of the more useful
-  ones:
-</p>
-
-<table class="table">
-  <thead>
-    <tr>
-      <th>Type</th>
-      <th>Format</th>
-      <th>Description</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <th>Enum</th>
-      <td><em>[s:]</em>value1,value2,...</td>
-      <td>
-        Attribute with a number of valid values, one of which may be used. When
-        s: is present, the enumeration is case sensitive.
-      </td>
-    </tr>
-    <tr>
-      <th>Bool</th>
-      <td>attribute_name</td>
-      <td>
-        Boolean attribute, with only one valid value: the name
-        of the attribute.
-      </td>
-    </tr>
-    <tr>
-      <th>CDATA</th>
-      <td></td>
-      <td>
-        Attribute of arbitrary text. Can also be referred to as <strong>Text</strong>
-        (the specification makes a semantic distinction between the two).
-      </td>
-    </tr>
-    <tr>
-      <th>ID</th>
-      <td></td>
-      <td>
-        Attribute that specifies a unique ID
-      </td>
-    </tr>
-    <tr>
-      <th>Pixels</th>
-      <td></td>
-      <td>
-        Attribute that specifies an integer pixel length
-      </td>
-    </tr>
-    <tr>
-      <th>Length</th>
-      <td></td>
-      <td>
-        Attribute that specifies a pixel or percentage length
-      </td>
-    </tr>
-    <tr>
-      <th>NMTOKENS</th>
-      <td></td>
-      <td>
-        Attribute that specifies a number of name tokens, example: the
-        <code>class</code> attribute
-      </td>
-    </tr>
-    <tr>
-      <th>URI</th>
-      <td></td>
-      <td>
-        Attribute that specifies a URI, example: the <code>href</code>
-        attribute
-      </td>
-    </tr>
-    <tr>
-      <th>Number</th>
-      <td></td>
-      <td>
-        Attribute that specifies an positive integer number
-      </td>
-    </tr>
-  </tbody>
-</table>
-
-<p>
-  For a complete list, consult
-  <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>;
-  more information on attributes that accept parameters can be found on their
-  respective includes in
-  <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/AttrDef"><code>library/HTMLPurifier/AttrDef</code></a>.
-</p>
-
-<p>
-  Sometimes, the restrictive list in AttrTypes just doesn't cut it. Don't
-  sweat: you can also use a fully instantiated object as the value. The
-  equivalent, verbose form of the above example is:
-</p>
-
-<pre>$config = HTMLPurifier_Config::createDefault();
-$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
-$config-&gt;set('HTML.DefinitionRev', 1);
-$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
-$def = $config-&gt;getHTMLDefinition(true);
-<strong>$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
-  array('_blank','_self','_target','_top')
-));</strong></pre>
-
-<p>
-  Trust me, you'll learn to love the shorthand.
-</p>
-
-<h2>Add an element</h2>
-
-<p>
-  Adding attributes is really small-fry stuff, though, and it was possible
-  to add them (albeit a bit more wordy) prior to 2.0. The real gem of
-  the Advanced API is adding elements. There are five questions to
-  ask when adding a new element:
-</p>
-
-<ol>
-  <li>What is the element's name?</li>
-  <li>What content set does this element belong to?</li>
-  <li>What are the allowed children of this element?</li>
-  <li>What attributes does the element allow that are general?</li>
-  <li>What attributes does the element allow that are specific to this element?</li>
-</ol>
-
-<p>
-  It's a mouthful, and you'll be slightly lost if your not familiar with
-  the HTML specification, so let's explain them step by step.
-</p>
-
-<h3>Content set</h3>
-
-<p>
-  The HTML specification defines two major content sets: Inline
-  and Block.  Each of these
-  content sets contain a list of elements: Inline contains things like
-  <code>span</code> and <code>b</code> while Block contains things like
-  <code>div</code> and <code>blockquote</code>.
-</p>
-
-<p>
-  These content sets amount to a macro mechanism for HTML definition. Most
-  elements in HTML are organized into one of these two sets, and most
-  elements in HTML allow elements from one of these sets.  If we had
-  to write each element verbatim into each other element's allowed
-  children, we would have ridiculously large lists; instead we use
-  content sets to compactify the declaration.
-</p>
-
-<p>
-  Practically speaking, there are several useful values you can use here:
-</p>
-
-<table class="table">
-  <thead>
-    <tr>
-      <th>Content set</th>
-      <th>Description</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <th>Inline</th>
-      <td>Character level elements, text</td>
-    </tr>
-    <tr>
-      <th>Block</th>
-      <td>Block-like elements, like paragraphs and lists</td>
-    </tr>
-    <tr>
-      <th><em>false</em></th>
-      <td>
-        Any element that doesn't fit into the mold, for example <code>li</code>
-        or <code>tr</code>
-      </td>
-    </tr>
-  </tbody>
-</table>
-
-<p>
-  By specifying a valid value here, all other elements that use that
-  content set will also allow your element, without you having to do
-  anything. If you specify <em>false</em>, you'll have to register
-  your element manually.
-</p>
-
-<h3>Allowed children</h3>
-
-<p>
-  Allowed children defines the elements that this element can contain.
-  The allowed values may range from none to a complex regexp depending on
-  your element.
-</p>
-
-<p>
-  If you've ever taken a look at the HTML DTD's before, you may have
-  noticed declarations like this:
-</p>
-
-<pre>&lt;!ELEMENT LI - O (%flow;)*             -- list item --&gt;</pre>
-
-<p>
-  The <code>(%flow;)*</code> indicates the allowed children of the
-  <code>li</code> tag: <code>li</code> allows any number of flow
-  elements as its children. (The <code>- O</code> allows the closing tag to be
-  omitted, though in XML this is not allowed.) In HTML Purifier,
-  we'd write it like <code>Flow</code> (here's where the content sets
-  we were discussing earlier come into play). There are three shorthand
-  content models you can specify:
-</p>
-
-<table class="table">
-  <thead>
-    <tr>
-      <th>Content model</th>
-      <th>Description</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <th>Empty</th>
-      <td>No children allowed, like <code>br</code> or <code>hr</code></td>
-    </tr>
-    <tr>
-      <th>Inline</th>
-      <td>Any number of inline elements and text, like <code>span</code></td>
-    </tr>
-    <tr>
-      <th>Flow</th>
-      <td>Any number of inline elements, block elements and text, like <code>div</code></td>
-    </tr>
-  </tbody>
-</table>
-
-<p>
-  This covers 90% of all the cases out there, but what about elements that
-  break the mold like <code>ul</code>? This guy requires at least one
-  child, and the only valid children for it are <code>li</code>. The
-  content model is: <code>Required: li</code>. There are two parts: the
-  first type determines what <code>ChildDef</code> will be used to validate
-  content models. The most common values are:
-</p>
-
-<table class="table">
-  <thead>
-    <tr>
-      <th>Type</th>
-      <th>Description</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <th>Required</th>
-      <td>Children must be one or more of the valid elements</td>
-    </tr>
-    <tr>
-      <th>Optional</th>
-      <td>Children can be any number of the valid elements</td>
-    </tr>
-    <tr>
-      <th>Custom</th>
-      <td>Children must follow the DTD-style regex</td>
-    </tr>
-  </tbody>
-</table>
-
-<p>
-  You can also implement your own <code>ChildDef</code>: this was done
-  for a few special cases in HTML Purifier such as <code>Chameleon</code>
-  (for <code>ins</code> and <code>del</code>), <code>StrictBlockquote</code>
-  and <code>Table</code>.
-</p>
-
-<p>
-  The second part specifies either valid elements or a regular expression.
-  Valid elements are separated with horizontal bars (|), i.e.
-  "<code>a | b | c</code>".  Use #PCDATA to represent plain text.
-  Regular expressions are based off of DTD's style:
-</p>
-
-<ul>
-  <li>Parentheses () are used for grouping</li>
-  <li>Commas (,) separate elements that should come one after another</li>
-  <li>Horizontal bars (|) indicate one or the other elements should be used</li>
-  <li>Plus signs (+) are used for a one or more match</li>
-  <li>Asterisks (*) are used for a zero or more match</li>
-  <li>Question marks (?) are used for a zero or one match</li>
-</ul>
-
-<p>
-  For example, "<code>a, b?, (c | d), e+, f*</code>" means "In this order,
-  one <code>a</code> element, at most one <code>b</code> element,
-  one <code>c</code> or <code>d</code> element (but not both), one or more
-  <code>e</code> elements, and any number of <code>f</code> elements."
-  Regex veterans should be able to jump right in, and those not so savvy
-  can always copy-paste W3C's content model definitions into HTML Purifier
-  and hope for the best.
-</p>
-
-<p>
-  A word of warning: while the regex format is extremely flexible on
-  the developer's side, it is
-  quite unforgiving on the user's side.  If the user input does not <em>exactly</em>
-  match the specification, the entire contents of the element will
-  be nuked.  This is why there is are specific content model types like
-  Optional and Required: while they could be implemented as <code>Custom:
-  (valid | elements)*</code>, the custom classes contain special recovery
-  measures that make sure as much of the user's original content gets
-  through. HTML Purifier's core, as a rule, does not use Custom.
-</p>
-
-<p>
-  One final note: you can also use Content Sets inside your valid elements
-  lists or regular expressions. In fact, the three shorthand content models
-  mentioned above are just that: abbreviations:
-</p>
-
-<table class="table">
-  <thead>
-    <tr>
-      <th>Content model</th>
-      <th>Implementation</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <th>Inline</th>
-      <td>Optional: Inline | #PCDATA</td>
-    </tr>
-    <tr>
-      <th>Flow</th>
-      <td>Optional: Flow | #PCDATA</td>
-    </tr>
-  </tbody>
-</table>
-
-<p>
-  When the definition is compiled, Inline will be replaced with a
-  horizontal-bar separated list of inline elements. Also, notice that
-  it does not contain text: you have to specify that yourself.
-</p>
-
-<h3>Common attributes</h3>
-
-<p>
-  Congratulations: you have just gotten over the proverbial hump (Allowed
-  children). Common attributes is much simpler, and boils down to
-  one question: does your element have the <code>id</code>, <code>style</code>,
-  <code>class</code>, <code>title</code> and <code>lang</code> attributes?
-  If so, you'll want to specify the <code>Common</code> attribute collection,
-  which contains these five attributes that are found on almost every
-  HTML element in the specification.
-</p>
-
-<p>
-  There are a few more collections, but they're really edge cases:
-</p>
-
-<table class="table">
-  <thead>
-    <tr>
-      <th>Collection</th>
-      <th>Attributes</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <th>I18N</th>
-      <td><code>lang</code>, possibly <code>xml:lang</code></td>
-    </tr>
-    <tr>
-      <th>Core</th>
-      <td><code>style</code>, <code>class</code>, <code>id</code> and <code>title</code></td>
-    </tr>
-  </tbody>
-</table>
-
-<p>
-  Common is a combination of the above-mentioned collections.
-</p>
-
-<p class="aside">
-  Readers familiar with the modularization may have noticed that the Core
-  attribute collection differs from that specified by the <a
-  href="http://www.w3.org/TR/xhtml-modularization/abstract_modules.html#s_commonatts">abstract
-  modules of the XHTML Modularization 1.1</a>. We believe this section
-  to be in error, as <code>br</code> permits the use of the <code>style</code>
-  attribute even though it uses the <code>Core</code> collection, and
-  the DTD and XML Schemas supplied by W3C support our interpretation.
-</p>
-
-<h3>Attributes</h3>
-
-<p>
-  If you didn't read the <a href="#addAttribute">earlier section on
-  adding attributes</a>, read it now.  The last parameter is simply
-  an array of attribute names to attribute implementations, in the exact
-  same format as <code>addAttribute()</code>.
-</p>
-
-<h3>Putting it all together</h3>
-
-<p>
-  We're going to implement <code>form</code>. Before we embark, lets
-  grab a reference implementation from over at the
-  <a href="http://www.w3.org/TR/html4/sgml/loosedtd.html">transitional DTD</a>:
-</p>
-
-<pre>&lt;!ELEMENT FORM - - (%flow;)* -(FORM)   -- interactive form --&gt;
-&lt;!ATTLIST FORM
-  %attrs;                              -- %coreattrs, %i18n, %events --
-  action      %URI;          #REQUIRED -- server-side form handler --
-  method      (GET|POST)     GET       -- HTTP method used to submit the form--
-  enctype     %ContentType;  &quot;application/x-www-form-urlencoded&quot;
-  accept      %ContentTypes; #IMPLIED  -- list of MIME types for file upload --
-  name        CDATA          #IMPLIED  -- name of form for scripting --
-  onsubmit    %Script;       #IMPLIED  -- the form was submitted --
-  onreset     %Script;       #IMPLIED  -- the form was reset --
-  target      %FrameTarget;  #IMPLIED  -- render in this frame --
-  accept-charset %Charsets;  #IMPLIED  -- list of supported charsets --
-  &gt;</pre>
-
-<p>
-  Juicy! With just this, we can answer four of our five questions:
-</p>
-
-<ol>
-  <li>What is the element's name? <strong>form</strong></li>
-  <li>What content set does this element belong to? <strong>Block</strong>
-    (this needs a little sleuthing, I find the easiest way is to search
-    the DTD for <code>FORM</code> and determine which set it is in.)</li>
-  <li>What are the allowed children of this element? <strong>One
-    or more flow elements, but no nested <code>form</code>s</strong></li>
-  <li>What attributes does the element allow that are general? <strong>Common</strong></li>
-  <li>What attributes does the element allow that are specific to this element? <strong>A whole bunch, see ATTLIST;
-    we're going to do the vital ones: <code>action</code>, <code>method</code> and <code>name</code></strong></li>
-</ol>
-
-<p>
-  Time for some code:
-</p>
-
-<pre>$config = HTMLPurifier_Config::createDefault();
-$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
-$config-&gt;set('HTML.DefinitionRev', 1);
-$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
-$def = $config-&gt;getHTMLDefinition(true);
-$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
-  array('_blank','_self','_target','_top')
-));
-<strong>$form = $def-&gt;addElement(
-  'form',   // name
-  'Block',  // content set
-  'Flow', // allowed children
-  'Common', // attribute collection
-  array( // attributes
-    'action*' => 'URI',
-    'method' => 'Enum#get|post',
-    'name' => 'ID'
-  )
-);
-$form-&gt;excludes = array('form' => true);</strong></pre>
-
-<p>
-  Each of the parameters corresponds to one of the questions we asked.
-  Notice that we added an asterisk to the end of the <code>action</code>
-  attribute to indicate that it is required. If someone specifies a
-  <code>form</code> without that attribute, the tag will be axed.
-  Also, the extra line at the end is a special extra declaration that
-  prevents forms from being nested within each other.
-</p>
-
-<p>
-  And that's all there is to it! Implementing the rest of the form
-  module is left as an exercise to the user; to see more examples
-  check the <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/HTMLModule"><code>library/HTMLPurifier/HTMLModule/</code></a> directory
-  in your local HTML Purifier installation.
-</p>
-
-<h2>And beyond...</h2>
-
-<p>
-  Perceptive users may have realized that, to a certain extent, we
-  have simply re-implemented the facilities of XML Schema or the
-  Document Type Definition.  What you are seeing here, however, is
-  not just an XML Schema or Document Type Definition: it is a fully
-  expressive method of specifying the definition of HTML that is
-  a portable superset of the capabilities of the two above-mentioned schema
-  languages.  What makes HTMLDefinition so powerful is the fact that
-  if we don't have an implementation for a content model or an attribute
-  definition, you can supply it yourself by writing a PHP class.
-</p>
-
-<p>
-  There are many facets of HTMLDefinition beyond the Advanced API I have
-  walked you through today.  To find out more about these, you can
-  check out these source files:
-</p>
-
-<ul>
-  <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li>
-  <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li>
-</ul>
-
-<h2 id="optimized">Notes for HTML Purifier 4.2.0 and earlier</h3>
-
-<p>
-    Previously, this tutorial gave some incorrect template code for
-    editing raw definitions, and that template code will now produce the
-    error <q>Due to a documentation error in previous version of HTML
-    Purifier...</q>  Here is how to mechanically transform old-style
-    code into new-style code.
-</p>
-
-<p>
-    First, identify all code that edits the raw definition object, and
-    put it together.  Ensure none of this code must be run on every
-    request; if some sub-part needs to always be run, move it outside
-    this block.  Here is an example below, with the raw definition
-    object code bolded.
-</p>
-
-<pre>$config = HTMLPurifier_Config::createDefault();
-$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
-$config-&gt;set('HTML.DefinitionRev', 1);
-$def = $config-&gt;getHTMLDefinition(true);
-<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong>
-$purifier = new HTMLPurifier($config);</pre>
-
-<p>
-    Next, replace the raw definition retrieval with a
-    maybeGetRawHTMLDefinition method call inside an if conditional, and
-    place the editing code inside that if block.
-</p>
-
-<pre>$config = HTMLPurifier_Config::createDefault();
-$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
-$config-&gt;set('HTML.DefinitionRev', 1);
-<strong>if ($def = $config-&gt;maybeGetRawHTMLDefinition()) {
-    $def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
-}</strong>
-$purifier = new HTMLPurifier($config);</pre>
-
-<p>
-    And you're done!  Alternatively, if you're OK with not ever caching
-    your code, the following will still work and not emit warnings.
-</p>
-
-<pre>$config = HTMLPurifier_Config::createDefault();
-$def = $config-&gt;getHTMLDefinition(true);
-$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
-$purifier = new HTMLPurifier($config);</pre>
-
-<p>
-    A slightly less efficient version of this was what was going on with
-    old versions of HTML Purifier.
-</p>
-
-<p>
-    <em>Technical notes:</em> ajh pointed out on <a
-        href="http://htmlpurifier.org/phorum/read.php?5,5164,5169#msg-5169">in a forum topic</a> that
-    HTML Purifier appeared to be repeatedly writing to the cache even
-    when a cache entry already existed.  Investigation lead to the
-    discovery of the following infelicity: caching of customized
-    definitions didn't actually work!  The problem was that even though
-    a cache file would be written out at the end of the process, there
-    was no way for HTML Purifier to say, <q>Actually, I've already got a
-        copy of your work, no need to reconfigure your
-        customizations</q>.  This required the API to change: placing
-    all of the customizations to the raw definition object in a
-    conditional which could be skipped.
-</p>
-
-</body></html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/enduser-id.html b/lib/htmlpurifier/docs/enduser-id.html
deleted file mode 100644
index 53d2da248..000000000
--- a/lib/htmlpurifier/docs/enduser-id.html
+++ /dev/null
@@ -1,148 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Explains various methods for allowing IDs in documents safely in HTML Purifier." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>IDs - HTML Purifier</title>
-
-</head><body>
-
-<h1 class="subtitled">IDs</h1>
-<div class="subtitle">What they are, why you should(n't) wear them, and how to deal with it</div>
-
-<div id="filing">Filed under End-User</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>Prior to HTML Purifier 1.2.0, this library blithely accepted user input that
-looked like this:</p>
-
-<pre>&lt;a id=&quot;fragment&quot;&gt;Anchor&lt;/a&gt;</pre>
-
-<p>...presenting an attractive vector for those that would destroy standards
-compliance: simply set the ID to one that is already used elsewhere in the
-document and voila: validation breaks.  There was a half-hearted attempt to
-prevent this by allowing users to blacklist IDs, but I suspect that no one
-really bothered, and thus, with the release of 1.2.0, IDs are now <em>removed</em>
-by default.</p>
-
-<p>IDs, however, are quite useful functionality to have, so if users start
-complaining about broken anchors you'll probably want to turn them back on
-with %Attr.EnableID. But before you go mucking around with the config
-object, it's probably worth to take some precautions to keep your page
-validating. Why?</p>
-
-<ol>
-   <li>Standards-compliant pages are good</li>
-   <li>Duplicated IDs interfere with anchors.  If there are two id="foobar"s in a
-   document, which spot does a browser presented with the fragment #foobar go
-   to? Most browsers opt for the first appearing ID, making it impossible
-   to references the second section. Similarly, duplicated IDs can hijack
-   client-side scripting that relies on the IDs of elements.</li>
-</ol>
-
-<p>You have (currently) four ways of dealing with the problem.</p>
-
-
-
-<h2 class="subtitled">Blacklisting IDs</h2>
-<div class="subsubtitle">Good for pages with single content source and stable templates</div>
-
-<p>Keeping in terms with the
-<acronym title="Keep It Simple, Stupid">KISS</acronym> principle, let us
-deal with the most obvious solution: preventing users from using any IDs that
-appear elsewhere on the document.  The method is simple:</p>
-
-<pre>$config-&gt;set('Attr.EnableID', true);
-$config-&gt;set('Attr.IDBlacklist' array(
-    'list', 'of', 'attribute', 'values', 'that', 'are', 'forbidden'
-));</pre>
-
-<p>That being said, there are some notable drawbacks.  First of all, you have to
-know precisely which IDs are being used by the HTML surrounding the user code.
-This is easier said than done: quite often the page designer and the system
-coder work separately, so the designer has to constantly be talking with the
-coder whenever he decides to add a new anchor.  Miss one and you open yourself
-to possible standards-compliance issues.</p>
-
-<p>Furthermore, this position becomes untenable when a single web page must hold
-multiple portions of user-submitted content.  Since there's obviously no way
-to find out before-hand what IDs users will use, the blacklist is helpless.
-And since HTML Purifier validates each segment separately, perhaps doing
-so at different times, it would be extremely difficult to dynamically update
-the blacklist in between runs.</p>
-
-<p>Finally, simply destroying the ID is extremely un-userfriendly behavior: after
-all, they might have simply specified a duplicate ID by accident.</p>
-
-<p>Thus, we get to our second method.</p>
-
-
-
-<h2 class="subtitled">Namespacing IDs</h2>
-<div class="subsubtitle">Lazy developer's way, but needs user education</div>
-
-<p>This method, too, is quite simple: add a prefix to all user IDs. With this
-code:</p>
-
-<pre>$config-&gt;set('Attr.EnableID', true);
-$config-&gt;set('Attr.IDPrefix', 'user_');</pre>
-
-<p>...this:</p>
-
-<pre>&lt;a id=&quot;foobar&quot;&gt;Anchor!&lt;/a&gt;</pre>
-
-<p>...turns into:</p>
-
-<pre>&lt;a id=&quot;user_foobar&quot;&gt;Anchor!&lt;/a&gt;</pre>
-
-<p>As long as you don't have any IDs that start with user_, collisions are
-guaranteed not to happen.  The drawback is obvious: if a user submits
-id=&quot;foobar&quot;, they probably expect to be able to reference their page with
-#foobar. You'll have to tell them, &quot;No, that doesn't work, you have to add
-user_ to the beginning.&quot;</p>
-
-<p>And yes, things get hairier.  Even with a nice prefix, we still have done
-nothing about multiple HTML Purifier outputs on one page.  Thus, we have
-a second configuration value to piggy-back off of: %Attr.IDPrefixLocal:</p>
-
-<pre>$config-&gt;set('Attr.IDPrefixLocal', 'comment' . $id . '_');</pre>
-
-<p>This new attributes does nothing but append on to regular IDPrefix, but is
-special in that it is volatile: it's value is determined at run-time and
-cannot possibly be cordoned into, say, a .ini config file.  As for what to
-put into the directive, is up to you, but I would recommend the ID number
-the text has been assigned in the database.  Whatever you pick, however, it
-has to be unique and stable for the text you are validating.  Note, however,
-that we require that %Attr.IDPrefix be set before you use this directive.</p>
-
-<p>And also remember: the user has to know what this prefix is too!</p>
-
-
-
-<h2>Abstinence</h2>
-
-<p>You may not want to bother. That's okay too, just don't enable IDs.</p>
-
-<p>Personally, I would take this road whenever user-submitted content would be
-possibly be shown together on one page.  Why a blog comment would need to use
-anchors is beyond me.</p>
-
-
-
-<h2>Denial</h2>
-
-<p>To revert back to pre-1.2.0 behavior, simply:</p>
-
-<pre>$config-&gt;set('Attr.EnableID', true);</pre>
-
-<p>Don't come crying to me when your page mysteriously stops validating, though.</p>
-
-</body>
-</html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/enduser-overview.txt b/lib/htmlpurifier/docs/enduser-overview.txt
deleted file mode 100644
index fe7f8705d..000000000
--- a/lib/htmlpurifier/docs/enduser-overview.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-
-HTML Purifier
-  by Edward Z. Yang
-
-There are a number of ad hoc HTML filtering solutions out there on the web
-(some examples including HTML_Safe, kses and SafeHtmlChecker.class.php) that
-claim to filter HTML properly, preventing malicious JavaScript and layout
-breaking HTML from getting through the parser.  None of them, however,
-demonstrates a thorough knowledge of neither the DTD that defines the HTML
-nor the caveats of HTML that cannot be expressed by a DTD.  Configurable
-filters (such as kses or PHP's built-in striptags() function) have trouble
-validating the contents of attributes and can be subject to security attacks
-due to poor configuration.  Other filters take the naive approach of
-blacklisting known threats and tags, failing to account for the introduction
-of new technologies, new tags, new attributes or quirky browser behavior.
-
-However, HTML Purifier takes a different approach, one that doesn't use
-specification-ignorant regexes or narrow blacklists.  HTML Purifier will
-decompose the whole document into tokens, and rigorously process the tokens by:
-removing non-whitelisted elements, transforming bad practice tags like <font>
-into <span>, properly checking the nesting of tags and their children and
-validating all attributes according to their RFCs.
-
-To my knowledge, there is nothing like this on the web yet.  Not even MediaWiki,
-which allows an amazingly diverse mix of HTML and wikitext in its documents,
-gets all the nesting quirks right.  Existing solutions hope that no JavaScript
-will slip through, but either do not attempt to ensure that the resulting
-output is valid XHTML or send the HTML through a draconic XML parser (and yet
-still get the nesting wrong: SafeHtmlChecker.class.php does not prevent <a>
-tags from being nested within each other).
-
-This document no longer is a detailed description of how HTMLPurifier works,
-as those descriptions have been moved to the appropriate code.  The first
-draft was drawn up after two rough code sketches and the implementation of a
-forgiving lexer.  You may also be interested in the unit tests located in the
-tests/ folder, which provide a living document on how exactly the filter deals
-with malformed input.
-
-In summary (see corresponding classes for more details):
-
-1. Parse document into an array of tag and text tokens (Lexer)
-2. Remove all elements not on whitelist and transform certain other elements
-   into acceptable forms (i.e. <font>)
-3. Make document well formed while helpfully taking into account certain quirks,
-   such as the fact that <p> tags traditionally are closed by other block-level
-   elements.
-4. Run through all nodes and check children for proper order (especially
-   important for tables).
-5. Validate attributes according to more restrictive definitions based on the
-   RFCs.
-6. Translate back into a string. (Generator)
-
-HTML Purifier is best suited for documents that require a rich array of
-HTML tags.  Things like blog comments are, in all likelihood, most appropriately
-written in an extremely restrictive set of markup that doesn't require
-all this functionality (or not written in HTML at all), although this may
-be changing in the future with the addition of levels of filtering.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/enduser-security.txt b/lib/htmlpurifier/docs/enduser-security.txt
deleted file mode 100644
index 518f092bd..000000000
--- a/lib/htmlpurifier/docs/enduser-security.txt
+++ /dev/null
@@ -1,18 +0,0 @@
-
-Security
-
-Like anything that claims to afford security, HTML_Purifier can be circumvented
-through negligence of people. This class will do its job: no more, no less,
-and it's up to you to provide it the proper information and proper context
-to be effective. Things to remember:
-
-1. Character Encoding: see enduser-utf8.html for more info.
-
-2. IDs: see enduser-id.html for more info
-
-3. URIs: see enduser-uri-filter.html
-
-4. CSS: document pending
-Explain which CSS styles we blocked and why.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/enduser-slow.html b/lib/htmlpurifier/docs/enduser-slow.html
deleted file mode 100644
index f0ea02de1..000000000
--- a/lib/htmlpurifier/docs/enduser-slow.html
+++ /dev/null
@@ -1,120 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Explains how to speed up HTML Purifier through caching or inbound filtering." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>Speeding up HTML Purifier - HTML Purifier</title>
-
-</head><body>
-
-<h1 class="subtitled">Speeding up HTML Purifier</h1>
-<div class="subtitle">...also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG page</div>
-
-<div id="filing">Filed under End-User</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>HTML Purifier is a very powerful library. But with power comes great
-responsibility, in the form of longer execution times.  Remember, this
-library isn't lightly grazing over submitted HTML: it's deconstructing
-the whole thing, rigorously checking the parts, and then putting it back
-together. </p>
-
-<p>So, if it so turns out that HTML Purifier is kinda too slow for outbound
-filtering, you've got a few options: </p>
-
-<h2>Inbound filtering</h2>
-
-<p>Perform filtering of HTML when it's submitted by the user. Since the
-user is already submitting something, an extra half a second tacked on
-to the load time probably isn't going to be that huge of a problem.
-Then, displaying the content is a simple a manner of outputting it
-directly from your database/filesystem. The trouble with this method is
-that your user loses the original text, and when doing edits, will be
-handling the filtered text.  While this may be a good thing, especially
-if you're using a WYSIWYG editor, it can also result in data-loss if a
-user makes a typo. </p>
-
-<p>Example (non-functional):</p>
-
-<pre>&lt;?php
-    /**
-     * FORM SUBMISSION PAGE
-     * display_error($message) : displays nice error page with message
-     * display_success() : displays a nice success page
-     * display_form() : displays the HTML submission form
-     * database_insert($html) : inserts data into database as new row
-     */
-    if (!empty($_POST)) {
-        require_once '/path/to/library/HTMLPurifier.auto.php';
-        require_once 'HTMLPurifier.func.php';
-        $dirty_html = isset($_POST['html']) ? $_POST['html'] : false;
-        if (!$dirty_html) {
-            display_error('You must write some HTML!');
-        }
-        $html = HTMLPurifier($dirty_html);
-        database_insert($html);
-        display_success();
-        // notice that $dirty_html is *not* saved
-    } else {
-        display_form();
-    }
-?&gt;</pre>
-
-<h2>Caching the filtered output</h2>
-
-<p>Accept the submitted text and put it unaltered into the database, but
-then also generate a filtered version and stash that in the database.
-Serve the filtered version to readers, and the unaltered version to
-editors.  If need be, you can invalidate the cache and have the cached
-filtered version be regenerated on the first page view.  Pros? Full data
-retention. Cons? It's more complicated, and opens other editors up to
-XSS if they are using a WYSIWYG editor (to fix that, they'd have to be
-able to get their hands on the *really* original text served in
-plaintext mode). </p>
-
-<p>Example (non-functional):</p>
-
-<pre>&lt;?php
-    /**
-     * VIEW PAGE
-     * display_error($message) : displays nice error page with message
-     * cache_get($id) : retrieves HTML from fast cache (db or file)
-     * cache_insert($id, $html) : inserts good HTML into cache system
-     * database_get($id) : retrieves raw HTML from database
-     */
-    $id = isset($_GET['id']) ? (int) $_GET['id'] : false;
-    if (!$id) {
-        display_error('Must specify ID.');
-        exit;
-    }
-    $html = cache_get($id); // filesystem or database
-    if ($html === false) {
-        // cache didn't have the HTML, generate it
-        $raw_html = database_get($id);
-        require_once '/path/to/library/HTMLPurifier.auto.php';
-        require_once 'HTMLPurifier.func.php';
-        $html = HTMLPurifier($raw_html);
-        cache_insert($id, $html);
-    }
-    echo $html;
-?&gt;</pre>
-
-<h2>Summary</h2>
-
-<p>In short, inbound filtering is the simple option and caching is the
-robust option (albeit with bigger storage requirements). </p>
-
-<p>There is a third option, independent of the two we've discussed: profile
-and optimize HTMLPurifier yourself. Be sure to report back your results
-if you decide to do that! Especially if you port HTML Purifier to C++.
-<tt>;-)</tt></p>
-
-</body>
-</html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/enduser-tidy.html b/lib/htmlpurifier/docs/enduser-tidy.html
deleted file mode 100644
index a243f7fc2..000000000
--- a/lib/htmlpurifier/docs/enduser-tidy.html
+++ /dev/null
@@ -1,231 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Tutorial for tweaking HTML Purifier's Tidy-like behavior." />
-<link rel="stylesheet" type="text/css" href="style.css" />
-
-<title>Tidy - HTML Purifier</title>
-
-</head><body>
-
-<h1>Tidy</h1>
-
-<div id="filing">Filed under Development</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>You've probably heard of HTML Tidy, Dave Raggett's little piece
-of software that cleans up poorly written HTML.  Let me say it straight
-out:</p>
-
-<p class="emphasis">This ain't HTML Tidy!</p>
-
-<p>Rather, Tidy stands for a cool set of Tidy-inspired features in HTML Purifier
-that allows users to submit deprecated elements and attributes and get
-valid strict markup back. For example:</p>
-
-<pre>&lt;center&gt;Centered&lt;/center&gt;</pre>
-
-<p>...becomes:</p>
-
-<pre>&lt;div style=&quot;text-align:center;&quot;&gt;Centered&lt;/div&gt;</pre>
-
-<p>...when this particular fix is run on the HTML. This tutorial will give
-you the lowdown of what exactly HTML Purifier will do when Tidy
-is on, and how to fine-tune this behavior. Once again, <strong>you do
-not need Tidy installed on your PHP to use these features!</strong></p>
-
-<h2>What does it do?</h2>
-
-<p>Tidy will do several things to your HTML:</p>
-
-<ul>
-    <li>Convert deprecated elements and attributes to standards-compliant
-        alternatives</li>
-    <li>Enforce XHTML compatibility guidelines and other best practices</li>
-    <li>Preserve data that would normally be removed as per W3C</li>
-</ul>
-
-<h2>What are levels?</h2>
-
-<p>Levels describe how aggressive the Tidy module should be when
-cleaning up HTML. There are four levels to pick: none, light, medium
-and heavy. Each of these levels has a well-defined set of behavior
-associated with it, although it may change depending on your doctype.</p>
-
-<dl>
-    <dt>light</dt>
-    <dd>This is the <strong>lenient</strong> level. If a tag or attribute
-        is about to be removed because it isn't supported by the
-        doctype, Tidy will step in and change into an alternative that
-        is supported.</dd>
-    <dt>medium</dt>
-    <dd>This is the <strong>correctional</strong> level. At this level,
-        all the functions of light are performed, as well as some extra,
-        non-essential best practices enforcement. Changes made on this
-        level are very benign and are unlikely to cause problems.</dd>
-    <dt>heavy</dt>
-    <dd>This is the <strong>aggressive</strong> level. If a tag or
-        attribute is deprecated, it will be converted into a non-deprecated
-        version, no ifs ands or buts.</dd>
-</dl>
-
-<p>By default, Tidy operates on the <strong>medium</strong> level. You can
-change the level of cleaning by setting the %HTML.TidyLevel configuration
-directive:</p>
-
-<pre>$config-&gt;set('HTML.TidyLevel', 'heavy'); // burn baby burn!</pre>
-
-<h2>Is the light level really light?</h2>
-
-<p>It depends on what doctype you're using. If your documents are HTML
-4.01 <em>Transitional</em>, HTML Purifier will be lazy
-and won't clean up your <code>center</code>
-or <code>font</code> tags. But if you're using HTML 4.01 <em>Strict</em>,
-HTML Purifier has no choice: it has to convert them, or they will
-be nuked out of existence. So while light on Transitional will result
-in little to no changes, light on Strict will still result in quite
-a lot of fixes.</p>
-
-<p>This is different behavior from 1.6 or before, where deprecated
-tags in transitional documents would
-always be cleaned up regardless. This is also better behavior.</p>
-
-<h2>My pages look different!</h2>
-
-<p>HTML Purifier is tasked with converting deprecated tags and
-attributes to standards-compliant alternatives, which usually
-need copious amounts of CSS. It's also not foolproof: sometimes
-things do get lost in the translation. This is why when HTML Purifier
-can get away with not doing cleaning, it won't; this is why
-the default value is <strong>medium</strong> and not heavy.</p>
-
-<p>Fortunately, only a few attributes have problems with the switch
-over. They are described below:</p>
-
-<table class="table">
-    <thead><tr>
-        <th>Element@Attr</th>
-        <th>Changes</th>
-    </tr></thead>
-    <tbody>
-        <tr>
-            <td>caption@align</td>
-            <td>Firefox supports stuffing the caption on the
-                left and right side of the table, a feature that
-                Internet Explorer, understandably, does not have.
-                When align equals right or left, the text will simply
-                be aligned on the left or right side.</td>
-        </tr>
-        <tr>
-            <td>img@align</td>
-            <td>The implementation for align bottom is good, but not
-            perfect. There are a few pixel differences.</td>
-        </tr>
-        <tr>
-            <td>br@clear</td>
-            <td>Clear both gets a little wonky in Internet Explorer. Haven't
-                really been able to figure out why.</td>
-        </tr>
-        <tr>
-            <td>hr@noshade</td>
-            <td>All browsers implement this slightly differently: we've
-                chosen to make noshade horizontal rules gray.</td>
-        </tr>
-    </tbody>
-</table>
-
-<p>There are a few more minor, although irritating, bugs.
-Some older browsers support deprecated attributes,
-but not CSS. Transformed elements and attributes will look unstyled
-to said browsers. Also, CSS precedence is slightly different for
-inline styles versus presentational markup. In increasing precedence:</p>
-
-<ol>
-    <li>Presentational attributes</li>
-    <li>External style sheets</li>
-    <li>Inline styling</li>
-</ol>
-
-<p>This means that styling that may have been masked by external CSS
-declarations will start showing up (a good thing, perhaps). Finally,
-if you've turned off the style attribute, almost all of
-these transformations will not work. Sorry mates.</p>
-
-<p>You can review the rendering before and after of these transformations
-by consulting the <a
-href="http://htmlpurifier.org/live/smoketests/attrTransform.php">attrTransform.php
-smoketest</a>.</p>
-
-<h2>I like the general idea, but the specifics bug me!</h2>
-
-<p>So you want HTML Purifier to clean up your HTML, but you're not
-so happy about the br@clear implementation. That's perfectly fine!
-HTML Purifier will make accomodations:</p>
-
-<pre>$config-&gt;set('HTML.Doctype', 'XHTML 1.0 Transitional');
-$config-&gt;set('HTML.TidyLevel', 'heavy'); // all changes, minus...
-<strong>$config-&gt;set('HTML.TidyRemove', 'br@clear');</strong></pre>
-
-<p>That third line does the magic, removing the br@clear fix
-from the module, ensuring that <code>&lt;br clear="both" /&gt;</code>
-will pass through unharmed. The reverse is possible too:</p>
-
-<pre>$config-&gt;set('HTML.Doctype', 'XHTML 1.0 Transitional');
-$config-&gt;set('HTML.TidyLevel', 'none'); // no changes, plus...
-<strong>$config-&gt;set('HTML.TidyAdd', 'p@align');</strong></pre>
-
-<p>In this case, all transformations are shut off, except for the p@align
-one, which you found handy.</p>
-
-<p>To find out what the names of fixes you want to turn on or off are,
-you'll have to consult the source code, specifically the files in
-<code>HTMLPurifier/HTMLModule/Tidy/</code>. There is, however, a
-general syntax:</p>
-
-<table class="table">
-    <thead>
-        <tr>
-            <th>Name</th>
-            <th>Example</th>
-            <th>Interpretation</th>
-        </tr>
-    </thead>
-    <tbody>
-        <tr>
-            <td>element</td>
-            <td>font</td>
-            <td>Tag transform for <em>element</em></td>
-        </tr>
-        <tr>
-            <td>element@attr</td>
-            <td>br@clear</td>
-            <td>Attribute transform for <em>attr</em> on <em>element</em></td>
-        </tr>
-        <tr>
-            <td>@attr</td>
-            <td>@lang</td>
-            <td>Global attribute transform for <em>attr</em></td>
-        </tr>
-        <tr>
-            <td>e#content_model_type</td>
-            <td>blockquote#content_model_type</td>
-            <td>Change of child processing implementation for <em>e</em></td>
-        </tr>
-    </tbody>
-</table>
-
-<h2>So... what's the lowdown?</h2>
-
-<p>The lowdown is, quite frankly, HTML Purifier's default settings are
-probably good enough. The next step is to bump the level up to heavy,
-and if that still doesn't satisfy your appetite, do some fine-tuning.
-Other than that, don't worry about it: this all works silently and
-effectively in the background.</p>
-
-</body></html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/enduser-uri-filter.html b/lib/htmlpurifier/docs/enduser-uri-filter.html
deleted file mode 100644
index d1b3354a3..000000000
--- a/lib/htmlpurifier/docs/enduser-uri-filter.html
+++ /dev/null
@@ -1,204 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Tutorial for creating custom URI filters." />
-<link rel="stylesheet" type="text/css" href="style.css" />
-
-<title>URI Filters - HTML Purifier</title>
-
-</head><body>
-
-<h1>URI Filters</h1>
-
-<div id="filing">Filed under End-User</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>
-  This is a quick and dirty document to get you on your way to writing
-  custom URI filters for your own URL filtering needs.  Why would you
-  want to write a URI filter?  If you need URIs your users put into
-  HTML to magically change into a different URI, this is
-  exactly what you need!
-</p>
-
-<h2>Creating the class</h2>
-
-<p>
-  Any URI filter you make will be a subclass of <code>HTMLPurifier_URIFilter</code>.
-  The scaffolding is thus:
-</p>
-
-<pre>class HTMLPurifier_URIFilter_<strong>NameOfFilter</strong> extends HTMLPurifier_URIFilter
-{
-    public $name = '<strong>NameOfFilter</strong>';
-    public function prepare($config) {}
-    public function filter(&$uri, $config, $context) {}
-}</pre>
-
-<p>
-  Fill in the variable <code>$name</code> with the name of your filter, and
-  take a look at the two methods. <code>prepare()</code> is an initialization
-  method that is called only once, before any filtering has been done of the
-  HTML. Use it to perform any costly setup work that only needs to be done
-  once. <code>filter()</code> is the guts and innards of our filter:
-  it takes the URI and does whatever needs to be done to it.
-</p>
-
-<p>
-  If you've worked with HTML Purifier, you'll recognize the <code>$config</code>
-  and <code>$context</code> parameters.  On the other hand, <code>$uri</code>
-  is something unique to this section of the application: it's a
-  <code>HTMLPurifier_URI</code> object. The interface is thus:
-</p>
-
-<pre>class HTMLPurifier_URI
-{
-    public $scheme, $userinfo, $host, $port, $path, $query, $fragment;
-    public function HTMLPurifier_URI($scheme, $userinfo, $host, $port, $path, $query, $fragment);
-    public function toString();
-    public function copy();
-    public function getSchemeObj($config, $context);
-    public function validate($config, $context);
-}</pre>
-
-<p>
-  The first three methods are fairly self-explanatory: you have a constructor,
-  a serializer, and a cloner.  Generally, you won't be using them when
-  you are manipulating the URI objects themselves.
-  <code>getSchemeObj()</code> is a special purpose method that returns
-  a <code>HTMLPurifier_URIScheme</code> object corresponding to the specific
-  URI at hand. <code>validate()</code> performs general-purpose validation
-  on the internal components of a URI. Once again, you don't need to
-  worry about these: they've already been handled for you.
-</p>
-
-<h2>URI format</h2>
-
-<p>
-  As a URIFilter, we're interested in the member variables of the URI object.
-</p>
-
-<table class="quick"><tbody>
-  <tr><th>Scheme</th>   <td>The protocol for identifying (and possibly locating) a resource (http, ftp, https)</td></tr>
-  <tr><th>Userinfo</th> <td>User information such as a username (bob)</td></tr>
-  <tr><th>Host</th>     <td>Domain name or IP address of the server (example.com, 127.0.0.1)</td></tr>
-  <tr><th>Port</th>     <td>Network port number for the server (80, 12345)</td></tr>
-  <tr><th>Path</th>     <td>Data that identifies the resource, possibly hierarchical (/path/to, ed@example.com)</td></tr>
-  <tr><th>Query</th>    <td>String of information to be interpreted by the resource (?q=search-term)</td></tr>
-  <tr><th>Fragment</th> <td>Additional information for the resource after retrieval (#bookmark)</td></tr>
-</tbody></table>
-
-<p>
-  Because the URI is presented to us in this form, and not
-  <code>http://bob@example.com:8080/foo.php?q=string#hash</code>, it saves us
-  a lot of trouble in having to parse the URI every time we want to filter
-  it. For the record, the above URI has the following components:
-</p>
-
-<table class="quick"><tbody>
-  <tr><th>Scheme</th>   <td>http</td></tr>
-  <tr><th>Userinfo</th> <td>bob</td></tr>
-  <tr><th>Host</th>     <td>example.com</td></tr>
-  <tr><th>Port</th>     <td>8080</td></tr>
-  <tr><th>Path</th>     <td>/foo.php</td></tr>
-  <tr><th>Query</th>    <td>q=string</td></tr>
-  <tr><th>Fragment</th> <td>hash</td></tr>
-</tbody></table>
-
-<p>
-  Note that there is no question mark or octothorpe in the query or
-  fragment: these get removed during parsing.
-</p>
-
-<p>
-  With this information, you can get straight to implementing your
-  <code>filter()</code> method. But one more thing...
-</p>
-
-<h2>Return value: Boolean, not URI</h2>
-
-<p>
-  You may have noticed that the URI is being passed in by reference.
-  This means that whatever changes you make to it, those changes will
-  be reflected in the URI object the callee had.  <strong>Do not
-  return the URI object: it is unnecessary and will cause bugs.</strong>
-  Instead, return a boolean value, true if the filtering was successful,
-  or false if the URI is beyond repair and needs to be axed.
-</p>
-
-<p>
-  Let's suppose I wanted to write a filter that converted links with a
-  custom <code>image</code> scheme to its corresponding real path on
-  our website:
-</p>
-
-<pre>class HTMLPurifier_URIFilter_TransformImageScheme extends HTMLPurifier_URIFilter
-{
-    public $name = 'TransformImageScheme';
-    public function filter(&$uri, $config, $context) {
-        if ($uri->scheme !== 'image') return true;
-        $img_name = $uri->path;
-        // Overwrite the previous URI object
-        $uri = new HTMLPurifier_URI('http', null, null, null, '/img/' . $img_name . '.png', null, null);
-        return true;
-    }
-}</pre>
-
-<p>
-  Notice I did not <code>return $uri;</code>. This filter would turn
-  <code>image:Foo</code> into <code>/img/Foo.png</code>.
-</p>
-
-<h2>Activating your filter</h2>
-
-<p>
-  Having a filter is all well and good, but you need to tell HTML Purifier
-  to use it. Fortunately, this part's simple:
-</p>
-
-<pre>$uri = $config->getDefinition('URI');
-$uri->addFilter(new HTMLPurifier_URIFilter_<strong>NameOfFilter</strong>(), $config);</pre>
-
-<p>
-    After adding a filter, you won't be able to set configuration directives.
-    Structure your code accordingly.
-</p>
-
-<!-- XXX: link to new documentation system -->
-
-<h2>Post-filter</h2>
-
-<p>
-    Remember our TransformImageScheme filter? That filter acted before we had
-    performed scheme validation; otherwise, the URI would have been filtered
-    out when it was discovered that there was no image scheme. Well, a post-filter
-    is run after scheme specific validation, so it's ideal for bulk
-    post-processing of URIs, including munging. To specify a URI as a post-filter,
-    set the <code>$post</code> member variable to TRUE.
-</p>
-
-<pre>class HTMLPurifier_URIFilter_MyPostFilter extends HTMLPurifier_URIFilter
-{
-    public $name = 'MyPostFilter';
-    public $post = true;
-    // ... extra code here
-}
-</pre>
-
-<h2>Examples</h2>
-
-<p>
-  Check the
-  <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/URIFilter">URIFilter</a>
-  directory for more implementation examples, and see <a href="proposal-new-directives.txt">the
-  new directives proposal document</a> for ideas on what could be implemented
-  as a filter.
-</p>
-
-</body></html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/enduser-utf8.html b/lib/htmlpurifier/docs/enduser-utf8.html
deleted file mode 100644
index 9b01a302a..000000000
--- a/lib/htmlpurifier/docs/enduser-utf8.html
+++ /dev/null
@@ -1,1060 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-<style type="text/css">
-    .minor td {font-style:italic;}
-</style>
-
-<title>UTF-8: The Secret of Character Encoding - HTML Purifier</title>
-
-<!-- Note to users: this document, though professing to be UTF-8, attempts
-to use only ASCII characters, because most webservers are configured
-to send HTML as ISO-8859-1. So I will, many times, go against my
-own advice for sake of portability.  -->
-
-</head><body>
-
-<h1>UTF-8: The Secret of Character Encoding</h1>
-
-<div id="filing">Filed under End-User</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>Character encoding and character sets are not that
-difficult to understand, but so many people blithely stumble
-through the worlds of programming without knowing what to actually
-do about it, or say &quot;Ah, it's a job for those <em>internationalization</em>
-experts.&quot; No, it is not! This document will walk you through
-determining the encoding of your system and how you should handle
-this information. It will stay away from excessive discussion on
-the internals of character encoding.</p>
-
-<p>This document is not designed to be read in its entirety: it will
-slowly introduce concepts that build on each other: you need not get to
-the bottom to have learned something new. However, I strongly
-recommend you read all the way to <strong>Why UTF-8?</strong>, because at least
-at that point you'd have made a conscious decision not to migrate,
-which can be a rewarding (but difficult) task.</p>
-
-<blockquote class="aside">
-<div class="label">Asides</div>
-    <p>Text in this formatting is an <strong>aside</strong>,
-    interesting tidbits for the curious but not strictly necessary material to
-    do the tutorial. If you read this text, you'll come out
-    with a greater understanding of the underlying issues.</p>
-</blockquote>
-
-<h2>Table of Contents</h2>
-
-<ol id="toc">
-    <li><a href="#findcharset">Finding the real encoding</a></li>
-    <li><a href="#findmetacharset">Finding the embedded encoding</a></li>
-    <li><a href="#fixcharset">Fixing the encoding</a><ol>
-        <li><a href="#fixcharset-none">No embedded encoding</a></li>
-        <li><a href="#fixcharset-diff">Embedded encoding disagrees</a></li>
-        <li><a href="#fixcharset-server">Changing the server encoding</a><ol>
-            <li><a href="#fixcharset-server-php">PHP header() function</a></li>
-            <li><a href="#fixcharset-server-phpini">PHP ini directive</a></li>
-            <li><a href="#fixcharset-server-nophp">Non-PHP</a></li>
-            <li><a href="#fixcharset-server-htaccess">.htaccess</a></li>
-            <li><a href="#fixcharset-server-ext">File extensions</a></li>
-        </ol></li>
-        <li><a href="#fixcharset-xml">XML</a></li>
-        <li><a href="#fixcharset-internals">Inside the process</a></li>
-    </ol></li>
-    <li><a href="#whyutf8">Why UTF-8?</a><ol>
-        <li><a href="#whyutf8-i18n">Internationalization</a></li>
-        <li><a href="#whyutf8-user">User-friendly</a></li>
-        <li><a href="#whyutf8-forms">Forms</a><ol>
-            <li><a href="#whyutf8-forms-urlencoded">application/x-www-form-urlencoded</a></li>
-            <li><a href="#whyutf8-forms-multipart">multipart/form-data</a></li>
-        </ol></li>
-        <li><a href="#whyutf8-support">Well supported</a></li>
-        <li><a href="#whyutf8-htmlpurifier">HTML Purifiers</a></li>
-    </ol></li>
-    <li><a href="#migrate">Migrate to UTF-8</a><ol>
-        <li><a href="#migrate-db">Configuring your database</a><ol>
-            <li><a href="#migrate-db-legit">Legit method</a></li>
-            <li><a href="#migrate-db-binary">Binary</a></li>
-        </ol></li>
-        <li><a href="#migrate-editor">Text editor</a></li>
-        <li><a href="#migrate-bom">Byte Order Mark (headers already sent!)</a></li>
-        <li><a href="#migrate-fonts">Fonts</a><ol>
-            <li><a href="#migrate-fonts-obscure">Obscure scripts</a></li>
-            <li><a href="#migrate-fonts-occasional">Occasional use</a></li>
-        </ol></li>
-        <li><a href="#migrate-variablewidth">Dealing with variable width in functions</a></li>
-    </ol></li>
-    <li><a href="#externallinks">Further Reading</a></li>
-</ol>
-
-<h2 id="findcharset">Finding the real encoding</h2>
-
-<p>In the beginning, there was ASCII, and things were simple. But they
-weren't good, for no one could write in Cyrillic or Thai. So there
-exploded a proliferation of character encodings to remedy the problem
-by extending the characters ASCII could express. This ridiculously
-simplified version of the history of character encodings shows us that
-there are now many character encodings floating around.</p>
-
-<blockquote class="aside">
-    <p>A <strong>character encoding</strong> tells the computer how to
-    interpret raw zeroes and ones into real characters. It
-    usually does this by pairing numbers with characters.</p>
-    <p>There are many different types of character encodings floating
-    around, but the ones we deal most frequently with are ASCII,
-    8-bit encodings, and Unicode-based encodings.</p>
-    <ul>
-        <li><strong>ASCII</strong> is a 7-bit encoding based on the
-            English alphabet.</li>
-        <li><strong>8-bit encodings</strong> are extensions to ASCII
-            that add a potpourri of useful, non-standard characters
-            like &eacute; and &aelig;. They can only add 127 characters,
-            so usually only support one script at a time. When you
-            see a page on the web, chances are it's encoded in one
-            of these encodings.</li>
-        <li><strong>Unicode-based encodings</strong> implement the
-            Unicode standard and include UTF-8, UTF-16 and UTF-32/UCS-4.
-            They go beyond 8-bits and support almost
-            every language in the world. UTF-8 is gaining traction
-            as the dominant international encoding of the web.</li>
-    </ul>
-</blockquote>
-
-<p>The first step of our journey is to find out what the encoding of
-your website is. The most reliable way is to ask your
-browser:</p>
-
-<dl>
-    <dt>Mozilla Firefox</dt>
-    <dd>Tools &gt; Page Info: Encoding</dd>
-    <dt>Internet Explorer</dt>
-    <dd>View &gt; Encoding: bulleted item is unofficial name</dd>
-</dl>
-
-<p>Internet Explorer won't give you the MIME (i.e. useful/real) name of the
-character encoding, so you'll have to look it up using their description.
-Some common ones:</p>
-
-<table class="table">
-    <thead><tr>
-        <th>IE's Description</th>
-        <th>Mime Name</th>
-    </tr></thead>
-    <tbody>
-        <tr><th colspan="2">Windows</th></tr>
-        <tr><td>Arabic (Windows)</td><td>Windows-1256</td></tr>
-        <tr><td>Baltic (Windows)</td><td>Windows-1257</td></tr>
-        <tr><td>Central European (Windows)</td><td>Windows-1250</td></tr>
-        <tr><td>Cyrillic (Windows)</td><td>Windows-1251</td></tr>
-        <tr><td>Greek (Windows)</td><td>Windows-1253</td></tr>
-        <tr><td>Hebrew (Windows)</td><td>Windows-1255</td></tr>
-        <tr><td>Thai (Windows)</td><td>TIS-620</td></tr>
-        <tr><td>Turkish (Windows)</td><td>Windows-1254</td></tr>
-        <tr><td>Vietnamese (Windows)</td><td>Windows-1258</td></tr>
-        <tr><td>Western European (Windows)</td><td>Windows-1252</td></tr>
-    </tbody>
-    <tbody>
-        <tr><th colspan="2">ISO</th></tr>
-        <tr><td>Arabic (ISO)</td><td>ISO-8859-6</td></tr>
-        <tr><td>Baltic (ISO)</td><td>ISO-8859-4</td></tr>
-        <tr><td>Central European (ISO)</td><td>ISO-8859-2</td></tr>
-        <tr><td>Cyrillic (ISO)</td><td>ISO-8859-5</td></tr>
-        <tr class="minor"><td>Estonian (ISO)</td><td>ISO-8859-13</td></tr>
-        <tr class="minor"><td>Greek (ISO)</td><td>ISO-8859-7</td></tr>
-        <tr><td>Hebrew (ISO-Logical)</td><td>ISO-8859-8-l</td></tr>
-        <tr><td>Hebrew (ISO-Visual)</td><td>ISO-8859-8</td></tr>
-        <tr class="minor"><td>Latin 9 (ISO)</td><td>ISO-8859-15</td></tr>
-        <tr class="minor"><td>Turkish (ISO)</td><td>ISO-8859-9</td></tr>
-        <tr><td>Western European (ISO)</td><td>ISO-8859-1</td></tr>
-    </tbody>
-    <tbody>
-        <tr><th colspan="2">Other</th></tr>
-        <tr><td>Chinese Simplified (GB18030)</td><td>GB18030</td></tr>
-        <tr><td>Chinese Simplified (GB2312)</td><td>GB2312</td></tr>
-        <tr><td>Chinese Simplified (HZ)</td><td>HZ</td></tr>
-        <tr><td>Chinese Traditional (Big5)</td><td>Big5</td></tr>
-        <tr><td>Japanese (Shift-JIS)</td><td>Shift_JIS</td></tr>
-        <tr><td>Japanese (EUC)</td><td>EUC-JP</td></tr>
-        <tr><td>Korean</td><td>EUC-KR</td></tr>
-        <tr><td>Unicode (UTF-8)</td><td>UTF-8</td></tr>
-    </tbody>
-</table>
-
-<p>Internet Explorer does not recognize some of the more obscure
-character encodings, and having to lookup the real names with a table
-is a pain, so I recommend using Mozilla Firefox to find out your
-character encoding.</p>
-
-<h2 id="findmetacharset">Finding the embedded encoding</h2>
-
-<p>At this point, you may be asking, &quot;Didn't we already find out our
-encoding?&quot; Well, as it turns out, there are multiple places where
-a web developer can specify a character encoding, and one such place
-is in a <code>META</code> tag:</p>
-
-<pre>&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-8&quot; /&gt;</pre>
-
-<p>You'll find this in the <code>HEAD</code> section of an HTML document.
-The text to the right of <code>charset=</code> is the &quot;claimed&quot;
-encoding: the HTML claims to be this encoding, but whether or not this
-is actually the case depends on other factors. For now, take note
-if your <code>META</code> tag claims that either:</p>
-
-<ol>
-    <li>The character encoding is the same as the one reported by the
-        browser,</li>
-    <li>The character encoding is different from the browser's, or</li>
-    <li>There is no <code>META</code> tag at all! (horror, horror!)</li>
-</ol>
-
-<h2 id="fixcharset">Fixing the encoding</h2>
-
-<p class="aside">The advice given here is for pages being served as
-vanilla <code>text/html</code>.  Different practices must be used
-for <code>application/xml</code> or <code>application/xml+xhtml</code>, see
-<a href="http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020430/">W3C's
-document on XHTML media types</a> for more information.</p>
-
-<p>If your <code>META</code> encoding and your real encoding match,
-savvy! You can skip this section. If they don't...</p>
-
-<h3 id="fixcharset-none">No embedded encoding</h3>
-
-<p>If this is the case, you'll want to add in the appropriate
-<code>META</code> tag to your website. It's as simple as copy-pasting
-the code snippet above and replacing UTF-8 with whatever is the mime name
-of your real encoding.</p>
-
-<blockquote class="aside">
-    <p>For all those skeptics out there, there is a very good reason
-    why the character encoding should be explicitly stated. When the
-    browser isn't told what the character encoding of a text is, it
-    has to guess: and sometimes the guess is wrong. Hackers can manipulate
-    this guess in order to slip XSS past filters and then fool the
-    browser into executing it as active code. A great example of this
-    is the <a href="http://shiflett.org/archive/177">Google UTF-7
-    exploit</a>.</p>
-    <p>You might be able to get away with not specifying a character
-    encoding with the <code>META</code> tag as long as your webserver
-    sends the right Content-Type header, but why risk it? Besides, if
-    the user downloads the HTML file, there is no longer any webserver
-    to define the character encoding.</p>
-</blockquote>
-
-<h3 id="fixcharset-diff">Embedded encoding disagrees</h3>
-
-<p>This is an extremely common mistake: another source is telling
-the browser what the
-character encoding is and is overriding the embedded encoding. This
-source usually is the Content-Type HTTP header that the webserver (i.e.
-Apache) sends. A usual Content-Type header sent with a page might
-look like this:</p>
-
-<pre>Content-Type: text/html; charset=ISO-8859-1</pre>
-
-<p>Notice how there is a charset parameter: this is the webserver's
-way of telling a browser what the character encoding is, much like
-the <code>META</code> tags we touched upon previously.</p>
-
-<blockquote class="aside"><p>In fact, the <code>META</code> tag is
-designed as a substitute for the HTTP header for contexts where
-sending headers is impossible (such as locally stored files without
-a webserver). Thus the name <code>http-equiv</code> (HTTP equivalent).
-</p></blockquote>
-
-<p>There are two ways to go about fixing this: changing the <code>META</code>
-tag to match the HTTP header, or changing the HTTP header to match
-the <code>META</code> tag. How do we know which to do? It depends
-on the website's content: after all, headers and tags are only ways of
-describing the actual characters on the web page.</p>
-
-<p>If your website:</p>
-
-<dl>
-    <dt>...only uses ASCII characters,</dt>
-    <dd>Either way is fine, but I recommend switching both to
-        UTF-8 (more on this later).</dd>
-    <dt>...uses special characters, and they display
-        properly,</dt>
-    <dd>Change the embedded encoding to the server encoding.</dd>
-    <dt>...uses special characters, but users often complain that
-        they come out garbled,</dt>
-    <dd>Change the server encoding to the embedded encoding.</dd>
-</dl>
-
-<p>Changing a META tag is easy: just swap out the old encoding
-for the new. Changing the server (HTTP header) encoding, however,
-is slightly more difficult.</p>
-
-<h3 id="fixcharset-server">Changing the server encoding</h3>
-
-<h4 id="fixcharset-server-php">PHP header() function</h4>
-
-<p>The simplest way to handle this problem is to send the encoding
-yourself, via your programming language. Since you're using HTML
-Purifier, I'll assume PHP, although it's not too difficult to do
-similar things in
-<a href="http://www.w3.org/International/O-HTTP-charset#scripting">other
-languages</a>. The appropriate code is:</p>
-
-<pre><a href="http://php.net/function.header">header</a>('Content-Type:text/html; charset=UTF-8');</pre>
-
-<p>...replacing UTF-8 with whatever your embedded encoding is.
-This code must come before any output, so be careful about
-stray whitespace in your application (i.e., any whitespace before
-output excluding whitespace within &lt;?php ?&gt; tags).</p>
-
-<h4 id="fixcharset-server-phpini">PHP ini directive</h4>
-
-<p>PHP also has a neat little ini directive that can save you a
-header call: <code><a href="http://php.net/ini.core#ini.default-charset">default_charset</a></code>. Using this code:</p>
-
-<pre><a href="http://php.net/function.ini_set">ini_set</a>('default_charset', 'UTF-8');</pre>
-
-<p>...will also do the trick. If PHP is running as an Apache module (and
-not as FastCGI, consult
-<a href="http://php.net/phpinfo">phpinfo</a>() for details), you can even use htaccess to apply this property
-across many PHP files:</p>
-
-<pre><a href="http://php.net/configuration.changes#configuration.changes.apache">php_value</a> default_charset &quot;UTF-8&quot;</pre>
-
-<blockquote class="aside"><p>As with all INI directives, this can
-also go in your php.ini file. Some hosting providers allow you to customize
-your own php.ini file, ask your support for details. Use:</p>
-<pre>default_charset = &quot;utf-8&quot;</pre></blockquote>
-
-<h4 id="fixcharset-server-nophp">Non-PHP</h4>
-
-<p>You may, for whatever reason, need to set the character encoding
-on non-PHP files, usually plain ol' HTML files. Doing this
-is more of a hit-or-miss process: depending on the software being
-used as a webserver and the configuration of that software, certain
-techniques may work, or may not work.</p>
-
-<h4 id="fixcharset-server-htaccess">.htaccess</h4>
-
-<p>On Apache, you can use an .htaccess file to change the character
-encoding. I'll defer to
-<a href="http://www.w3.org/International/questions/qa-htaccess-charset">W3C</a>
-for the in-depth explanation, but it boils down to creating a file
-named .htaccess with the contents:</p>
-
-<pre><a href="http://httpd.apache.org/docs/1.3/mod/mod_mime.html#addcharset">AddCharset</a> UTF-8 .html</pre>
-
-<p>Where UTF-8 is replaced with the character encoding you want to
-use and .html is a file extension that this will be applied to. This
-character encoding will then be set for any file directly in
-or in the subdirectories of directory you place this file in.</p>
-
-<p>If you're feeling particularly courageous, you can use:</p>
-
-<pre><a href="http://httpd.apache.org/docs/1.3/mod/core.html#adddefaultcharset">AddDefaultCharset</a> UTF-8</pre>
-
-<p>...which changes the character set Apache adds to any document that
-doesn't have any Content-Type parameters. This directive, which the
-default configuration file sets to iso-8859-1 for security
-reasons, is probably why your headers mismatch
-with the <code>META</code> tag. If you would prefer Apache not to be
-butting in on your character encodings, you can tell it not
-to send anything at all:</p>
-
-<pre><a href="http://httpd.apache.org/docs/1.3/mod/core.html#adddefaultcharset">AddDefaultCharset</a> Off</pre>
-
-<p>...making your internal charset declaration (usually the <code>META</code> tags)
-the sole source of character encoding
-information. In these cases, it is <em>especially</em> important to make
-sure you have valid <code>META</code> tags on your pages and all the
-text before them is ASCII.</p>
-
-<blockquote class="aside"><p>These directives can also be
-placed in httpd.conf file for Apache, but
-in most shared hosting situations you won't be able to edit this file.
-</p></blockquote>
-
-<h4 id="fixcharset-server-ext">File extensions</h4>
-
-<p>If you're not allowed to use .htaccess files, you can often
-piggy-back off of Apache's default AddCharset declarations to get
-your files in the proper extension. Here are Apache's default
-character set declarations:</p>
-
-<table class="table">
-    <thead><tr>
-        <th>Charset</th>
-        <th>File extension(s)</th>
-    </tr></thead>
-    <tbody>
-        <tr><td>ISO-8859-1</td><td>.iso8859-1 .latin1</td></tr>
-        <tr><td>ISO-8859-2</td><td>.iso8859-2 .latin2 .cen</td></tr>
-        <tr><td>ISO-8859-3</td><td>.iso8859-3 .latin3</td></tr>
-        <tr><td>ISO-8859-4</td><td>.iso8859-4 .latin4</td></tr>
-        <tr><td>ISO-8859-5</td><td>.iso8859-5 .latin5 .cyr .iso-ru</td></tr>
-        <tr><td>ISO-8859-6</td><td>.iso8859-6 .latin6 .arb</td></tr>
-        <tr><td>ISO-8859-7</td><td>.iso8859-7 .latin7 .grk</td></tr>
-        <tr><td>ISO-8859-8</td><td>.iso8859-8 .latin8 .heb</td></tr>
-        <tr><td>ISO-8859-9</td><td>.iso8859-9 .latin9 .trk</td></tr>
-        <tr><td>ISO-2022-JP</td><td>.iso2022-jp .jis</td></tr>
-        <tr><td>ISO-2022-KR</td><td>.iso2022-kr .kis</td></tr>
-        <tr><td>ISO-2022-CN</td><td>.iso2022-cn .cis</td></tr>
-        <tr><td>Big5</td><td>.Big5 .big5 .b5</td></tr>
-        <tr><td>WINDOWS-1251</td><td>.cp-1251 .win-1251</td></tr>
-        <tr><td>CP866</td><td>.cp866</td></tr>
-        <tr><td>KOI8-r</td><td>.koi8-r .koi8-ru</td></tr>
-        <tr><td>KOI8-ru</td><td>.koi8-uk .ua</td></tr>
-        <tr><td>ISO-10646-UCS-2</td><td>.ucs2</td></tr>
-        <tr><td>ISO-10646-UCS-4</td><td>.ucs4</td></tr>
-        <tr><td>UTF-8</td><td>.utf8</td></tr>
-        <tr><td>GB2312</td><td>.gb2312 .gb </td></tr>
-        <tr><td>utf-7</td><td>.utf7</td></tr>
-        <tr><td>EUC-TW</td><td>.euc-tw</td></tr>
-        <tr><td>EUC-JP</td><td>.euc-jp</td></tr>
-        <tr><td>EUC-KR</td><td>.euc-kr</td></tr>
-        <tr><td>shift_jis</td><td>.sjis</td></tr>
-    </tbody>
-</table>
-
-<p>So, for example, a file named <code>page.utf8.html</code> or
-<code>page.html.utf8</code> will probably be sent with the UTF-8 charset
-attached, the difference being that if there is an
-<code>AddCharset charset .html</code> declaration, it will override
-the .utf8 extension in <code>page.utf8.html</code> (precedence moves
-from right to left). By default, Apache has no such declaration.</p>
-
-<h4 id="fixcharset-server-iis">Microsoft IIS</h4>
-
-<p>If anyone can contribute information on how to configure Microsoft
-IIS to change character encodings, I'd be grateful.</p>
-
-<h3 id="fixcharset-xml">XML</h3>
-
-<p><code>META</code> tags are the most common source of embedded
-encodings, but they can also come from somewhere else: XML
-Declarations. They look like:</p>
-
-<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;</pre>
-
-<p>...and are most often found in XML documents (including XHTML).</p>
-
-<p>For XHTML, this XML Declaration theoretically
-overrides the <code>META</code> tag. In reality, this happens only when the
-XHTML is actually served as legit XML and not HTML, which is almost always
-never due to Internet Explorer's lack of support for
-<code>application/xhtml+xml</code> (even though doing so is often
-argued to be <a href="http://www.hixie.ch/advocacy/xhtml">good
-practice</a> and is required by the XHTML 1.1 specification).</p>
-
-<p>For XML, however, this XML Declaration is extremely important.
-Since most webservers are not configured to send charsets for .xml files,
-this is the only thing a parser has to go on. Furthermore, the default
-for XML files is UTF-8, which often butts heads with more common
-ISO-8859-1 encoding (you see this in garbled RSS feeds).</p>
-
-<p>In short, if you use XHTML and have gone through the
-trouble of adding the XML Declaration, make sure it jives
-with your <code>META</code> tags (which should only be present
-if served in text/html) and HTTP headers.</p>
-
-<h3 id="fixcharset-internals">Inside the process</h3>
-
-<p>This section is not required reading,
-but may answer some of your questions on what's going on in all
-this character encoding hocus pocus. If you're interested in
-moving on to the next phase, skip this section.</p>
-
-<p>A logical question that follows all of our wheeling and dealing
-with multiple sources of character encodings is &quot;Why are there
-so many options?&quot; To answer this question, we have to turn
-back our definition of character encodings: they allow a program
-to interpret bytes into human-readable characters.</p>
-
-<p>Thus, a chicken-egg problem: a character encoding
-is necessary to interpret the
-text of a document. A <code>META</code> tag is in the text of a document.
-The <code>META</code> tag gives the character encoding. How can we
-determine the contents of a <code>META</code> tag, inside the text,
-if we don't know it's character encoding? And how do we figure out
-the character encoding, if we don't know the contents of the
-<code>META</code> tag?</p>
-
-<p>Fortunately for us, the characters we need to write the
-<code>META</code> are in ASCII, which is pretty much universal
-over every character encoding that is in common use today. So,
-all the web-browser has to do is parse all the way down until
-it gets to the Content-Type tag, extract the character encoding
-tag, then re-parse the document according to this new information.</p>
-
-<p>Obviously this is complicated, so browsers prefer the simpler
-and more efficient solution: get the character encoding from a
-somewhere other than the document itself, i.e. the HTTP headers,
-much to the chagrin of HTML authors who can't set these headers.</p>
-
-<h2 id="whyutf8">Why UTF-8?</h2>
-
-<p>So, you've gone through all the trouble of ensuring that your
-server and embedded characters all line up properly and are
-present.  Good job: at
-this point, you could quit and rest easy knowing that your pages
-are not vulnerable to character encoding style XSS attacks.
-However, just as having a character encoding is better than
-having no character encoding at all, having UTF-8 as your
-character encoding is better than having some other random
-character encoding, and the next step is to convert to UTF-8.
-But why?</p>
-
-<h3 id="whyutf8-i18n">Internationalization</h3>
-
-<p>Many software projects, at one point or another, suddenly realize
-that they should be supporting more than one language. Even regular
-usage in one language sometimes requires the occasional special character
-that, without surprise, is not available in your character set. Sometimes
-developers get around this by adding support for multiple encodings: when
-using Chinese, use Big5, when using Japanese, use Shift-JIS, when
-using Greek, etc. Other times, they use character references with great
-zeal.</p>
-
-<p>UTF-8, however, obviates the need for any of these complicated
-measures. After getting the system to use UTF-8 and adjusting for
-sources that are outside the hand of the browser (more on this later),
-UTF-8 just works. You can use it for any language, even many languages
-at once, you don't have to worry about managing multiple encodings,
-you don't have to use those user-unfriendly entities.</p>
-
-<h3 id="whyutf8-user">User-friendly</h3>
-
-<p>Websites encoded in Latin-1 (ISO-8859-1) which occasionally need
-a special character outside of their scope often will use a character
-entity reference to achieve the desired effect. For instance, &theta; can be
-written <code>&amp;theta;</code>, regardless of the character encoding's
-support of Greek letters.</p>
-
-<p>This works nicely for limited use of special characters, but
-say you wanted this sentence of Chinese text: &#28608;&#20809;,
-&#36889;&#20841;&#20491;&#23383;&#26159;&#29978;&#40636;&#24847;&#24605;.
-The ampersand encoded version would look like this:</p>
-
-<pre>&amp;#28608;&amp;#20809;, &amp;#36889;&amp;#20841;&amp;#20491;&amp;#23383;&amp;#26159;&amp;#29978;&amp;#40636;&amp;#24847;&amp;#24605;</pre>
-
-<p>Extremely inconvenient for those of us who actually know what
-character entities are, totally unintelligible to poor users who don't!
-Even the slightly more user-friendly, &quot;intelligible&quot; character
-entities like <code>&amp;theta;</code> will leave users who are
-uninterested in learning HTML scratching their heads. On the other
-hand, if they see &theta; in an edit box, they'll know that it's a
-special character, and treat it accordingly, even if they don't know
-how to write that character themselves.</p>
-
-<blockquote class="aside"><p>Wikipedia is a great case study for
-an application that originally used ISO-8859-1 but switched to UTF-8
-when it became far to cumbersome to support foreign languages. Bots
-will now actually go through articles and convert character entities
-to their corresponding real characters for the sake of user-friendliness
-and searchability. See
-<a href="http://meta.wikimedia.org/wiki/Help:Special_characters">Meta's
-page on special characters</a> for more details.
-</p></blockquote>
-
-<h3 id="whyutf8-forms">Forms</h3>
-
-<p>While we're on the tack of users, how do non-UTF-8 web forms deal
-with characters that are outside of their character set? Rather than
-discuss what UTF-8 does right, we're going to show what could go wrong
-if you didn't use UTF-8 and people tried to use characters outside
-of your character encoding.</p>
-
-<p>The troubles are large, extensive, and extremely difficult to fix (or,
-at least, difficult enough that if you had the time and resources to invest
-in doing the fix, you would be probably better off migrating to UTF-8).
-There are two types of form submission: <code>application/x-www-form-urlencoded</code>
-which is used for GET and by default for POST, and <code>multipart/form-data</code>
-which may be used by POST, and is required when you want to upload
-files.</p>
-
-<p>The following is a summarization of notes from
-<a href="http://web.archive.org/web/20060427015200/ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html">
-<code>FORM</code> submission and i18n</a>. That document contains lots
-of useful information, but is written in a rambly manner, so
-here I try to get right to the point. (Note: the original has
-disappeared off the web, so I am linking to the Web Archive copy.)</p>
-
-<h4 id="whyutf8-forms-urlencoded"><code>application/x-www-form-urlencoded</code></h4>
-
-<p>This is the Content-Type that GET requests must use, and POST requests
-use by default. It involves the ubiquitous percent encoding format that
-looks something like: <code>%C3%86</code>. There is no official way of
-determining the character encoding of such a request, since the percent
-encoding operates on a byte level, so it is usually assumed that it
-is the same as the encoding the page containing the form was submitted
-in. (<a href="http://tools.ietf.org/html/rfc3986#section-2.5">RFC 3986</a>
-recommends that textual identifiers be translated to UTF-8; however, browser
-compliance is spotty.) You'll run into very few problems
-if you only use characters in the character encoding you chose.</p>
-
-<p>However, once you start adding characters outside of your encoding
-(and this is a lot more common than you may think: take curly
-&quot;smart&quot; quotes from Microsoft as an example),
-a whole manner of strange things start to happen. Depending on the
-browser you're using, they might:</p>
-
-<ul>
-    <li>Replace the unsupported characters with useless question marks,</li>
-    <li>Attempt to fix the characters (example: smart quotes to regular quotes),</li>
-    <li>Replace the character with a character entity reference, or</li>
-    <li>Send it anyway as a different character encoding mixed in
-        with the original encoding (usually Windows-1252 rather than
-        iso-8859-1 or UTF-8 interspersed in 8-bit)</li>
-</ul>
-
-<p>To properly guard against these behaviors, you'd have to sniff out
-the browser agent, compile a database of different behaviors, and
-take appropriate conversion action against the string (disregarding
-a spate of extremely mysterious, random and devastating bugs Internet
-Explorer manifests every once in a while). Or you could
-use UTF-8 and rest easy knowing that none of this could possibly happen
-since UTF-8 supports every character.</p>
-
-<h4 id="whyutf8-forms-multipart"><code>multipart/form-data</code></h4>
-
-<p>Multipart form submission takes away a lot of the ambiguity
-that percent-encoding had: the server now can explicitly ask for
-certain encodings, and the client can explicitly tell the server
-during the form submission what encoding the fields are in.</p>
-
-<p>There are two ways you go with this functionality: leave it
-unset and have the browser send in the same encoding as the page,
-or set it to UTF-8 and then do another conversion server-side.
-Each method has deficiencies, especially the former.</p>
-
-<p>If you tell the browser to send the form in the same encoding as
-the page, you still have the trouble of what to do with characters
-that are outside of the character encoding's range. The behavior, once
-again, varies: Firefox 2.0 converts them to character entity references
-while Internet Explorer 7.0 mangles them beyond intelligibility. For
-serious internationalization purposes, this is not an option.</p>
-
-<p>The other possibility is to set Accept-Encoding to UTF-8, which
-begs the question: Why aren't you using UTF-8 for everything then?
-This route is more palatable, but there's a notable caveat: your data
-will come in as UTF-8, so you will have to explicitly convert it into
-your favored local character encoding.</p>
-
-<p>I object to this approach on idealogical grounds: you're
-digging yourself deeper into
-the hole when you could have been converting to UTF-8
-instead. And, of course, you can't use this method for GET requests.</p>
-
-<h3 id="whyutf8-support">Well supported</h3>
-
-<p>Almost every modern browser in the wild today has full UTF-8 and Unicode
-support: the number of troublesome cases can be counted with the
-fingers of one hand, and these browsers usually have trouble with
-other character encodings too. Problems users usually encounter stem
-from the lack of appropriate fonts to display the characters (once
-again, this applies to all character encodings and HTML entities) or
-Internet Explorer's lack of intelligent font picking (which can be
-worked around).</p>
-
-<p>We will go into more detail about how to deal with edge cases in
-the browser world in the Migration section, but rest assured that
-converting to UTF-8, if done correctly, will not result in users
-hounding you about broken pages.</p>
-
-<h3 id="whyutf8-htmlpurifier">HTML Purifier</h3>
-
-<p>And finally, we get to HTML Purifier.  HTML Purifier is built to
-deal with UTF-8: any indications otherwise are the result of an
-encoder that converts text from your preferred encoding to UTF-8, and
-back again.  HTML Purifier never touches anything else, and leaves
-it up to the module iconv to do the dirty work.</p>
-
-<p>This approach, however, is not perfect. iconv is blithely unaware
-of HTML character entities. HTML Purifier, in order to
-protect against sophisticated escaping schemes, normalizes all character
-and numeric entity references before processing the text. This leads to
-one important ramification:</p>
-
-<p><strong>Any character that is not supported by the target character
-set, regardless of whether or not it is in the form of a character
-entity reference or a raw character, will be silently ignored.</strong></p>
-
-<p>Example of this principle at work: say you have <code>&amp;theta;</code>
-in your HTML, but the output is in Latin-1 (which, understandably,
-does not understand Greek), the following process will occur (assuming you've
-set the encoding correctly using %Core.Encoding):</p>
-
-<ul>
-    <li>The <code>Encoder</code> will transform the text from ISO 8859-1 to UTF-8
-        (note that theta is preserved here since it doesn't actually use
-        any non-ASCII characters): <code>&amp;theta;</code></li>
-    <li>The <code>EntityParser</code> will transform all named and numeric
-        character entities to their corresponding raw UTF-8 equivalents:
-        <code>&theta;</code></li>
-    <li>HTML Purifier processes the code: <code>&theta;</code></li>
-    <li>The <code>Encoder</code> now transforms the text back from UTF-8
-        to ISO 8859-1. Since Greek is not supported by ISO 8859-1, it
-        will be either ignored or replaced with a question mark:
-        <code>?</code></li>
-</ul>
-
-<p>This behaviour is quite unsatisfactory. It is a deal-breaker for
-international applications, and it can be mildly annoying for the provincial
-soul who occasionally needs a special character. Since 1.4.0, HTML
-Purifier has provided a slightly more palatable workaround using
-%Core.EscapeNonASCIICharacters. The process now looks like:</p>
-
-<ul>
-    <li>The <code>Encoder</code> transforms encoding to UTF-8: <code>&amp;theta;</code></li>
-    <li>The <code>EntityParser</code> transforms entities: <code>&theta;</code></li>
-    <li>HTML Purifier processes the code: <code>&theta;</code></li>
-    <li>The <code>Encoder</code> replaces all non-ASCII characters
-        with numeric entity reference: <code>&amp;#952;</code></li>
-    <li>For good measure, <code>Encoder</code> transforms encoding back to
-        original (which is strictly unnecessary for 99% of encodings
-        out there): <code>&amp;#952;</code> (remember, it's all ASCII!)</li>
-</ul>
-
-<p>...which means that this is only good for an occasional foray into
-the land of Unicode characters, and is totally unacceptable for Chinese
-or Japanese texts. The even bigger kicker is that, supposing the
-input encoding was actually ISO-8859-7, which <em>does</em> support
-theta, the character would get converted into a character entity reference
-anyway! (The Encoder does not discriminate).</p>
-
-<p>The current functionality is about where HTML Purifier will be for
-the rest of eternity. HTML Purifier could attempt to preserve the original
-form of the character references so that they could be substituted back in, only the
-DOM extension kills them off irreversibly. HTML Purifier could also attempt
-to be smart and only convert non-ASCII characters that weren't supported
-by the target encoding, but that would require reimplementing iconv
-with HTML awareness, something I will not do.</p>
-
-<p>So there: either it's UTF-8 or crippled international support. Your pick! (and I'm
-not being sarcastic here: some people could care less about other languages).</p>
-
-<h2 id="migrate">Migrate to UTF-8</h2>
-
-<p>So, you've decided to bite the bullet, and want to migrate to UTF-8.
-Note that this is not for the faint-hearted, and you should expect
-the process to take longer than you think it will take.</p>
-
-<p>The general idea is that you convert all existing text to UTF-8,
-and then you set all the headers and META tags we discussed earlier
-to UTF-8. There are many ways going about doing this: you could
-write a conversion script that runs through the database and re-encodes
-everything as UTF-8 or you could do the conversion on the fly when someone
-reads the page. The details depend on your system, but I will cover
-some of the more subtle points of migration that may trip you up.</p>
-
-<h3 id="migrate-db">Configuring your database</h3>
-
-<p>Most modern databases, the most prominent open-source ones being MySQL
-4.1+ and PostgreSQL, support character encodings. If you're switching
-to UTF-8, logically speaking, you'd want to make sure your database
-knows about the change too. There are some caveats though:</p>
-
-<h4 id="migrate-db-legit">Legit method</h4>
-
-<p>Standardization in terms of SQL syntax for specifying character
-encodings is notoriously spotty. Refer to your respective database's
-documentation on how to do this properly.</p>
-
-<p>For <a href="http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html">MySQL</a>, <code>ALTER</code> will magically perform the
-character encoding conversion for you. However, you have
-to make sure that the text inside the column is what is says it is:
-if you had put Shift-JIS in an ISO 8859-1 column, MySQL will irreversibly mangle
-the text when you try to convert it to UTF-8. You'll have to convert
-it to a binary field, convert it to a Shift-JIS field (the real encoding),
-and then finally to UTF-8. Many a website had pages irreversibly mangled
-because they didn't realize that they'd been deluding themselves about
-the character encoding all along; don't become the next victim.</p>
-
-<p>For <a href="http://www.postgresql.org/docs/8.2/static/multibyte.html">PostgreSQL</a>, there appears to be no direct way to change the
-encoding of a database (as of 8.2). You will have to dump the data, and then reimport
-it into a new table. Make sure that your client encoding is set properly:
-this is how PostgreSQL knows to perform an encoding conversion.</p>
-
-<p>Many times, you will be also asked about the &quot;collation&quot; of
-the new column. Collation is how a DBMS sorts text, like ordering
-B, C and A into A, B and C (the problem gets surprisingly complicated
-when you get to languages like Thai and Japanese). If in doubt,
-going with the default setting is usually a safe bet.</p>
-
-<p>Once the conversion is all said and done, you still have to remember
-to set the client encoding (your encoding) properly on each database
-connection using <code>SET NAMES</code> (which is standard SQL and is
-usually supported).</p>
-
-<h4 id="migrate-db-binary">Binary</h4>
-
-<p>Due to the aforementioned compatibility issues, a more interoperable
-way of storing UTF-8 text is to stuff it in a binary datatype.
-<code>CHAR</code> becomes <code>BINARY</code>, <code>VARCHAR</code> becomes
-<code>VARBINARY</code> and <code>TEXT</code> becomes <code>BLOB</code>.
-Doing so can save you some huge headaches:</p>
-
-<ul>
-    <li>The syntax for binary data types is very portable,</li>
-    <li>MySQL 4.0 has <em>no</em> support for character encodings, so
-        if you want to support it you <em>have</em> to use binary,</li>
-    <li>MySQL, as of 5.1, has no support for four byte UTF-8 characters,
-        which represent characters beyond the basic multilingual
-        plane, and</li>
-    <li>You will never have to worry about your DBMS being too smart
-        and attempting to convert your text when you don't want it to.</li>
-</ul>
-
-<p>MediaWiki, a very prominent international application, uses binary fields
-for storing their data because of point three.</p>
-
-<p>There are drawbacks, of course:</p>
-
-<ul>
-    <li>Database tools like PHPMyAdmin won't be able to offer you inline
-        text editing, since it is declared as binary,</li>
-    <li>It's not semantically correct: it's really text not binary
-        (lying to the database),</li>
-    <li>Unless you use the not-very-portable wizardry mentioned above,
-        you have to change the encoding yourself (usually, you'd do
-        it on the fly), and</li>
-    <li>You will not have collation.</li>
-</ul>
-
-<p>Choose based on your circumstances.</p>
-
-<h3 id="migrate-editor">Text editor</h3>
-
-<p>For more flat-file oriented systems, you will often be tasked with
-converting reams of existing text and HTML files into UTF-8, as well as
-making sure that all new files uploaded are properly encoded. Once again,
-I can only point vaguely in the right direction for converting your
-existing files: make sure you backup, make sure you use
-<a href="http://php.net/ref.iconv">iconv</a>(), and
-make sure you know what the original character encoding of the files
-is (or are, depending on the tidiness of your system).</p>
-
-<p>However, I can proffer more specific advice on the subject of
-text editors. Many text editors have notoriously spotty Unicode support.
-To find out how your editor is doing, you can check out <a
-href="http://www.alanwood.net/unicode/utilities_editors.html">this list</a>
-or <a href="http://en.wikipedia.org/wiki/Comparison_of_text_editors#Encoding_support">Wikipedia's list.</a>
-I personally use Notepad++, which works like a charm when it comes to UTF-8.
-Usually, you will have to <strong>explicitly</strong> tell the editor through some dialogue
-(usually Save as or Format) what encoding you want it to use. An editor
-will often offer &quot;Unicode&quot; as a method of saving, which is
-ambiguous. Make sure you know whether or not they really mean UTF-8
-or UTF-16 (which is another flavor of Unicode).</p>
-
-<p>The two things to look out for are whether or not the editor
-supports <strong>font mixing</strong> (multiple
-fonts in one document) and whether or not it adds a <strong>BOM</strong>.
-Font mixing is important because fonts rarely have support for every
-language known to mankind: in order to be flexible, an editor must
-be able to take a little from here and a little from there, otherwise
-all your Chinese characters will come as nice boxes. We'll discuss
-BOM below.</p>
-
-<h3 id="migrate-bom">Byte Order Mark (headers already sent!)</h3>
-
-<p>The BOM, or <a href="http://en.wikipedia.org/wiki/Byte_Order_Mark">Byte
-Order Mark</a>, is a magical, invisible character placed at
-the beginning of UTF-8 files to tell people what the encoding is and
-what the endianness of the text is. It is also unnecessary.</p>
-
-<p>Because it's invisible, it often
-catches people by surprise when it starts doing things it shouldn't
-be doing. For example, this PHP file:</p>
-
-<pre><strong>BOM</strong>&lt;?php
-header('Location: index.php');
-?&gt;</pre>
-
-<p>...will fail with the all too familiar <strong>Headers already sent</strong>
-PHP error. And because the BOM is invisible, this culprit will go unnoticed.
-My suggestion is to only use ASCII in PHP pages, but if you must, make
-sure the page is saved WITHOUT the BOM.</p>
-
-<blockquote class="aside">
-    <p>The headers the error is referring to are <strong>HTTP headers</strong>,
-       which are sent to the browser before any HTML to tell it various
-       information. The moment any regular text (and yes, a BOM counts as
-       ordinary text) is output, the headers must be sent, and you are
-       not allowed to send anymore. Thus, the error.</p>
-</blockquote>
-
-<p>If you are reading in text files to insert into the middle of another
-page, it is strongly advised (but not strictly necessary) that you replace out the UTF-8 byte
-sequence for BOM <code>&quot;\xEF\xBB\xBF&quot;</code> before inserting it in,
-via:</p>
-
-<pre>$text = str_replace(&quot;\xEF\xBB\xBF&quot;, '', $text);</pre>
-
-<h3 id="migrate-fonts">Fonts</h3>
-
-<p>Generally speaking, people who are having trouble with fonts fall
-into two categories:</p>
-
-<ul>
-<li>Those who want to
-use an extremely obscure language for which there is very little
-support even among native speakers of the language, and</li>
-<li>Those where the primary language of the text is
-well-supported but there are occasional characters
-that aren't supported.</li>
-</ul>
-
-<p>Yes, there's always a chance where an English user happens across
-a Sinhalese website and doesn't have the right font. But an English user
-who happens not to have the right fonts probably has no business reading Sinhalese
-anyway. So we'll deal with the other two edge cases.</p>
-
-<h4 id="migrate-fonts-obscure">Obscure scripts</h4>
-
-<p>If you run a Bengali website, you may get comments from users who
-would like to read your website but get heaps of question marks or
-other meaningless characters. Fixing this problem requires the
-installation of a font or language pack which is often highly
-dependent on what the language is. <a href="http://bn.wikipedia.org/wiki/%E0%A6%89%E0%A6%87%E0%A6%95%E0%A6%BF%E0%A6%AA%E0%A7%87%E0%A6%A1%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A6%BE:Bangla_script_display_and_input_help">Here is an example</a>
-of such a help file for the Bengali language; I am sure there are
-others out there too. You just have to point users to the appropriate
-help file.</p>
-
-<h4 id="migrate-fonts-occasional">Occasional use</h4>
-
-<p>A prime example of when you'll see some very obscure Unicode
-characters embedded in what otherwise would be very bland ASCII are
-letters of the
-<a href="http://en.wikipedia.org/wiki/International_Phonetic_Alphabet">International
-Phonetic Alphabet (IPA)</a>, use to designate pronunciations in a very standard
-manner (you probably see them all the time in your dictionary). Your
-average font probably won't have support for all of the IPA characters
-like &#664; (bilabial click) or &#658; (voiced postalveolar fricative).
-So what's a poor browser to do? Font mix! Smart browsers like Mozilla Firefox
-and Internet Explorer 7 will borrow glyphs from other fonts in order
-to make sure that all the characters display properly.</p>
-
-<p>But what happens when the browser isn't smart and happens to be the
-most widely used browser in the entire world? Microsoft IE 6
-is not smart enough to borrow from other fonts when a character isn't
-present, so more often than not you'll be slapped with a nice big &#65533;.
-To get things to work, MSIE 6 needs a little nudge. You could configure it
-to use a different font to render the text, but you can achieve the same
-effect by selectively changing the font for blocks of special characters
-to known good Unicode fonts.</p>
-
-<p>Fortunately, the folks over at Wikipedia have already done all the
-heavy lifting for you. Get the CSS from the horses mouth here:
-<a href="http://en.wikipedia.org/wiki/MediaWiki:Common.css">Common.css</a>,
-and search for &quot;.IPA&quot; There are also a smattering of
-other classes you can use for other purposes, check out
-<a href="http://meta.wikimedia.org/wiki/Help:Special_characters#Displaying_Special_Characters">this page</a>
-for more details. For you lazy ones, this should work:</p>
-
-<pre>.Unicode {
-        font-family: Code2000, &quot;TITUS Cyberbit Basic&quot;, &quot;Doulos SIL&quot;,
-            &quot;Chrysanthi Unicode&quot;, &quot;Bitstream Cyberbit&quot;,
-            &quot;Bitstream CyberBase&quot;, Thryomanes, Gentium, GentiumAlt,
-            &quot;Lucida Grande&quot;, &quot;Arial Unicode MS&quot;, &quot;Microsoft Sans Serif&quot;,
-            &quot;Lucida Sans Unicode&quot;;
-        font-family /**/:inherit; /* resets fonts for everyone but IE6 */
-}</pre>
-
-<p>The standard usage goes along the lines of <code>&lt;span class=&quot;Unicode&quot;&gt;Crazy
-Unicode stuff here&lt;/span&gt;</code>. Characters in the
-<a href="http://en.wikipedia.org/wiki/Windows_Glyph_List_4">Windows Glyph List</a>
-usually don't need to be fixed, but for anything else you probably
-want to play it safe. Unless, of course, you don't care about IE6
-users.</p>
-
-<h3 id="migrate-variablewidth">Dealing with variable width in functions</h3>
-
-<p>When people claim that PHP6 will solve all our Unicode problems, they're
-misinformed. It will not fix any of the aforementioned troubles. It will,
-however, fix the problem we are about to discuss: processing UTF-8 text
-in PHP.</p>
-
-<p>PHP (as of PHP5) is blithely unaware of the existence of UTF-8 (with a few
-notable exceptions). Sometimes, this will cause problems, other times,
-this won't. So far, we've avoided discussing the architecture of
-UTF-8, so, we must first ask, what is UTF-8? Yes, it supports Unicode,
-and yes, it is variable width. Other traits:</p>
-
-<ul>
-    <li>Every character's byte sequence is unique and will never be found
-        inside the byte sequence of another character,</li>
-    <li>UTF-8 may use up to four bytes to encode a character,</li>
-    <li>UTF-8 text must be checked for well-formedness,</li>
-    <li>Pure ASCII is also valid UTF-8, and</li>
-    <li>Binary sorting will sort UTF-8 in the same order as Unicode.</li>
-</ul>
-
-<p>Each of these traits affect different domains of text processing
-in different ways. It is beyond the scope of this document to explain
-what precisely these implications are. PHPWact provides
-a very good <a href="http://www.phpwact.org/php/i18n/utf-8">reference document</a>
-on what to expect from each function, although coverage is spotty in
-some areas. Their more general notes on
-<a href="http://www.phpwact.org/php/i18n/charsets">character sets</a>
-are also worth looking at for information on UTF-8. Some rules of thumb
-when dealing with Unicode text:</p>
-
-<ul>
-    <li>Do not EVER use functions that:<ul>
-        <li>...convert case (strtolower, strtoupper, ucfirst, ucwords)</li>
-        <li>...claim to be case-insensitive (str_ireplace, stristr, strcasecmp)</li>
-    </ul></li>
-    <li>Think twice before using functions that:<ul>
-        <li>...count characters (strlen will return bytes, not characters;
-            str_split and word_wrap may corrupt)</li>
-        <li>...convert characters to entity references (UTF-8 doesn't need entities)</li>
-        <li>...do very complex string processing (*printf)</li>
-    </ul></li>
-</ul>
-
-<p>Note: this list applies to UTF-8 encoded text only: if you have
-a string that you are 100% sure is ASCII, be my guest and use
-<code>strtolower</code> (HTML Purifier uses this function.)</p>
-
-<p>Regardless, always think in bytes, not characters. If you use strpos()
-to find the position of a character, it will be in bytes, but this
-usually won't matter since substr() also operates with byte indices!</p>
-
-<p>You'll also need to make sure your UTF-8 is well-formed and will
-probably need replacements for some of these functions. I recommend
-using Harry Fuecks' <a href="http://phputf8.sourceforge.net/">PHP
-UTF-8</a> library, rather than use mb_string directly. HTML Purifier
-also defines a few useful UTF-8 compatible functions: check out
-<code>Encoder.php</code> in the <code>/library/HTMLPurifier/</code>
-directory.</p>
-
-<h2 id="externallinks">Further Reading</h2>
-
-<p>Well, that's it. Hopefully this document has served as a very
-practical springboard into knowledge of how UTF-8 works.  You may have
-decided that you don't want to migrate yet: that's fine, just know
-what will happen to your output and what bug reports you may receive.</p>
-
-<p>Many other developers have already discussed the subject of Unicode,
-UTF-8 and internationalization, and I would like to defer to them for
-a more in-depth look into character sets and encodings.</p>
-
-<ul>
-    <li><a href="http://www.joelonsoftware.com/articles/Unicode.html">
-        The Absolute Minimum Every Software Developer Absolutely,
-        Positively Must Know About Unicode and Character Sets
-        (No Excuses!)</a> by Joel Spolsky, provides a <em>very</em>
-        good high-level look at Unicode and character sets in general.</li>
-    <li><a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8 on Wikipedia</a>,
-        provides a lot of useful details into the innards of UTF-8, although
-        it may be a little off-putting to people who don't know much
-        about Unicode to begin with.</li>
-</ul>
-
-</body>
-</html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/enduser-youtube.html b/lib/htmlpurifier/docs/enduser-youtube.html
deleted file mode 100644
index 87a36b9aa..000000000
--- a/lib/htmlpurifier/docs/enduser-youtube.html
+++ /dev/null
@@ -1,153 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Explains how to safely allow the embedding of flash from trusted sites in HTML Purifier." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>Embedding YouTube Videos - HTML Purifier</title>
-
-</head><body>
-
-<h1 class="subtitled">Embedding YouTube Videos</h1>
-<div class="subtitle">...as well as other dangerous active content</div>
-
-<div id="filing">Filed under End-User</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>Clients like their YouTube videos. It gives them a warm fuzzy feeling when
-they see a neat little embedded video player on their websites that can play
-the latest clips from their documentary &quot;Fido and the Bones of Spring&quot;.
-All joking aside, the ability to embed YouTube videos or other active
-content in their pages is something that a lot of people like.</p>
-
-<p>This is a <em>bad</em> idea. The moment you embed anything untrusted,
-you will definitely be slammed by a manner of nasties that can be
-embedded in things from your run of the mill Flash movie to
-<a href="http://blog.spywareguide.com/2006/12/myspace_phish_attack_leads_use.html">Quicktime movies</a>.
-Even <code>img</code> tags, which HTML Purifier allows by default, can be
-dangerous. Be distrustful of anything that tells a browser to load content
-from another website automatically.</p>
-
-<p>Luckily for us, however, whitelisting saves the day. Sure, letting users
-include any old random flash file could be dangerous, but if it's
-from a specific website, it probably is okay. If no amount of pleading will
-convince the people upstairs that they should just settle with just linking
-to their movies, you may find this technique very useful.</p>
-
-<h2>Looking in</h2>
-
-<p>Below is custom code that allows users to embed
-YouTube videos. This is not favoritism: this trick can easily be adapted for
-other forms of embeddable content.</p>
-
-<p>Usually, websites like YouTube give us boilerplate code that you can insert
-into your documents. YouTube's code goes like this:</p>
-
-<pre>
-&lt;object width=&quot;425&quot; height=&quot;350&quot;&gt;
-  &lt;param name=&quot;movie&quot; value=&quot;http://www.youtube.com/v/AyPzM5WK8ys&quot; /&gt;
-  &lt;param name=&quot;wmode&quot; value=&quot;transparent&quot; /&gt;
-  &lt;embed src=&quot;http://www.youtube.com/v/AyPzM5WK8ys&quot;
-         type=&quot;application/x-shockwave-flash&quot;
-         wmode=&quot;transparent&quot; width=&quot;425&quot; height=&quot;350&quot; /&gt;
-&lt;/object&gt;
-</pre>
-
-<p>There are two things to note about this code:</p>
-
-<ol>
-    <li><code>&lt;embed&gt;</code> is not recognized by W3C, so if you want
-        standards-compliant code, you'll have to get rid of it.</li>
-    <li>The code is exactly the same for all instances, except for the
-        identifier <tt>AyPzM5WK8ys</tt> which tells us which movie file
-        to retrieve.</li>
-</ol>
-
-<p>What point 2 means is that if we have code like <code>&lt;span
-class=&quot;youtube-embed&quot;&gt;AyPzM5WK8ys&lt;/span&gt;</code> your
-application can reconstruct the full object from this small snippet that
-passes through HTML Purifier <em>unharmed</em>.
-<a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/Filter/YouTube.php">Show me the code!</a></p>
-
-<p>And the corresponding usage:</p>
-
-<pre>&lt;?php
-    $config-&gt;set('Filter.YouTube', true);
-?&gt;</pre>
-
-<p>There is a bit going in the two code snippets, so let's explain.</p>
-
-<ol>
-    <li>This is a Filter object, which intercepts the HTML that is
-        coming into and out of the purifier. You can add as many
-        filter objects as you like. <code>preFilter()</code>
-        processes the code before it gets purified, and <code>postFilter()</code>
-        processes the code afterwards. So, we'll use <code>preFilter()</code> to
-        replace the object tag with a <code>span</code>, and <code>postFilter()</code>
-        to restore it.</li>
-    <li>The first preg_replace call replaces any YouTube code users may have
-        embedded into the benign span tag. Span is used because it is inline,
-        and objects are inline too. We are very careful to be extremely
-        restrictive on what goes inside the span tag, as if an errant code
-        gets in there it could get messy.</li>
-    <li>The HTML is then purified as usual.</li>
-    <li>Then, another preg_replace replaces the span tag with a fully fledged
-        object. Note that the embed is removed, and, in its place, a data
-        attribute was added to the object. This makes the tag standards
-        compliant! It also breaks Internet Explorer, so we add in a bit of
-        conditional comments with the old embed code to make it work again.
-        It's all quite convoluted but works.</li>
-</ol>
-
-<h2>Warning</h2>
-
-<p>There are a number of possible problems with the code above, depending
-on how you look at it.</p>
-
-<h3>Cannot change width and height</h3>
-
-<p>The width and height of the final YouTube movie cannot be adjusted. This
-is because I am lazy. If you really insist on letting users change the size
-of the movie, what you need to do is package up the attributes inside the
-span tag (along with the movie ID). It gets complicated though: a malicious
-user can specify an outrageously large height and width and attempt to crash
-the user's operating system/browser. You need to either cap it by limiting
-the amount of digits allowed in the regex or using a callback to check the
-number.</p>
-
-<h3>Trusts media's host's security</h3>
-
-<p>By allowing this code onto our website, we are trusting that YouTube has
-tech-savvy enough people not to allow their users to inject malicious
-code into the Flash files.  An exploit on YouTube means an exploit on your
-site.  Even though YouTube is run by the reputable Google, it
-<a href="http://ha.ckers.org/blog/20061213/google-xss-vuln/">doesn't</a>
-mean they are
-<a href="http://ha.ckers.org/blog/20061208/xss-in-googles-orkut/">invulnerable.</a>
-You're putting a certain measure of the job on an external provider (just as
-you have by entrusting your user input to HTML Purifier), and
-it is important that you are cognizant of the risk.</p>
-
-<h3>Poorly written adaptations compromise security</h3>
-
-<p>This should go without saying, but if you're going to adapt this code
-for Google Video or the like, make sure you do it <em>right</em>. It's
-extremely easy to allow a character too many in <code>postFilter()</code> and
-suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML
-Purifier may be well written, but it cannot guard against vulnerabilities
-introduced after it has finished.</p>
-
-<h2>Help out!</h2>
-
-<p>If you write a filter for your favorite video destination (or anything
-like that, for that matter), send it over and it might get included
-with the core!</p>
-
-</body>
-</html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/entities/xhtml-lat1.ent b/lib/htmlpurifier/docs/entities/xhtml-lat1.ent
deleted file mode 100644
index ffee223eb..000000000
--- a/lib/htmlpurifier/docs/entities/xhtml-lat1.ent
+++ /dev/null
@@ -1,196 +0,0 @@
-<!-- Portions (C) International Organization for Standardization 1986
-     Permission to copy in any form is granted for use with
-     conforming SGML systems and applications as defined in
-     ISO 8879, provided this notice is included in all copies.
--->
-<!-- Character entity set. Typical invocation:
-    <!ENTITY % HTMLlat1 PUBLIC
-       "-//W3C//ENTITIES Latin 1 for XHTML//EN"
-       "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
-    %HTMLlat1;
--->
-
-<!ENTITY nbsp   "&#160;"> <!-- no-break space = non-breaking space,
-                                  U+00A0 ISOnum -->
-<!ENTITY iexcl  "&#161;"> <!-- inverted exclamation mark, U+00A1 ISOnum -->
-<!ENTITY cent   "&#162;"> <!-- cent sign, U+00A2 ISOnum -->
-<!ENTITY pound  "&#163;"> <!-- pound sign, U+00A3 ISOnum -->
-<!ENTITY curren "&#164;"> <!-- currency sign, U+00A4 ISOnum -->
-<!ENTITY yen    "&#165;"> <!-- yen sign = yuan sign, U+00A5 ISOnum -->
-<!ENTITY brvbar "&#166;"> <!-- broken bar = broken vertical bar,
-                                  U+00A6 ISOnum -->
-<!ENTITY sect   "&#167;"> <!-- section sign, U+00A7 ISOnum -->
-<!ENTITY uml    "&#168;"> <!-- diaeresis = spacing diaeresis,
-                                  U+00A8 ISOdia -->
-<!ENTITY copy   "&#169;"> <!-- copyright sign, U+00A9 ISOnum -->
-<!ENTITY ordf   "&#170;"> <!-- feminine ordinal indicator, U+00AA ISOnum -->
-<!ENTITY laquo  "&#171;"> <!-- left-pointing double angle quotation mark
-                                  = left pointing guillemet, U+00AB ISOnum -->
-<!ENTITY not    "&#172;"> <!-- not sign = angled dash,
-                                  U+00AC ISOnum -->
-<!ENTITY shy    "&#173;"> <!-- soft hyphen = discretionary hyphen,
-                                  U+00AD ISOnum -->
-<!ENTITY reg    "&#174;"> <!-- registered sign = registered trade mark sign,
-                                  U+00AE ISOnum -->
-<!ENTITY macr   "&#175;"> <!-- macron = spacing macron = overline
-                                  = APL overbar, U+00AF ISOdia -->
-<!ENTITY deg    "&#176;"> <!-- degree sign, U+00B0 ISOnum -->
-<!ENTITY plusmn "&#177;"> <!-- plus-minus sign = plus-or-minus sign,
-                                  U+00B1 ISOnum -->
-<!ENTITY sup2   "&#178;"> <!-- superscript two = superscript digit two
-                                  = squared, U+00B2 ISOnum -->
-<!ENTITY sup3   "&#179;"> <!-- superscript three = superscript digit three
-                                  = cubed, U+00B3 ISOnum -->
-<!ENTITY acute  "&#180;"> <!-- acute accent = spacing acute,
-                                  U+00B4 ISOdia -->
-<!ENTITY micro  "&#181;"> <!-- micro sign, U+00B5 ISOnum -->
-<!ENTITY para   "&#182;"> <!-- pilcrow sign = paragraph sign,
-                                  U+00B6 ISOnum -->
-<!ENTITY middot "&#183;"> <!-- middle dot = Georgian comma
-                                  = Greek middle dot, U+00B7 ISOnum -->
-<!ENTITY cedil  "&#184;"> <!-- cedilla = spacing cedilla, U+00B8 ISOdia -->
-<!ENTITY sup1   "&#185;"> <!-- superscript one = superscript digit one,
-                                  U+00B9 ISOnum -->
-<!ENTITY ordm   "&#186;"> <!-- masculine ordinal indicator,
-                                  U+00BA ISOnum -->
-<!ENTITY raquo  "&#187;"> <!-- right-pointing double angle quotation mark
-                                  = right pointing guillemet, U+00BB ISOnum -->
-<!ENTITY frac14 "&#188;"> <!-- vulgar fraction one quarter
-                                  = fraction one quarter, U+00BC ISOnum -->
-<!ENTITY frac12 "&#189;"> <!-- vulgar fraction one half
-                                  = fraction one half, U+00BD ISOnum -->
-<!ENTITY frac34 "&#190;"> <!-- vulgar fraction three quarters
-                                  = fraction three quarters, U+00BE ISOnum -->
-<!ENTITY iquest "&#191;"> <!-- inverted question mark
-                                  = turned question mark, U+00BF ISOnum -->
-<!ENTITY Agrave "&#192;"> <!-- latin capital letter A with grave
-                                  = latin capital letter A grave,
-                                  U+00C0 ISOlat1 -->
-<!ENTITY Aacute "&#193;"> <!-- latin capital letter A with acute,
-                                  U+00C1 ISOlat1 -->
-<!ENTITY Acirc  "&#194;"> <!-- latin capital letter A with circumflex,
-                                  U+00C2 ISOlat1 -->
-<!ENTITY Atilde "&#195;"> <!-- latin capital letter A with tilde,
-                                  U+00C3 ISOlat1 -->
-<!ENTITY Auml   "&#196;"> <!-- latin capital letter A with diaeresis,
-                                  U+00C4 ISOlat1 -->
-<!ENTITY Aring  "&#197;"> <!-- latin capital letter A with ring above
-                                  = latin capital letter A ring,
-                                  U+00C5 ISOlat1 -->
-<!ENTITY AElig  "&#198;"> <!-- latin capital letter AE
-                                  = latin capital ligature AE,
-                                  U+00C6 ISOlat1 -->
-<!ENTITY Ccedil "&#199;"> <!-- latin capital letter C with cedilla,
-                                  U+00C7 ISOlat1 -->
-<!ENTITY Egrave "&#200;"> <!-- latin capital letter E with grave,
-                                  U+00C8 ISOlat1 -->
-<!ENTITY Eacute "&#201;"> <!-- latin capital letter E with acute,
-                                  U+00C9 ISOlat1 -->
-<!ENTITY Ecirc  "&#202;"> <!-- latin capital letter E with circumflex,
-                                  U+00CA ISOlat1 -->
-<!ENTITY Euml   "&#203;"> <!-- latin capital letter E with diaeresis,
-                                  U+00CB ISOlat1 -->
-<!ENTITY Igrave "&#204;"> <!-- latin capital letter I with grave,
-                                  U+00CC ISOlat1 -->
-<!ENTITY Iacute "&#205;"> <!-- latin capital letter I with acute,
-                                  U+00CD ISOlat1 -->
-<!ENTITY Icirc  "&#206;"> <!-- latin capital letter I with circumflex,
-                                  U+00CE ISOlat1 -->
-<!ENTITY Iuml   "&#207;"> <!-- latin capital letter I with diaeresis,
-                                  U+00CF ISOlat1 -->
-<!ENTITY ETH    "&#208;"> <!-- latin capital letter ETH, U+00D0 ISOlat1 -->
-<!ENTITY Ntilde "&#209;"> <!-- latin capital letter N with tilde,
-                                  U+00D1 ISOlat1 -->
-<!ENTITY Ograve "&#210;"> <!-- latin capital letter O with grave,
-                                  U+00D2 ISOlat1 -->
-<!ENTITY Oacute "&#211;"> <!-- latin capital letter O with acute,
-                                  U+00D3 ISOlat1 -->
-<!ENTITY Ocirc  "&#212;"> <!-- latin capital letter O with circumflex,
-                                  U+00D4 ISOlat1 -->
-<!ENTITY Otilde "&#213;"> <!-- latin capital letter O with tilde,
-                                  U+00D5 ISOlat1 -->
-<!ENTITY Ouml   "&#214;"> <!-- latin capital letter O with diaeresis,
-                                  U+00D6 ISOlat1 -->
-<!ENTITY times  "&#215;"> <!-- multiplication sign, U+00D7 ISOnum -->
-<!ENTITY Oslash "&#216;"> <!-- latin capital letter O with stroke
-                                  = latin capital letter O slash,
-                                  U+00D8 ISOlat1 -->
-<!ENTITY Ugrave "&#217;"> <!-- latin capital letter U with grave,
-                                  U+00D9 ISOlat1 -->
-<!ENTITY Uacute "&#218;"> <!-- latin capital letter U with acute,
-                                  U+00DA ISOlat1 -->
-<!ENTITY Ucirc  "&#219;"> <!-- latin capital letter U with circumflex,
-                                  U+00DB ISOlat1 -->
-<!ENTITY Uuml   "&#220;"> <!-- latin capital letter U with diaeresis,
-                                  U+00DC ISOlat1 -->
-<!ENTITY Yacute "&#221;"> <!-- latin capital letter Y with acute,
-                                  U+00DD ISOlat1 -->
-<!ENTITY THORN  "&#222;"> <!-- latin capital letter THORN,
-                                  U+00DE ISOlat1 -->
-<!ENTITY szlig  "&#223;"> <!-- latin small letter sharp s = ess-zed,
-                                  U+00DF ISOlat1 -->
-<!ENTITY agrave "&#224;"> <!-- latin small letter a with grave
-                                  = latin small letter a grave,
-                                  U+00E0 ISOlat1 -->
-<!ENTITY aacute "&#225;"> <!-- latin small letter a with acute,
-                                  U+00E1 ISOlat1 -->
-<!ENTITY acirc  "&#226;"> <!-- latin small letter a with circumflex,
-                                  U+00E2 ISOlat1 -->
-<!ENTITY atilde "&#227;"> <!-- latin small letter a with tilde,
-                                  U+00E3 ISOlat1 -->
-<!ENTITY auml   "&#228;"> <!-- latin small letter a with diaeresis,
-                                  U+00E4 ISOlat1 -->
-<!ENTITY aring  "&#229;"> <!-- latin small letter a with ring above
-                                  = latin small letter a ring,
-                                  U+00E5 ISOlat1 -->
-<!ENTITY aelig  "&#230;"> <!-- latin small letter ae
-                                  = latin small ligature ae, U+00E6 ISOlat1 -->
-<!ENTITY ccedil "&#231;"> <!-- latin small letter c with cedilla,
-                                  U+00E7 ISOlat1 -->
-<!ENTITY egrave "&#232;"> <!-- latin small letter e with grave,
-                                  U+00E8 ISOlat1 -->
-<!ENTITY eacute "&#233;"> <!-- latin small letter e with acute,
-                                  U+00E9 ISOlat1 -->
-<!ENTITY ecirc  "&#234;"> <!-- latin small letter e with circumflex,
-                                  U+00EA ISOlat1 -->
-<!ENTITY euml   "&#235;"> <!-- latin small letter e with diaeresis,
-                                  U+00EB ISOlat1 -->
-<!ENTITY igrave "&#236;"> <!-- latin small letter i with grave,
-                                  U+00EC ISOlat1 -->
-<!ENTITY iacute "&#237;"> <!-- latin small letter i with acute,
-                                  U+00ED ISOlat1 -->
-<!ENTITY icirc  "&#238;"> <!-- latin small letter i with circumflex,
-                                  U+00EE ISOlat1 -->
-<!ENTITY iuml   "&#239;"> <!-- latin small letter i with diaeresis,
-                                  U+00EF ISOlat1 -->
-<!ENTITY eth    "&#240;"> <!-- latin small letter eth, U+00F0 ISOlat1 -->
-<!ENTITY ntilde "&#241;"> <!-- latin small letter n with tilde,
-                                  U+00F1 ISOlat1 -->
-<!ENTITY ograve "&#242;"> <!-- latin small letter o with grave,
-                                  U+00F2 ISOlat1 -->
-<!ENTITY oacute "&#243;"> <!-- latin small letter o with acute,
-                                  U+00F3 ISOlat1 -->
-<!ENTITY ocirc  "&#244;"> <!-- latin small letter o with circumflex,
-                                  U+00F4 ISOlat1 -->
-<!ENTITY otilde "&#245;"> <!-- latin small letter o with tilde,
-                                  U+00F5 ISOlat1 -->
-<!ENTITY ouml   "&#246;"> <!-- latin small letter o with diaeresis,
-                                  U+00F6 ISOlat1 -->
-<!ENTITY divide "&#247;"> <!-- division sign, U+00F7 ISOnum -->
-<!ENTITY oslash "&#248;"> <!-- latin small letter o with stroke,
-                                  = latin small letter o slash,
-                                  U+00F8 ISOlat1 -->
-<!ENTITY ugrave "&#249;"> <!-- latin small letter u with grave,
-                                  U+00F9 ISOlat1 -->
-<!ENTITY uacute "&#250;"> <!-- latin small letter u with acute,
-                                  U+00FA ISOlat1 -->
-<!ENTITY ucirc  "&#251;"> <!-- latin small letter u with circumflex,
-                                  U+00FB ISOlat1 -->
-<!ENTITY uuml   "&#252;"> <!-- latin small letter u with diaeresis,
-                                  U+00FC ISOlat1 -->
-<!ENTITY yacute "&#253;"> <!-- latin small letter y with acute,
-                                  U+00FD ISOlat1 -->
-<!ENTITY thorn  "&#254;"> <!-- latin small letter thorn,
-                                  U+00FE ISOlat1 -->
-<!ENTITY yuml   "&#255;"> <!-- latin small letter y with diaeresis,
-                                  U+00FF ISOlat1 -->
diff --git a/lib/htmlpurifier/docs/entities/xhtml-special.ent b/lib/htmlpurifier/docs/entities/xhtml-special.ent
deleted file mode 100644
index ca358b2fe..000000000
--- a/lib/htmlpurifier/docs/entities/xhtml-special.ent
+++ /dev/null
@@ -1,80 +0,0 @@
-<!-- Special characters for XHTML -->
-
-<!-- Character entity set. Typical invocation:
-     <!ENTITY % HTMLspecial PUBLIC
-        "-//W3C//ENTITIES Special for XHTML//EN"
-        "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
-     %HTMLspecial;
--->
-
-<!-- Portions (C) International Organization for Standardization 1986:
-     Permission to copy in any form is granted for use with
-     conforming SGML systems and applications as defined in
-     ISO 8879, provided this notice is included in all copies.
--->
-
-<!-- Relevant ISO entity set is given unless names are newly introduced.
-     New names (i.e., not in ISO 8879 list) do not clash with any
-     existing ISO 8879 entity names. ISO 10646 character numbers
-     are given for each character, in hex. values are decimal
-     conversions of the ISO 10646 values and refer to the document
-     character set. Names are Unicode names. 
--->
-
-<!-- C0 Controls and Basic Latin -->
-<!ENTITY quot    "&#34;"> <!--  quotation mark, U+0022 ISOnum -->
-<!ENTITY amp     "&#38;#38;"> <!--  ampersand, U+0026 ISOnum -->
-<!ENTITY lt      "&#38;#60;"> <!--  less-than sign, U+003C ISOnum -->
-<!ENTITY gt      "&#62;"> <!--  greater-than sign, U+003E ISOnum -->
-<!ENTITY apos	 "&#39;"> <!--  apostrophe = APL quote, U+0027 ISOnum -->
-
-<!-- Latin Extended-A -->
-<!ENTITY OElig   "&#338;"> <!--  latin capital ligature OE,
-                                    U+0152 ISOlat2 -->
-<!ENTITY oelig   "&#339;"> <!--  latin small ligature oe, U+0153 ISOlat2 -->
-<!-- ligature is a misnomer, this is a separate character in some languages -->
-<!ENTITY Scaron  "&#352;"> <!--  latin capital letter S with caron,
-                                    U+0160 ISOlat2 -->
-<!ENTITY scaron  "&#353;"> <!--  latin small letter s with caron,
-                                    U+0161 ISOlat2 -->
-<!ENTITY Yuml    "&#376;"> <!--  latin capital letter Y with diaeresis,
-                                    U+0178 ISOlat2 -->
-
-<!-- Spacing Modifier Letters -->
-<!ENTITY circ    "&#710;"> <!--  modifier letter circumflex accent,
-                                    U+02C6 ISOpub -->
-<!ENTITY tilde   "&#732;"> <!--  small tilde, U+02DC ISOdia -->
-
-<!-- General Punctuation -->
-<!ENTITY ensp    "&#8194;"> <!-- en space, U+2002 ISOpub -->
-<!ENTITY emsp    "&#8195;"> <!-- em space, U+2003 ISOpub -->
-<!ENTITY thinsp  "&#8201;"> <!-- thin space, U+2009 ISOpub -->
-<!ENTITY zwnj    "&#8204;"> <!-- zero width non-joiner,
-                                    U+200C NEW RFC 2070 -->
-<!ENTITY zwj     "&#8205;"> <!-- zero width joiner, U+200D NEW RFC 2070 -->
-<!ENTITY lrm     "&#8206;"> <!-- left-to-right mark, U+200E NEW RFC 2070 -->
-<!ENTITY rlm     "&#8207;"> <!-- right-to-left mark, U+200F NEW RFC 2070 -->
-<!ENTITY ndash   "&#8211;"> <!-- en dash, U+2013 ISOpub -->
-<!ENTITY mdash   "&#8212;"> <!-- em dash, U+2014 ISOpub -->
-<!ENTITY lsquo   "&#8216;"> <!-- left single quotation mark,
-                                    U+2018 ISOnum -->
-<!ENTITY rsquo   "&#8217;"> <!-- right single quotation mark,
-                                    U+2019 ISOnum -->
-<!ENTITY sbquo   "&#8218;"> <!-- single low-9 quotation mark, U+201A NEW -->
-<!ENTITY ldquo   "&#8220;"> <!-- left double quotation mark,
-                                    U+201C ISOnum -->
-<!ENTITY rdquo   "&#8221;"> <!-- right double quotation mark,
-                                    U+201D ISOnum -->
-<!ENTITY bdquo   "&#8222;"> <!-- double low-9 quotation mark, U+201E NEW -->
-<!ENTITY dagger  "&#8224;"> <!-- dagger, U+2020 ISOpub -->
-<!ENTITY Dagger  "&#8225;"> <!-- double dagger, U+2021 ISOpub -->
-<!ENTITY permil  "&#8240;"> <!-- per mille sign, U+2030 ISOtech -->
-<!ENTITY lsaquo  "&#8249;"> <!-- single left-pointing angle quotation mark,
-                                    U+2039 ISO proposed -->
-<!-- lsaquo is proposed but not yet ISO standardized -->
-<!ENTITY rsaquo  "&#8250;"> <!-- single right-pointing angle quotation mark,
-                                    U+203A ISO proposed -->
-<!-- rsaquo is proposed but not yet ISO standardized -->
-
-<!-- Currency Symbols -->
-<!ENTITY euro   "&#8364;"> <!--  euro sign, U+20AC NEW -->
diff --git a/lib/htmlpurifier/docs/entities/xhtml-symbol.ent b/lib/htmlpurifier/docs/entities/xhtml-symbol.ent
deleted file mode 100644
index 63c2abfa6..000000000
--- a/lib/htmlpurifier/docs/entities/xhtml-symbol.ent
+++ /dev/null
@@ -1,237 +0,0 @@
-<!-- Mathematical, Greek and Symbolic characters for XHTML -->
-
-<!-- Character entity set. Typical invocation:
-     <!ENTITY % HTMLsymbol PUBLIC
-        "-//W3C//ENTITIES Symbols for XHTML//EN"
-        "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
-     %HTMLsymbol;
--->
-
-<!-- Portions (C) International Organization for Standardization 1986:
-     Permission to copy in any form is granted for use with
-     conforming SGML systems and applications as defined in
-     ISO 8879, provided this notice is included in all copies.
--->
-
-<!-- Relevant ISO entity set is given unless names are newly introduced.
-     New names (i.e., not in ISO 8879 list) do not clash with any
-     existing ISO 8879 entity names. ISO 10646 character numbers
-     are given for each character, in hex. values are decimal
-     conversions of the ISO 10646 values and refer to the document
-     character set. Names are Unicode names. 
--->
-
-<!-- Latin Extended-B -->
-<!ENTITY fnof     "&#402;"> <!-- latin small letter f with hook = function
-                                    = florin, U+0192 ISOtech -->
-
-<!-- Greek -->
-<!ENTITY Alpha    "&#913;"> <!-- greek capital letter alpha, U+0391 -->
-<!ENTITY Beta     "&#914;"> <!-- greek capital letter beta, U+0392 -->
-<!ENTITY Gamma    "&#915;"> <!-- greek capital letter gamma,
-                                    U+0393 ISOgrk3 -->
-<!ENTITY Delta    "&#916;"> <!-- greek capital letter delta,
-                                    U+0394 ISOgrk3 -->
-<!ENTITY Epsilon  "&#917;"> <!-- greek capital letter epsilon, U+0395 -->
-<!ENTITY Zeta     "&#918;"> <!-- greek capital letter zeta, U+0396 -->
-<!ENTITY Eta      "&#919;"> <!-- greek capital letter eta, U+0397 -->
-<!ENTITY Theta    "&#920;"> <!-- greek capital letter theta,
-                                    U+0398 ISOgrk3 -->
-<!ENTITY Iota     "&#921;"> <!-- greek capital letter iota, U+0399 -->
-<!ENTITY Kappa    "&#922;"> <!-- greek capital letter kappa, U+039A -->
-<!ENTITY Lambda   "&#923;"> <!-- greek capital letter lamda,
-                                    U+039B ISOgrk3 -->
-<!ENTITY Mu       "&#924;"> <!-- greek capital letter mu, U+039C -->
-<!ENTITY Nu       "&#925;"> <!-- greek capital letter nu, U+039D -->
-<!ENTITY Xi       "&#926;"> <!-- greek capital letter xi, U+039E ISOgrk3 -->
-<!ENTITY Omicron  "&#927;"> <!-- greek capital letter omicron, U+039F -->
-<!ENTITY Pi       "&#928;"> <!-- greek capital letter pi, U+03A0 ISOgrk3 -->
-<!ENTITY Rho      "&#929;"> <!-- greek capital letter rho, U+03A1 -->
-<!-- there is no Sigmaf, and no U+03A2 character either -->
-<!ENTITY Sigma    "&#931;"> <!-- greek capital letter sigma,
-                                    U+03A3 ISOgrk3 -->
-<!ENTITY Tau      "&#932;"> <!-- greek capital letter tau, U+03A4 -->
-<!ENTITY Upsilon  "&#933;"> <!-- greek capital letter upsilon,
-                                    U+03A5 ISOgrk3 -->
-<!ENTITY Phi      "&#934;"> <!-- greek capital letter phi,
-                                    U+03A6 ISOgrk3 -->
-<!ENTITY Chi      "&#935;"> <!-- greek capital letter chi, U+03A7 -->
-<!ENTITY Psi      "&#936;"> <!-- greek capital letter psi,
-                                    U+03A8 ISOgrk3 -->
-<!ENTITY Omega    "&#937;"> <!-- greek capital letter omega,
-                                    U+03A9 ISOgrk3 -->
-
-<!ENTITY alpha    "&#945;"> <!-- greek small letter alpha,
-                                    U+03B1 ISOgrk3 -->
-<!ENTITY beta     "&#946;"> <!-- greek small letter beta, U+03B2 ISOgrk3 -->
-<!ENTITY gamma    "&#947;"> <!-- greek small letter gamma,
-                                    U+03B3 ISOgrk3 -->
-<!ENTITY delta    "&#948;"> <!-- greek small letter delta,
-                                    U+03B4 ISOgrk3 -->
-<!ENTITY epsilon  "&#949;"> <!-- greek small letter epsilon,
-                                    U+03B5 ISOgrk3 -->
-<!ENTITY zeta     "&#950;"> <!-- greek small letter zeta, U+03B6 ISOgrk3 -->
-<!ENTITY eta      "&#951;"> <!-- greek small letter eta, U+03B7 ISOgrk3 -->
-<!ENTITY theta    "&#952;"> <!-- greek small letter theta,
-                                    U+03B8 ISOgrk3 -->
-<!ENTITY iota     "&#953;"> <!-- greek small letter iota, U+03B9 ISOgrk3 -->
-<!ENTITY kappa    "&#954;"> <!-- greek small letter kappa,
-                                    U+03BA ISOgrk3 -->
-<!ENTITY lambda   "&#955;"> <!-- greek small letter lamda,
-                                    U+03BB ISOgrk3 -->
-<!ENTITY mu       "&#956;"> <!-- greek small letter mu, U+03BC ISOgrk3 -->
-<!ENTITY nu       "&#957;"> <!-- greek small letter nu, U+03BD ISOgrk3 -->
-<!ENTITY xi       "&#958;"> <!-- greek small letter xi, U+03BE ISOgrk3 -->
-<!ENTITY omicron  "&#959;"> <!-- greek small letter omicron, U+03BF NEW -->
-<!ENTITY pi       "&#960;"> <!-- greek small letter pi, U+03C0 ISOgrk3 -->
-<!ENTITY rho      "&#961;"> <!-- greek small letter rho, U+03C1 ISOgrk3 -->
-<!ENTITY sigmaf   "&#962;"> <!-- greek small letter final sigma,
-                                    U+03C2 ISOgrk3 -->
-<!ENTITY sigma    "&#963;"> <!-- greek small letter sigma,
-                                    U+03C3 ISOgrk3 -->
-<!ENTITY tau      "&#964;"> <!-- greek small letter tau, U+03C4 ISOgrk3 -->
-<!ENTITY upsilon  "&#965;"> <!-- greek small letter upsilon,
-                                    U+03C5 ISOgrk3 -->
-<!ENTITY phi      "&#966;"> <!-- greek small letter phi, U+03C6 ISOgrk3 -->
-<!ENTITY chi      "&#967;"> <!-- greek small letter chi, U+03C7 ISOgrk3 -->
-<!ENTITY psi      "&#968;"> <!-- greek small letter psi, U+03C8 ISOgrk3 -->
-<!ENTITY omega    "&#969;"> <!-- greek small letter omega,
-                                    U+03C9 ISOgrk3 -->
-<!ENTITY thetasym "&#977;"> <!-- greek theta symbol,
-                                    U+03D1 NEW -->
-<!ENTITY upsih    "&#978;"> <!-- greek upsilon with hook symbol,
-                                    U+03D2 NEW -->
-<!ENTITY piv      "&#982;"> <!-- greek pi symbol, U+03D6 ISOgrk3 -->
-
-<!-- General Punctuation -->
-<!ENTITY bull     "&#8226;"> <!-- bullet = black small circle,
-                                     U+2022 ISOpub  -->
-<!-- bullet is NOT the same as bullet operator, U+2219 -->
-<!ENTITY hellip   "&#8230;"> <!-- horizontal ellipsis = three dot leader,
-                                     U+2026 ISOpub  -->
-<!ENTITY prime    "&#8242;"> <!-- prime = minutes = feet, U+2032 ISOtech -->
-<!ENTITY Prime    "&#8243;"> <!-- double prime = seconds = inches,
-                                     U+2033 ISOtech -->
-<!ENTITY oline    "&#8254;"> <!-- overline = spacing overscore,
-                                     U+203E NEW -->
-<!ENTITY frasl    "&#8260;"> <!-- fraction slash, U+2044 NEW -->
-
-<!-- Letterlike Symbols -->
-<!ENTITY weierp   "&#8472;"> <!-- script capital P = power set
-                                     = Weierstrass p, U+2118 ISOamso -->
-<!ENTITY image    "&#8465;"> <!-- black-letter capital I = imaginary part,
-                                     U+2111 ISOamso -->
-<!ENTITY real     "&#8476;"> <!-- black-letter capital R = real part symbol,
-                                     U+211C ISOamso -->
-<!ENTITY trade    "&#8482;"> <!-- trade mark sign, U+2122 ISOnum -->
-<!ENTITY alefsym  "&#8501;"> <!-- alef symbol = first transfinite cardinal,
-                                     U+2135 NEW -->
-<!-- alef symbol is NOT the same as hebrew letter alef,
-     U+05D0 although the same glyph could be used to depict both characters -->
-
-<!-- Arrows -->
-<!ENTITY larr     "&#8592;"> <!-- leftwards arrow, U+2190 ISOnum -->
-<!ENTITY uarr     "&#8593;"> <!-- upwards arrow, U+2191 ISOnum-->
-<!ENTITY rarr     "&#8594;"> <!-- rightwards arrow, U+2192 ISOnum -->
-<!ENTITY darr     "&#8595;"> <!-- downwards arrow, U+2193 ISOnum -->
-<!ENTITY harr     "&#8596;"> <!-- left right arrow, U+2194 ISOamsa -->
-<!ENTITY crarr    "&#8629;"> <!-- downwards arrow with corner leftwards
-                                     = carriage return, U+21B5 NEW -->
-<!ENTITY lArr     "&#8656;"> <!-- leftwards double arrow, U+21D0 ISOtech -->
-<!-- Unicode does not say that lArr is the same as the 'is implied by' arrow
-    but also does not have any other character for that function. So lArr can
-    be used for 'is implied by' as ISOtech suggests -->
-<!ENTITY uArr     "&#8657;"> <!-- upwards double arrow, U+21D1 ISOamsa -->
-<!ENTITY rArr     "&#8658;"> <!-- rightwards double arrow,
-                                     U+21D2 ISOtech -->
-<!-- Unicode does not say this is the 'implies' character but does not have 
-     another character with this function so rArr can be used for 'implies'
-     as ISOtech suggests -->
-<!ENTITY dArr     "&#8659;"> <!-- downwards double arrow, U+21D3 ISOamsa -->
-<!ENTITY hArr     "&#8660;"> <!-- left right double arrow,
-                                     U+21D4 ISOamsa -->
-
-<!-- Mathematical Operators -->
-<!ENTITY forall   "&#8704;"> <!-- for all, U+2200 ISOtech -->
-<!ENTITY part     "&#8706;"> <!-- partial differential, U+2202 ISOtech  -->
-<!ENTITY exist    "&#8707;"> <!-- there exists, U+2203 ISOtech -->
-<!ENTITY empty    "&#8709;"> <!-- empty set = null set, U+2205 ISOamso -->
-<!ENTITY nabla    "&#8711;"> <!-- nabla = backward difference,
-                                     U+2207 ISOtech -->
-<!ENTITY isin     "&#8712;"> <!-- element of, U+2208 ISOtech -->
-<!ENTITY notin    "&#8713;"> <!-- not an element of, U+2209 ISOtech -->
-<!ENTITY ni       "&#8715;"> <!-- contains as member, U+220B ISOtech -->
-<!ENTITY prod     "&#8719;"> <!-- n-ary product = product sign,
-                                     U+220F ISOamsb -->
-<!-- prod is NOT the same character as U+03A0 'greek capital letter pi' though
-     the same glyph might be used for both -->
-<!ENTITY sum      "&#8721;"> <!-- n-ary summation, U+2211 ISOamsb -->
-<!-- sum is NOT the same character as U+03A3 'greek capital letter sigma'
-     though the same glyph might be used for both -->
-<!ENTITY minus    "&#8722;"> <!-- minus sign, U+2212 ISOtech -->
-<!ENTITY lowast   "&#8727;"> <!-- asterisk operator, U+2217 ISOtech -->
-<!ENTITY radic    "&#8730;"> <!-- square root = radical sign,
-                                     U+221A ISOtech -->
-<!ENTITY prop     "&#8733;"> <!-- proportional to, U+221D ISOtech -->
-<!ENTITY infin    "&#8734;"> <!-- infinity, U+221E ISOtech -->
-<!ENTITY ang      "&#8736;"> <!-- angle, U+2220 ISOamso -->
-<!ENTITY and      "&#8743;"> <!-- logical and = wedge, U+2227 ISOtech -->
-<!ENTITY or       "&#8744;"> <!-- logical or = vee, U+2228 ISOtech -->
-<!ENTITY cap      "&#8745;"> <!-- intersection = cap, U+2229 ISOtech -->
-<!ENTITY cup      "&#8746;"> <!-- union = cup, U+222A ISOtech -->
-<!ENTITY int      "&#8747;"> <!-- integral, U+222B ISOtech -->
-<!ENTITY there4   "&#8756;"> <!-- therefore, U+2234 ISOtech -->
-<!ENTITY sim      "&#8764;"> <!-- tilde operator = varies with = similar to,
-                                     U+223C ISOtech -->
-<!-- tilde operator is NOT the same character as the tilde, U+007E,
-     although the same glyph might be used to represent both  -->
-<!ENTITY cong     "&#8773;"> <!-- approximately equal to, U+2245 ISOtech -->
-<!ENTITY asymp    "&#8776;"> <!-- almost equal to = asymptotic to,
-                                     U+2248 ISOamsr -->
-<!ENTITY ne       "&#8800;"> <!-- not equal to, U+2260 ISOtech -->
-<!ENTITY equiv    "&#8801;"> <!-- identical to, U+2261 ISOtech -->
-<!ENTITY le       "&#8804;"> <!-- less-than or equal to, U+2264 ISOtech -->
-<!ENTITY ge       "&#8805;"> <!-- greater-than or equal to,
-                                     U+2265 ISOtech -->
-<!ENTITY sub      "&#8834;"> <!-- subset of, U+2282 ISOtech -->
-<!ENTITY sup      "&#8835;"> <!-- superset of, U+2283 ISOtech -->
-<!ENTITY nsub     "&#8836;"> <!-- not a subset of, U+2284 ISOamsn -->
-<!ENTITY sube     "&#8838;"> <!-- subset of or equal to, U+2286 ISOtech -->
-<!ENTITY supe     "&#8839;"> <!-- superset of or equal to,
-                                     U+2287 ISOtech -->
-<!ENTITY oplus    "&#8853;"> <!-- circled plus = direct sum,
-                                     U+2295 ISOamsb -->
-<!ENTITY otimes   "&#8855;"> <!-- circled times = vector product,
-                                     U+2297 ISOamsb -->
-<!ENTITY perp     "&#8869;"> <!-- up tack = orthogonal to = perpendicular,
-                                     U+22A5 ISOtech -->
-<!ENTITY sdot     "&#8901;"> <!-- dot operator, U+22C5 ISOamsb -->
-<!-- dot operator is NOT the same character as U+00B7 middle dot -->
-
-<!-- Miscellaneous Technical -->
-<!ENTITY lceil    "&#8968;"> <!-- left ceiling = APL upstile,
-                                     U+2308 ISOamsc  -->
-<!ENTITY rceil    "&#8969;"> <!-- right ceiling, U+2309 ISOamsc  -->
-<!ENTITY lfloor   "&#8970;"> <!-- left floor = APL downstile,
-                                     U+230A ISOamsc  -->
-<!ENTITY rfloor   "&#8971;"> <!-- right floor, U+230B ISOamsc  -->
-<!ENTITY lang     "&#9001;"> <!-- left-pointing angle bracket = bra,
-                                     U+2329 ISOtech -->
-<!-- lang is NOT the same character as U+003C 'less than sign' 
-     or U+2039 'single left-pointing angle quotation mark' -->
-<!ENTITY rang     "&#9002;"> <!-- right-pointing angle bracket = ket,
-                                     U+232A ISOtech -->
-<!-- rang is NOT the same character as U+003E 'greater than sign' 
-     or U+203A 'single right-pointing angle quotation mark' -->
-
-<!-- Geometric Shapes -->
-<!ENTITY loz      "&#9674;"> <!-- lozenge, U+25CA ISOpub -->
-
-<!-- Miscellaneous Symbols -->
-<!ENTITY spades   "&#9824;"> <!-- black spade suit, U+2660 ISOpub -->
-<!-- black here seems to mean filled as opposed to hollow -->
-<!ENTITY clubs    "&#9827;"> <!-- black club suit = shamrock,
-                                     U+2663 ISOpub -->
-<!ENTITY hearts   "&#9829;"> <!-- black heart suit = valentine,
-                                     U+2665 ISOpub -->
-<!ENTITY diams    "&#9830;"> <!-- black diamond suit, U+2666 ISOpub -->
diff --git a/lib/htmlpurifier/docs/examples/basic.php b/lib/htmlpurifier/docs/examples/basic.php
deleted file mode 100644
index b51096d2d..000000000
--- a/lib/htmlpurifier/docs/examples/basic.php
+++ /dev/null
@@ -1,23 +0,0 @@
-<?php
-
-// This file demonstrates basic usage of HTMLPurifier.
-
-// replace this with the path to the HTML Purifier library
-require_once '../../library/HTMLPurifier.auto.php';
-
-$config = HTMLPurifier_Config::createDefault();
-
-// configuration goes here:
-$config->set('Core.Encoding', 'UTF-8'); // replace with your encoding
-$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); // replace with your doctype
-
-$purifier = new HTMLPurifier($config);
-
-// untrusted input HTML
-$html = '<b>Simple and short';
-
-$pure_html = $purifier->purify($html);
-
-echo '<pre>' . htmlspecialchars($pure_html) . '</pre>';
-
-// vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/fixquotes.htc b/lib/htmlpurifier/docs/fixquotes.htc
deleted file mode 100644
index 80dda2dc2..000000000
--- a/lib/htmlpurifier/docs/fixquotes.htc
+++ /dev/null
@@ -1,9 +0,0 @@
-<public:attach event="oncontentready" onevent="init();" />
-<script>
-function init() {
-  element.innerHTML = '&#8220;'+element.innerHTML+'&#8221;';
-}
-</script>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/index.html b/lib/htmlpurifier/docs/index.html
deleted file mode 100644
index 3c4ecc716..000000000
--- a/lib/htmlpurifier/docs/index.html
+++ /dev/null
@@ -1,188 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Index to all HTML Purifier documentation." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>Documentation - HTML Purifier</title>
-
-</head>
-<body>
-
-<h1>Documentation</h1>
-
-<p><strong><a href="http://htmlpurifier.org/">HTML Purifier</a></strong> has documentation for all types of people.
-Here is an index of all of them.</p>
-
-<h2>End-user</h2>
-<p>End-user documentation that contains articles, tutorials and useful
-information for casual developers using HTML Purifier.</p>
-
-<dl>
-
-<dt><a href="enduser-id.html">IDs</a></dt>
-<dd>Explains various methods for allowing IDs in documents safely.</dd>
-
-<dt><a href="enduser-youtube.html">Embedding YouTube videos</a></dt>
-<dd>Explains how to safely allow the embedding of flash from trusted sites.</dd>
-
-<dt><a href="enduser-slow.html">Speeding up HTML Purifier</a></dt>
-<dd>Explains how to speed up HTML Purifier through caching or inbound filtering.</dd>
-
-<dt><a href="enduser-utf8.html">UTF-8: The Secret of Character Encoding</a></dt>
-<dd>Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch.</dd>
-
-<dt><a href="enduser-tidy.html">Tidy</a></dt>
-<dd>Tutorial for tweaking HTML Purifier's Tidy-like behavior.</dd>
-
-<dt><a href="enduser-customize.html">Customize</a></dt>
-<dd>Tutorial for customizing HTML Purifier's tag and attribute sets.</dd>
-
-<dt><a href="enduser-uri-filter.html">URI Filters</a></dt>
-<dd>Tutorial for creating custom URI filters.</dd>
-
-</dl>
-
-<h2>Development</h2>
-<p>Developer documentation detailing code issues, roadmaps and project
-conventions.</p>
-
-<dl>
-
-<dt><a href="dev-progress.html">Implementation Progress</a></dt>
-<dd>Tables detailing HTML element and CSS property implementation coverage.</dd>
-
-<dt><a href="dev-naming.html">Naming Conventions</a></dt>
-<dd>Defines class naming conventions.</dd>
-
-<dt><a href="dev-optimization.html">Optimization</a></dt>
-<dd>Discusses possible methods of optimizing HTML Purifier.</dd>
-
-<dt><a href="dev-flush.html">Flushing the Purifier</a></dt>
-<dd>Discusses when to flush HTML Purifier's various caches.</dd>
-
-<dt><a href="dev-advanced-api.html">Advanced API</a></dt>
-<dd>Specification for HTML Purifier's advanced API for defining
-custom filtering behavior.</dd>
-
-<dt><a href="dev-config-schema.html">Config Schema</a></dt>
-<dd>Describes config schema framework in HTML Purifier.</dd>
-
-</dl>
-
-<h2>Proposals</h2>
-<p>Proposed features, as well as the associated rambling to get a clear
-objective in place before attempted implementation.</p>
-
-<dl>
-<dt><a href="proposal-colors.html">Colors</a></dt>
-<dd>Proposal to allow for color constraints.</dd>
-</dl>
-
-<h2>Reference</h2>
-<p>Miscellaneous essays, research pieces and other reference type material
-that may not directly discuss HTML Purifier.</p>
-
-<dl>
-<dt><a href="ref-devnetwork.html">DevNetwork Credits</a></dt>
-<dd>Credits and links to DevNetwork forum topics.</dd>
-</dl>
-
-<h2>Internal memos</h2>
-
-<p>Plaintext documents that are more for use by active developers of
-the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
-
-<table class="table">
-
-<thead><tr>
-    <th style="width:10%">Type</th>
-    <th style="width:20%">Name</th>
-    <th>Description</th>
-</tr></thead>
-
-<tbody>
-
-<tr>
-    <td>End-user</td>
-    <td><a href="enduser-overview.txt">Overview</a></td>
-    <td>High level overview of the general control flow (mostly obsolete).</td>
-</tr>
-
-<tr>
-    <td>End-user</td>
-    <td><a href="enduser-security.txt">Security</a></td>
-    <td>Common security issues that may still arise (half-baked).</td>
-</tr>
-
-<tr>
-    <td>Development</td>
-    <td><a href="dev-config-bcbreaks.txt">Config BC Breaks</a></td>
-    <td>Backwards-incompatible changes in HTML Purifier 4.0.0</td>
-</tr>
-
-<tr>
-    <td>Development</td>
-    <td><a href="dev-code-quality.txt">Code Quality Issues</a></td>
-    <td>Enumerates code quality issues and places that need to be refactored.</td>
-</tr>
-
-<tr>
-    <td>Proposal</td>
-    <td><a href="proposal-filter-levels.txt">Filter levels</a></td>
-    <td>Outlines details of projected configurable level of filtering.</td>
-</tr>
-
-<tr>
-    <td>Proposal</td>
-    <td><a href="proposal-language.txt">Language</a></td>
-    <td>Specification of I18N for error messages derived from MediaWiki (half-baked).</td>
-</tr>
-
-<tr>
-    <td>Proposal</td>
-    <td><a href="proposal-new-directives.txt">New directives</a></td>
-    <td>Assorted configuration options that could be implemented.</td>
-</tr>
-
-<tr>
-    <td>Proposal</td>
-    <td><a href="proposal-css-extraction.txt">CSS extraction</a></td>
-    <td>Taking the inline CSS out of documents and into <code>style</code>.</td>
-</tr>
-
-<tr>
-    <td>Reference</td>
-    <td><a href="ref-content-models.txt">Handling Content Model Changes</a></td>
-    <td>Discusses how to tidy up content model changes using custom ChildDef classes.</td>
-</tr>
-
-<tr>
-    <td>Reference</td>
-    <td><a href="ref-proprietary-tags.txt">Proprietary tags</a></td>
-    <td>List of vendor-specific tags we may want to transform to W3C compliant markup.</td>
-</tr>
-
-<tr>
-    <td>Reference</td>
-    <td><a href="ref-html-modularization.txt">Modularization of HTMLDefinition</a></td>
-    <td>Provides a high-level overview of the concepts behind HTMLModules.</td>
-</tr>
-
-<tr>
-    <td>Reference</td>
-    <td><a href="ref-whatwg.txt">WHATWG</a></td>
-    <td>How WHATWG plays into what we need to do.</td>
-</tr>
-
-</tbody>
-
-</table>
-
-</body>
-</html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/proposal-colors.html b/lib/htmlpurifier/docs/proposal-colors.html
deleted file mode 100644
index 657633882..000000000
--- a/lib/htmlpurifier/docs/proposal-colors.html
+++ /dev/null
@@ -1,49 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Proposal to allow for color constraints in HTML Purifier." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>Proposal: Colors - HTML Purifier</title>
-
-</head><body>
-
-<h1 class="subtitled">Colors</h1>
-<div class="subtitle">Hammering some sense into those color-blind newbies</div>
-
-<div id="filing">Filed under Proposals</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>Your website probably has a color-scheme.
-<span style="color:#090; background:#FFF;">Green on white</span>,
-<span style="color:#A0F; background:#FF0;">purple on yellow</span>,
-whatever. When you give users the ability to style their content, you may
-want them to keep in line with your styling. If you're website is all
-about light colors, you don't want a user to come in and vandalize your
-page with a deep maroon.</p>
-
-<p>This is an extremely silly feature proposal, but I'm writing it down anyway.</p>
-
-<p>What if the user could constrain the colors specified in inline styles? You
-are only allowed to use these shades of dark green for text and these shades
-of light yellow for the background. At the very least, you could ensure
-that we did not have pale yellow on white text.</p>
-
-<h2>Implementation issues</h2>
-
-<ol>
-<li>Requires the color attribute definition to know, currently, what the text
-and background colors are. This becomes difficult when classes are thrown
-into the mix.</li>
-<li>The user still has to define the permissible colors, how does one do
-something like that?</li>
-</ol>
-
-</body>
-</html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/proposal-config.txt b/lib/htmlpurifier/docs/proposal-config.txt
deleted file mode 100644
index 4e031c586..000000000
--- a/lib/htmlpurifier/docs/proposal-config.txt
+++ /dev/null
@@ -1,23 +0,0 @@
-
-Configuration
-
-Configuration is documented on a per-use case: if a class uses a certain
-value from the configuration object, it has to define its name and what the
-value is used for.  This means decentralized configuration declarations that
-are nevertheless error checking and a centralized configuration object.
-
-Directives are divided into namespaces, indicating the major portion of
-functionality they cover (although there may be overlaps).  Please consult
-the documentation in ConfigDef for more information on these namespaces.
-
-Since configuration is dependant on context, internal classes require a
-configuration object to be passed as a parameter.  (They also require a
-Context object). A majority of classes do not need the config object,
-but for those who do, it is a lifesaver.
-
-Definition objects are complex datatypes influenced by their respective
-directive namespaces (HTMLDefinition with HTML and CSSDefinition with CSS).
-If any of these directives is updated, HTML Purifier forces the definition
-to be regenerated.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/proposal-css-extraction.txt b/lib/htmlpurifier/docs/proposal-css-extraction.txt
deleted file mode 100644
index 9933c96b8..000000000
--- a/lib/htmlpurifier/docs/proposal-css-extraction.txt
+++ /dev/null
@@ -1,34 +0,0 @@
-
-Extracting inline CSS from HTML Purifier
-    voodoofied: Assigning semantics to elements
-
-Sander Tekelenburg brought to my attention the poor programming style of
-inline CSS in HTML documents.  In an ideal world, we wouldn't be using inline
-CSS at all: everything would be assigned using semantic class attributes
-from an external stylesheet.
-
-With ExtractStyleBlocks and CSSTidy, this is now possible (when allowed, users
-can specify a style element which gets extracted from the user-submitted HTML, which
-the application can place in the head of the HTML document).  But there still
-is the issue of inline CSS that refuses to go away.
-
-The basic idea behind this feature is assign every element a unique identifier,
-and then move all of the CSS data to a style-sheet. This HTML:
-
-<div style="text-align:center">Big <span style="color:red;">things</span>!</div>
-
-into
-
-<div id="hp-12345">Big <span id="hp-12346">things</span>!</div>
-
-and a stylesheet that is:
-
-#hp-12345 {text-align:center;}
-#hp-12346 {color:red;}
-
-Beyond that, HTML Purifier can magically merge common CSS values together,
-and a whole manner of other heuristic things.  HTML Purifier should also
-make it easy for an admin to re-style the HTML semantically. Speed is not
-an issue. Also, better WYSIWYG editors are needed.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/proposal-errors.txt b/lib/htmlpurifier/docs/proposal-errors.txt
deleted file mode 100644
index 87cb2ac19..000000000
--- a/lib/htmlpurifier/docs/proposal-errors.txt
+++ /dev/null
@@ -1,211 +0,0 @@
-Considerations for ErrorCollection
-
-Presently, HTML Purifier takes a code-execution centric approach to handling
-errors. Errors are organized and grouped according to which segment of the
-code triggers them, not necessarily the portion of the input document that
-triggered the error. This means that errors are pseudo-sorted by category,
-rather than location in the document.
-
-One easy way to "fix" this problem would be to re-sort according to line number.
-However, the "category" style information we derive from naively following
-program execution is still useful. After all, each of the strategies which
-can report errors still process the document mostly linearly. Furthermore,
-not only do they process linearly, but the way they pass off operations to
-sub-systems mirrors that of the document. For example, AttrValidator will
-linearly proceed through elements, and on each element will use AttrDef to
-validate those contents. From there, the attribute might have more
-sub-components, which have execution passed off accordingly.
-
-In fact, each strategy handles a very specific class of "error."
-
-RemoveForeignElements   - element tokens
-MakeWellFormed          - element token ordering
-FixNesting              - element token ordering
-ValidateAttributes      - attributes of elements
-
-The crucial point is that while we care about the hierarchy governing these
-different errors, we *don't* care about any other information about what actually
-happens to the elements. This brings up another point: if HTML Purifier fixes
-something, this is not really a notice/warning/error; it's really a suggestion
-of a way to fix the aforementioned defects.
-
-In short, the refactoring to take this into account kinda sucks.
-
-Errors should not be recorded in order that they are reported. Instead, they
-should be bound to the line (and preferably element) in which they were found.
-This means we need some way to uniquely identify every element in the document,
-which doesn't presently exist. An easy way of adding this would be to track
-line columns. An important ramification of this is that we *must* use the
-DirectLex implementation.
-
-    1. Implement column numbers for DirectLex [DONE!]
-    2. Disable error collection when not using DirectLex [DONE!]
-
-Next, we need to re-orient all of the error declarations to place CurrentToken
-at utmost important. Since this is passed via Context, it's not always clear
-if that's available. ErrorCollector should complain HARD if it isn't available.
-There are some locations when we don't have a token available. These include:
-
-    * Lexing - this can actually have a row and column, but NOT correspond to
-      a token
-    * End of document errors - bump this to the end
-
-Actually, we *don't* have to complain if CurrentToken isn't available; we just
-set it as a document-wide error. And actually, nothing needs to be done here.
-
-Something interesting to consider is whether or not we care about the locations
-of attributes and CSS properties, i.e. the sub-objects that compose these things.
-In terms of consistency, at the very least attributes should have column/line
-numbers attached to them. However, this may be overkill, as attributes are
-uniquely identifiable. You could go even further, with CSS, but they are also
-uniquely identifiable.
-
-Bottom-line is, however, this information must be available, in form of the
-CurrentAttribute and CurrentCssProperty (theoretical) context variables, and
-it must be used to organize the errors that the sub-processes may throw.
-There is also a hierarchy of sorts that may make merging this into one context
-variable more sense, if it hadn't been for HTML's reasonably rigid structure.
-A CSS property will never contain an HTML attribute. So we won't ever get
-recursive relations, and having multiple depths won't ever make sense. Leave
-this be.
-
-We already have this information, and consequently, using start and end is
-*unnecessary*, so long as the context variables are set appropriately. We don't
-care if an error was thrown by an attribute transform or an attribute definition;
-to the end user these are the same (for a developer, they are different, but
-they're better off with a stack trace (which we should add support for) in such
-cases).
-
-    3. Remove start()/end() code. Don't get rid of recursion, though [DONE]
-    4. Setup ErrorCollector to use context information to setup hierarchies.
-       This may require a different internal format. Use objects if it gets
-       complex. [DONE]
-
-       ASIDE
-            More on this topic: since we are now binding errors to lines
-            and columns, a particular error can have three relationships to that
-            specific location:
-
-            1. The token at that location directly
-                RemoveForeignElements
-                AttrValidator (transforms)
-                MakeWellFormed
-            2. A "component" of that token (i.e. attribute)
-                AttrValidator (removals)
-            3. A modification to that node (i.e. contents from start to end
-               token) as a whole
-                FixNesting
-
-            This needs to be marked accordingly. In the presentation, it might
-            make sense keep (3) separate, have (2) a sublist of (1). (1) can
-            be a closing tag, in which case (3) makes no sense at all, OR it
-            should be related with its opening tag (this may not necessarily
-            be possible before MakeWellFormed is run).
-
-            So, the line and column counts as our identifier, so:
-
-            $errors[$line][$col] = ...
-
-            Then, we need to identify case 1, 2 or 3. They are identified as
-            such:
-
-            1. Need some sort of semaphore in RemoveForeignElements, etc.
-            2. If CurrentAttr/CurrentCssProperty is non-null
-            3. Default (FixNesting, MakeWellFormed)
-
-            One consideration about (1) is that it usually is actually a
-            (3) modification, but we have no way of knowing about that because
-            of various optimizations. However, they can probably be treated
-            the same. The other difficulty is that (3) is never a line and
-            column; rather, it is a range (i.e. a duple) and telling the user
-            the very start of the range may confuse them. For example,
-
-            <b>Foo<div>bar</div></b>
-            ^     ^
-
-            The node being operated on is <b>, so the error would be assigned
-            to the first caret, with a "node reorganized" error. Then, the
-            ChildDef would have submitted its own suggestions and errors with
-            regard to what's going in the internals.  So I suppose this is
-            ok. :-)
-
-            Now, the structure of the earlier mentioned ... would be something
-            like this:
-
-            object {
-                type = (token|attr|property),
-                value, // appropriate for type
-                errors => array(),
-                sub-errors = [recursive],
-            }
-
-            This helps us keep things agnostic. It is also sufficiently complex
-            enough to warrant an object.
-
-So, more wanking about the object format is in order. The way HTML Purifier is
-currently setup, the only possible hierarchy is:
-
-    token -> attr -> css property
-
-These relations do not exist all of the time; a comment or end token would not
-ever have any attributes, and non-style attributes would never have CSS properties
-associated with them.
-
-I believe that it is worth supporting multiple paths. At some point, we might
-have a hierarchy like:
-
-    * -> syntax
-      -> token -> attr -> css property
-                       -> url
-               -> css stylesheet <style>
-
-et cetera. Now, one of the practical implications of this is that every "node"
-on our tree is well-defined, so in theory it should be possible to either 1.
-create a separate class for each error struct, or 2. embed this information
-directly into HTML Purifier's token stream.  Embedding the information in the
-token stream is not a terribly good idea, since tokens can be removed, etc.
-So that leaves us with 1... and if we use a generic interface we can cut down
-on a lot of code we might need. So let's leave it like this.
-
-~~~~
-
-Then we setup suggestions.
-
-    5. Setup a separate error class which tells the user any modifications
-       HTML Purifier made.
-
-Some information about this:
-
-Our current paradigm is to tell the user what HTML Purifier did to the HTML.
-This is the most natural mode of operation, since that's what HTML Purifier
-is all about; it was not meant to be a validator.
-
-However, most other people have experience dealing with a validator. In cases
-where HTML Purifier unambiguously does the right thing, simply giving the user
-the correct version isn't a bad idea, but problems arise when:
-
-- The user has such bad HTML we do something odd, when we should have just
-  flagged the HTML as an error. Such examples are when we do things like
-  remove text from directly inside a <table> tag. It was probably meant to
-  be in a <td> tag or be outside the table, but we're not smart enough to
-  realize this so we just remove it. In such a case, we should tell the user
-  that there was foreign data in the table, but then we shouldn't "demand"
-  the user remove the data; it's more of a "here's a possible way of
-  rectifying the problem"
-
-- Giving line context for input is hard enough, but feasible; giving output
-  line context will be extremely difficult due to shifting lines; we'd probably
-  have to track what the tokens are and then find the appropriate out context
-  and it's not guaranteed to work etc etc etc.
-
-````````````
-
-Don't forget to spruce up output.
-
-    6. Output needs to automatically give line and column numbers, basically
-       "at line" on steroids. Look at W3C's output; it's ok. [PARTIALLY DONE]
-
-       - We need a standard CSS to apply (check demo.css for some starting
-         styling; some buttons would also be hip)
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/proposal-filter-levels.txt b/lib/htmlpurifier/docs/proposal-filter-levels.txt
deleted file mode 100644
index b78b898b4..000000000
--- a/lib/htmlpurifier/docs/proposal-filter-levels.txt
+++ /dev/null
@@ -1,137 +0,0 @@
-
-Filter Levels
-    When one size *does not* fit all
-
-It makes little sense to constrain users to one set of HTML elements and
-attributes and tell them that they are not allowed to mold this in
-any fashion.  Many users demand to be able to custom-select which elements
-and attributes they want.  This is fine: because HTML Purifier keeps close
-track of what elements are safe to use, there is no way for them to
-accidently allow an XSS-able tag.
-
-However, combing through the HTML spec to make your own whitelist can
-be a daunting task.  HTML Purifier ought to offer pre-canned filter levels
-that amateur users can select based on what they think is their use-case.
-
-Here are some fuzzy levels you could set:
-
-1. Comments - Wordpress recommends a, abbr, acronym, b, blockquote, cite,
-    code, em, i, strike, strong; however, you could get away with only a, em and
-    p; also having blockquote and pre tags would be helpful.
-2. BBCode - Emulate the usual tagset for forums: b, i, img, a, blockquote,
-    pre, div, span and h[2-6] (the last three are for specially formatted
-    posts, div and span require associated classes or inline styling enabled
-    to be useful)
-3. Pages - As permissive as possible without allowing XSS.  No protection
-    against bad design sense, unfortunantely.  Suitable for wiki and page
-    environments. (probably what we have now)
-4. Lint - Accept everything in the spec, a Tidy wannabe. (This probably won't
-    get implemented as it would require routines for things like <object>
-    and friends to be implemented, which is a lot of work for not a lot of
-    benefit)
-
-One final note: when you start axing tags that are more commonly used, you
-run the risk of accidentally destroying user data, especially if the data
-is incoming from a WYSIWYG editor that hasn't been synced accordingly. This may
-make forbidden element to text transformations desirable (for example, images).
-
-
-
-== Element Risk Analysis ==
-
-Although none of the currently supported elements presents a security
-threat per-say, some can cause problems for page layouts or be
-extremely complicated.
-
-Legend:
-    [danger level] - regular tags / uncommon tags ~ deprecated tags
-    [danger level]* - rare tags
-
-1 - blockquote, code, em, i, p, tt / strong, sub, sup
-1* - abbr, acronym, bdo, cite, dfn, kbd, q, samp
-2 - b, br, del, div, pre, span / ins, s, strike ~ u
-3 - h2, h3, h4, h5, h6 ~ center
-4 - h1, big ~ font
-5 - a
-7 - area, map
-
-These are special use tags, they should be enabled on a blanket basis.
-
-Lists - dd, dl, dt, li, ol, ul ~ menu, dir
-Tables - caption, table, td, th, tr / col, colgroup, tbody, tfoot, thead
-
-Forms - fieldset, form, input, lable, legend, optgroup, option, select, textarea
-XSS - noscript, object, script ~ applet
-Meta - base, basefont, body, head, html, link, meta, style, title
-Frames - frame, frameset, iframe
-
-And tag specific notes:
-
-a   - general problems involving linkspam
-b   - too much bold is bad, typographically speaking bold is discouraged
-br  - often misused
-center - CSS, usually no legit use
-del - only useful in editing context
-div - little meaning in certain contexts i.e. blog comment
-h1  - usually no legit use, as header is already set by application
-h*  - not needed in blog comments
-hr  - usually not necessary in blog comments
-img - could be extremely undesirable if linking to external pics (CSRF, goatse)
-pre - could use formatting, only useful in code contexts
-q   - very little support
-s   - transform into span with styling or del?
-small - technically presentational
-span - depends on attribute allowances
-sub, sup - specialized
-u   - little legit use, prefer class with text-decoration
-
-Based on the riskiness of the items, we may want to offer %HTML.DisableImages
-attribute and put URI filtering higher up on the priority list.
-
-
-== Attribute Risk Analysis ==
-
-We actually have a suprisingly small assortment of allowed attributes (the
-rest are deprecated in strict, and thus we opted not to allow them, even
-though our output is XHTML Transitional by default.)
-
-Required URI - img.alt, img.src, a.href
-Medium risk - *.class, *.dir
-High risk - img.height, img.width, *.id, *.style
-
-Table - colgroup/col.span, td/th.rowspan, td/th.colspan
-Uncommon - *.title, *.lang, *.xml:lang
-Rare - td/th.abbr, table.summary, {table}.charoff
-Rare URI - del.cite, ins.cite, blockquote.cite, q.cite, img.longdesc
-Presentational - {table}.align, {table}.valign, table.frame, table.rules,
-    table.border
-Partially presentational - table.cellpadding, table.cellspacing,
-    table.width, col.width, colgroup.width
-
-
-== CSS Risk Analysis ==
-
-Currently, there is no support for fine-grained "allowed CSS" specification,
-mainly because I'm lazy, partially because no one has asked for it. However,
-this will be added eventually.
-
-There are certain CSS elements that are extremely useful inline, but then
-as you get to more presentation oriented styling it may not always be
-appropriate to inline them.
-
-Useful - clear, float, border-collapse, caption-side
-
-These CSS properties can break layouts if used improperly. We have excluded
-any CSS properties that are not currently implemented (such as position).
-
-Dangerous, can go outside container - float
-Easy to abuse - font-size, font-family (font), width
-Colored - background-color (background), border-color (border), color
-    (see proposal-colors.html)
-Dramatic - border, list-style-position (list-style), margin, padding,
-    text-align, text-indent, text-transform, vertical-align, line-height
-
-Dramatic elements substantially change the look of text in ways that should
-probably have been reserved to other areas.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/proposal-language.txt b/lib/htmlpurifier/docs/proposal-language.txt
deleted file mode 100644
index 149701cd3..000000000
--- a/lib/htmlpurifier/docs/proposal-language.txt
+++ /dev/null
@@ -1,64 +0,0 @@
-We are going to model our I18N/L10N off of MediaWiki's system.  Their's is
-obviously quite complicated, so we're going to simplify it a bit for our needs.
-
-== Caching ==
-
-MediaWiki has lots of caching mechanisms built in, which make the code somewhat
-more difficult to understand.  Before doing any loading, MediaWiki will check
-the following places to see if we can be lazy:
-
-1. $mLocalisationCache[$code] -  just a variable where it may have been stashed
-2. serialized/$code.ser -  compiled serialized language file
-3. Memcached version of file (with expiration checking)
-
-Expiration checking consists of by ensuring all dependencies have filemtime
-that match the ones bundled with the cached copy. Similar checking could be
-implemented for serialized versions, as it seems that they are not updated
-until manually recompiled.
-
-== Behavior ==
-
-Things that are localizable:
-
--  Weekdays (and abbrev)
--  Months (and abbrev)
--  Bookstores
--  Skin names
--  Date preferences / Custom date format
--  Default date format
--  Default user option overrides
--+ Language names
--  Timezones
--+ Character encoding conversion via iconv
--  UpperLowerCase first (needs casemaps for some)
--  UpperLowerCase
--  Uppercase words
--  Uppercase word breaks
--  Case folding
--  Strip punctuation for MySQL search
--  Get first character
--+ Alternate encoding
--+ Recoding for edit (and then recode input)
--+ RTL
--+ Direction mark character depending on RTL
--? Arrow depending on RTL
--  Languages where italics cannot be used
--+ Number formatting (commafy, transform digits, transform separators)
--  Truncate (multibyte)
--  Grammar conversions for inflected languages
--  Plural transformations
--  Formatting expiry times
--  Segmenting for diffs (Chinese)
--  Convert to variants of language
--  Language specific user preference options
--  Link trails [[foo]]bar
--+ Language code (RFC 3066)
-
-Neat functionality:
-
--  I18N sprintfDate
--  Roman numeral formatting
-
-Items marked with a + likely need to be addressed by HTML Purifier
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/proposal-new-directives.txt b/lib/htmlpurifier/docs/proposal-new-directives.txt
deleted file mode 100644
index f54ee2d8d..000000000
--- a/lib/htmlpurifier/docs/proposal-new-directives.txt
+++ /dev/null
@@ -1,44 +0,0 @@
-
-Configuration Ideas
-
-Here are some theoretical configuration ideas that we could implement some
-time.  Note the naming convention: %Namespace.Directive. If you want one
-implemented, give us a ring, and we'll move it up the priority chain.
-
-%Attr.RewriteFragments - if there's %Attr.IDPrefix we may want to transparently
-    rewrite the URLs we parse too.  However, we can only do it when it's a pure
-    anchor link, so it's not foolproof
-
-%Attr.ClassBlacklist,
-%Attr.ClassWhitelist,
-%Attr.ClassPolicy - determines what classes are allowed. When
-    %Attr.ClassPolicy is set to Blacklist, only allow those not in
-    %Attr.ClassBlacklist. When it's Whitelist, only allow those in
-    %Attr.ClassWhitelist.
-
-%Attr.MaxWidth,
-%Attr.MaxHeight - caps for width and height related checks.
-    (the hack in Pixels for an image crashing attack could be replaced by this)
-
-%URI.AddRelNofollow - will add rel="nofollow" to all links, preventing the
-    spread of ill-gotten pagerank
-
-%URI.HostBlacklistRegex - regexes that if matching the host are disallowed
-%URI.HostWhitelist - domain names that are excluded from the host blacklist
-%URI.HostPolicy - determines whether or not its reject all and then whitelist
-    or allow all in then do specific blacklists with whitelist intervening.
-    'DenyAll' or 'AllowAll' (default)
-
-%URI.DisableIPHosts - URIs that have IP addresses for hosts are disallowed.
-    Be sure to also grab unusual encodings (dword, hex and octal), which may
-    be currently be caught by regular DNS
-%URI.DisableIDN - Disallow raw internationalized domain names. Punycode
-    will still be permitted.
-
-%URI.ConvertUnusualIPHosts - transform dword/hex/octal IP addresses to the
-    regular form
-%URI.ConvertAbsoluteDNS - Remove extra dots after host names that trigger
-    absolute DNS.  While this is actually the preferred method according to
-    the RFC, most people opt to use a relative domain name relative to . (root).
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/proposal-plists.txt b/lib/htmlpurifier/docs/proposal-plists.txt
deleted file mode 100644
index eef8ade61..000000000
--- a/lib/htmlpurifier/docs/proposal-plists.txt
+++ /dev/null
@@ -1,218 +0,0 @@
-THE UNIVERSAL DESIGN PATTERN: PROPERTIES
-Steve Yegge
-
-Implementation:
-    get(name)
-    put(name, value)
-    has(name)
-    remove(name)
-    iteration, with filtering [this will be our namespaces]
-    parent
-
-Representations:
-    - Keys are strings
-    - It's nice to not need to quote keys (if we formulate our own language,
-      consider this)
-    - Property not present representation (key missing)
-    - Frequent removal/re-add may have null help. If null is valid, use
-      another value. (PHP semantics are weird here)
-
-Data structures:
-    - LinkedHashMap is wonderful (O(1) access and maintains order)
-    - Using a special property that points to the parent is usual
-    - Multiple inheritance possible, need rules for which to lookup first
-    - Iterative inheritance is best
-    - Consider performance!
-
-Deletion
-    - Tricky problem with inheritance
-    - Distinguish between "not found" and "look in my parent for the property"
-    [Maybe HTML Purifier won't allow deletion]
-
-Read/write asymmetry (it's correct!)
-
-Read-only plists
-    - Allow ability to freeze [this is what we have already]
-    - Don't overuse it
-
-Performance:
-    - Intern strings (PHP does this already)
-    - Don't be case-insensitive
-    - If all properties in a plist are known a-priori, you can use a "perfect"
-      hash function. Often overkill.
-    - Copy-on-read caching "plundering" reduces lookup, but uses memory and can
-      grow stale. Use as last resort.
-    - Refactoring to fields. Watch for API compatibility, system complexity,
-      and lack of flexibility.
-    - Refrigerator: external data-structure to hold plists
-
-Transient properties:
-    [Don't need to worry about this]
-    - Use a separate plist for transient properties
-    - Non-numeric override; numeric should ADD
-    - Deletion: removeTransientProperty() and transientlyRemoveProperty()
-
-Persistence:
-    - XML/JSON are good
-    - Text-based is good for readability, maintainability and bootstrapping
-    - Compressed binary format for network transport [not necessary]
-    - RDBMS or XML database
-
-Querying: [not relevant]
-    - XML database is nice for XPath/XQuery
-    - jQuery for JSON
-    - Just load it all into a program
-
-Backfills/Data integrity:
-    - Use usual methods
-    - Lazy backfill is a nice hack
-
-Type systems:
-    - Flags: ReadOnly, Permanent, DontEnum
-    - Typed properties isn't that useful [It's also Not-PHP]
-    - Seperate meta-list of directive properties IS useful
-    - Duck typing is useful for systems designed fully around properties pattern
-
-Trade-off:
-    + Flexibility
-    + Extensibility
-    + Unit-testing/prototype-speed
-    - Performance
-    - Data integrity
-    - Navagability/Query-ability
-    - Reversability (hard to go back)
-
-HTML Purifier
-
-We are not happy with our current system of defining configuration directives,
-because it has become clear that things will get a lot nicer if we allow
-multiple namespaces, and there are some features that naturally lend themselves
-to inheritance, which we do not really support well.
-
-One of the considered implementation changes would be to go from a structure
-like:
-
-array(
-    'Namespace' => array(
-        'Directive' => 'val1',
-        'Directive2' => 'val2',
-    )
-)
-
-to:
-
-array(
-    'Namespace.Directive' => 'val1',
-    'Namespace.Directive2' => 'val2',
-)
-
-The below implementation takes more memory, however, and it makes it a bit
-complicated to grab all values from a namespace.
-
-The alternate implementation choice is to allow nested plists. This keeps
-iteration easy, but is problematic for inheritance (it would be difficult
-to distinguish a plist from an array) and retrieval (when specifying multiple
-namespaces we would need some multiple de-referencing).
-
-----
-
-We can bite the performance hit, and just do iteration with filter
-(the strncmp call should be relatively cheap). Then, users should be able
-to optimize doing something like:
-
-$config = HTMLPurifier_Config::createDefault();
-if (!file_exists('config.php')) {
-    // set up $config
-    $config->save('config.php');
-} else {
-    $config->load('config.php');
-}
-
-Or maybe memcache, or something. This means that "// set up $config" must
-not have any dynamic parts, or the user has to invalidate the cache when
-they do update it. We have to think about this a little more carefully; the
-file call might be more expensive.
-
-----
-
-This might get expensive, however, when we actually care about iterating
-over the configuration and want the actual values. So what about nesting the
-lists?
-
-"ns.sub.directive" => values['ns']['sub']['directive']
-
-We can distinguish between plists and arrays by using ArrayObjects for the
-plists, and regular arrays for the arrays? Alternatively, use ArrayObjects
-for the arrays, and regular arrays for the plists.
-
-----
-
-Implementation demands, and what has caused them:
-
-1. DefinitionCache, the HTML, CSS and URI namespaces have caches attached to them
-   Results:
-    - getBatchSerial()
-        - getBatch() : in general, the ability to traverse just a namespace
-
-2. AutoFormat/Filter, this is a plugin architecture, directives not hard-coded
-    - getBatch()
-
-3. Configuration form
-    - Namespaces used to organize directives
-
-Other than that, we have a pure plist. PERHAPS we should maintain separate things
-for these different demands.
-
-Issue 2: Directives for configuring the plugins are regular plists, but
-when enabling them, while it's "plist-ish", what you're really doing is adding
-them to an array of "autoformatters"/"filters" to enable. We can setup
-magic BC as well as in the new interface, but there should also be an
-add('AutoFormat', 'AutoParagraph'); which does the right thing.
-
-One thing to consider is whether or not inheritance rules will apply to these.
-I'd say yes. That means that they're still plisty, in fact, the underlying
-implementation will probably be a plist. However, they will get their OWN
-plists, and will NOT support nesting.
-
-Issue 1: Our current implementation is generally not efficient; md5(serialize($foo))
-is pretty expensive. So, I don't think there will be any problems if it
-gets "less" efficient, as long as we give users a properly fast alternative;
-DefinitionRev gives us a way to do this, by simply telling the user they must
-update it whenever they update Configuration directives as well. (There are
-obvious BC concerns here).
-
-In such a case, we simply iterate over our plist (performing full retrievals
-for each value), grab the entries we care about, and then serialize and hash.
-It's going to be slow either way, due to the ability of plists to inherit.
-If we ksort(), we don't have to traverse the entire array, however, the
-cost of a ksort() call may not be worth it.
-
-At this point, last time, I started worrying about the performance implications
-of allowing inheritance, and wondering whether or not I wanted to squash
-the plist. At first blush, our code might be under the assumption that
-accessing properties is cheap; but actually we prefer to copy out the value
-into a member variable if it's going to be used many times. With this is mind
-I don't think CPU consumption from a few nested function calls is going to
-be a problem. We *are* going to enforce a function only interface.
-
-The next issue at hand is how we're going to manage the "special" plists,
-which should still be able to be inherited. Basically, it means that multiple
-plists would be attached to the configuration object, which is not the
-best for memory performance. The alternative is to keep them all in one
-big plist, and then eat the one-time cost of traversing the entire plist
-to grab the appropriate values.
-
-I think at this point we can write the generic interface, and then set up separate
-plists if that ends up being necessary for performance (it probably won't.) Now
-lets code our generic plist implementation.
-
-----
-
-Iterating over the plist presents some problems. The way we've chosen to solve
-this is to squash all of the parents.
-
-----
-
-But I don't need iteration.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/ref-content-models.txt b/lib/htmlpurifier/docs/ref-content-models.txt
deleted file mode 100644
index 19f84d526..000000000
--- a/lib/htmlpurifier/docs/ref-content-models.txt
+++ /dev/null
@@ -1,50 +0,0 @@
-
-Handling Content Model Changes
-
-
-1. Context
-
-The distinction between Transitional and Strict document types is somewhat
-of an anomaly in the lineage of XHTML document types (following 1.0, no
-doctypes do not have flavors: instead, modularization is used to let
-document authors vary their elements).  This transition is usually quite
-straight-forward, as W3C usually deprecates attributes or elements, which
-are quite easily handled using tag and attribute transforms.
-
-However, for two elements, <blockquote>, <body> and <address>, W3C elected
-to also change the content model.  <blockquote> and <body> originally
-accepted both inline and block elements, but in the strict doctype they
-only allow block elements.  With <address>, the situation is inverted:
-<p> tags were now forbidden from appearing within this tag.
-
-
-2. Current situation
-
-Currently, HTML Purifier treats <blockquote> specially during Tidy mode
-using a custom ChildDef class StrictBlockquote.  StrictBlockquote
-operates similarly to Required, except that when it encounters an inline
-element, it will wrap it in a block tag (as specified by
-%HTML.BlockWrapper, the default is <p>).  The naming suggests it can
-only be used for <blockquote>s, although it may be possible to
-genericize it to work on other cases of this nature (this would be of
-little practical application, as no other element in XHTML 1.1 or earlier
-has a block-only content model).
-
-Tidy currently contains no custom, lenient implementation for <address>.
-If one were to be written, it would likely operate on the principle that,
-when a <p> tag were to be encountered, it would be replaced with a
-leading and trailing <br /> tag (the contents of <p>, being inline, are
-not an issue).  There is no prior work with this sort of operation.
-
-
-3. Outside applicability
-
-There are a number of other elements that contain restrictive content
-models, such as <ul> or <span> (the latter is restrictive in that it
-does not allow block elements).  In the former case, an errant node
-is eliminated completely, in the latter case, the text of the node
-would is preserved (as the parent node does allow PCDATA).  Custom
-content model implementations probably are not the best way of handling
-these cases, instead, node bubbling should be implemented instead.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/ref-css-length.txt b/lib/htmlpurifier/docs/ref-css-length.txt
deleted file mode 100644
index aa40559e3..000000000
--- a/lib/htmlpurifier/docs/ref-css-length.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-
-CSS Length Reference
-  To bound, or not to bound, that is the question
-
-It's quite a reasonable request, really, and it's already been implemented
-for HTML.  That is, length bounding.  It makes little sense to let users
-define text blocks that have a font-size of 63,360 inches (that's a mile,
-by the way) or a width of forty-fold the parent container.
-
-But it's a little more complicated then that. There are multiple units
-one can use, and we have to a little unit conversion to get things working.
-Here's what we have:
-
-Absolute:
-    1 in ~= 2.54 cm
-    1 cm = 10 mm
-    1 pt = 1/72 in
-    1 pc = 12 pt
-
-Relative:
-    1 em ~= 10.0667 px
-    1 ex ~= 0.5 em, though Mozilla Firefox says 1 ex = 6px
-    1 px ~= 1 pt
-
-Watch out: font-sizes can also be nested to get successively larger
-(although I do not relish having to keep track of context font-sizes,
-this may be necessary, especially for some of the more advanced features
-for preventing things like white on white).
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/ref-devnetwork.html b/lib/htmlpurifier/docs/ref-devnetwork.html
deleted file mode 100644
index 2e9d142e5..000000000
--- a/lib/htmlpurifier/docs/ref-devnetwork.html
+++ /dev/null
@@ -1,47 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<meta name="description" content="Credits and links to DevNetwork forum topics on HTML Purifier." />
-<link rel="stylesheet" type="text/css" href="./style.css" />
-
-<title>DevNetwork Credits - HTML Purifier</title>
-
-</head>
-<body>
-
-<h1>DevNetwork Credits</h1>
-
-<div id="filing">Filed under Reference</div>
-<div id="index">Return to the <a href="index.html">index</a>.</div>
-<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
-
-<p>Many thanks to the DevNetwork community for answering questions,
-theorizing about design, and offering encouragement during
-the development of this library in these forum threads:</p>
-
-<ul>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=52905">HTMLPurifier PHP Library hompeage</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53056">How much of CSS to implement?</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53083">Parsing URL only according to URI : Security Risk?</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53096">Gimme a name : URI and friends</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53415">How to document configuration directives</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53479">IPv6</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53539">http and ftp versus news and mailto</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53579">HTMLPurifier - Take your best shot</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53664">Need help optimizing a block of code</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=53861">Non-SGML characters</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=54283">Wordpress makes me cry</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=54478">Parameter Object vs. Parameter Array vs. Parameter Functions</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=54521">Convert encoding where output cannot represent characters</a></li>
-    <li><a href="http://forums.devnetwork.net/viewtopic.php?t=56411">Reporting errors in a document without line numbers</a></li>
-</ul>
-
-<p>...as well as any I may have forgotten.</p>
-
-</body>
-</html>
-
-<!-- vim: et sw=4 sts=4
--->
diff --git a/lib/htmlpurifier/docs/ref-html-modularization.txt b/lib/htmlpurifier/docs/ref-html-modularization.txt
deleted file mode 100644
index d26d30ada..000000000
--- a/lib/htmlpurifier/docs/ref-html-modularization.txt
+++ /dev/null
@@ -1,166 +0,0 @@
-
-The Modularization of HTMLDefinition in HTML Purifier
-
-WARNING: This document was drafted before the implementation of this
-    system, and some implementation details may have evolved over time.
-
-HTML Purifier uses the modularization of XHTML
-<http://www.w3.org/TR/xhtml-modularization/> to organize the internals
-of HTMLDefinition into a more manageable and extensible fashion. Rather
-than have one super-object, HTMLDefinition is split into HTMLModules,
-each of which are responsible for defining elements, their attributes,
-and other properties (for a more indepth coverage, see
-/library/HTMLPurifier/HTMLModule.php's docblock comments). These modules
-are managed by HTMLModuleManager.
-
-Modules that we don't support but could support are:
-
-    * 5.6. Table Modules
-          o 5.6.1. Basic Tables Module [?]
-    * 5.8. Client-side Image Map Module [?]
-    * 5.9. Server-side Image Map Module [?]
-    * 5.12. Target Module [?]
-    * 5.21. Name Identification Module [deprecated]
-
-These modules would be implemented as "unsafe":
-
-    * 5.2. Core Modules
-          o 5.2.1. Structure Module
-    * 5.3. Applet Module
-    * 5.5. Forms Modules
-          o 5.5.1. Basic Forms Module
-          o 5.5.2. Forms Module
-    * 5.10. Object Module
-    * 5.11. Frames Module
-    * 5.13. Iframe Module
-    * 5.14. Intrinsic Events Module
-    * 5.15. Metainformation Module
-    * 5.16. Scripting Module
-    * 5.17. Style Sheet Module
-    * 5.19. Link Module
-    * 5.20. Base Module
-
-We will not be using W3C's XML Schemas or DTDs directly due to the lack
-of robust tools for handling them (the main problem is that all the
-current parsers are usually PHP 5 only and solely-validating, not
-correcting).
-
-This system may be generalized and ported over for CSS.
-
-== General Use-Case ==
-
-The outwards API of HTMLDefinition has been largely preserved, not
-only for backwards-compatibility but also by design. Instead,
-HTMLDefinition can be retrieved "raw", in which it loads a structure
-that closely resembles the modules of XHTML 1.1. This structure is very
-dynamic, making it easy to make cascading changes to global content
-sets or remove elements in bulk.
-
-However, once HTML Purifier needs the actual definition, it retrieves
-a finalized version of HTMLDefinition. The finalized definition involves
-processing the modules into a form that it is optimized for multiple
-calls. This final version is immutable and, even if editable, would
-be extremely hard to change.
-
-So, some code taking advantage of the XHTML modularization may look
-like this:
-
-<?php
-    $config = HTMLPurifier_Config::createDefault();
-    $def =& $config->getHTMLDefinition(true); // reference to raw
-    $def->addElement('marquee', 'Block', 'Flow', 'Common');
-    $purifier = new HTMLPurifier($config);
-    $purifier->purify($html); // now the definition is finalized
-?>
-
-== Inclusions ==
-
-One of the nice features of HTMLDefinition is that piggy-backing off
-of global attribute and content sets is extremely easy to do.
-
-=== Attributes ===
-
-HTMLModule->elements[$element]->attr stores attribute information for the
-specific attributes of $element. This is quite close to the final
-API that HTML Purifier interfaces with, but there's an important
-extra feature: attr may also contain a array with a member index zero.
-
-<?php
-    HTMLModule->elements[$element]->attr[0] = array('AttrSet');
-?>
-
-Rather than map the attribute key 0 to an array (which should be
-an AttrDef), it defines a number of attribute collections that should
-be merged into this elements attribute array.
-
-Furthermore, the value of an attribute key, attribute value pair need
-not be a fully fledged AttrDef object. They can also be a string, which
-signifies a AttrDef that is looked up from a centralized registry
-AttrTypes. This allows more concise attribute definitions that look
-more like W3C's declarations, as well as offering a centralized point
-for modifying the behavior of one attribute type. And, of course, the
-old method of manually instantiating an AttrDef still works.
-
-=== Attribute Collections ===
-
-Attribute collections are stored and processed in the AttrCollections
-object, which is responsible for performing the inclusions signified
-by the 0 index. These attribute collections, too, are mutable, by
-using HTMLModule->attr_collections. You may add new attributes
-to a collection or define an entirely new collection for your module's
-use. Inclusions can also be cumulative.
-
-Attribute collections allow us to get rid of so called "global attributes"
-(which actually aren't so global).
-
-=== Content Models and ChildDef ===
-
-An implementation of the above-mentioned attributes and attribute
-collections was applied to the ChildDef system. HTML Purifier uses
-a proprietary system called ChildDef for performance and flexibility
-reasons, but this does not line up very well with W3C's notion of
-regexps for defining the allowed children of an element.
-
-HTMLPurifier->elements[$element]->content_model and
-HTMLPurifier->elements[$element]->content_model_type store information
-about the final ChildDef that will be stored in
-HTMLPurifier->elements[$element]->child (we use a different variable
-because the two forms are sufficiently different).
-
-$content_model is an abstract, string representation of the internal
-state of ChildDef, while $content_model_type is a string identifier
-of which ChildDef subclass to instantiate. $content_model is processed
-by substituting all content set identifiers (capitalized element names)
-with their contents. It is then parsed and passed into the appropriate
-ChildDef class, as defined by the ContentSets->getChildDef() or the
-custom fallback HTMLModule->getChildDef() for custom child definitions
-not in the core.
-
-You'll need to use these facilities if you plan on referencing a content
-set like "Inline" or "Block", and using them is recommended even if you're
-not due to their conciseness.
-
-A few notes on $content_model: it's structure can be as complicated
-as you want, but the pipe symbol (|) is reserved for defining possible
-choices, due to the content sets implementation. For example, a content
-model that looks like:
-
-"Inline -> Block -> a"
-
-...when the Inline content set is defined as "span | b" and the Block
-content set is defined as "div | blockquote", will expand into:
-
-"span | b -> div | blockquote -> a"
-
-The custom HTMLModule->getChildDef() function will need to be able to
-then feed this information to ChildDef in a usable manner.
-
-=== Content Sets ===
-
-Content sets can be altered using HTMLModule->content_sets, an associative
-array of content set names to content set contents. If the content set
-already exists, your values are appended on to it (great for, say,
-registering the font tag as an inline element), otherwise it is
-created. They are substituted into content_model.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/ref-proprietary-tags.txt b/lib/htmlpurifier/docs/ref-proprietary-tags.txt
deleted file mode 100644
index 5849eb04d..000000000
--- a/lib/htmlpurifier/docs/ref-proprietary-tags.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-Proprietary Tags
-    <nobr> and friends
-
-Here are some proprietary tags that W3C does not define but occasionally show
-up in the wild.  We have only included tags that would make sense in an
-HTML Purifier context.
-
-<align>, block element that aligns (extremely rare)
-<blackface>, inline that double-bolds text (extremely rare)
-<comment>, hidden comment for IE and WebTV
-<multicol cols=number gutter=pixels width=pixels>, multiple columns
-<nobr>, no linebreaks
-<spacer align=* type="vertical|horizontal|block">, whitespace in doc,
-    use width/height for block and size for vertical/horizontal (attributes)
-    (extremely rare)
-<wbr>, potential word break point: allows linebreaks. Only works in <nobr>
-
-<listing>, monospace pre-variant (extremely rare)
-<plaintext>, escapes all tags to the end of document
-<xmp>, monospace, replace with pre
-
-These should be put into their own Tidy module, not loaded by default(?). These
-all qualify as "lenient" transforms.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/ref-whatwg.txt b/lib/htmlpurifier/docs/ref-whatwg.txt
deleted file mode 100644
index 4bb4984f2..000000000
--- a/lib/htmlpurifier/docs/ref-whatwg.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-Web Hypertext Application Technology Working Group
-    WHATWG
-
-== HTML 5 ==
-
-URL: http://www.whatwg.org/specs/web-apps/current-work/
-
-HTML 5 defines a kaboodle of new elements and attributes, as well as
-some well-defined, "quirks mode" HTML parsing.  Although WHATWG professes
-to be targeted towards web applications, many of their semantic additions
-would be quite useful in regular documents. Eventually, HTML
-Purifier will need to audit their lists and figure out what changes need
-to be made.  This process is complicated by the fact that the WHATWG
-doesn't buy into W3C's modularization of XHTML 1.1: we may need
-to remodularize HTML 5 (probably done by section name). No sense in
-committing ourselves till the spec stabilizes, though.
-
-More immediately speaking though, however, is the well-defined parsing
-behavior that HTML 5 adds. While I have little interest in writing
-another DirectLex parser, other parsers like ph5p
-<http://jero.net/lab/ph5p/> can be adapted to DOMLex to support much more
-flexible HTML parsing (a cool feature I've seen is how they resolve
-<b>bold<i>both</b>italic</i>).
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/specimens/LICENSE b/lib/htmlpurifier/docs/specimens/LICENSE
deleted file mode 100644
index 0bfad771e..000000000
--- a/lib/htmlpurifier/docs/specimens/LICENSE
+++ /dev/null
@@ -1,10 +0,0 @@
-Licensing of Specimens
-
-Some files in this directory have different licenses:
-
-windows-live-mail-desktop-beta.html - donated by laacz, public domain
-img.png - LGPL, from <http://commons.wikimedia.org/wiki/Image:Pastille_chrome.png>
-
-All other files are by me, and are licensed under LGPL.
-
-    vim: et sw=4 sts=4
diff --git a/lib/htmlpurifier/docs/specimens/html-align-to-css.html b/lib/htmlpurifier/docs/specimens/html-align-to-css.html
deleted file mode 100644
index 0adf76aaa..000000000
--- a/lib/htmlpurifier/docs/specimens/html-align-to-css.html
+++ /dev/null
@@ -1,165 +0,0 @@
-<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
-   "http://www.w3.org/TR/html4/loose.dtd">
-<html>
-<head>
-<title>HTML align attribute to CSS - HTML Purifier Specimen</title>
-<style type="text/css">
-div.container {position:relative;height:110px;}
-div.container.legend .test {text-align:center;line-height:100px;}
-div.test {width:100px;height:100px;border:1px solid black;
-position:absolute;top:10px;}
-div.test.html {left:10px;}
-div.test.css  {left:140px;}
-table {background:#F00;}
-img {border:1px solid #000;}
-hr {width:50px;}
-div.segment {width:250px; float:left; margin-top:1em;}
-</style>
-</head>
-<body>
-
-<h1>HTML align attribute to CSS</h1>
-
-<p>Inspect source for methodology.</p>
-
-<div class="container legend">
-<div class="test html">
-    HTML
-</div>
-<div class="test css">
-    CSS
-</div>
-</div>
-
-<div class="segment">
-
-<h2>table.align</h2>
-
-<h3>left</h3>
-<div class="container">
-<div class="test html">
-    a<table align="left"><tr><td>O</td></tr></table>a
-</div>
-<div class="test css">
-    a<table style="float:left;"><tr><td>O</td></tr></table>a
-</div>
-</div>
-
-<h3>center</h3>
-<div class="container">
-<div class="test html">
-    a<table align="center"><tr><td>O</td></tr></table>a
-</div>
-<div class="test css">
-    a<table style="margin-left:auto; margin-right:auto;"><tr><td>O</td></tr></table>a
-</div>
-</div>
-
-<h3>right</h3>
-<div class="container">
-<div class="test html">
-    a<table align="right"><tr><td>O</td></tr></table>a
-</div>
-<div class="test css">
-    a<table style="float:right;"><tr><td>O</td></tr></table>a
-</div>
-</div>
-
-</div>
-
-<!-- ################################################################## -->
-
-<div class="segment">
-<h2>img.align</h2>
-<h3>left</h3>
-<div class="container">
-<div class="test html">
-    a<img src="img.png" align="left">a
-</div>
-<div class="test css">
-    a<img src="img.png" style="float:left;">a
-</div>
-</div>
-
-<h3>right</h3>
-<div class="container">
-<div class="test html">
-    a<img src="img.png" align="right">a
-</div>
-<div class="test css">
-    a<img src="img.png" style="float:right;">a
-</div>
-</div>
-
-<h3>bottom</h3>
-<div class="container">
-<div class="test html">
-    a<img src="img.png" align="bottom">a
-</div>
-<div class="test css">
-    a<img src="img.png" style="vertical-align:baseline;">a
-</div>
-</div>
-
-<h3>middle</h3>
-<div class="container">
-<div class="test html">
-    a<img src="img.png" align="middle">a
-</div>
-<div class="test css">
-    a<img src="img.png" style="vertical-align:middle;">a
-</div>
-</div>
-
-<h3>top</h3>
-<div class="container">
-<div class="test html">
-    a<img src="img.png" align="top">a
-</div>
-<div class="test css">
-    a<img src="img.png" style="vertical-align:top;">a
-</div>
-</div>
-
-</div>
-
-<!-- ################################################################## -->
-
-<div class="segment">
-
-<h2>hr.align</h2>
-
-<h3>left</h3>
-<div class="container">
-<div class="test html">
-    <hr align="left" />
-</div>
-<div class="test css">
-    <hr style="margin-right:auto; margin-left:0; text-align:left;" />
-</div>
-</div>
-
-<h3>center</h3>
-<div class="container">
-<div class="test html">
-    <hr align="center" />
-</div>
-<div class="test css">
-    <hr style="margin-right:auto; margin-left:auto; text-align:center;" />
-</div>
-</div>
-
-<h3>right</h3>
-<div class="container">
-<div class="test html">
-    <hr align="right" />
-</div>
-<div class="test css">
-    <hr style="margin-right:0; margin-left:auto; text-align:right;" />
-</div>
-</div>
-
-</div>
-
-</body>
-</html>
diff --git a/lib/htmlpurifier/docs/specimens/img.png b/lib/htmlpurifier/docs/specimens/img.png
deleted file mode 100644
index a755bcb5e..000000000
--- a/lib/htmlpurifier/docs/specimens/img.png
+++ /dev/null
diff --git a/lib/htmlpurifier/docs/specimens/jochem-blok-word.html b/lib/htmlpurifier/docs/specimens/jochem-blok-word.html
deleted file mode 100644
index 1cc08f888..000000000
--- a/lib/htmlpurifier/docs/specimens/jochem-blok-word.html
+++ /dev/null
@@ -1,129 +0,0 @@
-<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
-
-<head>
-<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
-<meta name=Generator content="Microsoft Word 12 (filtered medium)">
-<!--[if !mso]>
-<style>
-v\:* {behavior:url(#default#VML);}
-o\:* {behavior:url(#default#VML);}
-w\:* {behavior:url(#default#VML);}
-..shape {behavior:url(#default#VML);}
-</style>
-<![endif]-->
-<style>
-<!--
- /* Font Definitions */
- @font-face
-	{font-family:"Cambria Math";
-	panose-1:2 4 5 3 5 4 6 3 2 4;}
-@font-face
-	{font-family:Calibri;
-	panose-1:2 15 5 2 2 2 4 3 2 4;}
-@font-face
-	{font-family:Tahoma;
-	panose-1:2 11 6 4 3 5 4 4 2 4;}
-@font-face
-	{font-family:Verdana;
-	panose-1:2 11 6 4 3 5 4 4 2 4;}
- /* Style Definitions */
- p.MsoNormal, li.MsoNormal, div.MsoNormal
-	{margin:0cm;
-	margin-bottom:.0001pt;
-	font-size:10.0pt;
-	font-family:"Verdana","sans-serif";}
-a:link, span.MsoHyperlink
-	{mso-style-priority:99;
-	color:blue;
-	text-decoration:underline;}
-a:visited, span.MsoHyperlinkFollowed
-	{mso-style-priority:99;
-	color:purple;
-	text-decoration:underline;}
-p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
-	{mso-style-priority:99;
-	mso-style-link:"Balloon Text Char";
-	margin:0cm;
-	margin-bottom:.0001pt;
-	font-size:8.0pt;
-	font-family:"Tahoma","sans-serif";}
-span.EmailStyle17
-	{mso-style-type:personal-compose;
-	font-family:"Verdana","sans-serif";
-	color:windowtext;}
-span.BalloonTextChar
-	{mso-style-name:"Balloon Text Char";
-	mso-style-priority:99;
-	mso-style-link:"Balloon Text";
-	font-family:"Tahoma","sans-serif";}
-..MsoChpDefault
-	{mso-style-type:export-only;}
-@page Section1
-	{size:612.0pt 792.0pt;
-	margin:70.85pt 70.85pt 70.85pt 70.85pt;}
-div.Section1
-	{page:Section1;}
--->
-</style>
-<!--[if gte mso 9]><xml>
- <o:shapedefaults v:ext="edit" spidmax="2050" />
-</xml><![endif]--><!--[if gte mso 9]><xml>
- <o:shapelayout v:ext="edit">
-  <o:idmap v:ext="edit" data="1" />
- </o:shapelayout></xml><![endif]-->
-</head>
-
-<body lang=NL link=blue vlink=purple>
-
-<div class=Section1>
-
-<p class=MsoNormal><img width=1277 height=994 id="Picture_x0020_1"
-src="cid:image001.png@01C8CBDF.5D1BAEE0"><o:p></o:p></p>
-
-<p class=MsoNormal><o:p>&nbsp;</o:p></p>
-
-<p class=MsoNormal><b>Name<o:p></o:p></b></p>
-
-<p class=MsoNormal>E-mail : <a href="mailto:mail@example.com"><span
-style='color:windowtext'>mail@example.com</span></a><o:p></o:p></p>
-
-<p class=MsoNormal><o:p>&nbsp;</o:p></p>
-
-<p class=MsoNormal><b>Company<o:p></o:p></b></p>
-
-<p class=MsoNormal>Address 1<o:p></o:p></p>
-
-<p class=MsoNormal>Address 2<o:p></o:p></p>
-
-<p class=MsoNormal><o:p>&nbsp;</o:p></p>
-
-<p class=MsoNormal>Telefoon&nbsp; : +xx xx xxx xxx xx <span style='color:black'><o:p></o:p></span></p>
-
-<p class=MsoNormal><span lang=EN-US style='color:black'>Fax&nbsp; : +xx xx xxx xx xx<o:p></o:p></span></p>
-
-<p class=MsoNormal><span lang=EN-US style='color:black'>Internet : </span><span
-style='color:black'><a href="http://www.example.com/"><span lang=EN-US
-style='color:black'>http://www.example.com</span></a></span><span
-lang=EN-US style='color:black'><o:p></o:p></span></p>
-
-<p class=MsoNormal><span lang=EN-US style='color:black'>Kamer van koophandel
-xxxxxxxxx<o:p></o:p></span></p>
-
-<p class=MsoNormal><span lang=EN-US style='color:black'><o:p>&nbsp;</o:p></span></p>
-
-<p class=MsoNormal><span lang=EN-US style='font-size:7.5pt;color:black'>Op deze
-e-mail is een disclaimer van toepassing, ga naar </span><span lang=EN-US
-style='font-size:7.5pt'><a
-href="http://www.example.com/disclaimer"><span
-style='color:black'>www.example.com/disclaimer</span></a><br>
-<span style='color:black'>A disclaimer is applicable to this email, please
-refer to </span><a href="http://www.example.com/disclaimer"><span
-style='color:black'>www.example.com/disclaimer</span></a><o:p></o:p></span></p>
-
-<p class=MsoNormal><span lang=EN-US><o:p>&nbsp;</o:p></span></p>
-
-</div>
-
-</body>
-
-</html>
diff --git a/lib/htmlpurifier/docs/specimens/windows-live-mail-desktop-beta.html b/lib/htmlpurifier/docs/specimens/windows-live-mail-desktop-beta.html
deleted file mode 100644
index 735b4bd95..000000000
--- a/lib/htmlpurifier/docs/specimens/windows-live-mail-desktop-beta.html
+++ /dev/null
@@ -1,74 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
-<HTML ChildAreas="4" xmlns:canvas><HEAD>
-<META http-equiv=Content-Type content=text/html;charset=windows-1257>
-<STYLE></STYLE>
-
-<META content="MSHTML 6.00.6000.16414" name=GENERATOR></HEAD>
-<BODY id=MailContainerBody
-style="PADDING-RIGHT: 10px; PADDING-LEFT: 10px; FONT-SIZE: 10pt; COLOR: #000000; PADDING-TOP: 15px; FONT-FAMILY: Arial"
-bgColor=#ff6600 leftMargin=0 background="" topMargin=0
-name="Compose message area" acc_role="text" CanvasTabStop="false">
-<DIV
-style="BORDER-TOP: #dddddd 1px solid; FONT-SIZE: 10pt; WIDTH: 100%; MARGIN-RIGHT: 10px; PADDING-TOP: 5px; BORDER-BOTTOM: #dddddd 1px solid; FONT-FAMILY: Verdana; HEIGHT: 25px; BACKGROUND-COLOR: #ffffff"><NOBR><SPAN
-title="View a slideshow of the pictures in this e-mail message."
-style="PADDING-RIGHT: 20px"><A style="COLOR: #0088e4"
-href="http://g.msn.com/5meen_us/171?path=/photomail/{6fc0065f-ffdd-4ca6-9a4c-cc5a93dc122f}&amp;image=47D7B182CFEFB10!127&amp;imagehi=47D7B182CFEFB10!125&amp;CID=323550092004883216">Play
-slideshow </A></SPAN><SPAN style="COLOR: #909090"><SPAN>|</SPAN><SPAN
-style="PADDING-LEFT: 20px"> Download the highest quality version of a picture by
-clicking the + above it </SPAN></SPAN></NOBR></DIV>
-<DIV
-style="PADDING-RIGHT: 5px; PADDING-LEFT: 7px; PADDING-BOTTOM: 2px; WIDTH: 100%; PADDING-TOP: 2px">
-<OL>
-  <LI><IMG title="Angry smile emoticon"
-  style="FLOAT: none; MARGIN: 0px; POSITION: static" tabIndex=-1
-  alt="Angry smile emoticon" src="cid:49F0C856199E4D688D2D740680733D74@wc"
-  MSNNonUserImageOrEmoticon="true">Un ka <FONT style="BACKGROUND-COLOR: #800000"
-  color=#cc99ff><STRONG>Tev</STRONG></FONT> iet, un ko tu dari?
-  <LI>Aha!</LI></OL>
-
-<UL>
-  <LI>Buletets
-  <LI>
-  <DIV align=justify><A title=http://laacz.lv/blog/
-  href="http://laacz.lv/blog/">http://laacz.lv/blog/</A> un <A
-  title=http://google.com/ href="http://google.com/">gugle</A></DIV>
-  <LI>Sarakstucitis</LI></UL></DIV><SPAN><SPAN xmlns:canvas="canvas-namespace-id"
-layoutEmptyTextWellFont="Tahoma"><SPAN
-style="MARGIN-BOTTOM: 15px; OVERFLOW: visible; HEIGHT: 16px"></SPAN><SPAN
-style="MARGIN-BOTTOM: 25px; VERTICAL-ALIGN: top; OVERFLOW: visible; MARGIN-RIGHT: 25px; HEIGHT: 234px">
-<TABLE style="DISPLAY: inline">
-  <TBODY>
-  <TR>
-
-    <TD>
-      <DIV
-      style="FONT-WEIGHT: bold; FONT-SIZE: 12pt; FONT-FAMILY: arial; TEXT-ALIGN: center"><A
-      id=HiresARef
-      title="Click here to view or download a high resolution version of this picture"
-      style="COLOR: #0088e4; TEXT-DECORATION: none"
-      href="http://byfiles.storage.msn.com/x1pMvt0I80jTgT6DuaCpEMbprX3nk3jNv_vjigxV_EYVSMyM_PKgEvDEUtuNhQC-F-23mTTcKyqx6eGaeK2e_wMJ0ikwpDdFntk4SY7pfJUv2g2Ck6R2S2vAA?download">+</A></DIV>
-      <DIV
-      title="Click here to view the full image using the online photo viewer."
-      style="DISPLAY: inline; OVERFLOW: hidden; WIDTH: 140px; HEIGHT: 140px"><A
-      href="http://g.msn.com/5meen_us/171?path=/photomail/{6fc0065f-ffdd-4ca6-9a4c-cc5a93dc122f}&amp;image=47D7B182CFEFB10!127&amp;imagehi=47D7B182CFEFB10!125&amp;CID=323550092004883216"
-      border="0"><IMG
-      style="MARGIN-TOP: 15px; DISPLAY: inline-block; MARGIN-LEFT: 0px"
-      height=109 src="cid:006A71303B80404E9FB6184E55D6A446@wc" width=140
-      border=0></A></DIV></TD></TR>
-  <TR>
-    <TD>
-      <DIV
-      style="FONT-SIZE: 10pt; WIDTH: 140px; FONT-FAMILY: verdana; TEXT-ALIGN: center"><EM><STRONG>This
-      <U>is </U></STRONG><U>tit</U>le</EM> fo<STRONG>r <FONT
-      face="Arial Black">t<FONT color=#800000 size=7>h<U>i</U></FONT>s
-      </FONT>picture</STRONG></DIV></TD></TR></TBODY></TABLE></SPAN></SPAN></SPAN>
-
-<DIV
-style="PADDING-RIGHT: 5px; PADDING-LEFT: 7px; PADDING-BOTTOM: 2px; WIDTH: 100%; PADDING-TOP: 2px; HEIGHT: 50px">
-<DIV>&nbsp;</DIV></DIV>
-<DIV
-style="BORDER-TOP: #dddddd 1px solid; FONT-SIZE: 10pt; MARGIN-BOTTOM: 10px; WIDTH: 100%; COLOR: #909090; MARGIN-RIGHT: 10px; PADDING-TOP: 9px; FONT-FAMILY: Verdana; HEIGHT: 42px; BACKGROUND-COLOR: #ffffff"><NOBR><SPAN
-title="Join Windows Live to share photos using Windows Live Photo E-mail.">Online
-pictures are available for 30 days. <A style="COLOR: #0088e4"
-href="http://g.msn.com/5meen_us/175">Get Windows Live Mail desktop to create
-your own photo e-mails. </A></SPAN></NOBR></DIV></BODY></HTML>
diff --git a/lib/htmlpurifier/docs/style.css b/lib/htmlpurifier/docs/style.css
deleted file mode 100644
index bd79c8a00..000000000
--- a/lib/htmlpurifier/docs/style.css
+++ /dev/null
@@ -1,76 +0,0 @@
-html {font-size:1em; font-family:serif; }
-body {margin-left:4em; margin-right:4em; }
-
-dt {font-weight:bold; }
-pre {margin-left:2em; }
-pre, code, tt {font-family:monospace; font-size:1em; }
-
-h1 {text-align:center; font-family:Garamond, serif;
-  font-variant:small-caps;}
-h2 {border-bottom:1px solid #CCC; font-family:sans-serif; font-weight:normal;
-    font-size:1.3em;}
-h3 {font-family:sans-serif; font-size:1.1em; font-weight:bold; }
-h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
-
-/* For witty quips */
-.subtitled {margin-bottom:0em;}
-.subtitle , .subsubtitle {font-size:.8em; margin-bottom:1em;
-    font-style:italic; margin-top:-.2em;text-align:center;}
-.subsubtitle {text-align:left;margin-left:2em;}
-
-/* Used for special "See also" links. */
-.reference {font-style:italic;margin-left:2em;}
-
-/* Marks off asides, discussions on why something is the way it is */
-.aside {margin-left:2em; font-family:sans-serif; font-size:0.9em; }
-blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
-    border-bottom:1px solid #CCC;}
-.emphasis {font-weight:bold; text-align:center; font-size:1.3em;}
-
-/* A regular table */
-.table {border-collapse:collapse; border-bottom:2px solid #888; margin-left:2em; }
-.table thead th {margin:0; background:#888; color:#FFF; }
-.table thead th:first-child {-moz-border-radius-topleft:1em;}
-.table tbody td {border-bottom:1px solid #CCC; padding-right:0.6em;padding-left:0.6em;}
-
-/* A quick table*/
-table.quick tbody th {text-align:right; padding-right:1em;}
-
-/* Category of the file */
-#filing {font-weight:bold; font-size:smaller; }
-
-/* Contains, without exception, Return to index. */
-#index {font-size:smaller; }
-
-#home {font-size:smaller;}
-
-/* Contains, without exception, $Id$, for SVN version info. */
-#version {text-align:right; font-style:italic; margin:2em 0;}
-
-#toc ol ol {list-style-type:lower-roman;}
-#toc ol {list-style-type:decimal;}
-#toc {list-style-type:upper-alpha;}
-
-q {
-  behavior: url(fixquotes.htc); /* IE fix */
-  quotes: '\201C' '\201D' '\2018' '\2019';
-}
-q:before {
-  content: open-quote;
-}
-q:after {
-  content: close-quote;
-}
-
-/* Marks off implementation details interesting only to the person writing
-   the class described in the spec. */
-.technical {margin-left:2em; }
-.technical:before {content:"Technical note: "; font-weight:bold; color:#061; }
-
-/* Marks off sections that are lacking. */
-.fixme {margin-left:2em; }
-.fixme:before {content:"Fix me: "; font-weight:bold; color:#C00; }
-
-#applicability {margin: 1em 5%; font-style:italic;}
-
-/* vim: et sw=4 sts=4 */