diff options
Diffstat (limited to 'lib/htmlpurifier/docs/dev-config-schema.html')
-rw-r--r-- | lib/htmlpurifier/docs/dev-config-schema.html | 412 |
1 files changed, 412 insertions, 0 deletions
diff --git a/lib/htmlpurifier/docs/dev-config-schema.html b/lib/htmlpurifier/docs/dev-config-schema.html new file mode 100644 index 000000000..07aecd35a --- /dev/null +++ b/lib/htmlpurifier/docs/dev-config-schema.html @@ -0,0 +1,412 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + <meta name="description" content="Describes config schema framework in HTML Purifier." /> + <link rel="stylesheet" type="text/css" href="./style.css" /> + <title>Config Schema - HTML Purifier</title> + </head> + <body> + + <h1>Config Schema</h1> + + <div id="filing">Filed under Development</div> + <div id="index">Return to the <a href="index.html">index</a>.</div> + <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div> + + <p> + HTML Purifier has a fairly complex system for configuration. Users + interact with a <code>HTMLPurifier_Config</code> object to + set configuration directives. The values they set are validated according + to a configuration schema, <code>HTMLPurifier_ConfigSchema</code>. + </p> + + <p> + The schema is mostly transparent to end-users, but if you're doing development + work for HTML Purifier and need to define a new configuration directive, + you'll need to interact with it. We'll also talk about how to define + userspace configuration directives at the very end. + </p> + + <h2>Write a directive file</h2> + + <p> + Directive files define configuration directives to be used by + HTML Purifier. They are placed in <code>library/HTMLPurifier/ConfigSchema/schema/</code> + in the form <code><em>Namespace</em>.<em>Directive</em>.txt</code> (I + couldn't think of a more descriptive file extension.) + Directive files are actually what we call <code>StringHash</code>es, + i.e. associative arrays represented in a string form reminiscent of + <a href="http://qa.php.net/write-test.php">PHPT</a> tests. Here's a + sample directive file, <code>Test.Sample.txt</code>: + </p> + + <pre>Test.Sample +TYPE: string/null +DEFAULT: NULL +ALLOWED: 'foo', 'bar' +VALUE-ALIASES: 'baz' => 'bar' +VERSION: 3.1.0 +--DESCRIPTION-- +This is a sample configuration directive for the purposes of the +<code>dev-config-schema.html<code> documentation. +--ALIASES-- +Test.Example</pre> + + <p> + Each of these segments has a specific meaning: + </p> + + <table class="table"> + <thead> + <tr> + <th>Key</th> + <th>Example</th> + <th>Description</th> + </tr> + </thead> + <tbody> + <tr> + <td>ID</td> + <td>Test.Sample</td> + <td>The name of the directive, in the form Namespace.Directive + (implicitly the first line)</td> + </tr> + <tr> + <td>TYPE</td> + <td>string/null</td> + <td>The type of variable this directive accepts. See below for + details. You can also add <code>/null</code> to the end of + any basic type to allow null values too.</td> + </tr> + <tr> + <td>DEFAULT</td> + <td>NULL</td> + <td>A parseable PHP expression of the default value.</td> + </tr> + <tr> + <td>DESCRIPTION</td> + <td>This is a...</td> + <td>An HTML description of what this directive does.</td> + </tr> + <tr> + <td>VERSION</td> + <td>3.1.0</td> + <td><em>Recommended</em>. The version of HTML Purifier this directive was added. + Directives that have been around since 1.0.0 don't have this, + but any new ones should.</td> + </tr> + <tr> + <td>ALIASES</td> + <td>Test.Example</td> + <td><em>Optional</em>. A comma separated list of aliases for this directive. + This is most useful for backwards compatibility and should + not be used otherwise.</td> + </tr> + <tr> + <td>ALLOWED</td> + <td>'foo', 'bar'</td> + <td><em>Optional</em>. Set of allowed value for a directive, + a comma separated list of parseable PHP expressions. This + is only allowed string, istring, text and itext TYPEs.</td> + </tr> + <tr> + <td>VALUE-ALIASES</td> + <td>'baz' => 'bar'</td> + <td><em>Optional</em>. Mapping of one value to another, and + should be a comma separated list of keypair duples. This + is only allowed string, istring, text and itext TYPEs.</td> + </tr> + <tr> + <td>DEPRECATED-VERSION</td> + <td>3.1.0</td> + <td><em>Not shown</em>. Indicates that the directive was + deprecated this version.</td> + </tr> + <tr> + <td>DEPRECATED-USE</td> + <td>Test.NewDirective</td> + <td><em>Not shown</em>. Indicates what new directive should be + used instead. Note that the directives will functionally be + different, although they should offer the same functionality. + If they are identical, use an alias instead.</td> + </tr> + <tr> + <td>EXTERNAL</td> + <td>CSSTidy</td> + <td><em>Not shown</em>. Indicates if there is an external library + the user will need to download and install to use this configuration + directive. As of right now, this is merely a Google-able name; future + versions may also provide links and instructions.</td> + </tr> + </tbody> + </table> + + <p> + Some notes on format and style: + </p> + + <ul> + <li> + Each of these keys can be expressed in the short format + (<code>KEY: Value</code>) or the long format + (<code>--KEY--</code> with value beneath). You must use the + long format if multiple lines are needed, or if a long format + has been used already (that's why <code>ALIASES</code> in our + example is in the long format); otherwise, it's user preference. + </li> + <li> + The HTML descriptions should be wrapped at about 80 columns; do + not rely on editor word-wrapping. + </li> + </ul> + + <p> + Also, as promised, here is the set of possible types: + </p> + + <table class="table"> + <thead> + <tr> + <th>Type</th> + <th>Example</th> + <th>Description</th> + </tr> + </thead> + <tbody> + <tr> + <td>string</td> + <td>'Foo'</td> + <td><a href="http://docs.php.net/manual/en/language.types.string.php">String</a> without newlines</td> + </tr> + <tr> + <td>istring</td> + <td>'foo'</td> + <td>Case insensitive ASCII string without newlines</td> + </tr> + <tr> + <td>text</td> + <td>"A<em>\n</em>b"</td> + <td>String with newlines</td> + </tr> + <tr> + <td>itext</td> + <td>"a<em>\n</em>b"</td> + <td>Case insensitive ASCII string without newlines</td> + </tr> + <tr> + <td>int</td> + <td>23</td> + <td>Integer</td> + </tr> + <tr> + <td>float</td> + <td>3.0</td> + <td>Floating point number</td> + </tr> + <tr> + <td>bool</td> + <td>true</td> + <td>Boolean</td> + </tr> + <tr> + <td>lookup</td> + <td>array('key' => true)</td> + <td>Lookup array, used with <code>isset($var[$key])</code></td> + </tr> + <tr> + <td>list</td> + <td>array('f', 'b')</td> + <td>List array, with ordered numerical indexes</td> + </tr> + <tr> + <td>hash</td> + <td>array('key' => 'val')</td> + <td>Associative array of keys to values</td> + </tr> + <tr> + <td>mixed</td> + <td>new stdclass</td> + <td>Any PHP variable is fine</td> + </tr> + </tbody> + </table> + + <p> + The examples represent what will be returned out of the configuration + object; users have a little bit of leeway when setting configuration + values (for example, a lookup value can be specified as a list; + HTML Purifier will flip it as necessary.) These types are defined + in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php"> + library/HTMLPurifier/VarParser.php</a>. + </p> + + <p> + For more information on what values are allowed, and how they are parsed, + consult <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php"> + library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well + as <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php"> + library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for + the semantics of the parsed values. + </p> + + <h2>Refreshing the cache</h2> + + <p> + You may have noticed that your directive file isn't doing anything + yet. That's because it hasn't been added to the runtime + <code>HTMLPurifier_ConfigSchema</code> instance. Run + <code>maintenance/generate-schema-cache.php</code> to fix this. + If there were no errors, you're good to go! Don't forget to add + some unit tests for your functionality! + </p> + + <p> + If you ever make changes to your configuration directives, you + will need to run this script again. + </p> + <h2>Adding in-house schema definitions</h2> + + <p> + Placing stuff directly in HTML Purifier's source tree is generally not a + good idea, so HTML Purifier 4.0.0+ has some facilities in place to make your + life easier. + </p> + + <p> + The first is to pass an extra parameter to <code>maintenance/generate-schema-cache.php</code> + with the location of your directory (relative or absolute path will do). For example, + if I'm storing my custom definitions in <em>/var/htmlpurifier/myschema</em>, run: + <code>php maintenance/generate-schema-cache.php /var/htmlpurifier/myschema</code>. + </p> + + <p> + Alternatively, you can create a small loader PHP file in the HTML Purifier base + directory named <code>config-schema.php</code> (this is the same directory + you would place a <code>test-settings.php</code> file). In this file, add + the following line for each directory you want to load: + </p> + +<pre>$builder->buildDir($interchange, '/var/htmlpurifier/myschema');</pre> + + <p>You can even load a single file using:</p> + +<pre>$builder->buildFile($interchange, '/var/htmlpurifier/myschema/MyApp.Directive.txt');</pre> + + <p>Storing custom definitions that you don't plan on sending back upstream in + a separate directory is <em>definitely</em> a good idea! Additionally, picking + a good namespace can go a long way to saving you grief if you want to use + someone else's change, but they picked the same name, or if HTML Purifier + decides to add support for a configuration directive that has the same name.</p> + + <!-- TODO: how to name directives that rely on naming conventions --> + + <h2>Errors</h2> + + <p> + All directive files go through a rigorous validation process + through <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php"> + library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well + as some basic checks during building. While + listing every error out here is out-of-scope for this document, we + can give some general tips for interpreting error messages. + There are two types of errors: builder errors and validation errors. + </p> + + <h3>Builder errors</h3> + + <blockquote> + <p> + <strong>Exception:</strong> Expected type string, got + integer in DEFAULT in directive hash 'Ns.Dir' + </p> + </blockquote> + + <p> + You can identify a builder error by the keyword "directive hash." + These are the easiest to deal with, because they directly correspond + with your directive file. Find the offending directive file (which + is the directive hash plus the .txt extension), find the + offending index ("in DEFAULT" means the DEFAULT key) and fix the error. + This particular error would occur if your default value is not the same + type as TYPE. + </p> + + <h3>Validation errors</h3> + + <blockquote> + <p> + <strong>Exception:</strong> Alias 3 in valueAliases in directive + 'Ns.Dir' must be a string + </p> + </blockquote> + + <p> + These are a little trickier, because we're not actually validating + your directive file, or even the direct string hash representation. + We're validating an Interchange object, and the error messages do + not mention any string hash keys. + </p> + + <p> + Nevertheless, it's not difficult to figure out what went wrong. + Read the "context" statements in reverse: + </p> + + <dl> + <dt>in directive 'Ns.Dir'</dt> + <dd>This means we need to look at the directive file <code>Ns.Dir.txt</code></dd> + <dt>in valueAliases</dt> + <dd>There's no key actually called this, but there's one that's close: + VALUE-ALIASES. Indeed, that's where to look.</dd> + <dt>Alias 3</dt> + <dd>The value alias that is equal to 3 is the culprit.</dd> + </dl> + + <p> + In this particular case, you're not allowed to alias integers values to + strings values. + </p> + + <p> + The most difficult part is translating the Interchange member variable (valueAliases) + into a directive file key (VALUE-ALIASES), but there's a one-to-one + correspondence currently. If the two formats diverge, any discrepancies + will be described in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php"> + library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>. + </p> + + <h2>Internals</h2> + + <p> + Much of the configuration schema framework's codebase deals with + shuffling data from one format to another, and doing validation on this + data. + The keystone of all of this is the <code>HTMLPurifier_ConfigSchema_Interchange</code> + class, which represents the purest, parsed representation of the schema. + </p> + + <p> + Hand-writing this data is unwieldy, however, so we write directive files. + These directive files are parsed by <code>HTMLPurifier_StringHashParser</code> + into <code>HTMLPurifier_StringHash</code>es, which then + are run through <code>HTMLPurifier_ConfigSchema_InterchangeBuilder</code> + to construct the interchange object. + </p> + + <p> + From the interchange object, the data can be siphoned into other forms + using <code>HTMLPurifier_ConfigSchema_Builder</code> subclasses. + For example, <code>HTMLPurifier_ConfigSchema_Builder_ConfigSchema</code> + generates a runtime <code>HTMLPurifier_ConfigSchema</code> object, + which <code>HTMLPurifier_Config</code> uses to validate its incoming + data. There is also an XML serializer, which is used to build documentation. + </p> + + </body> +</html> + +<!-- vim: et sw=4 sts=4 +--> |