aboutsummaryrefslogtreecommitdiffstats
path: root/lib/htmlpurifier/docs/dev-config-schema.html
blob: 07aecd35ac88c67f4bfbd6cdb05e920d6f807f61 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta name="description" content="Describes config schema framework in HTML Purifier." />
    <link rel="stylesheet" type="text/css" href="./style.css" />
    <title>Config Schema - HTML Purifier</title>
  </head>
  <body>

    <h1>Config Schema</h1>

    <div id="filing">Filed under Development</div>
    <div id="index">Return to the <a href="index.html">index</a>.</div>
    <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>

    <p>
      HTML Purifier has a fairly complex system for configuration. Users
      interact with a <code>HTMLPurifier_Config</code> object to
      set configuration directives. The values they set are validated according
      to a configuration schema, <code>HTMLPurifier_ConfigSchema</code>.
    </p>

    <p>
      The schema is mostly transparent to end-users, but if you're doing development
      work for HTML Purifier and need to define a new configuration directive,
      you'll need to interact with it. We'll also talk about how to define
      userspace configuration directives at the very end.
    </p>

    <h2>Write a directive file</h2>

    <p>
      Directive files define configuration directives to be used by
      HTML Purifier. They are placed in <code>library/HTMLPurifier/ConfigSchema/schema/</code>
      in the form <code><em>Namespace</em>.<em>Directive</em>.txt</code> (I
      couldn't think of a more descriptive file extension.)
      Directive files are actually what we call <code>StringHash</code>es,
      i.e. associative arrays represented in a string form reminiscent of
      <a href="http://qa.php.net/write-test.php">PHPT</a> tests. Here's a
      sample directive file, <code>Test.Sample.txt</code>:
    </p>

    <pre>Test.Sample
TYPE: string/null
DEFAULT: NULL
ALLOWED: 'foo', 'bar'
VALUE-ALIASES: 'baz' => 'bar'
VERSION: 3.1.0
--DESCRIPTION--
This is a sample configuration directive for the purposes of the
&lt;code&gt;dev-config-schema.html&lt;code&gt; documentation.
--ALIASES--
Test.Example</pre>

    <p>
      Each of these segments has a specific meaning:
    </p>

    <table class="table">
      <thead>
        <tr>
          <th>Key</th>
          <th>Example</th>
          <th>Description</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>ID</td>
          <td>Test.Sample</td>
          <td>The name of the directive, in the form Namespace.Directive
          (implicitly the first line)</td>
        </tr>
        <tr>
          <td>TYPE</td>
          <td>string/null</td>
          <td>The type of variable this directive accepts. See below for
          details. You can also add <code>/null</code> to the end of
          any basic type to allow null values too.</td>
        </tr>
        <tr>
          <td>DEFAULT</td>
          <td>NULL</td>
          <td>A parseable PHP expression of the default value.</td>
        </tr>
        <tr>
          <td>DESCRIPTION</td>
          <td>This is a...</td>
          <td>An HTML description of what this directive does.</td>
        </tr>
        <tr>
          <td>VERSION</td>
          <td>3.1.0</td>
          <td><em>Recommended</em>. The version of HTML Purifier this directive was added.
          Directives that have been around since 1.0.0 don't have this,
          but any new ones should.</td>
        </tr>
        <tr>
          <td>ALIASES</td>
          <td>Test.Example</td>
          <td><em>Optional</em>. A comma separated list of aliases for this directive.
          This is most useful for backwards compatibility and should
          not be used otherwise.</td>
        </tr>
        <tr>
          <td>ALLOWED</td>
          <td>'foo', 'bar'</td>
          <td><em>Optional</em>. Set of allowed value for a directive,
          a comma separated list of parseable PHP expressions. This
          is only allowed string, istring, text and itext TYPEs.</td>
        </tr>
        <tr>
          <td>VALUE-ALIASES</td>
          <td>'baz' =&gt; 'bar'</td>
          <td><em>Optional</em>. Mapping of one value to another, and
          should be a comma separated list of keypair duples. This
          is only allowed string, istring, text and itext TYPEs.</td>
        </tr>
        <tr>
          <td>DEPRECATED-VERSION</td>
          <td>3.1.0</td>
          <td><em>Not shown</em>. Indicates that the directive was
          deprecated this version.</td>
        </tr>
        <tr>
          <td>DEPRECATED-USE</td>
          <td>Test.NewDirective</td>
          <td><em>Not shown</em>. Indicates what new directive should be
          used instead. Note that the directives will functionally be
          different, although they should offer the same functionality.
          If they are identical, use an alias instead.</td>
        </tr>
        <tr>
          <td>EXTERNAL</td>
          <td>CSSTidy</td>
          <td><em>Not shown</em>. Indicates if there is an external library
          the user will need to download and install to use this configuration
          directive. As of right now, this is merely a Google-able name; future
          versions may also provide links and instructions.</td>
        </tr>
      </tbody>
    </table>

    <p>
      Some notes on format and style:
    </p>

    <ul>
      <li>
        Each of these keys can be expressed in the short format
        (<code>KEY: Value</code>) or the long format
        (<code>--KEY--</code> with value beneath). You must use the
        long format if multiple lines are needed, or if a long format
        has been used already (that's why <code>ALIASES</code> in our
        example is in the long format); otherwise, it's user preference.
      </li>
      <li>
        The HTML descriptions should be wrapped at about 80 columns; do
        not rely on editor word-wrapping.
      </li>
    </ul>

    <p>
      Also, as promised, here is the set of possible types:
    </p>

    <table class="table">
      <thead>
        <tr>
          <th>Type</th>
          <th>Example</th>
          <th>Description</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>string</td>
          <td>'Foo'</td>
          <td><a href="http://docs.php.net/manual/en/language.types.string.php">String</a> without newlines</td>
        </tr>
        <tr>
          <td>istring</td>
          <td>'foo'</td>
          <td>Case insensitive ASCII string without newlines</td>
        </tr>
        <tr>
          <td>text</td>
          <td>"A<em>\n</em>b"</td>
          <td>String with newlines</td>
        </tr>
        <tr>
          <td>itext</td>
          <td>"a<em>\n</em>b"</td>
          <td>Case insensitive ASCII string without newlines</td>
        </tr>
        <tr>
          <td>int</td>
          <td>23</td>
          <td>Integer</td>
        </tr>
        <tr>
          <td>float</td>
          <td>3.0</td>
          <td>Floating point number</td>
        </tr>
        <tr>
          <td>bool</td>
          <td>true</td>
          <td>Boolean</td>
        </tr>
        <tr>
          <td>lookup</td>
          <td>array('key' =&gt; true)</td>
          <td>Lookup array, used with <code>isset($var[$key])</code></td>
        </tr>
        <tr>
          <td>list</td>
          <td>array('f', 'b')</td>
          <td>List array, with ordered numerical indexes</td>
        </tr>
        <tr>
          <td>hash</td>
          <td>array('key' =&gt; 'val')</td>
          <td>Associative array of keys to values</td>
        </tr>
        <tr>
          <td>mixed</td>
          <td>new stdclass</td>
          <td>Any PHP variable is fine</td>
        </tr>
      </tbody>
    </table>

    <p>
      The examples represent what will be returned out of the configuration
      object; users have a little bit of leeway when setting configuration
      values (for example, a lookup value can be specified as a list;
      HTML Purifier will flip it as necessary.) These types are defined
      in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php">
      library/HTMLPurifier/VarParser.php</a>.
    </p>

    <p>
      For more information on what values are allowed, and how they are parsed,
      consult <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
      library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well
      as <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php">
      library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for
      the semantics of the parsed values.
    </p>

    <h2>Refreshing the cache</h2>

    <p>
      You may have noticed that your directive file isn't doing anything
      yet. That's because it hasn't been added to the runtime
      <code>HTMLPurifier_ConfigSchema</code> instance. Run
      <code>maintenance/generate-schema-cache.php</code> to fix this.
      If there were no errors, you're good to go! Don't forget to add
      some unit tests for your functionality!
    </p>

    <p>
      If you ever make changes to your configuration directives, you
      will need to run this script again.
    </p>
    <h2>Adding in-house schema definitions</h2>

    <p>
      Placing stuff directly in HTML Purifier's source tree is generally not a
      good idea, so HTML Purifier 4.0.0+ has some facilities in place to make your
      life easier.
    </p>

    <p>
      The first is to pass an extra parameter to <code>maintenance/generate-schema-cache.php</code>
      with the location of your directory (relative or absolute path will do). For example,
      if I'm storing my custom definitions in <em>/var/htmlpurifier/myschema</em>, run:
      <code>php maintenance/generate-schema-cache.php /var/htmlpurifier/myschema</code>.
    </p>

    <p>
      Alternatively, you can create a small loader PHP file in the HTML Purifier base
      directory named <code>config-schema.php</code> (this is the same directory
      you would place a <code>test-settings.php</code> file).  In this file, add
      the following line for each directory you want to load:
    </p>

<pre>$builder-&gt;buildDir($interchange, '/var/htmlpurifier/myschema');</pre>

    <p>You can even load a single file using:</p>

<pre>$builder-&gt;buildFile($interchange, '/var/htmlpurifier/myschema/MyApp.Directive.txt');</pre>

    <p>Storing custom definitions that you don't plan on sending back upstream in
    a separate directory is <em>definitely</em> a good idea! Additionally, picking
    a good namespace can go a long way to saving you grief if you want to use
    someone else's change, but they picked the same name, or if HTML Purifier
    decides to add support for a configuration directive that has the same name.</p>

    <!-- TODO: how to name directives that rely on naming conventions -->

    <h2>Errors</h2>

    <p>
      All directive files go through a rigorous validation process
      through <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php">
      library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well
      as some basic checks during building. While
      listing every error out here is out-of-scope for this document, we
      can give some general tips for interpreting error messages.
      There are two types of errors: builder errors and validation errors.
    </p>

    <h3>Builder errors</h3>

    <blockquote>
      <p>
        <strong>Exception:</strong> Expected type string, got
        integer in DEFAULT in directive hash 'Ns.Dir'
      </p>
    </blockquote>

    <p>
      You can identify a builder error by the keyword "directive hash."
      These are the easiest to deal with, because they directly correspond
      with your directive file. Find the offending directive file (which
      is the directive hash plus the .txt extension), find the
      offending index ("in DEFAULT" means the DEFAULT key) and fix the error.
      This particular error would occur if your default value is not the same
      type as TYPE.
    </p>

    <h3>Validation errors</h3>

    <blockquote>
      <p>
        <strong>Exception:</strong> Alias 3 in valueAliases in directive
        'Ns.Dir' must be a string
      </p>
    </blockquote>

    <p>
      These are a little trickier, because we're not actually validating
      your directive file, or even the direct string hash representation.
      We're validating an Interchange object, and the error messages do
      not mention any string hash keys.
    </p>

    <p>
      Nevertheless, it's not difficult to figure out what went wrong.
      Read the "context" statements in reverse:
    </p>

    <dl>
      <dt>in directive 'Ns.Dir'</dt>
        <dd>This means we need to look at the directive file <code>Ns.Dir.txt</code></dd>
      <dt>in valueAliases</dt>
        <dd>There's no key actually called this, but there's one that's close:
          VALUE-ALIASES. Indeed, that's where to look.</dd>
      <dt>Alias 3</dt>
        <dd>The value alias that is equal to 3 is the culprit.</dd>
    </dl>

    <p>
      In this particular case, you're not allowed to alias integers values to
      strings values.
    </p>

    <p>
      The most difficult part is translating the Interchange member variable (valueAliases)
      into a directive file key (VALUE-ALIASES), but there's a one-to-one
      correspondence currently. If the two formats diverge, any discrepancies
      will be described in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
      library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>.
    </p>

    <h2>Internals</h2>

    <p>
      Much of the configuration schema framework's codebase deals with
      shuffling data from one format to another, and doing validation on this
      data.
      The keystone of all of this is the <code>HTMLPurifier_ConfigSchema_Interchange</code>
      class, which represents the purest, parsed representation of the schema.
    </p>

    <p>
      Hand-writing this data is unwieldy, however, so we write directive files.
      These directive files are parsed by <code>HTMLPurifier_StringHashParser</code>
      into <code>HTMLPurifier_StringHash</code>es, which then
      are run through <code>HTMLPurifier_ConfigSchema_InterchangeBuilder</code>
      to construct the interchange object.
    </p>

    <p>
      From the interchange object, the data can be siphoned into other forms
      using <code>HTMLPurifier_ConfigSchema_Builder</code> subclasses.
      For example, <code>HTMLPurifier_ConfigSchema_Builder_ConfigSchema</code>
      generates a runtime <code>HTMLPurifier_ConfigSchema</code> object,
      which <code>HTMLPurifier_Config</code> uses to validate its incoming
      data. There is also an XML serializer, which is used to build documentation.
    </p>

  </body>
</html>

<!-- vim: et sw=4 sts=4
-->