aboutsummaryrefslogtreecommitdiffstats
path: root/lib/htmlpurifier/docs/dev-config-naming.txt
blob: 66db5bce3c0cd4104b9db59c25266a6cacd30bb8 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
Configuration naming

HTML Purifier 4.0.0 features a new configuration naming system that
allows arbitrary nesting of namespaces.  While there are certain cases
in which using two namespaces is obviously better (the canonical example
is where we were using AutoFormatParam to contain directives for AutoFormat
parameters), it is unclear whether or not a general migration to highly
namespaced directives is a good idea or not.

== Case studies ==

=== Attr.* ===

We have a dead duck HTML.Attr.Name.UseCDATA which migrated before we decided
to think this out thoroughly.

We currently have a large number of directives in the Attr.* namespace.
These directives tweak the behavior of some HTML attributes.  They have
the properties:

* While they apply to only one attribute at a time, the attribute can
  span over multiple elements (not necessarily all attributes, either).
  The information of which elements it impacts is either omitted or
  informally stated (EnableID applies to all elements, DefaultImageAlt
  applies to <img> tags, AllowedRev doesn't say but only applies to a tags).

* There is a certain degree of clustering that could be applied, especially
  to the ID directives.  The clustering could be done with respect to
  what element/attribute was used, i.e.

    *.id -> EnableID, IDBlacklistRegexp, IDBlacklist, IDPrefixLocal, IDPrefix
    img.src -> DefaultInvalidImage
    img.alt -> DefaultImageAlt, DefaultInvalidImageAlt
    bdo.dir -> DefaultTextDir
    a.rel -> AllowedRel
    a.rev -> AllowedRev
    a.target -> AllowedFrameTargets
    a.name -> Name.UseCDATA

* The directives often reference generic attribute types that were specified
  in the DTD/specification.  However, some of the behavior specifically relies
  on the fact that other use cases of the attribute are not, at current,
  supported by HTML Purifier.

    AllowedRel, AllowedRev -> heavily <a> specific; if <link> ends up being
        allowed, we will also have to give users specificity there (we also
        want to preserve generality) DTD %Linktypes, HTML5 distinguishes
        between <link> and <a>/<area>
    AllowedFrameTargets -> heavily <a> specific, but also used by <area>
        and <form>. Transitional DTD %FrameTarget, not present in strict,
        HTML5 calls them "browsing contexts"
    Default*Image* -> as a default parameter, is almost entirely exlcusive
        to <img>
    EnableID -> global attribute
    Name.UseCDATA -> heavily <a> specific, but has heavy other usage by
        many things

== AutoFormat.* ==

These have the fairly normal pluggable architecture that lends itself to
large amounts of namespaces (pluggability may be the key to figuring
out when gratuitous namespacing is good.)  Properties:

* Boolean directives are fair game for being namespaced: for example,
  RemoveEmpty.RemoveNbsp triggers RemoveEmpty.RemoveNbsp.Exceptions,
  the latter of which only makes sense when RemoveEmpty.RemoveNbsp
  is set to true. (The same applies to RemoveNbsp too)

The AutoFormat string is a bit long, but is the only bit of repeated
context.

== Core.* ==

Core is the potpourri of directives, mostly regarding some minor behavioral
tweaks for HTML handling abilities.

    AggressivelyFixLt
    ConvertDocumentToFragment
    DirectLexLineNumberSyncInterval
    LexerImpl
    MaintainLineNumbers
        Lexer
    CollectErrors
    Language
        Error handling (Language is ostensibly a little more general, but
        it's only used for error handling right now)
    ColorKeywords
        CSS and HTML
    Encoding
    EscapeNonASCIICharacters
        Character encoding
    EscapeInvalidChildren
    EscapeInvalidTags
    HiddenElements
    RemoveInvalidImg
        Lexing/Output
    RemoveScriptContents
        Deprecated

== HTML.* ==

    AllowedAttributes
    AllowedElements
    AllowedModules
    Allowed
    ForbiddenAttributes
    ForbiddenElements
        Element set tuning
    BlockWrapper
        Child def advanced twiddle
    CoreModules
    CustomDoctype
        Advanced HTMLModuleManager twiddles
    DefinitionID
    DefinitionRev
        Caching
    Doctype
    Parent
    Strict
    XHTML
        Global environment
    MaxImgLength
        Attribute twiddle? (applies to two attributes)
    Proprietary
    SafeEmbed
    SafeObject
    Trusted
        Extra functionality/tagsets
    TidyAdd
    TidyLevel
    TidyRemove
        Tidy

== Output.* ==

These directly affect the output of Generator. These are all advanced
twiddles.

== URI.* ==

    AllowedSchemes
    OverrideAllowedSchemes
        Scheme tuning
    Base
    DefaultScheme
    Host
        Global environment
    DefinitionID
    DefinitionRev
        Caching
    DisableExternalResources
    DisableExternal
    DisableResources
    Disable
        Contextual/authority tuning
    HostBlacklist
        Authority tuning
    MakeAbsolute
    MungeResources
    MungeSecretKey
    Munge
        Transformation behavior (munge can be grouped)