aboutsummaryrefslogtreecommitdiffstats
path: root/lib/htmlpurifier/docs/dev-includes.txt
diff options
context:
space:
mode:
Diffstat (limited to 'lib/htmlpurifier/docs/dev-includes.txt')
-rw-r--r--lib/htmlpurifier/docs/dev-includes.txt281
1 files changed, 0 insertions, 281 deletions
diff --git a/lib/htmlpurifier/docs/dev-includes.txt b/lib/htmlpurifier/docs/dev-includes.txt
deleted file mode 100644
index d3382b593..000000000
--- a/lib/htmlpurifier/docs/dev-includes.txt
+++ /dev/null
@@ -1,281 +0,0 @@
-
-INCLUDES, AUTOLOAD, BYTECODE CACHES and OPTIMIZATION
-
-The Problem
------------
-
-HTML Purifier contains a number of extra components that are not used all
-of the time, only if the user explicitly specifies that we should use
-them.
-
-Some of these optional components are optionally included (Filter,
-Language, Lexer, Printer), while others are included all the time
-(Injector, URIFilter, HTMLModule, URIScheme). We will stipulate that these
-are all developer specified: it is conceivable that certain Tokens are not
-used, but this is user-dependent and should not be trusted.
-
-We should come up with a consistent way to handle these things and ensure
-that we get the maximum performance when there is bytecode caches and
-when there are not. Unfortunately, these two goals seem contrary to each
-other.
-
-A peripheral issue is the performance of ConfigSchema, which has been
-shown take a large, constant amount of initialization time, and is
-intricately linked to the issue of includes due to its pervasive use
-in our plugin architecture.
-
-Pros and Cons
--------------
-
-We will assume that user-based extensions will be included by them.
-
-Conditional includes:
- Pros:
- - User management is simplified; only a single directive needs to be set
- - Only necessary code is included
- Cons:
- - Doesn't play nicely with opcode caches
- - Adds complexity to standalone version
- - Optional configuration directives are not exposed without a little
- extra coaxing (not implemented yet)
-
-Include it all:
- Pros:
- - User management is still simple
- - Plays nicely with opcode caches and standalone version
- - All configuration directives are present
- Cons:
- - Lots of (how much?) extra code is included
- - Classes that inherit from external libraries will cause compile
- errors
-
-Build an include stub (Let's do this!):
- Pros:
- - Only necessary code is included
- - Plays nicely with opcode caches and standalone version
- - require (without once) can be used, see above
- - Could further extend as a compilation to one file
- Cons:
- - Not implemented yet
- - Requires user intervention and use of a command line script
- - Standalone script must be chained to this
- - More complex and compiled-language-like
- - Requires a whole new class of system-wide configuration directives,
- as configuration objects can be reused
- - Determining what needs to be included can be complex (see above)
- - No way of autodetecting dynamically instantiated classes
- - Might be slow
-
-Include stubs
--------------
-
-This solution may be "just right" for users who are heavily oriented
-towards performance. However, there are a number of picky implementation
-details to work out beforehand.
-
-The number one concern is how to make the HTML Purifier files "work
-out of the box", while still being able to easily get them into a form
-that works with this setup. As the codebase stands right now, it would
-be necessary to strip out all of the require_once calls. The only way
-we could get rid of the require_once calls is to use __autoload or
-use the stub for all cases (which might not be a bad idea).
-
- Aside
- -----
- An important thing to remember, however, is that these require_once's
- are valuable data about what classes a file needs. Unfortunately, there's
- no distinction between whether or not the file is needed all the time,
- or whether or not it is one of our "optional" files. Thus, it is
- effectively useless.
-
- Deprecated
- ----------
- One of the things I'd like to do is have the code search for any classes
- that are explicitly mentioned in the code. If a class isn't mentioned, I
- get to assume that it is "optional," i.e. included via introspection.
- The choice is either to use PHP's tokenizer or use regexps; regexps would
- be faster but a tokenizer would be more correct. If this ends up being
- unfeasible, adding dependency comments isn't a bad idea. (This could
- even be done automatically by search/replacing require_once, although
- we'd have to manually inspect the results for the optional requires.)
-
- NOTE: This ends up not being necessary, as we're going to make the user
- figure out all the extra classes they need, and only include the core
- which is predetermined.
-
-Using the autoload framework with include stubs works nicely with
-introspective classes: instead of having to have require_once inside
-the function, we can let autoload do the work; we simply need to
-new $class or accept the object straight from the caller. Handling filters
-becomes a simple matter of ticking off configuration directives, and
-if ConfigSchema spits out errors, adding the necessary includes. We could
-also use the autoload framework as a fallback, in case the user forgets
-to make the include, but doesn't really care about performance.
-
- Insight
- -------
- All of this talk is merely a natural extension of what our current
- standalone functionality does. However, instead of having our code
- perform the includes, or attempting to inline everything that possibly
- could be used, we boot the issue to the user, making them include
- everything or setup the fallback autoload handler.
-
-Configuration Schema
---------------------
-
-A common deficiency for all of the conditional include setups (including
-the dynamically built include PHP stub) is that if one of this
-conditionally included files includes a configuration directive, it
-is not accessible to configdoc. A stopgap solution for this problem is
-to have it piggy-back off of the data in the merge-library.php script
-to figure out what extra files it needs to include, but if the file also
-inherits classes that don't exist, we're in big trouble.
-
-I think it's high time we centralized the configuration documentation.
-However, the type checking has been a great boon for the library, and
-I'd like to keep that. The compromise is to use some other source, and
-then parse it into the ConfigSchema internal format (sans all of those
-nasty documentation strings which we really don't need at runtime) and
-serialize that for future use.
-
-The next question is that of format. XML is very verbose, and the prospect
-of setting defaults in it gives me willies. However, this may be necessary.
-Splitting up the file into manageable chunks may alleviate this trouble,
-and we may be even want to create our own format optimized for specifying
-configuration. It might look like (based off the PHPT format, which is
-nicely compact yet unambiguous and human-readable):
-
-Core.HiddenElements
-TYPE: lookup
-DEFAULT: array('script', 'style') // auto-converted during processing
---ALIASES--
-Core.InvisibleElements, Core.StupidElements
---DESCRIPTION--
-<p>
- Blah blah
-</p>
-
-The first line is the directive name, the lines after that prior to the
-first --HEADER-- block are single-line values, and then after that
-the multiline values are there. No value is restricted to a particular
-format: DEFAULT could very well be multiline if that would be easier.
-This would make it insanely easy, also, to add arbitrary extra parameters,
-like:
-
-VERSION: 3.0.0
-ALLOWED: 'none', 'light', 'medium', 'heavy' // this is wrapped in array()
-EXTERNAL: CSSTidy // this would be documented somewhere else with a URL
-
-The final loss would be that you wouldn't know what file the directive
-was used in; with some clever regexps it should be possible to
-figure out where $config->get($ns, $d); occurs. Reflective calls to
-the configuration object is mitigated by the fact that getBatch is
-used, so we can simply talk about that in the namespace definition page.
-This might be slow, but it would only happen when we are creating
-the documentation for consumption, and is sugar.
-
-We can put this in a schema/ directory, outside of HTML Purifier. The serialized
-data gets treated like entities.ser.
-
-The final thing that needs to be handled is user defined configurations.
-They can be added at runtime using ConfigSchema::registerDirectory()
-which globs the directory and grabs all of the directives to be incorporated
-in. Then, the result is saved. We may want to take advantage of the
-DefinitionCache framework, although it is not altogether certain what
-configuration directives would be used to generate our key (meta-directives!)
-
- Further thoughts
- ----------------
- Our master configuration schema will only need to be updated once
- every new version, so it's easily versionable. User specified
- schema files are far more volatile, but it's far too expensive
- to check the filemtimes of all the files, so a DefinitionRev style
- mechanism works better. However, we can uniquely identify the
- schema based on the directories they loaded, so there's no need
- for a DefinitionId until we give them full programmatic control.
-
- These variables should be directly incorporated into ConfigSchema,
- and ConfigSchema should handle serialization. Some refactoring will be
- necessary for the DefinitionCache classes, as they are built with
- Config in mind. If the user changes something, the cache file gets
- rebuilt. If the version changes, the cache file gets rebuilt. Since
- our unit tests flush the caches before we start, and the operation is
- pretty fast, this will not negatively impact unit testing.
-
-One last thing: certain configuration directives require that files
-get added. They may even be specified dynamically. It is not a good idea
-for the HTMLPurifier_Config object to be used directly for such matters.
-Instead, the userland code should explicitly perform the includes. We may
-put in something like:
-
-REQUIRES: HTMLPurifier_Filter_ExtractStyleBlocks
-
-To indicate that if that class doesn't exist, and the user is attempting
-to use the directive, we should fatally error out. The stub includes the core files,
-and the user includes everything else. Any reflective things like new
-$class would be required to tie in with the configuration.
-
-It would work very well with rarely used configuration options, but it
-wouldn't be so good for "core" parts that can be disabled. In such cases
-the core include file would need to be modified, and the only way
-to properly do this is use the configuration object. Once again, our
-ability to create cache keys saves the day again: we can create arbitrary
-stub files for arbitrary configurations and include those. They could
-even be the single file affairs. The only thing we'd need to include,
-then, would be HTMLPurifier_Config! Then, the configuration object would
-load the library.
-
- An aside...
- -----------
- One questions, however, the wisdom of letting PHP files write other PHP
- files. It seems like a recipe for disaster, or at least lots of headaches
- in highly secured setups, where PHP does not have the ability to write
- to its root. In such cases, we could use sticky bits or tell the user
- to manually generate the file.
-
- The other troublesome bit is actually doing the calculations necessary.
- For certain cases, it's simple (such as URIScheme), but for AttrDef
- and HTMLModule the dependency trees are very complex in relation to
- %HTML.Allowed and friends. I think that this idea should be shelved
- and looked at a later, less insane date.
-
-An interesting dilemma presents itself when a configuration form is offered
-to the user. Normally, the configuration object is not accessible without
-editing PHP code; this facility changes thing. The sensible thing to do
-is stipulate that all classes required by the directives you allow must
-be included.
-
-Unit testing
-------------
-
-Setting up the parsing and translation into our existing format would not
-be difficult to do. It might represent a good time for us to rethink our
-tests for these facilities; as creative as they are, they are often hacky
-and require public visibility for things that ought to be protected.
-This is especially applicable for our DefinitionCache tests.
-
-Migration
----------
-
-Because we are not *adding* anything essentially new, it should be trivial
-to write a script to take our existing data and dump it into the new format.
-Well, not trivial, but fairly easy to accomplish. Primary implementation
-difficulties would probably involve formatting the file nicely.
-
-Backwards-compatibility
------------------------
-
-I expect that the ConfigSchema methods should stick around for a little bit,
-but display E_USER_NOTICE warnings that they are deprecated. This will
-require documentation!
-
-New stuff
----------
-
-VERSION: Version number directive was introduced
-DEPRECATED-VERSION: If the directive was deprecated, when was it deprecated?
-DEPRECATED-USE: If the directive was deprecated, what should the user use now?
-REQUIRES: What classes does this configuration directive require, but are
- not part of the HTML Purifier core?
-
- vim: et sw=4 sts=4