diff options
Diffstat (limited to 'lib/htmlpurifier/docs/proposal-plists.txt')
-rw-r--r-- | lib/htmlpurifier/docs/proposal-plists.txt | 218 |
1 files changed, 218 insertions, 0 deletions
diff --git a/lib/htmlpurifier/docs/proposal-plists.txt b/lib/htmlpurifier/docs/proposal-plists.txt new file mode 100644 index 000000000..eef8ade61 --- /dev/null +++ b/lib/htmlpurifier/docs/proposal-plists.txt @@ -0,0 +1,218 @@ +THE UNIVERSAL DESIGN PATTERN: PROPERTIES +Steve Yegge + +Implementation: + get(name) + put(name, value) + has(name) + remove(name) + iteration, with filtering [this will be our namespaces] + parent + +Representations: + - Keys are strings + - It's nice to not need to quote keys (if we formulate our own language, + consider this) + - Property not present representation (key missing) + - Frequent removal/re-add may have null help. If null is valid, use + another value. (PHP semantics are weird here) + +Data structures: + - LinkedHashMap is wonderful (O(1) access and maintains order) + - Using a special property that points to the parent is usual + - Multiple inheritance possible, need rules for which to lookup first + - Iterative inheritance is best + - Consider performance! + +Deletion + - Tricky problem with inheritance + - Distinguish between "not found" and "look in my parent for the property" + [Maybe HTML Purifier won't allow deletion] + +Read/write asymmetry (it's correct!) + +Read-only plists + - Allow ability to freeze [this is what we have already] + - Don't overuse it + +Performance: + - Intern strings (PHP does this already) + - Don't be case-insensitive + - If all properties in a plist are known a-priori, you can use a "perfect" + hash function. Often overkill. + - Copy-on-read caching "plundering" reduces lookup, but uses memory and can + grow stale. Use as last resort. + - Refactoring to fields. Watch for API compatibility, system complexity, + and lack of flexibility. + - Refrigerator: external data-structure to hold plists + +Transient properties: + [Don't need to worry about this] + - Use a separate plist for transient properties + - Non-numeric override; numeric should ADD + - Deletion: removeTransientProperty() and transientlyRemoveProperty() + +Persistence: + - XML/JSON are good + - Text-based is good for readability, maintainability and bootstrapping + - Compressed binary format for network transport [not necessary] + - RDBMS or XML database + +Querying: [not relevant] + - XML database is nice for XPath/XQuery + - jQuery for JSON + - Just load it all into a program + +Backfills/Data integrity: + - Use usual methods + - Lazy backfill is a nice hack + +Type systems: + - Flags: ReadOnly, Permanent, DontEnum + - Typed properties isn't that useful [It's also Not-PHP] + - Seperate meta-list of directive properties IS useful + - Duck typing is useful for systems designed fully around properties pattern + +Trade-off: + + Flexibility + + Extensibility + + Unit-testing/prototype-speed + - Performance + - Data integrity + - Navagability/Query-ability + - Reversability (hard to go back) + +HTML Purifier + +We are not happy with our current system of defining configuration directives, +because it has become clear that things will get a lot nicer if we allow +multiple namespaces, and there are some features that naturally lend themselves +to inheritance, which we do not really support well. + +One of the considered implementation changes would be to go from a structure +like: + +array( + 'Namespace' => array( + 'Directive' => 'val1', + 'Directive2' => 'val2', + ) +) + +to: + +array( + 'Namespace.Directive' => 'val1', + 'Namespace.Directive2' => 'val2', +) + +The below implementation takes more memory, however, and it makes it a bit +complicated to grab all values from a namespace. + +The alternate implementation choice is to allow nested plists. This keeps +iteration easy, but is problematic for inheritance (it would be difficult +to distinguish a plist from an array) and retrieval (when specifying multiple +namespaces we would need some multiple de-referencing). + +---- + +We can bite the performance hit, and just do iteration with filter +(the strncmp call should be relatively cheap). Then, users should be able +to optimize doing something like: + +$config = HTMLPurifier_Config::createDefault(); +if (!file_exists('config.php')) { + // set up $config + $config->save('config.php'); +} else { + $config->load('config.php'); +} + +Or maybe memcache, or something. This means that "// set up $config" must +not have any dynamic parts, or the user has to invalidate the cache when +they do update it. We have to think about this a little more carefully; the +file call might be more expensive. + +---- + +This might get expensive, however, when we actually care about iterating +over the configuration and want the actual values. So what about nesting the +lists? + +"ns.sub.directive" => values['ns']['sub']['directive'] + +We can distinguish between plists and arrays by using ArrayObjects for the +plists, and regular arrays for the arrays? Alternatively, use ArrayObjects +for the arrays, and regular arrays for the plists. + +---- + +Implementation demands, and what has caused them: + +1. DefinitionCache, the HTML, CSS and URI namespaces have caches attached to them + Results: + - getBatchSerial() + - getBatch() : in general, the ability to traverse just a namespace + +2. AutoFormat/Filter, this is a plugin architecture, directives not hard-coded + - getBatch() + +3. Configuration form + - Namespaces used to organize directives + +Other than that, we have a pure plist. PERHAPS we should maintain separate things +for these different demands. + +Issue 2: Directives for configuring the plugins are regular plists, but +when enabling them, while it's "plist-ish", what you're really doing is adding +them to an array of "autoformatters"/"filters" to enable. We can setup +magic BC as well as in the new interface, but there should also be an +add('AutoFormat', 'AutoParagraph'); which does the right thing. + +One thing to consider is whether or not inheritance rules will apply to these. +I'd say yes. That means that they're still plisty, in fact, the underlying +implementation will probably be a plist. However, they will get their OWN +plists, and will NOT support nesting. + +Issue 1: Our current implementation is generally not efficient; md5(serialize($foo)) +is pretty expensive. So, I don't think there will be any problems if it +gets "less" efficient, as long as we give users a properly fast alternative; +DefinitionRev gives us a way to do this, by simply telling the user they must +update it whenever they update Configuration directives as well. (There are +obvious BC concerns here). + +In such a case, we simply iterate over our plist (performing full retrievals +for each value), grab the entries we care about, and then serialize and hash. +It's going to be slow either way, due to the ability of plists to inherit. +If we ksort(), we don't have to traverse the entire array, however, the +cost of a ksort() call may not be worth it. + +At this point, last time, I started worrying about the performance implications +of allowing inheritance, and wondering whether or not I wanted to squash +the plist. At first blush, our code might be under the assumption that +accessing properties is cheap; but actually we prefer to copy out the value +into a member variable if it's going to be used many times. With this is mind +I don't think CPU consumption from a few nested function calls is going to +be a problem. We *are* going to enforce a function only interface. + +The next issue at hand is how we're going to manage the "special" plists, +which should still be able to be inherited. Basically, it means that multiple +plists would be attached to the configuration object, which is not the +best for memory performance. The alternative is to keep them all in one +big plist, and then eat the one-time cost of traversing the entire plist +to grab the appropriate values. + +I think at this point we can write the generic interface, and then set up separate +plists if that ends up being necessary for performance (it probably won't.) Now +lets code our generic plist implementation. + +---- + +Iterating over the plist presents some problems. The way we've chosen to solve +this is to squash all of the parents. + +---- + +But I don't need iteration. + + vim: et sw=4 sts=4 |