diff options
Diffstat (limited to 'lib/htmlpurifier/docs/proposal-plists.txt')
-rw-r--r-- | lib/htmlpurifier/docs/proposal-plists.txt | 218 |
1 files changed, 0 insertions, 218 deletions
diff --git a/lib/htmlpurifier/docs/proposal-plists.txt b/lib/htmlpurifier/docs/proposal-plists.txt deleted file mode 100644 index eef8ade61..000000000 --- a/lib/htmlpurifier/docs/proposal-plists.txt +++ /dev/null @@ -1,218 +0,0 @@ -THE UNIVERSAL DESIGN PATTERN: PROPERTIES -Steve Yegge - -Implementation: - get(name) - put(name, value) - has(name) - remove(name) - iteration, with filtering [this will be our namespaces] - parent - -Representations: - - Keys are strings - - It's nice to not need to quote keys (if we formulate our own language, - consider this) - - Property not present representation (key missing) - - Frequent removal/re-add may have null help. If null is valid, use - another value. (PHP semantics are weird here) - -Data structures: - - LinkedHashMap is wonderful (O(1) access and maintains order) - - Using a special property that points to the parent is usual - - Multiple inheritance possible, need rules for which to lookup first - - Iterative inheritance is best - - Consider performance! - -Deletion - - Tricky problem with inheritance - - Distinguish between "not found" and "look in my parent for the property" - [Maybe HTML Purifier won't allow deletion] - -Read/write asymmetry (it's correct!) - -Read-only plists - - Allow ability to freeze [this is what we have already] - - Don't overuse it - -Performance: - - Intern strings (PHP does this already) - - Don't be case-insensitive - - If all properties in a plist are known a-priori, you can use a "perfect" - hash function. Often overkill. - - Copy-on-read caching "plundering" reduces lookup, but uses memory and can - grow stale. Use as last resort. - - Refactoring to fields. Watch for API compatibility, system complexity, - and lack of flexibility. - - Refrigerator: external data-structure to hold plists - -Transient properties: - [Don't need to worry about this] - - Use a separate plist for transient properties - - Non-numeric override; numeric should ADD - - Deletion: removeTransientProperty() and transientlyRemoveProperty() - -Persistence: - - XML/JSON are good - - Text-based is good for readability, maintainability and bootstrapping - - Compressed binary format for network transport [not necessary] - - RDBMS or XML database - -Querying: [not relevant] - - XML database is nice for XPath/XQuery - - jQuery for JSON - - Just load it all into a program - -Backfills/Data integrity: - - Use usual methods - - Lazy backfill is a nice hack - -Type systems: - - Flags: ReadOnly, Permanent, DontEnum - - Typed properties isn't that useful [It's also Not-PHP] - - Seperate meta-list of directive properties IS useful - - Duck typing is useful for systems designed fully around properties pattern - -Trade-off: - + Flexibility - + Extensibility - + Unit-testing/prototype-speed - - Performance - - Data integrity - - Navagability/Query-ability - - Reversability (hard to go back) - -HTML Purifier - -We are not happy with our current system of defining configuration directives, -because it has become clear that things will get a lot nicer if we allow -multiple namespaces, and there are some features that naturally lend themselves -to inheritance, which we do not really support well. - -One of the considered implementation changes would be to go from a structure -like: - -array( - 'Namespace' => array( - 'Directive' => 'val1', - 'Directive2' => 'val2', - ) -) - -to: - -array( - 'Namespace.Directive' => 'val1', - 'Namespace.Directive2' => 'val2', -) - -The below implementation takes more memory, however, and it makes it a bit -complicated to grab all values from a namespace. - -The alternate implementation choice is to allow nested plists. This keeps -iteration easy, but is problematic for inheritance (it would be difficult -to distinguish a plist from an array) and retrieval (when specifying multiple -namespaces we would need some multiple de-referencing). - ----- - -We can bite the performance hit, and just do iteration with filter -(the strncmp call should be relatively cheap). Then, users should be able -to optimize doing something like: - -$config = HTMLPurifier_Config::createDefault(); -if (!file_exists('config.php')) { - // set up $config - $config->save('config.php'); -} else { - $config->load('config.php'); -} - -Or maybe memcache, or something. This means that "// set up $config" must -not have any dynamic parts, or the user has to invalidate the cache when -they do update it. We have to think about this a little more carefully; the -file call might be more expensive. - ----- - -This might get expensive, however, when we actually care about iterating -over the configuration and want the actual values. So what about nesting the -lists? - -"ns.sub.directive" => values['ns']['sub']['directive'] - -We can distinguish between plists and arrays by using ArrayObjects for the -plists, and regular arrays for the arrays? Alternatively, use ArrayObjects -for the arrays, and regular arrays for the plists. - ----- - -Implementation demands, and what has caused them: - -1. DefinitionCache, the HTML, CSS and URI namespaces have caches attached to them - Results: - - getBatchSerial() - - getBatch() : in general, the ability to traverse just a namespace - -2. AutoFormat/Filter, this is a plugin architecture, directives not hard-coded - - getBatch() - -3. Configuration form - - Namespaces used to organize directives - -Other than that, we have a pure plist. PERHAPS we should maintain separate things -for these different demands. - -Issue 2: Directives for configuring the plugins are regular plists, but -when enabling them, while it's "plist-ish", what you're really doing is adding -them to an array of "autoformatters"/"filters" to enable. We can setup -magic BC as well as in the new interface, but there should also be an -add('AutoFormat', 'AutoParagraph'); which does the right thing. - -One thing to consider is whether or not inheritance rules will apply to these. -I'd say yes. That means that they're still plisty, in fact, the underlying -implementation will probably be a plist. However, they will get their OWN -plists, and will NOT support nesting. - -Issue 1: Our current implementation is generally not efficient; md5(serialize($foo)) -is pretty expensive. So, I don't think there will be any problems if it -gets "less" efficient, as long as we give users a properly fast alternative; -DefinitionRev gives us a way to do this, by simply telling the user they must -update it whenever they update Configuration directives as well. (There are -obvious BC concerns here). - -In such a case, we simply iterate over our plist (performing full retrievals -for each value), grab the entries we care about, and then serialize and hash. -It's going to be slow either way, due to the ability of plists to inherit. -If we ksort(), we don't have to traverse the entire array, however, the -cost of a ksort() call may not be worth it. - -At this point, last time, I started worrying about the performance implications -of allowing inheritance, and wondering whether or not I wanted to squash -the plist. At first blush, our code might be under the assumption that -accessing properties is cheap; but actually we prefer to copy out the value -into a member variable if it's going to be used many times. With this is mind -I don't think CPU consumption from a few nested function calls is going to -be a problem. We *are* going to enforce a function only interface. - -The next issue at hand is how we're going to manage the "special" plists, -which should still be able to be inherited. Basically, it means that multiple -plists would be attached to the configuration object, which is not the -best for memory performance. The alternative is to keep them all in one -big plist, and then eat the one-time cost of traversing the entire plist -to grab the appropriate values. - -I think at this point we can write the generic interface, and then set up separate -plists if that ends up being necessary for performance (it probably won't.) Now -lets code our generic plist implementation. - ----- - -Iterating over the plist presents some problems. The way we've chosen to solve -this is to squash all of the parents. - ----- - -But I don't need iteration. - - vim: et sw=4 sts=4 |