diff options
Diffstat (limited to 'lib/htmlpurifier/docs/enduser-slow.html')
-rw-r--r-- | lib/htmlpurifier/docs/enduser-slow.html | 120 |
1 files changed, 120 insertions, 0 deletions
diff --git a/lib/htmlpurifier/docs/enduser-slow.html b/lib/htmlpurifier/docs/enduser-slow.html new file mode 100644 index 000000000..f0ea02de1 --- /dev/null +++ b/lib/htmlpurifier/docs/enduser-slow.html @@ -0,0 +1,120 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head> +<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> +<meta name="description" content="Explains how to speed up HTML Purifier through caching or inbound filtering." /> +<link rel="stylesheet" type="text/css" href="./style.css" /> + +<title>Speeding up HTML Purifier - HTML Purifier</title> + +</head><body> + +<h1 class="subtitled">Speeding up HTML Purifier</h1> +<div class="subtitle">...also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG page</div> + +<div id="filing">Filed under End-User</div> +<div id="index">Return to the <a href="index.html">index</a>.</div> +<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div> + +<p>HTML Purifier is a very powerful library. But with power comes great +responsibility, in the form of longer execution times. Remember, this +library isn't lightly grazing over submitted HTML: it's deconstructing +the whole thing, rigorously checking the parts, and then putting it back +together. </p> + +<p>So, if it so turns out that HTML Purifier is kinda too slow for outbound +filtering, you've got a few options: </p> + +<h2>Inbound filtering</h2> + +<p>Perform filtering of HTML when it's submitted by the user. Since the +user is already submitting something, an extra half a second tacked on +to the load time probably isn't going to be that huge of a problem. +Then, displaying the content is a simple a manner of outputting it +directly from your database/filesystem. The trouble with this method is +that your user loses the original text, and when doing edits, will be +handling the filtered text. While this may be a good thing, especially +if you're using a WYSIWYG editor, it can also result in data-loss if a +user makes a typo. </p> + +<p>Example (non-functional):</p> + +<pre><?php + /** + * FORM SUBMISSION PAGE + * display_error($message) : displays nice error page with message + * display_success() : displays a nice success page + * display_form() : displays the HTML submission form + * database_insert($html) : inserts data into database as new row + */ + if (!empty($_POST)) { + require_once '/path/to/library/HTMLPurifier.auto.php'; + require_once 'HTMLPurifier.func.php'; + $dirty_html = isset($_POST['html']) ? $_POST['html'] : false; + if (!$dirty_html) { + display_error('You must write some HTML!'); + } + $html = HTMLPurifier($dirty_html); + database_insert($html); + display_success(); + // notice that $dirty_html is *not* saved + } else { + display_form(); + } +?></pre> + +<h2>Caching the filtered output</h2> + +<p>Accept the submitted text and put it unaltered into the database, but +then also generate a filtered version and stash that in the database. +Serve the filtered version to readers, and the unaltered version to +editors. If need be, you can invalidate the cache and have the cached +filtered version be regenerated on the first page view. Pros? Full data +retention. Cons? It's more complicated, and opens other editors up to +XSS if they are using a WYSIWYG editor (to fix that, they'd have to be +able to get their hands on the *really* original text served in +plaintext mode). </p> + +<p>Example (non-functional):</p> + +<pre><?php + /** + * VIEW PAGE + * display_error($message) : displays nice error page with message + * cache_get($id) : retrieves HTML from fast cache (db or file) + * cache_insert($id, $html) : inserts good HTML into cache system + * database_get($id) : retrieves raw HTML from database + */ + $id = isset($_GET['id']) ? (int) $_GET['id'] : false; + if (!$id) { + display_error('Must specify ID.'); + exit; + } + $html = cache_get($id); // filesystem or database + if ($html === false) { + // cache didn't have the HTML, generate it + $raw_html = database_get($id); + require_once '/path/to/library/HTMLPurifier.auto.php'; + require_once 'HTMLPurifier.func.php'; + $html = HTMLPurifier($raw_html); + cache_insert($id, $html); + } + echo $html; +?></pre> + +<h2>Summary</h2> + +<p>In short, inbound filtering is the simple option and caching is the +robust option (albeit with bigger storage requirements). </p> + +<p>There is a third option, independent of the two we've discussed: profile +and optimize HTMLPurifier yourself. Be sure to report back your results +if you decide to do that! Especially if you port HTML Purifier to C++. +<tt>;-)</tt></p> + +</body> +</html> + +<!-- vim: et sw=4 sts=4 +--> |