From fa48de33c2f6cefbac8bfec7cde75b75390d5f39 Mon Sep 17 00:00:00 2001 From: redmatrix Date: Wed, 15 Jun 2016 19:44:15 -0700 Subject: provide syntax based [colour] highlighting on code blocks for popular languages. I'm not happy with the line height on the list elements but couldn't see where this was defaulted. This uses the syntax [code=xxx]some code snippet[/code], where xxx represents a code/language style - with about 18 builtins. --- library/Text_Highlighter/README | 455 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 455 insertions(+) create mode 100644 library/Text_Highlighter/README (limited to 'library/Text_Highlighter/README') diff --git a/library/Text_Highlighter/README b/library/Text_Highlighter/README new file mode 100644 index 000000000..88f71aed2 --- /dev/null +++ b/library/Text_Highlighter/README @@ -0,0 +1,455 @@ +# $Id$ + +Introduction +============ + +Text_Highlighter is a class for syntax highlighting. The main idea is to +simplify creation of subclasses implementing syntax highlighting for +particular language. Subclasses do not implement any new functioanality, they +just provide syntax highlighting rules. The rules sources are in XML format. +To create a highlighter for a language, there is no need to code a new class +manually. Simply describe the rules in XML file and use Text_Highlighter_Generator +to create a new class. + + +This document does not contain a formal description of API - it is very +simple, and I believe providing some examples of code is sufficient. + + +Highlighter XML source +====================== + +Basics +------ + +Creating a new syntax highlighter begins with describing the highlighting +rules. There are two basic elements: block and region. A block is just a +portion of text matching a regular expression and highlighted with a single +color. Keyword is an example of a block. A region is defined by two regular +expressions: one for start of region, and another for the end. The main +difference from a block is that a region can contain blocks and regions +(including same-named regions). An example of a region is a group of +statements enclosed in curly brackets (this is used in many languages, for +example PHP and C). Also, characters matching start and end of a region may be +highlighted with their own color, and region contents with another. + +Blocks and regions may be declared as contained. Contained blocks and regions +can only appear inside regions. If a region or a block is not declared as +contained, it can appear both on top level and inside regions. Block or region +declared as not-contained can only appear on top level. + +For any region, a list of blocks and regions that can appear inside this +region can be specified. + +In this document, the term "color group" is used. Chunks of text assigned to +same color group will be highlighted with same color. Note that in versions +prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only +HTML output is supported, so "color group" is more appropriate term. + +Elements +-------- + +The toplevel element is . Attribute lang is required and denotes +the name of the language. Its value is used as a part of generated class name, +and must only contain letters, digits and underscores. Optional attribute +case, when given value yes, makes the language case sensitive (default is case +insensitive). Allowed subelements are: + + * : Information about the authors of the file. + : Information about a single author of the file. (May be used + multiple times, one per author.) + - name="...": Author's name. Required. + - email="...": Author's email address. Optional. + + * : Default color group. + - innerGroup="...": color group name. Required. + + * : Region definition + - name="...": Region name. Required. + - innerGroup="...": Default color group of region contents. Required. + - delimGroup="...": color group of start and end of region. Optional, + defaults to value of innerGroup attribute. + - start="...", end="...": Regular expression matching start and end + of region. Required. Regular expression delimiters are optional, but + if you need to specify delimiter, use /. The only case when the + delimiters are needed, is specifying regular expression modifiers, + such as m or U. Examples: \/\* or /$/m. + - contained="yes": Marks region as contained. + - never-contained="yes": Marks region as not-contained. + - : Elements allowed inside this region. + - all="yes" Region can contain any other region or block + (except not-contained). May be used multiple times. + - Do not allow certain regions or blocks. + - region="..." Name of region not allowed within + current region. + - block="..." Name of block not allowed within + current region. + - region="..." Name of region allowed within current region. + - block="..." Name of block allowed within current region. + - Only allow this region within certain regions. May be + used multiple times. + - block="..." Name of parent region + + * : Block definition + - name="...": Block name. Required. + - innerGroup="...": color group of block contents. Optional. If not + specified, color group of parent region or default color group will be + used. One would only want to omit this attribute if there are + keyword groups (see below) inherited from this block, and no special + highlighting should apply when the block does not match the keyword. + - match="..." Regular expression matching the block. Required. + Regular expression delimiters are optional, but if you need to + specify delimiter, use /. The only case when the delimiters are + needed, is specifying regular expression modifiers, such as m or U. + Examples: #|\/\/ or /$/m. + - contained="yes": Marks block as contained. + - never-contained="yes": Marks block as not-contained. + - Only allow this block within certain regions. May be used + multiple times. + - block="..." Name of parent region + - multiline="yes": Marks block as multi-line. By default, whole + blocks are assumed to reside in a single line. This make the things + faster. If you need to declare a multi-line block, use this + attribute. + - : Assigns another color group to a part of the block that + matched a subpattern. + - index="n": Subpattern index. Required. + - innerGroup="...": color group name. Required. + + This is an example from CSS highlighter: the measure is matched as + a whole, but the measurement units are highlighted with different + color. + + + + + + + * : Keyword group definition. Keyword groups are useful when you + want to highlight some words that match a condition for a block with a + different color. Keywords are defined with literal match, not regular + expressions. For example, you have a block named identifier matching a + general identifier, and want to highlight reserved words (which match + this block as well) with different color. You inherit a keyword group + "reserved" from "identifier" block. + - name="...": Keyword group. Required. + - ifdef="...", ifndef="..." : Conditional declaration. See + "Conditions" below. + - inherits="...": Inherited block name. Required. + - innerGroup="...": color group of keyword group. Required. + - case="yes|no": Overrides case-sensitivity of the language. + Optional, defaults to global value. + - : Single keyword definition. + - match="..." The keyword. Note: this is not a regular + expression, but literal match (possibly case insensitive). + +Note that for BC reasons element partClass is alias for partGroup, and +attributes innerClass and delimClass are aliases of innerGroup and +delimGroup, respectively. + + +Conditions +---------- + +Conditional declarations allow enabling or disabling certain highlighting +rules at runtime. For example, Java highlighter has a very big list of +keywords matching Java standard classes. Finding a match in this list can take +much time. For that reason, corresponding keyword group is declared with +"ifdef" attribute : + + + + + + ... + ... + + + + + +This keyword group will be only enabled when "java.builtins" is passed as an +element of "defines" option: + + $options = array( + 'defines' => array( + 'java.builtins', + ), + 'numbers' => HL_NUMBERS_TABLE, + ); + $highlighter = Text_Highlighter::factory('java', $options); + +"ifndef" attribute has reverse meaning. + +Currently, "ifdef" and "ifndef" attributes are only supported for +tag. + + + +Class generation +================ + +Creating XML description of highlighting rules is the most complicated part of +the process. To generate the class, you need just few lines of code: + + generate(); + $generator->saveCode('PHP.php'); + ?> + + + +Command-line class generation tool +================================== + +Example from previous section looks pretty simple, but it does not handle any +errors which may occur during parsing of XML source. The package provides a +command-line script to make generation of classes even more simple, and takes +care of possible errors. It is called generate (on Unix/Linux) or generate.bat +(on Windows). This script is able to process multiple files in one run, and +also to process XML from standard input and write generated code to standard +output. + + Usage: + generate options + + Options: + -x filename, --xml=filename + source XML file. Multiple input files can be specified, in which + case each -x option must be followed by -p unless -d is specified + Defaults to stdin + -p filename, --php=filename + destination PHP file. Defaults to stdout. If specied multiple times, + each -p must follow -x + -d dirname, --dir=dirname + Default destination directory. File names will be taken from XML input + ("lang" attribute of tag) + -h, --help + This help + +Examples + + Read from php.xml, write to PHP.php + + generate -x php.xml -p PHP.php + + Read from php.xml, write to standard output + + generate -x php.xml + + Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php + + generate -x php.xml -p PHP.php -x xml.xml -p XML.php + + Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to + /some/dir/XML.php (assuming that xml.xml contains , and + php.xml contains ) + + generate -x php.xml -x xml.xml -d /some/dir/ + + + +Renderers +========= + +Introduction +------------ + +Text_Highlighter supports renderes. Using renderers, you can get output in +different formats. Two renderers are included in the package: + + - HTML renderer. Generates HTML output. A style sheet should be linked to + the document to display colored text + + - Console renderer. Can be used to output highlighted text to + color-capable terminals, either directly or trough less -r + + +Renderers API +------------- + +Renderers are subclasses of Text_Highlighter_Renderer. Renderer should +override at least two methods - acceptToken and getOutput. Overriding other +methods is optional, depending on the nature of renderer's output and details +of implementation. + + string reset() + resets renderer state. This method is called every time before a new + source file is highlighted. + + string preprocess(string $code) + preprocesses code. Can be used, for example, to normalize whitespace + before highlighting. Returns preprocessed string. + + void acceptToken(string $group, string $content) + the core method of the renderer. Highlighter passes chunks of text to + this method in $content, and color group in $group + + void finalize() + signals the renderer that no more tokens are available. + + mixed getOutput() + returns generated output. + + +Setting renderer options +-------------------------------- + +Renderers accept an optional argument to their constructor - options array. +Elements of this array are renderer-specific. + +HTML renderer +------------- + +HTML renderer produces HTML output with optional line numbering. The renderer +itself does not provide information about actual colors of highlighted text. +Instead, is used, where XXX is replaced with color group +name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet. +If 'use_language' option with value evaluating to true was passed, class names +will be formatted as "LANG-hl-XXX", where LANG is language name as defined in +highlighter XML source ("lang" attribute of tag) in lower case. + +There are 3 special CSS classes: + + hl-main - this class applies to whole output or right table column, + depending on 'numbers' option + hl-gutter - applies to left column in table + hl-table - applies to whole table + +HTML renderer accepts following options (each being optional): + + * numbers - line numbering style. + 0 - no numbering (default) + HL_NUMBERS_LI - use
    for line numbering + HL_NUMBERS_TABLE - create a 2-column table, with line numbers in left + column and highlighted text in right column + + * tabsize - tabulation size. Defaults to 4 + + Example: + + require_once 'Text/Highlighter/Renderer/Html.php'; + $options = array( + 'numbers' => HL_NUMBERS_LI, + 'tabsize' => 8, + ); + $renderer = new Text_Highlighter_Renderer_HTML($options); + +Console renderer +---------------- + +Console renderer produces output for displaying on a color-capable terminal, +either directly or through less -r, using ANSI escape sequences. By default, +this renderer only highlights most common color groups. Additional colors +can be specified using 'colors' option. This renderer also accepts 'numbers' +option - a boolean value, and 'tabsize' option. + + Example : + + require_once 'Text/Highlighter/Renderer/Console.php'; + $colors = array( + 'prepro' => "\033[35m", + 'types' => "\033[32m", + ); + $options = array( + 'numbers' => true, + 'tabsize' => 8, + 'colors' => $colors, + ); + $renderer = new Text_Highlighter_Renderer_Console($options); + + +ANSI color escape sequences have the following format: + + ESC[#;#;....;#m + +where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is +one of the following: + + 0 for normal display + 1 for bold on + 4 underline (mono only) + 5 blink on + 7 reverse video on + 8 nondisplayed (invisible) + 30 black foreground + 31 red foreground + 32 green foreground + 33 yellow foreground + 34 blue foreground + 35 magenta foreground + 36 cyan foreground + 37 white foreground + 40 black background + 41 red background + 42 green background + 43 yellow background + 44 blue background + 45 magenta background + 46 cyan background + 47 white background + + +How to use Text_Highlighter class +================================= + +Creating a highlighter object +----------------------------- + +To create a highlighter for a certain language, use Text_Highlighter::factory() +static method: + + require_once 'Text/Highlighter.php'; + $hl = Text_Highlighter::factory('php'); + + +Setting a renderer +------------------ + +Actual output is produced by a renderer. + + require_once 'Text/Highlighter.php'; + require_once 'Text/Highlighter/Renderer/Html.php'; + $options = array( + 'numbers' => HL_NUMBERS_LI, + 'tabsize' => 8, + ); + $renderer = new Text_Highlighter_Renderer_HTML($options); + $hl = Text_Highlighter::factory('php'); + $hl->setRenderer($renderer); + +Note that for BC reasons, it is possible to use highlighter without setting a +renderer. If no renderer is set, HTML renderer will be used by default. In +this case, you should pass options as second parameter to factory method. The +following example works exactly as previous one: + + require_once 'Text/Highlighter.php'; + $options = array( + 'numbers' => HL_NUMBERS_LI, + 'tabsize' => 8, + ); + $hl = Text_Highlighter::factory('php', $options); + + +Getting output +-------------- + +And finally, do the highlighting and get the output: + + require_once 'Text/Highlighter.php'; + require_once 'Text/Highlighter/Renderer/Html.php'; + $options = array( + 'numbers' => HL_NUMBERS_LI, + 'tabsize' => 8, + ); + $renderer = new Text_Highlighter_Renderer_HTML($options); + $hl = Text_Highlighter::factory('php'); + $hl->setRenderer($renderer); + $html = $hl->highlight(file_get_contents('example.php')); + +# vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */ + -- cgit v1.2.3