diff options
Diffstat (limited to 'vendor/sabre/dav/docs/rfc5051.txt')
-rw-r--r-- | vendor/sabre/dav/docs/rfc5051.txt | 395 |
1 files changed, 0 insertions, 395 deletions
diff --git a/vendor/sabre/dav/docs/rfc5051.txt b/vendor/sabre/dav/docs/rfc5051.txt deleted file mode 100644 index 0a4479cad..000000000 --- a/vendor/sabre/dav/docs/rfc5051.txt +++ /dev/null @@ -1,395 +0,0 @@ - - - - - - -Network Working Group M. Crispin -Request for Comments: 5051 University of Washington -Category: Standards Track October 2007 - - - i;unicode-casemap - Simple Unicode Collation Algorithm - -Status of This Memo - - This document specifies an Internet standards track protocol for the - Internet community, and requests discussion and suggestions for - improvements. Please refer to the current edition of the "Internet - Official Protocol Standards" (STD 1) for the standardization state - and status of this protocol. Distribution of this memo is unlimited. - -Abstract - - This document describes "i;unicode-casemap", a simple case- - insensitive collation for Unicode strings. It provides equality, - substring, and ordering operations. - -1. Introduction - - The "i;ascii-casemap" collation described in [COMPARATOR] is quite - simple to implement and provides case-independent comparisons for the - 26 Latin alphabetics. It is specified as the default and/or baseline - comparator in some application protocols, e.g., [IMAP-SORT]. - - However, the "i;ascii-casemap" collation does not produce - satisfactory results with non-ASCII characters. It is possible, with - a modest extension, to provide a more sophisticated collation with - greater multilingual applicability than "i;ascii-casemap". This - extension provides case-independent comparisons for a much greater - number of characters. It also collates characters with diacriticals - with the non-diacritical character forms. - - This collation, "i;unicode-casemap", is intended to be an alternative - to, and preferred over, "i;ascii-casemap". It does not replace the - "i;basic" collation described in [BASIC]. - -2. Unicode Casemap Collation Description - - The "i;unicode-casemap" collation is a simple collation which is - case-insensitive in its treatment of characters. It provides - equality, substring, and ordering operations. The validity test - operation returns "valid" for any input. - - - - - -Crispin Standards Track [Page 1] - -RFC 5051 i;unicode-casemap October 2007 - - - This collation allows strings in arbitrary (and mixed) character - sets, as long as the character set for each string is identified and - it is possible to convert the string to Unicode. Strings which have - an unidentified character set and/or cannot be converted to Unicode - are not rejected, but are treated as binary. - - Each input string is prepared by converting it to a "titlecased - canonicalized UTF-8" string according to the following steps, using - UnicodeData.txt ([UNICODE-DATA]): - - (1) A Unicode codepoint is obtained from the input string. - - (a) If the input string is in a known charset that can be - converted to Unicode, a sequence in the string's charset - is read and checked for validity according to the rules of - that charset. If the sequence is valid, it is converted - to a Unicode codepoint. Note that for input strings in - UTF-8, the UTF-8 sequence must be valid according to the - rules of [UTF-8]; e.g., overlong UTF-8 sequences are - invalid. - - (b) If the input string is in an unknown charset, or an - invalid sequence occurs in step (1)(a), conversion ceases. - No further preparation is performed, and any partial - preparation results are discarded. The original string is - used unchanged with the i;octet comparator. - - (2) The following steps, using UnicodeData.txt ([UNICODE-DATA]), - are performed on the resulting codepoint from step (1)(a). - - (a) If the codepoint has a titlecase property in - UnicodeData.txt (this is normally the same as the - uppercase property), the codepoint is converted to the - codepoints in the titlecase property. - - (b) If the resulting codepoint from (2)(a) has a decomposition - property of any type in UnicodeData.txt, the codepoint is - converted to the codepoints in the decomposition property. - This step is recursively applied to each of the resulting - codepoints until no more decomposition is possible - (effectively Normalization Form KD). - - Example: codepoint U+01C4 (LATIN CAPITAL LETTER DZ WITH CARON) - has a titlecase property of U+01C5 (LATIN CAPITAL LETTER D - WITH SMALL LETTER Z WITH CARON). Codepoint U+01C5 has a - decomposition property of U+0044 (LATIN CAPITAL LETTER D) - U+017E (LATIN SMALL LETTER Z WITH CARON). U+017E has a - decomposition property of U+007A (LATIN SMALL LETTER Z) U+030c - - - -Crispin Standards Track [Page 2] - -RFC 5051 i;unicode-casemap October 2007 - - - (COMBINING CARON). Neither U+0044, U+007A, nor U+030C have - any decomposition properties. Therefore, U+01C4 is converted - to U+0044 U+007A U+030C by this step. - - (3) The resulting codepoint(s) from step (2) is/are appended, in - UTF-8 format, to the "titlecased canonicalized UTF-8" string. - - (4) Repeat from step (1) until there is no more data in the input - string. - - Following the above preparation process on each string, the equality, - ordering, and substring operations are as for i;octet. - - It is permitted to use an alternative implementation of the above - preparation process if it produces the same results. For example, it - may be more convenient for an implementation to convert all input - strings to a sequence of UTF-16 or UTF-32 values prior to performing - any of the step (2) actions. Similarly, if all input strings are (or - are convertible to) Unicode, it may be possible to use UTF-32 as an - alternative to UTF-8 in step (3). - - Note: UTF-16 is unsuitable as an alternative to UTF-8 in step (3), - because UTF-16 surrogates will cause i;octet to collate codepoints - U+E0000 through U+FFFF after non-BMP codepoints. - - This collation is not locale sensitive. Consequently, care should be - taken when using OS-supplied functions to implement this collation. - Functions such as strcasecmp and toupper are sometimes locale - sensitive and may inconsistently casemap letters. - - The i;unicode-casemap collation is well suited to use with many - Internet protocols and computer languages. Use with natural language - is often inappropriate; even though the collation apparently supports - languages such as Swahili and English, in real-world use it tends to - mis-sort a number of types of string: - - o people and place names containing scripts that are not collated - according to "alphabetical order". - o words with characters that have diacriticals. However, - i;unicode-casemap generally does a better job than i;ascii-casemap - for most (but not all) languages. For example, German umlaut - letters will sort correctly, but some Scandinavian letters will - not. - o names such as "Lloyd" (which in Welsh sorts after "Lyon", unlike - in English), - o strings containing other non-letter symbols; e.g., euro and pound - sterling symbols, quotation marks other than '"', dashes/hyphens, - etc. - - - -Crispin Standards Track [Page 3] - -RFC 5051 i;unicode-casemap October 2007 - - -3. Unicode Casemap Collation Registration - - <?xml version='1.0'?> - <!DOCTYPE collation SYSTEM 'collationreg.dtd'> - <collation rfc="5051" scope="global" intendedUse="common"> - <identifier>i;unicode-casemap</identifier> - <title>Unicode Casemap</title> - <operations>equality order substring</operations> - <specification>RFC 5051</specification> - <owner>IETF</owner> - <submitter>mrc@cac.washington.edu</submitter> - </collation> - -4. Security Considerations - - The security considerations for [UTF-8], [STRINGPREP], and [UNICODE- - SECURITY] apply and are normative to this specification. - - The results from this comparator will vary depending upon the - implementation for several reasons. Implementations MUST consider - whether these possibilities are a problem for their use case: - - 1) New characters added in Unicode may have decomposition or - titlecase properties that will not be known to an implementation - based upon an older revision of Unicode. This impacts step (2). - - 2) Step (2)(b) defines a subset of Normalization Form KD (NFKD) that - does not require normalization of out-of-order diacriticals. - However, an implementation MAY use an NFKD library routine that - does such normalization. This impacts step (2)(b) and possibly - also step (1)(a), and is an issue only with ill-formed UTF-8 - input. - - 3) The set of charsets handled in step (1)(a) is open-ended. UTF-8 - (and, by extension, US-ASCII) are the only mandatory-to-implement - charsets. This impacts step (1)(a). - - Implementations SHOULD, as far as feasible, support all the - charsets they are likely to encounter in the input data, in order - to avoid poor collation caused by the fall through to the (1)(b) - rule. - - 4) Other charsets may have revisions which add new characters that - are not known to an implementation based upon an older revision. - This impacts step (1)(a) and possibly also step (1)(b). - - - - - - -Crispin Standards Track [Page 4] - -RFC 5051 i;unicode-casemap October 2007 - - - An attacker may create input that is ill-formed or in an unknown - charset, with the intention of impacting the results of this - comparator or exploiting other parts of the system which process this - input in different ways. Note, however, that even well-formed data - in a known charset can impact the result of this comparator in - unexpected ways. For example, an attacker can substitute U+0041 - (LATIN CAPITAL LETTER A) with U+0391 (GREEK CAPITAL LETTER ALPHA) or - U+0410 (CYRILLIC CAPITAL LETTER A) in the intention of causing a - non-match of strings which visually appear the same and/or causing - the string to appear elsewhere in a sort. - -5. IANA Considerations - - The i;unicode-casemap collation defined in section 2 has been added - to the registry of collations defined in [COMPARATOR]. - -6. Normative References - - [COMPARATOR] Newman, C., Duerst, M., and A. Gulbrandsen, - "Internet Application Protocol Collation - Registry", RFC 4790, February 2007. - - [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of - Internationalized Strings ("stringprep")", RFC - 3454, December 2002. - - [UTF-8] Yergeau, F., "UTF-8, a transformation format of - ISO 10646", STD 63, RFC 3629, November 2003. - - [UNICODE-DATA] <http://www.unicode.org/Public/UNIDATA/ - UnicodeData.txt> - - Although the UnicodeData.txt file referenced - here is part of the Unicode standard, it is - subject to change as new characters are added - to Unicode and errors are corrected in Unicode - revisions. As a result, it may be less stable - than might otherwise be implied by the - standards status of this specification. - - [UNICODE-SECURITY] Davis, M. and M. Suignard, "Unicode Security - Considerations", February 2006, - <http://www.unicode.org/reports/tr36/>. - - - - - - - - -Crispin Standards Track [Page 5] - -RFC 5051 i;unicode-casemap October 2007 - - -7. Informative References - - [BASIC] Newman, C., Duerst, M., and A. Gulbrandsen, - "i;basic - the Unicode Collation Algorithm", - Work in Progress, March 2007. - - [IMAP-SORT] Crispin, M. and K. Murchison, "Internet Message - Access Protocol - SORT and THREAD Extensions", - Work in Progress, September 2007. - -Author's Address - - Mark R. Crispin - Networks and Distributed Computing - University of Washington - 4545 15th Avenue NE - Seattle, WA 98105-4527 - - Phone: +1 (206) 543-5762 - EMail: MRC@CAC.Washington.EDU - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Crispin Standards Track [Page 6] - -RFC 5051 i;unicode-casemap October 2007 - - -Full Copyright Statement - - Copyright (C) The IETF Trust (2007). - - This document is subject to the rights, licenses and restrictions - contained in BCP 78, and except as set forth therein, the authors - retain all their rights. - - This document and the information contained herein are provided on an - "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS - OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND - THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS - OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF - THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED - WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. - -Intellectual Property - - The IETF takes no position regarding the validity or scope of any - Intellectual Property Rights or other rights that might be claimed to - pertain to the implementation or use of the technology described in - this document or the extent to which any license under such rights - might or might not be available; nor does it represent that it has - made any independent effort to identify any such rights. Information - on the procedures with respect to rights in RFC documents can be - found in BCP 78 and BCP 79. - - Copies of IPR disclosures made to the IETF Secretariat and any - assurances of licenses to be made available, or the result of an - attempt made to obtain a general license or permission for the use of - such proprietary rights by implementers or users of this - specification can be obtained from the IETF on-line IPR repository at - http://www.ietf.org/ipr. - - The IETF invites any interested party to bring to its attention any - copyrights, patents or patent applications, or other proprietary - rights that may cover technology that may be required to implement - this standard. Please address the information to the IETF at - ietf-ipr@ietf.org. - - - - - - - - - - - - -Crispin Standards Track [Page 7] - |