idnits 2.17.00 (12 Aug 2021) /tmp/idnits59311/draft-ietf-idnabis-mappings-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 3, 2009) is 4704 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: draft-ietf-idnabis-protocol has been published as RFC 5891 ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode51' Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDNABIS P. Resnick, Ed. 3 Internet-Draft Qualcomm Incorporated 4 Intended status: Standards Track P. Hoffman 5 Expires: January 4, 2010 VPN Consortium 6 July 3, 2009 8 Mapping Characters in IDNA 9 draft-ietf-idnabis-mappings-01 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. This document may contain material 15 from IETF Documents or IETF Contributions published or made publicly 16 available before November 10, 2008. The person(s) controlling the 17 copyright in some of this material may not have granted the IETF 18 Trust the right to allow modifications of such material outside the 19 IETF Standards Process. Without obtaining an adequate license from 20 the person(s) controlling the copyright in such materials, this 21 document may not be modified outside the IETF Standards Process, and 22 derivative works of it may not be created outside the IETF Standards 23 Process, except to format it for publication as an RFC or to 24 translate it into languages other than English. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 This Internet-Draft will expire on January 4, 2010. 44 Copyright Notice 46 Copyright (c) 2009 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents in effect on the date of 51 publication of this document (http://trustee.ietf.org/license-info). 52 Please review these documents carefully, as they describe your rights 53 and restrictions with respect to this document. 55 Abstract 57 In the original version of the Internationalized Domain Names in 58 Applications (IDNA) protocol, any Unicode code points taken from user 59 input were mapped into a set of Unicode code points that "make 60 sense", which were then encoded and passed to the domain name system 61 (DNS). The current version of IDNA presumes that the input to the 62 protocol comes from a set of "permitted" code points, which it then 63 encodes and passes to the DNS, but does not specify what to do with 64 the result of user input. This document describes the actions taken 65 by an implementation between user input and passing permitted code 66 points to the new IDNA protocol. 68 1. Introduction 70 This document describes the operations that can be applied to user 71 input in order to get it into a form acceptable by the 72 Internationalized Domain Names in Applications (IDNA) protocol 73 [I-D.ietf-idnabis-protocol]. The document describes the underlying 74 architectural principles (in section 2 and the general implementation 75 procedure (in section 3). 77 It should be noted that this document does not specify the behavior 78 of a protocol that appears "on the wire". It describes an operation 79 that is to be applied to user input in order to prepare that user 80 input for use in an "on the network" protocol. As unusual as this 81 may be for an IETF protocol document, it is a necessary operation to 82 maintain interoperability. 84 2. Architectural Principles 86 An application that implements the IDNA protocol 87 [I-D.ietf-idnabis-protocol] will always take any user input and 88 convert it to a set of Unicode code points. That user input may be 89 acquired by any of several different input methods, all with 90 differing conversion processes to be taken into consideration (e.g., 91 typed on a keyboard, written by hand onto some sort of digitizer, 92 spoken into a microphone and interpreted by a speech-to-text engine, 93 etc.). The process of taking any particular user input and mapping 94 it into a Unicode code point may be a simple one: If a user strikes 95 the "A" key on a US English keyboard, without any modifiers such as 96 the "Shift" key held down, in order to draw a Latin small letter A 97 ("a"), many (perhaps most) modern operating system input methods will 98 produce to the calling application the code point U+0061, encoded in 99 a single octet. 101 Sometimes the process is somewhat more complicated: a user might 102 strike a particular set of keys to represent a combining macron 103 followed by striking the "A" key in order to draw a Latin small 104 letter A with a macron above it. Depending on the operating system, 105 the input method chosen by the user, and even the parameters with 106 which the application communicates with the input method, the result 107 might be the code point U+0101 (encoded as two octets in UTF-8 or 108 UTF-16, four octets in UTF-32, etc.), the code point U+0061 followed 109 by the code point U+0304 (again, encoded in three or more octets, 110 depending upon the encoding used) or even the code point U+FF41 111 followed by the code point U+0304 (and encoded in some form). And 112 these examples leave aside the issue of operating systems and input 113 methods that do not use Unicode code points for their character set. 115 In every case, applications (with the help of the operating systems 116 on which they run and the input methods used) need to perform a 117 mapping from user input into Unicode code points. 119 The original version of the IDNA protocol [RFC3490] used a model 120 whereby input was taken from the user, mapped (via whatever input 121 method mechanisms were used) to a set of Unicode code points, and 122 then further mapped to a set of Unicode code points using the 123 Nameprep profile specified in [RFC3491]. In this procedure, there 124 are two separate mapping steps: First, a mapping done by the input 125 method (which might be controlled by the operating system, the 126 application, or some combination) and then a second mapping performed 127 by the Nameprep portion of the IDNA protocol. The mapping done in 128 Nameprep includes a particular mapping table to re-map some 129 characters to other characters, a particular normalization, and a set 130 of prohibited characters. 132 Note that the result of the two step mapping process means that the 133 mapping chosen by the operating system or application in the first 134 step might differ significantly from the mapping supplied by the 135 Nameprep profile in the second step. This has advantages and 136 disadvantages. Of course, the second mapping regularizes what gets 137 looked up in the DNS, making for better interoperability between 138 implementations which use the Nameprep mapping. However, the 139 application or operating system may choose mappings in their input 140 methods, which when passed through the second (Nameprep) mapping 141 result in characters that are "surprising" to the end user. 143 The other important feature of the original version of the IDNA 144 protocol is that, with very few exceptions, it assumes that any set 145 of Unicode code points provided to the Nameprep mapping can be mapped 146 into a string of Unicode code points that are "sensible", even if 147 that means mapping some code points to nothing (that is, removing the 148 code points from the string). This allowed maximum flexibility in 149 input strings. 151 The present version of IDNA differs significantly in approach from 152 the original version. First and foremost, it does not provide 153 explicit mapping instructions. Instead, it assumes that the 154 application (perhaps via an operating system input method) will do 155 whatever mapping it requires to convert input into Unicode code 156 points. This has the advantage of giving flexibility to the 157 application to choose a mapping that is suitable for its user given 158 specific user requirements, and avoids the two-step mapping of the 159 original protocol. Instead of a mapping, the current version of IDNA 160 provides a set of categories that can be used to specify the valid 161 code points allowed in a domain name. 163 In principle, an application ought to take user input of a domain 164 name and convert it to the set of Unicode code points that represent 165 the domain name the user intends. As a practical matter, of course, 166 determining user intent is a tricky business, so an application needs 167 to choose a reasonable mapping from user input. That may differ 168 based on the particular circumstances of a user, depending on locale, 169 language, type of input method, etc. It is up to the application to 170 make a reasonable choice. 172 3. The General Procedure 174 This section defines a general algorithm that applications ought to 175 implement in order to produce Unicode code points that will be valid 176 under the IDNA protocol. An application might implement the full 177 mapping as described below, or can choose a different mapping. In 178 fact, an appliction might want to implement a full mapping that is 179 substantially compatible with the original IDNA protocol instead of 180 the algorithm given here. 182 The general algorithm that an application (or the input method 183 provided by an operating system) ought to use is relatively 184 straightforward and generally follows section 5 of 185 [I-D.ietf-idnabis-protocol]: 187 1. All characters are mapped using Unicode Normalization Form C 188 (NFC). 190 2. Upper case characters are mapped to their lower case equivalents 191 by using the algorithm for mapping Unicode characters. 193 3. Full-width and half-width characters (those defined with 194 Decomposition Types and ) are mapped to their 195 decomposition mappings as shown in the Unicode character 196 database. 198 Definitions for the rules in this algorithm can be found in 199 [Unicode51]. Specifically: 201 o Unicode Normalization Form C can be found in Annex #15 of 202 [Unicode51]. 204 o In order to map upper case characters to their lower case 205 equivalents (defined in section 3.13 of [Unicode51]), first map 206 characters to the "Lowercase_Mapping" property (the "" 207 entry in the second column) in 208 , if any. 209 Then, map characters to the "Simple_Lowercase_Mapping" property 210 (the fourteenth column) in 211 , if any. 213 o In order to map full-width and half-width characters to their 214 decomposition mappings, map any character whose 215 "Decomposition_Type" (contained in the first part of of the sixth 216 column) in 217 is either "" or "" to the "Decomposition_Mapping" of 218 that character (contained in the second part of the sixth column) 219 in . 221 o The web page has 222 useful descriptions of the contents of these files. 224 If this mappings in this document are applied to versions of Unicode 225 later than Unicode 5.1, the later versions of the Unicode Standard 226 should be consulted. 228 These are a minimal set of mappings that an application should 229 strongly consider doing. Of course, there are many others that might 230 be done. 232 4. IANA Considerations 234 This memo includes no request to IANA. 236 5. Security Considerations 238 This document suggests creating mappings that might cause confusion 239 for some users while alleviating confusion in other users. Such 240 confusion is not covered in any depth in this document (nor in the 241 other IDNA-related documents). 243 6. Normative References 245 [I-D.ietf-idnabis-protocol] 246 Klensin, J., "Internationalized Domain Names in 247 Applications (IDNA): Protocol", 248 draft-ietf-idnabis-protocol-12 (work in progress), 249 May 2009. 251 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 252 "Internationalizing Domain Names in Applications (IDNA)", 253 RFC 3490, March 2003. 255 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 256 Profile for Internationalized Domain Names (IDN)", 257 RFC 3491, March 2003. 259 [Unicode51] 260 The Unicode Consortium, "The Unicode Standard, Version 261 5.1.0", 2008. 263 defined by: The Unicode Standard, Version 5.0, Boston, MA, 264 Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by 265 Unicode 5.1.0 266 (). 268 Authors' Addresses 270 Peter W. Resnick (editor) 271 Qualcomm Incorporated 272 5775 Morehouse Drive 273 San Diego, CA 92121-1714 274 US 276 Phone: +1 858 651 4478 277 Email: presnick@qualcomm.com 278 URI: http://www.qualcomm.com/~presnick/ 279 Paul Hoffman 280 VPN Consortium 281 127 Segre Place 282 Santa Cruz, CA 95060 283 US 285 Phone: 1-831-426-9827 286 Email: paul.hoffman@vpnc.org