idnits 2.17.00 (12 Aug 2021) /tmp/idnits60074/draft-ietf-idnabis-mappings-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 25, 2009) is 4743 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: draft-ietf-idnabis-protocol has been published as RFC 5891 ** Obsolete normative reference: RFC 3454 (Obsoleted by RFC 7564) ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode51' Summary: 4 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDNABIS P. Resnick, Ed. 3 Internet-Draft Qualcomm Incorporated 4 Intended status: Standards Track May 25, 2009 5 Expires: November 26, 2009 7 Mapping Characters in IDNA 8 draft-ietf-idnabis-mappings-00 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. This document may contain material 14 from IETF Documents or IETF Contributions published or made publicly 15 available before November 10, 2008. The person(s) controlling the 16 copyright in some of this material may not have granted the IETF 17 Trust the right to allow modifications of such material outside the 18 IETF Standards Process. Without obtaining an adequate license from 19 the person(s) controlling the copyright in such materials, this 20 document may not be modified outside the IETF Standards Process, and 21 derivative works of it may not be created outside the IETF Standards 22 Process, except to format it for publication as an RFC or to 23 translate it into languages other than English. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that 27 other groups may also distribute working documents as Internet- 28 Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/ietf/1id-abstracts.txt. 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 This Internet-Draft will expire on November 26, 2009. 43 Copyright Notice 45 Copyright (c) 2009 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents in effect on the date of 50 publication of this document (http://trustee.ietf.org/license-info). 51 Please review these documents carefully, as they describe your rights 52 and restrictions with respect to this document. 54 Abstract 56 In the original version of the Internationalized Domain Names in 57 Applications (IDNA) protocol, any Unicode code points taken from user 58 input were mapped into a set of Unicode code points that "make 59 sense", which were then encoded and passed to the domain name system 60 (DNS). The current version of IDNA presumes that the input to the 61 protocol comes from a set of "permitted" code points, which it then 62 encodes and passes to the DNS, but does not specify what to do with 63 the result of user input. This document specifies the actions taken 64 by an implementation between user input and passing permitted code 65 points to the new IDNA protocol. 67 Table of Contents 69 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 4 71 2. Architectural Principles . . . . . . . . . . . . . . . . . . . 4 72 3. The General Procedure . . . . . . . . . . . . . . . . . . . . . 6 73 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 74 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 75 Appendix A. Backwards-compatible Mapping Algorithm . . . . . . . . 7 76 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . . 7 77 6. Normative References . . . . . . . . . . . . . . . . . . . . . 7 78 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8 80 1. Introduction 82 This document specifies the operations that applications apply to 83 user input in order to get it into a form acceptable by the 84 Internationalized Domain Names in Applications (IDNA) protocol 85 [I-D.ietf-idnabis-protocol]. The document describes the 86 architectural principles that underly this function in section 2, 87 describes a general procedure that an application SHOULD implement in 88 section 3, and specifies an algorithm and mapping that an application 89 MAY implement in order to remain reasonably backward compatible with 90 the original version of the IDNA protocol in appendix A. 92 It should be noted that this document is NOT specifying the behavior 93 of a protocol that appears "on the wire". It specifies an operation 94 that is to be applied to user input in order to prepare that user 95 input for use in an "on the network" protocol. As unusual as this 96 may be for an IETF protocol document, it is a necessary operation to 97 maintain interoperability. 99 1.1. Requirements Language 101 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 102 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 103 document are to be interpreted as described in RFC 2119 [RFC2119]. 105 2. Architectural Principles 107 An application that implements the IDNA protocol 108 [I-D.ietf-idnabis-protocol] must take a set of user input and convert 109 that input to a set of Unicode code points. That user input might be 110 typed on a keyboard, written by hand onto some sort of digitizer, 111 spoken into a microphone and interpreted by a speech-to-text engine, 112 or otherwise. The process of taking any particular user input and 113 mapping it into a Unicode code point may be a simple one: If a user 114 strikes the "A" key on a US English keyboard, without any modifiers 115 such as the "Shift" key held down, in order to draw a Latin small 116 letter A ("a"), many (perhaps most) modern operating system input 117 methods will produce to the calling application the code point 118 U+0061, encoded in a single octet. Sometimes the process is somewhat 119 more complicated: A user might strike a particular set of keys to 120 represent a combining macron followed by striking the "A" key in 121 order to draw a Latin small letter A with a macron above it. 122 Depending on the operating system, the input method chosen by the 123 user, and even the parameters with which the application communicates 124 with the input method, the result might be the code point U+0101 125 (encoded as two octets in UTF-8 or UTF-16, four octets in UTF-32, 126 etc.), the code point U+0061 followed by the code point U+0304 127 (again, encoded in three or more octets, depending upon the encoding 128 used) or even the code point U+FF41 followed by the code point U+0304 129 (and encoded in some form). And these examples leave aside the issue 130 of operating systems and input methods that do not use Unicode code 131 points for their character set. In every case, applications (with 132 the help of the operating systems on which they run and the input 133 methods used) MUST perform a mapping from user input into Unicode 134 code points. 136 The original version of the IDNA protocol [RFC3490] used a model 137 whereby input was taken from the user, mapped (via whatever input 138 method mechanisms were used) to a set of Unicode code points, and 139 then further mapped to a set of Unicode code points using the 140 Nameprep profile specified in [RFC3491]. In this procedure, there 141 are two separate mapping steps: First, a mapping done by the input 142 method (which might be controlled by the operating system, the 143 application, or some combination) and then a second mapping performed 144 by the Nameprep portion of the IDNA protocol. The mapping done in 145 Nameprep includes a particular mapping table to re-map some 146 characters to other characters, a particular normalization, and a set 147 of prohibited characters. 149 Note that the result of the two step mapping process means that the 150 mapping chosen by the operating system or application in the first 151 step might differ significantly from the mapping supplied by the 152 Nameprep profile in the second step. This has advantages and 153 disadvantages. Of course, the second mapping regularizes what gets 154 looked up in the DNS, making for better interoperability between 155 implementations which use the Nameprep mapping. However, the 156 application or operating system may choose mappings in their input 157 methods, which when passed through the second (Nameprep) mapping 158 result in characters that are "surprising" to the end user. 160 The other important feature of the original version of the IDNA 161 protocol is that, with very few exceptions, it assumes that any set 162 of Unicode code points provided to the Nameprep mapping can be mapped 163 into a string of Unicode code points that are "sensible", even if 164 that means mapping some code points to nothing (that is, removing the 165 code points from the string). This allowed maximum flexibility in 166 input strings. 168 The present version of IDNA differs significantly in approach from 169 the original version. First and foremost, it does not provide 170 explicit mapping instructions. Instead, it assumes that the 171 application (perhaps via an operating system input method) will do 172 whatever mapping it requires to convert input into Unicode code 173 points. This has the advantage of giving flexibility to the 174 application to choose a mapping that is suitable for its user given 175 specific user requirements, and avoids the two-step mapping of the 176 original protocol. Instead of a mapping, the current version of IDNA 177 provides a set of categories that can be used to specify the valid 178 code points allowed in a domain name. 180 In principle, an application ought to take user input of a domain 181 name and convert it to the set of Unicode code points that represent 182 the domain name the user _intends_. As a practical matter, of 183 course, determining user desires is a tricky business, so an 184 application needs to choose a reasonable mapping from user input. 185 That may differ based on the particular circumstances of a user, 186 depending on locale, language, type of input method, etc. It is up 187 to the application to make a reasonable choice. 189 In the next section, this document specifies a general algorithm that 190 applications SHOULD implement in order produce Unicode code points 191 that will be valid under the IDNA protocol. Then, in appendix A, a 192 full mapping is specified that is substantially compatible with the 193 original IDNA protocol. An application MAY implement the full 194 mapping or MAY choose a different mapping. 196 3. The General Procedure 198 The general algorithm that an application (or the input method 199 provided by an operating system) should use is relatively 200 straightforward and generally follows section 5 of 201 [I-D.ietf-idnabis-protocol]: 203 1. All characters are mapped using Unicode Normalization Form C 204 (NFC). [Unicode51] 206 2. Capital (upper case) characters are mapped to their small (lower 207 case) equivalents. [[anchor2: Need reference to "toLowerCase"]] 209 3. Full-width and half-width CJK characters are mapped to their 210 equivalents. [[anchor3: Handwaving for how that's supposed to 211 happen]] 213 These are the minimal mappings that an application SHOULD do. Of 214 course, there are many others that MAY be done. In particular, a 215 mapping that in substantially compatible with [RFC3490] appears below 216 in appendix A. 218 4. IANA Considerations 220 This memo includes no request to IANA. 222 5. Security Considerations 224 Appendix A. Backwards-compatible Mapping Algorithm 226 The following mapping is mostly backwards-compatible with the 227 original version of the IDNA protocol [RFC3490]. One important 228 change is that the original IDNA specification mapped some characters 229 to nothing that the current IDNA specification permit. Those 230 characters are not re-mapped in this algorithm. 232 [[anchor4: This is filler; needs to be completed.]] 234 1. Map using table B.1 and B.2 from [RFC3454]. 236 2. Normalize using Unicode Normalization Form KC. [Unicode51] 238 3. Prohibit using tables C.1.2, C.3, C.4, C.5, C.6, C.7, C.8, and 239 C.9 from [RFC3454]. 241 Appendix B. Acknowledgements 243 6. Normative References 245 [I-D.ietf-idnabis-protocol] 246 Klensin, J., "Internationalized Domain Names in 247 Applications (IDNA): Protocol", 248 draft-ietf-idnabis-protocol-12 (work in progress), 249 May 2009. 251 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 252 Requirement Levels", BCP 14, RFC 2119, March 1997. 254 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 255 Internationalized Strings ("stringprep")", RFC 3454, 256 December 2002. 258 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 259 "Internationalizing Domain Names in Applications (IDNA)", 260 RFC 3490, March 2003. 262 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 263 Profile for Internationalized Domain Names (IDN)", 264 RFC 3491, March 2003. 266 [Unicode51] 267 The Unicode Consortium, "The Unicode Standard, Version 268 5.1.0", 2008. 270 defined by: The Unicode Standard, Version 5.0, Boston, MA, 271 Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by 272 Unicode 5.1.0 273 (http://www.unicode.org/versions/Unicode5.1.0/). 275 Author's Address 277 Peter W. Resnick (editor) 278 Qualcomm Incorporated 279 5775 Morehouse Drive 280 San Diego, CA 92121-1714 281 US 283 Phone: +1 858 651 4478 284 Email: presnick@qualcomm.com 285 URI: http://www.qualcomm.com/~presnick/