idnits 2.17.00 (12 Aug 2021) /tmp/idnits23603/draft-ietf-tsvwg-ecn-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 58 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 59 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2475' is mentioned on line 1495, but not defined == Missing Reference: 'RFC 2001' is mentioned on line 588, but not defined ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581) == Missing Reference: 'RFC 2983' is mentioned on line 1100, but not defined == Missing Reference: 'RFC 2474' is mentioned on line 1494, but not defined == Missing Reference: 'RFC 1455' is mentioned on line 2677, but not defined ** Obsolete undefined reference: RFC 1455 (Obsoleted by RFC 2474) == Unused Reference: 'FRED' is defined on line 1886, but no explicit reference was found in the text == Unused Reference: 'RFC1455' is defined on line 1929, but no explicit reference was found in the text == Unused Reference: 'RFC1701' is defined on line 1932, but no explicit reference was found in the text == Unused Reference: 'RFC1702' is defined on line 1935, but no explicit reference was found in the text == Unused Reference: 'RFC 2119' is defined on line 1941, but no explicit reference was found in the text == Unused Reference: 'RFC2408' is defined on line 1953, but no explicit reference was found in the text == Unused Reference: 'RFC2409' is defined on line 1957, but no explicit reference was found in the text == Unused Reference: 'RFC2475' is defined on line 1964, but no explicit reference was found in the text == Unused Reference: 'RFC2983' is defined on line 1978, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2402 (ref. 'AH') (Obsoleted by RFC 4302, RFC 4305) -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN' ** Obsolete normative reference: RFC 2406 (ref. 'ESP') (Obsoleted by RFC 4303, RFC 4305) -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF99' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED' ** Downref: Normative reference to an Informational RFC: RFC 1701 (ref. 'GRE') -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90' -- Possible downref: Non-RFC (?) normative reference: ref. 'K98' -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96' ** Downref: Normative reference to an Informational RFC: RFC 2702 (ref. 'MPLS') ** Downref: Normative reference to an Informational RFC: RFC 2637 (ref. 'PPTP') ** Downref: Normative reference to an Informational RFC: RFC 1141 ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474) ** Obsolete normative reference: RFC 1455 (Obsoleted by RFC 2474) -- Duplicate reference: RFC1701, mentioned in 'RFC1701', was also mentioned in 'GRE'. ** Downref: Normative reference to an Informational RFC: RFC 1701 ** Downref: Normative reference to an Informational RFC: RFC 1702 -- Duplicate reference: RFC2119, mentioned in 'RFC 2119', was also mentioned in 'B97'. ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301) ** Obsolete normative reference: RFC 2407 (Obsoleted by RFC 4306) ** Obsolete normative reference: RFC 2409 (ref. 'RFC2408') (Obsoleted by RFC 4306) -- Duplicate reference: RFC2409, mentioned in 'RFC2409', was also mentioned in 'RFC2408'. ** Obsolete normative reference: RFC 2409 (Obsoleted by RFC 4306) ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Downref: Normative reference to an Informational RFC: RFC 2884 ** Downref: Normative reference to an Informational RFC: RFC 2983 -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90' -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99' Summary: 25 errors (**), 0 flaws (~~), 18 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force K. K. Ramakrishnan 3 INTERNET DRAFT TeraOptic Networks 4 draft-ietf-tsvwg-ecn-03.txt Sally Floyd 5 ACIRI 6 D. Black 7 EMC 8 March, 2001 9 Expires: September, 2001 11 The Addition of Explicit Congestion Notification (ECN) to IP 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 This document specifies the incorporation of ECN (Explicit Congestion 37 Notification) to TCP and IP, including ECN's use of two bits in the 38 IP header. We begin by describing TCP's use of packet drops as an 39 indication of congestion. Next we explain that with the addition of 40 active queue management (e.g., RED) to the Internet infrastructure, 41 where routers detect congestion before the queue overflows, routers 42 are no longer limited to packet drops as an indication of congestion. 43 Routers can instead set the Congestion Experienced (CE) codepoint in 44 the IP header of packets from ECN-capable transports. We describe 45 when the CE codepoint is to be set in routers, and describe 46 modifications needed to TCP to make it ECN-capable. Modifications to 47 other transport protocols (e.g., unreliable unicast or multicast, 48 reliable multicast, other reliable unicast transport protocols) could 49 be considered as those protocols are developed and advance through 50 the standards process. 52 We also describe in this document the issues involving the use of ECN 53 within IP tunnels, and within IPsec tunnels in particular. 55 One of the guiding principles for this document is that all the 56 mechanisms specified here are incrementally deployable. 58 Table of Contents 59 1. Introduction 60 2. Conventions and Acronyms 61 3. Assumptions and General Principles 62 4. Active Queue Management (AQM) 63 5. Explicit Congestion Notification in IP 64 5.1. ECN as an Indication of Persistent Congestion 65 5.2. Dropped or Corrupted Packets 66 5.3. Fragmentation 67 6. Support from the Transport Protocol 68 6.1. TCP 69 6.1.1 TCP Initialization 70 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 71 6.1.2. The TCP Sender 72 6.1.3. The TCP Receiver 73 6.1.4. Congestion on the ACK-path 74 6.1.5. Retransmitted TCP packets 75 6.1.6. TCP Window Probes. 76 7. Non-compliance by the End Nodes 77 8. Non-compliance in the Network 78 8.1. Complications Introduced by Split Paths 79 9. Encapsulated Packets 80 9.1. IP packets encapsulated in IP 81 9.1.1. The Limited-functionality and Full-functionality Options 82 9.1.2. Changes to the ECN Field within an IP Tunnel. 83 9.2. IPsec Tunnels 84 9.2.1. Negotiation between Tunnel Endpoints 85 9.2.1.1. ECN Tunnel Security Association Database Field 86 9.2.1.2. ECN Tunnel Security Association Attribute 87 9.2.1.3. Changes to IPsec Tunnel Header Processing 88 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 89 9.2.3. Comments for IPsec Support 90 9.3. IP packets encapsulated in non-IP packet headers. 91 10. Issues Raised by Monitoring and Policing Devices 92 11. Evaluations of ECN 93 11.1. Related Work Evaluating ECN 94 11.2. A Discussion of the ECN nonce. 95 11.2.1. The Incremental Deployment of ECT(1) in Routers. 96 12. Summary of changes required in IP and TCP 97 13. Conclusions 98 14. Acknowledgements 99 15. References 100 16. Security Considerations 101 17. IPv4 Header Checksum Recalculation 102 18. Possible Changes to the ECN Field in the Network 103 18.1. Possible Changes to the IP Header 104 18.1.1. Erasing the Congestion Indication 105 18.1.2. Falsely Reporting Congestion 106 18.1.3. Disabling ECN-Capability 107 18.1.4. Falsely Indicating ECN-Capability 108 18.2. Information carried in the Transport Header 109 18.3. Split Paths 110 19. Implications of Subverting End-to-End Congestion Control 111 19.1. Implications for the Network and for Competing Flows 112 19.2. Implications for the Subverted Flow 113 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 114 20. The Motivation for the ECT Codepoints. 115 20.1. The Motivation for an ECT Codepoint. 116 20.2. The Motivation for two ECT Codepoints. 117 21. Why use Two Bits in the IP Header? 118 22. Historical Definitions for the IPv4 TOS Octet 119 23. IANA Considerations 121 RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare 122 this with draft-ietf-tsvwg-ecn-02, compare the following: 123 "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-02.troff" 124 "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-03.troff" 125 Changes from draft-ietf-tsvwg-ecn-02: 126 Revised Section 5.3 on fragmentation. 127 Changes from draft-ietf-tsvwg-ecn-01: 128 Added the ECT(1) codepoint, and changed references about bits to 129 references about codepoints in many places. Also added Section 11.2 on 130 "A Discussion of the ECN nonce", and Section 20.2 on "The Motivation for 131 two ECT Codepoints". 132 Added a paragraph saying that by default, the discussion of setting 133 the CE codepoint applies to all Differentiated Services Per-Hop 134 Behaviors. 135 Added Section 5.3 on fragmentation. 136 Added "A host MUST NOT set ECT on SYN or SYN-ACK packets." to the end 137 of Section 6.1.1, just to be explicit. 138 Corrected some references to "Section 19" to "Section 22". 139 Clarified that ECN is defined identically in IPv4 and in IPv6. 141 1. Introduction 143 TCP's congestion control and avoidance algorithms are based on the 144 notion that the network is a black-box [Jacobson88, Jacobson90]. The 145 network's state of congestion or otherwise is determined by end- 146 systems probing for the network state, by gradually increasing the 147 load on the network (by increasing the window of packets that are 148 outstanding in the network) until the network becomes congested and a 149 packet is lost. Treating the network as a "black-box" and treating 150 loss as an indication of congestion in the network is appropriate for 151 pure best-effort data carried by TCP, with little or no sensitivity 152 to delay or loss of individual packets. In addition, TCP's 153 congestion management algorithms have techniques built-in (such as 154 Fast Retransmit and Fast Recovery) to minimize the impact of losses, 155 from a throughput perspective. However, these mechanisms are not 156 intended to help applications that are in fact sensitive to the delay 157 or loss of one or more individual packets. Interactive traffic such 158 as telnet, web-browsing, and transfer of audio and video data can be 159 sensitive to packet losses (especially when using an unreliable data 160 delivery transport such as UDP) or to the increased latency of the 161 packet caused by the need to retransmit the packet after a loss (with 162 the reliable data delivery semantics provided by TCP). 164 Since TCP determines the appropriate congestion window to use by 165 gradually increasing the window size until it experiences a dropped 166 packet, this causes the queues at the bottleneck router to build up. 167 With most packet drop policies at the router that are not sensitive 168 to the load placed by each individual flow (e.g., tail-drop on queue 169 overflow), this means that some of the packets of latency-sensitive 170 flows may be dropped. In addition, such drop policies lead to 171 synchronization of loss across multiple flows. 173 Active queue management mechanisms detect congestion before the queue 174 overflows, and provide an indication of this congestion to the end 175 nodes. Thus, active queue management can reduce unnecessary queueing 176 delay for all traffic sharing that queue. The advantages of active 177 queue management are discussed in RFC 2309 [RFC2309]. Active queue 178 management avoids some of the bad properties of dropping on queue 179 overflow, including the undesirable synchronization of loss across 180 multiple flows. More importantly, active queue management means that 181 transport protocols with mechanisms for congestion control (e.g., 182 TCP) do not have to rely on buffer overflow as the only indication of 183 congestion. 185 Active queue management mechanisms may use one of several methods for 186 indicating congestion to end-nodes. One is to use packet drops, as is 187 currently done. However, active queue management allows the router to 188 separate policies of queueing or dropping packets from the policies 189 for indicating congestion. Thus, active queue management allows 190 routers to use the Congestion Experienced (CE) codepoint in a packet 191 header as an indication of congestion, instead of relying solely on 192 packet drops. This has the potential of reducing the impact of loss 193 on latency-sensitive flows. 195 This document is intended to obsolete RFC 2481, "A Proposal to add 196 Explicit Congestion Notification (ECN) to IP", which defined ECN as 197 an Experimental Protocol for the Internet Community. 199 RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This 200 document obsoletes three subsequent internet-drafts on ECN, "IPsec 201 Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP 202 with ECN: The Treatment of Retransmitted Data Packets". This 203 document is intended largely to merge the earlier documents all into 204 a single document, for greater clarity, in preparation to becoming a 205 Proposed Standard. 207 2. Conventions and Acronyms 209 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 210 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 211 document, are to be interpreted as described in [B97]. 213 3. Assumptions and General Principles 215 In this section, we describe some of the important design principles 216 and assumptions that guided the design choices in this proposal. 218 * Because ECN is likely to be adopted gradually, accommodating 219 migration is essential. Some routers may still only drop packets to 220 indicate congestion, and some end-systems may not be ECN-capable. The 221 most viable strategy is one that accommodates incremental deployment 222 without having to resort to "islands" of ECN-capable and non-ECN- 223 capable environments. 224 * New mechanisms for congestion control and avoidance need to co- 225 exist and cooperate with existing mechanisms for congestion control. 226 In particular, new mechanisms have to co-exist with TCP's current 227 methods of adapting to congestion and with routers' current practice 228 of dropping packets in periods of congestion. 229 * Congestion may persist over different time-scales. The time scales 230 that we are concerned with are congestion events that may last longer 231 than a round-trip time. 232 * The number of packets in an individual flow (e.g., TCP connection 233 or an exchange using UDP) may range from a small number of packets to 234 quite a large number. We are interested in managing the congestion 235 caused by flows that send enough packets so that they are still 236 active when network feedback reaches them. 237 * Asymmetric routing is likely to be a normal occurrence in the 238 Internet. The path (sequence of links and routers) followed by data 239 packets may be different from the path followed by the acknowledgment 240 packets in the reverse direction. 241 * Many routers process the "regular" headers in IP packets more 242 efficiently than they process the header information in IP options. 243 This suggests keeping congestion experienced information in the 244 regular headers of an IP packet. 245 * It must be recognized that not all end-systems will cooperate in 246 mechanisms for congestion control. However, new mechanisms shouldn't 247 make it easier for TCP applications to disable TCP congestion 248 control. The benefit of lying about participating in new mechanisms 249 such as ECN-capability should be small. 251 4. Active Queue Management (AQM) 253 Random Early Detection (RED) is one mechanism for Active Queue 254 Management (AQM) that has been proposed to detect incipient 255 congestion [FJ93], and is currently being deployed in the Internet 256 [RFC2309]. AQM is meant to be a general mechanism using one of 257 several alternatives for congestion indication, but in the absence of 258 ECN, AQM is restricted to using packet drops as a mechanism for 259 congestion indication. AQM drops packets based on the average queue 260 length exceeding a threshold, rather than only when the queue 261 overflows. However, because AQM may drop packets before the queue 262 actually overflows, AQM is not always forced by memory limitations to 263 discard the packet. 265 AQM can set a Congestion Experienced (CE) codepoint in the packet 266 header instead of dropping the packet, when such a field is provided 267 in the IP header and understood by the transport protocol. The use 268 of the CE codepoint with ECN allows the receiver(s) to receive the 269 packet, avoiding the potential for excessive delays due to 270 retransmissions after packet losses. We use the term 'CE packet' to 271 denote a packet that has the CE codepoint set. 273 5. Explicit Congestion Notification in IP 275 This document specifies that the Internet provide a congestion 276 indication for incipient congestion (as in RED and earlier work 277 [RJ90]) where the notification can sometimes be through marking 278 packets rather than dropping them. This uses an ECN field in the IP 279 header with two bits, making four ECN codepoints, '00' to '11'. The 280 ECN-Capable Transport (ECT) codepoints '10' and '01' are set by the 281 data sender to indicate that the end-points of the transport protocol 282 are ECN-capable; we call them ECT(0) and ECT(1) respectively. The 283 phrase "the ECT codepoint" in this documents refers to either of the 284 two ECT codepoints. Routers treat the ECT(0) and ECT(1) codepoints 285 as equivalent. Senders are free to use either the ECT(0) or the 286 ECT(1) codepoint to indicate ECT, on a packet-by-packet basis. 288 The use of both the two codepoints for ECT, ECT(0) and ECT(1), is 289 motivated primarily by the desire to allow mechanisms for the data 290 sender to verify that network elements are not erasing the CE 291 codepoint, and that data receivers are properly reporting to the 292 sender the receipt of packets with the CE codepoint set, as required 293 by the transport protocol. Guidelines for the senders and receivers 294 to differentiate between the ECT(0) and ECT(1) codepoints will be 295 addressed in separate documents, for each transport protocol. In 296 particular, this document does not address mechanisms for TCP end- 297 nodes to differentiate between the ECT(0) and ECT(1) codepoints. 299 Protocols and senders that only require a single ECT codepoint SHOULD 300 use ECT(0). 302 The not-ECT codepoint '00' indicates a packet that is not using ECN. 303 The CE codepoint '11' is set by a router to indicate congestion to 304 the end nodes. Routers that have a packet arriving at a full queue 305 drop the packet, just as they do in the absence of ECN. 307 +-----+-----+ 308 | ECN FIELD | 309 +-----+-----+ 310 ECT CE The ECT and CE bits defined in RFC 2481. 311 0 0 Not-ECT 312 0 1 ECT(1) 313 1 0 ECT(0) 314 1 1 CE 316 Figure 1: The ECN Field in IP. 318 The use of two ECT codepoints essentially gives a one-bit ECN nonce 319 in packet headers, and routers necessarily "erase" the nonce when 320 they set the CE codepoint [SCWA99]. For example, routers that erased 321 the CE codepoint would face additional difficulty in reconstructing 322 the original nonce, and thus repeated erasure of the CE codepoint 323 would be more likely to be detected by the end-nodes. The ECN nonce 324 also can address the problem of misbehaving transport receivers lying 325 to the transport sender about whether or not the CE codepoint was set 326 in a packet. The motivations for the use of two ECT codepoints is 327 discussed in more detail in Section 20, along with some discussion of 328 alternate possibilities for the fourth ECT codepoint. Backwards 329 compatibility with earlier ECN implementations that do not understand 330 the ECT(1) codepoint is discussed in Section 11. 332 In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable 333 Transport (ECT) bit and the CE bit. The ECN field with only the ECN- 334 Capable Transport (ECT) bit set in RFC 2481 corresponds to the ECT(0) 335 codepoint in this document, and the ECN field with both the ECT and 336 CE bit in RFC 2481 corresponds to the CE codepoint in this document. 337 The '01' codepoint was left undefined in RFC 2481, and this is the 338 reason for recommending the use of ECT(0) when only a single ECT 339 codepoint is needed. 341 0 1 2 3 4 5 6 7 342 +-----+-----+-----+-----+-----+-----+-----+-----+ 343 | DS FIELD, DSCP | ECN FIELD | 344 +-----+-----+-----+-----+-----+-----+-----+-----+ 346 DSCP: differentiated services codepoint 347 ECN: Explicit Congestion Notification 349 Figure 2: The Differentiated Services and ECN Fields in IP. 351 Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. 352 The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6, 353 and the ECN field is defined identically in both cases. The 354 definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic 355 Class octet have been superseded by the six-bit DS (Differentiated 356 Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in 357 [RFC2474] as Currently Unused, and are specified in RFC 2780 as 358 approved for experimental use for ECN. Section 22 gives a brief 359 history of the TOS octet. 361 Because of the unstable history of the TOS octet, the use of the ECN 362 field as specified in this document cannot be guaranteed to be 363 backwards compatible with those past uses of these two bits that pre- 364 date ECN. The potential dangers of this lack of backwards 365 compatibility are discussed in Section 22. 367 Upon the receipt by an ECN-Capable transport of a single CE packet, 368 the congestion control algorithms followed at the end-systems MUST be 369 essentially the same as the congestion control response to a *single* 370 dropped packet. For example, for ECN-Capable TCP the source TCP is 371 required to halve its congestion window for any window of data 372 containing either a packet drop or an ECN indication. 374 One reason for requiring that the congestion-control response to the 375 CE packet be essentially the same as the response to a dropped packet 376 is to accommodate the incremental deployment of ECN in both end- 377 systems and in routers. Some routers may drop ECN-Capable packets 378 (e.g., using the same AQM policies for congestion detection) while 379 other routers set the CE codepoint, for equivalent levels of 380 congestion. Similarly, a router might drop a non-ECN-Capable packet 381 but set the CE codepoint in an ECN-Capable packet, for equivalent 382 levels of congestion. If there were different congestion control 383 responses to a CE codepoint than to a packet drop, this could result 384 in unfair treatment for different flows. 386 An additional goal is that the end-systems should react to congestion 387 at most once per window of data (i.e., at most once per round-trip 388 time), to avoid reacting multiple times to multiple indications of 389 congestion within a round-trip time. 391 For a router, the CE codepoint of an ECN-Capable packet SHOULD only 392 be set if the router would otherwise have dropped the packet as an 393 indication of congestion to the end nodes. When the router's buffer 394 is not yet full and the router is prepared to drop a packet to inform 395 end nodes of incipient congestion, the router should first check to 396 see if the ECT codepoint is set in that packet's IP header. If so, 397 then instead of dropping the packet, the router MAY instead set the 398 CE codepoint in the IP header. 400 An environment where all end nodes were ECN-Capable could allow new 401 criteria to be developed for setting the CE codepoint, and new 402 congestion control mechanisms for end-node reaction to CE packets. 403 However, this is a research issue, and as such is not addressed in 404 this document. 406 When a CE packet (i.e., a packet that has the CE codepoint set) is 407 received by a router, the CE codepoint is left unchanged, and the 408 packet is transmitted as usual. When severe congestion has occurred 409 and the router's queue is full, then the router has no choice but to 410 drop some packet when a new packet arrives. We anticipate that such 411 packet losses will become relatively infrequent when a majority of 412 end-systems become ECN-Capable and participate in TCP or other 413 compatible congestion control mechanisms. In an ECN-Capable 414 environment that is adequately-provisioned, packet losses should 415 occur primarily during transients or in the presence of non- 416 cooperating sources. 418 The above discussion of when CE may be set instead of dropping a 419 packet applies by default to all Differentiated Services Per-Hop 420 Behaviors (PHBs) [RFC 2475]. Specifications for PHBs MAY provide 421 more specifics on how a compliant implementation is to choose between 422 setting CE and dropping a packet, but this is NOT REQUIRED. A router 423 MUST NOT set CE instead of dropping a packet when the drop that would 424 occur is caused by reasons other than congestion or the desire to 425 indicate incipient congestion to end nodes (e.g., a diffserv edge 426 node may be configured to unconditionally drop certain classes of 427 traffic to prevent them from entering its diffserv domain). 429 We expect that routers will set the CE codepoint in response to 430 incipient congestion as indicated by the average queue size, using 431 the RED algorithms suggested in [FJ93, RFC2309]. To the best of our 432 knowledge, this is the only proposal currently under discussion in 433 the IETF for routers to drop packets proactively, before the buffer 434 overflows. However, this document does not attempt to specify a 435 particular mechanism for active queue management, leaving that 436 endeavor, if needed, to other areas of the IETF. While ECN is 437 inextricably tied up with the need to have a reasonable active queue 438 management mechanism at the router, the reverse does not hold; active 439 queue management mechanisms have been developed and deployed 440 independent of ECN, using packet drops as indications of congestion 441 in the absence of ECN in the IP architecture. 443 5.1. ECN as an Indication of Persistent Congestion 445 We emphasize that a *single* packet with the CE codepoint set in an 446 IP packet causes the transport layer to respond, in terms of 447 congestion control, as it would to a packet drop. The instantaneous 448 queue size is likely to see considerable variations even when the 449 router does not experience persistent congestion. As such, it is 450 important that transient congestion at a router, reflected by the 451 instantaneous queue size reaching a threshold much smaller than the 452 capacity of the queue, not trigger a reaction at the transport layer. 453 Therefore, the CE codepoint should not be set by a router based on 454 the instantaneous queue size. 456 For example, since the ATM and Frame Relay mechanisms for congestion 457 indication have typically been defined without an associated notion 458 of average queue size as the basis for determining that an 459 intermediate node is congested, we believe that they provide a very 460 noisy signal. The TCP-sender reaction specified in this document for 461 ECN is NOT the appropriate reaction for such a noisy signal of 462 congestion notification. However, if the routers that interface to 463 the ATM network have a way of maintaining the average queue at the 464 interface, and use it to come to a reliable determination that the 465 ATM subnet is congested, they may use the ECN notification that is 466 defined here. 468 We continue to encourage experiments in techniques at layer 2 (e.g., 469 in ATM switches or Frame Relay switches) to take advantage of ECN. 470 For example, using a scheme such as RED (where packet marking is 471 based on the average queue length exceeding a threshold), layer 2 472 devices could provide a reasonably reliable indication of congestion. 473 When all the layer 2 devices in a path set that layer's own 474 Congestion Experienced codepoint (e.g., the EFCI bit for ATM, the 475 FECN bit in Frame Relay) in this reliable manner, then the interface 476 router to the layer 2 network could copy the state of that layer 2 477 Congestion Experienced codepoint into the CE codepoint in the IP 478 header. We recognize that this is not the current practice, nor is 479 it in current standards. However, encouraging experimentation in this 480 manner may provide the information needed to enable evolution of 481 existing layer 2 mechanisms to provide a more reliable means of 482 congestion indication, when they use a single bit for indicating 483 congestion. 485 5.2. Dropped or Corrupted Packets 487 For the proposed use for ECN in this document (that is, for a 488 transport protocol such as TCP for which a dropped data packet is an 489 indication of congestion), end nodes detect dropped data packets, and 490 the congestion response of the end nodes to a dropped data packet is 491 at least as strong as the congestion response to a received CE 492 packet. To ensure the reliable delivery of the congestion indication 493 of the CE codepoint, an ECT codepoint MUST NOT be set in a packet 494 unless the loss of that packet in the network would be detected by 495 the end nodes and interpreted as an indication of congestion. 497 Transport protocols such as TCP do not necessarily detect all packet 498 drops, such as the drop of a "pure" ACK packet; for example, TCP does 499 not reduce the arrival rate of subsequent ACK packets in response to 500 an earlier dropped ACK packet. Any proposal for extending ECN- 501 Capability to such packets would have to address issues such as the 502 case of an ACK packet that was marked with the CE codepoint but was 503 later dropped in the network. We believe that this aspect is still 504 the subject of research, so this document specifies that at this 505 time, "pure" ACK packets MUST NOT indicate ECN-Capability. 507 Similarly, if a CE packet is dropped later in the network due to 508 corruption (bit errors), the end nodes should still invoke congestion 509 control, just as TCP would today in response to a dropped data 510 packet. This issue of corrupted CE packets would have to be 511 considered in any proposal for the network to distinguish between 512 packets dropped due to corruption, and packets dropped due to 513 congestion or buffer overflow. In particular, the ubiquitous 514 deployment of ECN would not, in and of itself, be a sufficient 515 development to allow end-nodes to interpret packet drops as 516 indications of corruption rather than congestion. 518 5.3. Fragmentation 520 ECN-capable packets MAY have the DF (Don't Fragment) bit set. 521 Reassembly of a fragmented packet MUST NOT lose indications of 522 congestion. In other words, if any fragment of an IP packet to be 523 reassembled has the CE codepoint set, then one of two actions MUST be 524 taken: 525 * Set the CE codepoint on the reassembled packet. However, this 526 MUST NOT occur if any of the other fragments contributing to this 527 reassembly carries the Not-ECT codepoint. 528 * The packet is dropped, instead of being reassmembled, for any 529 other reason. 530 If both actions are applicable, either MAY be chosen. Reassembly of 531 a fragmented packet MUST NOT change the ECN codepoint when all of the 532 fragments carry the same codepoint. 534 We would note that because RFC 2481 did not specify reassembly 535 behavior, older ECN implementations conformant with that Experimental 536 RFC do not necessarily perform reassembly correctly, in terms of 537 preserving the CE codepoint in a fragment. The sender could avoid 538 the consequences of this behavior by setting the DF bit in ECN- 539 Capable packets. 541 Situations may arise in which the above reassembly specification is 542 insufficiently precise. For example, if there is a malicious or 543 broken entity in the path at or after the fragmentation point, packet 544 fragments could carry a mixture of ECT(0), ECT(1), and/or Not-ECT 545 codepoints. The reassembly specification above does not place 546 requirements on reassembly of fragments in this case. In situations 547 where more precise reassembly behavior would be required, protocol 548 specifications SHOULD instead specify that DF MUST be set in all ECN- 549 capable packets sent by the protocol. 551 6. Support from the Transport Protocol 553 ECN requires support from the transport protocol, in addition to the 554 functionality given by the ECN field in the IP packet header. The 555 transport protocol might require negotiation between the endpoints 556 during setup to determine that all of the endpoints are ECN-capable, 557 so that the sender can set the ECT codepoint in transmitted packets. 558 Second, the transport protocol must be capable of reacting 559 appropriately to the receipt of CE packets. This reaction could be 560 in the form of the data receiver informing the data sender of the 561 received CE packet (e.g., TCP), of the data receiver unsubscribing to 562 a layered multicast group (e.g., RLM [MJV96]), or of some other 563 action that ultimately reduces the arrival rate of that flow on that 564 congested link. CE packets indicate persistent rather than transient 565 congestion (see Section 5.1), and hence reactions to the receipt of 566 CE packets should be those appropriate for persistent congestion. 568 This document only addresses the addition of ECN Capability to TCP, 569 leaving issues of ECN in other transport protocols to further 570 research. For TCP, ECN requires three new pieces of functionality: 571 negotiation between the endpoints during connection setup to 572 determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the 573 TCP header so that the data receiver can inform the data sender when 574 a CE packet has been received; and a Congestion Window Reduced (CWR) 575 flag in the TCP header so that the data sender can inform the data 576 receiver that the congestion window has been reduced. The support 577 required from other transport protocols is likely to be different, 578 particularly for unreliable or reliable multicast transport 579 protocols, and will have to be determined as other transport 580 protocols are brought to the IETF for standardization. 582 6.1. TCP 584 The following sections describe in detail the proposed use of ECN in 585 TCP. This proposal is described in essentially the same form in 586 [Floyd94]. We assume that the source TCP uses the standard congestion 587 control algorithms of Slow-start, Fast Retransmit and Fast Recovery 588 [RFC 2001]. 590 This proposal specifies two new flags in the Reserved field of the 591 TCP header. The TCP mechanism for negotiating ECN-Capability uses 592 the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved 593 field of the TCP header is designated as the ECN-Echo flag. The 594 location of the 6-bit Reserved field in the TCP header is shown in 595 Figure 4 of RFC 793 [RFC793] (and is reproduced below for 596 completeness). This specification of the ECN Field leaves the 597 Reserved field as a 4-bit field using bits 4-7. 599 To enable the TCP receiver to determine when to stop setting the ECN- 600 Echo flag, we introduce a second new flag in the TCP header, the CWR 601 flag. The CWR flag is assigned to Bit 8 in the Reserved field of the 602 TCP header. 604 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 605 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 606 | | | U | A | P | R | S | F | 607 | Header Length | Reserved | R | C | S | S | Y | I | 608 | | | G | K | H | T | N | N | 609 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 611 Figure 3: The old definition of bytes 13 and 14 of the TCP 612 header. 614 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 615 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 616 | | | C | E | U | A | P | R | S | F | 617 | Header Length | Reserved | W | C | R | C | S | S | Y | I | 618 | | | R | E | G | K | H | T | N | N | 619 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 621 Figure 4: The new definition of bytes 13 and 14 of the TCP 622 Header. 624 Thus, ECN uses the ECT and CE flags in the IP header (as shown in 625 Figure 1) for signaling between routers and connection endpoints, and 626 uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure 627 4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, 628 a typical sequence of events in an ECN-based reaction to congestion 629 is as follows: 631 * An ECT codepoint is set in packets transmitted by the sender to 632 indicate that ECN is supported by the transport entities for these 633 packets. 634 * An ECN-capable router detects impending congestion and detects 635 that an ECT codepoint is set in the packet it is about to drop. 636 Instead of dropping the packet, the router chooses to set the CE 637 codepoint in the IP header and forwards the packet. 638 * The receiver receives the packet with the CE codepoint set, and 639 sets the ECN-Echo flag in its next TCP ACK sent to the sender. 640 * The sender receives the TCP ACK with ECN-Echo set, and reacts to 641 the congestion as if a packet had been dropped. 642 * The sender sets the CWR flag in the TCP header of the next 643 packet sent to the receiver to acknowledge its receipt of and 644 reaction to the ECN-Echo flag. 646 The negotiation for using ECN by the TCP transport entities and the 647 use of the ECN-Echo and CWR flags is described in more detail in the 648 sections below. 650 6.1.1 TCP Initialization 652 In the TCP connection setup phase, the source and destination TCPs 653 exchange information about their willingness to use ECN. Subsequent 654 to the completion of this negotiation, the TCP sender sets an ECT 655 codepoint in the IP header of data packets to indicate to the network 656 that the transport is capable and willing to participate in ECN for 657 this packet. This indicates to the routers that they may mark this 658 packet with the CE codepoint, if they would like to use that as a 659 method of congestion notification. If the TCP connection does not 660 wish to use ECN notification for a particular packet, the sending TCP 661 sets the ECN codepoint to not-ECT, and the TCP receiver ignores the 662 CE codepoint in the received packet. 664 For this discussion, we designate the initiating host as Host A and 665 the responding host as Host B. We call a SYN packet with the ECE and 666 CWR flags set an "ECN-setup SYN packet", and we call a SYN packet 667 with at least one of the ECE and CWR flags not set a "non-ECN-setup 668 SYN packet". Similarly, we call a SYN-ACK packet with only the ECE 669 flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and 670 we call a SYN-ACK packet with any other configuration of the ECE and 671 CWR flags a "non-ECN-setup SYN-ACK packet". 673 Before a TCP connection can use ECN, Host A sends an ECN-setup SYN 674 packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN 675 packet, the setting of both ECE and CWR in the ECN-setup SYN packet 676 is defined as an indication that the sending TCP is ECN-Capable, 677 rather than as an indication of congestion or of response to 678 congestion. More precisely, an ECN-setup SYN packet indicates that 679 the TCP implementation transmitting the SYN packet will participate 680 in ECN as both a sender and receiver. Specifically, as a receiver, 681 it will respond to incoming data packets that have the CE codepoint 682 set in the IP header by setting ECE in outgoing TCP Acknowledgement 683 (ACK) packets. As a sender, it will respond to incoming packets that 684 have ECE set by reducing the congestion window and setting CWR when 685 appropriate. An ECN-setup SYN packet does not commit the TCP sender 686 to setting the ECT codepoint in any or all of the packets it may 687 transmit. However, the commitment to respond appropriately to 688 incoming packets with the CE codepoint set remains even if the TCP 689 sender in a later transmission, within this TCP connection, sends a 690 SYN packet without ECE and CWR set. 692 When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag 693 but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an 694 indication that the TCP transmitting the SYN-ACK packet is ECN- 695 Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does 696 not commit the TCP host to setting the ECT codepoint in transmitted 697 packets. 699 The following rules apply to the sending of ECN-setup packets: 701 * If a host has received an ECN-setup SYN packet, then it MAY send an 702 ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup 703 SYN-ACK packet. 704 * A host MUST NOT set ECT on data packets unless it has sent at least 705 one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at 706 least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no 707 non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has 708 received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK 709 packet, then it SHOULD NOT set ECT on data packets. 710 * If a host ever sets the ECT codepoint on a data packet, then that 711 host MUST correctly set/clear the CWR TCP bit on all subsequent 712 packets in the connection. 713 * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK 714 packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- 715 ACK packet, then if that host receives TCP data packets with ECT and 716 CE codepoints set in the IP header, then that host MUST process these 717 packets as specified for an ECN-capable connection. 718 * A host that is not willing to use ECN on a TCP connection SHOULD 719 clear both the ECE and CWR flags in all non-ECN-setup SYN and/or SYN- 720 ACK packets that it sends to indicate this unwillingness. Receivers 721 MUST correctly handle all forms of the non-ECN-setup SYN and SYN-ACK 722 packets. 723 * A host MUST NOT set ECT on SYN or SYN-ACK packets. 725 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 727 There is the question of why we chose to have the TCP sending the SYN 728 set two ECN-related flags in the Reserved field of the TCP header for 729 the SYN packet, while the responding TCP sending the SYN-ACK sets 730 only one ECN-related flag in the SYN-ACK packet. This asymmetry is 731 necessary for the robust negotiation of ECN-capability with some 732 deployed TCP implementations. There exists at least one faulty TCP 733 implementation in which TCP receivers set the Reserved field of the 734 TCP header in ACK packets (and hence the SYN-ACK) simply to reflect 735 the Reserved field of the TCP header in the received data packet. 736 Because the TCP SYN packet sets the ECN-Echo and CWR flags to 737 indicate ECN-capability, while the SYN-ACK packet sets only the ECN- 738 Echo flag, the sending TCP correctly interprets a receiver's 739 reflection of its own flags in the Reserved field as an indication 740 that the receiver is not ECN-capable. The sending TCP is not mislead 741 by a faulty TCP implementation sending a SYN-ACK packet that simply 742 reflects the Reserved field of the incoming SYN packet. 744 6.1.2. The TCP Sender 746 For a TCP connection using ECN, new data packets are transmitted with 747 an ECT codepoint set in the IP header. When only one ECT codepoint 748 is needed by a sender for all packets sent on a TCP connection, 749 ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK 750 packet (that is, an ACK packet with the ECN-Echo flag set in the TCP 751 header), then the sender knows that congestion was encountered in the 752 network on the path from the sender to the receiver. The indication 753 of congestion should be treated just as a congestion loss in non-ECN- 754 Capable TCP. That is, the TCP source halves the congestion window 755 "cwnd" and reduces the slow start threshold "ssthresh". The sending 756 TCP SHOULD NOT increase the congestion window in response to the 757 receipt of an ECN-Echo ACK packet. 759 TCP should not react to congestion indications more than once every 760 window of data (or more loosely, more than once every round-trip 761 time). That is, the TCP sender's congestion window should be reduced 762 only once in response to a series of dropped and/or CE packets from a 763 single window of data. In addition, the TCP source should not 764 decrease the slow-start threshold, ssthresh, if it has been decreased 765 within the last round trip time. However, if any retransmitted 766 packets are dropped, then this is interpreted by the source TCP as a 767 new instance of congestion. 769 After the source TCP reduces its congestion window in response to a 770 CE packet, incoming acknowledgements that continue to arrive can 771 "clock out" outgoing packets as allowed by the reduced congestion 772 window. If the congestion window consists of only one MSS (maximum 773 segment size), and the sending TCP receives an ECN-Echo ACK packet, 774 then the sending TCP should in principle still reduce its congestion 775 window in half. However, the value of the congestion window is 776 bounded below by a value of one MSS. If the sending TCP were to 777 continue to send, using a congestion window of 1 MSS, this results in 778 the transmission of one packet per round-trip time. It is necessary 779 to still reduce the sending rate of the TCP sender even further, on 780 receipt of an ECN-Echo packet when the congestion window is one. We 781 use the retransmit timer as a means of reducing the rate further in 782 this circumstance. Therefore, the sending TCP MUST reset the 783 retransmit timer on receiving the ECN-Echo packet when the congestion 784 window is one. The sending TCP will then be able to send a new 785 packet only when the retransmit timer expires. 787 When an ECN-Capable TCP sender reduces its congestion window for any 788 reason (because of a retransmit timeout, a Fast Retransmit, or in 789 response to an ECN Notification), the TCP sender sets the CWR flag in 790 the TCP header of the first new data packet sent after the window 791 reduction. If that data packet is dropped in the network, then the 792 sending TCP will have to reduce the congestion window again and 793 retransmit the dropped packet. 795 We ensure that the "Congestion Window Reduced" information is 796 reliably delivered to the TCP receiver. This comes about from the 797 fact that if the new data packet carrying the CWR flag is dropped, 798 then the TCP sender will have to again reduce its congestion window, 799 and send another new data packet with the CWR flag set. Thus, the 800 CWR bit in the TCP header SHOULD NOT be set on retransmitted packets. 801 When the TCP data sender is ready to set the CWR bit after reducing 802 the congestion window, it SHOULD set the CWR bit only on the first 803 new data packet that it transmits. 805 [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] 806 discusses the validation test in the ns simulator, which illustrates 807 a wide range of ECN scenarios. These scenarios include the following: 808 an ECN followed by another ECN, a Fast Retransmit, or a Retransmit 809 Timeout; a Retransmit Timeout or a Fast Retransmit followed by an 810 ECN; and a congestion window of one packet followed by an ECN. 812 TCP follows existing algorithms for sending data packets in response 813 to incoming ACKs, multiple duplicate acknowledgements, or retransmit 814 timeouts [RFC2581]. TCP also follows the normal procedures for 815 increasing the congestion window when it receives ACK packets without 816 the ECN-Echo bit set [RFC2581]. 818 6.1.3. The TCP Receiver 820 When TCP receives a CE data packet at the destination end-system, the 821 TCP data receiver sets the ECN-Echo flag in the TCP header of the 822 subsequent ACK packet. If there is any ACK withholding implemented, 823 as in current "delayed-ACK" TCP implementations where the TCP 824 receiver can send an ACK for two arriving data packets, then the ECN- 825 Echo flag in the ACK packet will be set to '1' if the CE codepoint is 826 set in any of the data packets being acknowledged. That is, if any 827 of the received data packets are CE packets, then the returning ACK 828 has the ECN-Echo flag set. 830 To provide robustness against the possibility of a dropped ACK packet 831 carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in 832 a series of ACK packets sent subsequently. The TCP receiver uses the 833 CWR flag received from the TCP sender to determine when to stop 834 setting the ECN-Echo flag. 836 After a TCP receiver sends an ACK packet with the ECN-Echo bit set, 837 that TCP receiver continues to set the ECN-Echo flag in all the ACK 838 packets it sends (whether they acknowledge CE data packets or non-CE 839 data packets) until it receives a CWR packet (a packet with the CWR 840 flag set). After the receipt of the CWR packet, acknowledgements for 841 subsequent non-CE data packets do not have the ECN-Echo flag set. If 842 another CE packet is received by the data receiver, the receiver 843 would once again send ACK packets with the ECN-Echo flag set. While 844 the receipt of a CWR packet does not guarantee that the data sender 845 received the ECN-Echo message, this does suggest that the data sender 846 reduced its congestion window at some point *after* it sent the data 847 packet for which the CE codepoint was set. 849 We have already specified that a TCP sender is not required to reduce 850 its congestion window more than once per window of data. Some care 851 is required if the TCP sender is to avoid unnecessary reductions of 852 the congestion window when a window of data includes both dropped 853 packets and (marked) CE packets. This is illustrated in [Floyd98]. 855 6.1.4. Congestion on the ACK-path 857 For the current generation of TCP congestion control algorithms, pure 858 acknowledgement packets (e.g., packets that do not contain any 859 accompanying data) should be sent with the not-ECT codepoint. 860 Current TCP receivers have no mechanisms for reducing traffic on the 861 ACK-path in response to congestion notification. Mechanisms for 862 responding to congestion on the ACK-path are areas for current and 863 future research. (One simple possibility would be for the sender to 864 reduce its congestion window when it receives a pure ACK packet with 865 the CE codepoint set). For current TCP implementations, a single 866 dropped ACK generally has only a very small effect on the TCP's 867 sending rate. 869 6.1.5. Retransmitted TCP packets 871 This document specifies ECN-capable TCP implementations MUST NOT set 872 either ECT codepoint (ECT(0) or ECT(1)) in the IP header for 873 retransmitted data packets, and that the TCP data receiver SHOULD 874 ignore the ECN field on arriving data packets that are outside of the 875 receiver's current window. This is for greater security against 876 denial-of-service attacks, as well as for robustness of the ECN 877 congestion indication with packets that are dropped later in the 878 network. 880 First, we note that if the TCP sender were to set an ECT codepoint on 881 a retransmitted packet, then if an unnecessarily-retransmitted packet 882 was later dropped in the network, the end nodes would never receive 883 the indication of congestion from the router setting the CE 884 codepoint. Thus, setting an ECT codepoint on retransmitted data 885 packets is not consistent with the robust delivery of the congestion 886 indication even for packets that are later dropped in the network. 888 In addition, an attacker capable of spoofing the IP source address of 889 the TCP sender could send data packets with arbitrary sequence 890 numbers, with the CE codepoint set in the IP header. On receiving 891 this spoofed data packet, the TCP data receiver would determine that 892 the data does not lie in the current receive window, and return a 893 duplicate acknowledgement. We define an out-of-window packet at the 894 TCP data receiver as a data packet that lies outside the receiver's 895 current window. On receiving an out-of-window packet, the TCP data 896 receiver has to decide whether or not to treat the CE codepoint in 897 the packet header as a valid indication of congestion, and therefore 898 whether to return ECN-Echo indications to the TCP data sender. If 899 the TCP data receiver ignored the CE codepoint in an out-of-window 900 packet, then the TCP data sender would not receive this possibly- 901 legitimate indication of congestion from the network, resulting in a 902 violation of end-to-end congestion control. On the other hand, if 903 the TCP data receiver honors the CE indication in the out-of-window 904 packet, and reports the indication of congestion to the TCP data 905 sender, then the malicious node that created the spoofed, out-of- 906 window packet has successfully "attacked" the TCP connection by 907 forcing the data sender to unnecessarily reduce (halve) its 908 congestion window. To prevent such a denial-of-service attack, we 909 specify that a legitimate TCP data sender MUST NOT set an ECT 910 codepoint on retransmitted data packets, and that the TCP data 911 receiver SHOULD ignore the CE codepoint on out-of-window packets. 913 One drawback of not setting ECT(0) or ECT(1) on retransmitted packets 914 is that it denies ECN protection for retransmitted packets. However, 915 for an ECN-capable TCP connection in a fully-ECN-capable environment 916 with mild congestion, packets should rarely be dropped due to 917 congestion in the first place, and so instances of retransmitted 918 packets should rarely arise. If packets are being retransmitted, 919 then there are already packet losses (from corruption or from 920 congestion) that ECN has been unable to prevent. 922 We note that if the router sets the CE codepoint for an ECN-capable 923 data packet within a TCP connection, then the TCP connection is 924 guaranteed to receive that indication of congestion, or to receive 925 some other indication of congestion within the same window of data, 926 even if this packet is dropped or reordered in the network. We 927 consider two cases, when the packet is later retransmitted, and when 928 the packet is not later retransmitted. 930 In the first case, if the packet is either dropped or delayed, and at 931 some point retransmitted by the data sender, then the retransmission 932 is a result of a Fast Retransmit or a Retransmit Timeout for either 933 that packet or for some prior packet in the same window of data. In 934 this case, because the data sender already has retransmitted this 935 packet, we know that the data sender has already responded to an 936 indication of congestion for some packet within the same window of 937 data as the original packet. Thus, even if the first transmission of 938 the packet is dropped in the network, or is delayed, if it had the CE 939 codepoint set, and is later ignored by the data receiver as an out- 940 of-window packet, this is not a problem, because the sender has 941 already responded to an indication of congestion for that window of 942 data. 944 In the second case, if the packet is never retransmitted by the data 945 sender, then this data packet is the only copy of this data received 946 by the data receiver, and therefore arrives at the data receiver as 947 an in-window packet, regardless of how much the packet might be 948 delayed or reordered. In this case, if the CE codepoint is set on 949 the packet within the network, this will be treated by the data 950 receiver as a valid indication of congestion. 952 6.1.6. TCP Window Probes. 954 When the TCP data receiver advertises a zero window, the TCP data 955 sender sends window probes to determine if the receiver's window has 956 increased. Window probe packets do not contain any user data except 957 for the sequence number, which is a byte. If a window probe packet 958 is dropped in the network, this loss is not detected by the receiver. 959 Therefore, the TCP data sender MUST NOT set either an ECT codepoint 960 or the CWR bit on window probe packets. 962 However, because window probes use exact sequence numbers, they 963 cannot be easily spoofed in denial-of-service attacks. Therefore, if 964 a window probe arrives with the CE codepoint set, then the receiver 965 SHOULD respond to the ECN indications. 967 7. Non-compliance by the End Nodes 969 This section discusses concerns about the vulnerability of ECN to 970 non-compliant end-nodes (i.e., end nodes that set the ECT codepoint 971 in transmitted packets but do not respond to received CE packets). 972 We argue that the addition of ECN to the IP architecture will not 973 significantly increase the current vulnerability of the architecture 974 to unresponsive flows. 976 Even for non-ECN environments, there are serious concerns about the 977 damage that can be done by non-compliant or unresponsive flows (that 978 is, flows that do not respond to congestion control indications by 979 reducing their arrival rate at the congested link). For example, an 980 end-node could "turn off congestion control" by not reducing its 981 congestion window in response to packet drops. This is a concern for 982 the current Internet. It has been argued that routers will have to 983 deploy mechanisms to detect and differentially treat packets from 984 non-compliant flows [RFC2309,FF99]. It has also been suggested that 985 techniques such as end-to-end per-flow scheduling and isolation of 986 one flow from another, differentiated services, or end-to-end 987 reservations could remove some of the more damaging effects of 988 unresponsive flows. 990 It might seem that dropping packets in itself is an adequate 991 deterrent for non-compliance, and that the use of ECN removes this 992 deterrent. We would argue in response that (1) ECN-capable routers 993 preserve packet-dropping behavior in times of high congestion; and 994 (2) even in times of high congestion, dropping packets in itself is 995 not an adequate deterrent for non-compliance. 997 First, ECN-Capable routers will only mark packets (as opposed to 998 dropping them) when the packet marking rate is reasonably low. During 999 periods where the average queue size exceeds an upper threshold, and 1000 therefore the potential packet marking rate would be high, our 1001 recommendation is that routers drop packets rather then set the CE 1002 codepoint in packet headers. 1004 During the periods of low or moderate packet marking rates when ECN 1005 would be deployed, there would be little deterrent effect on 1006 unresponsive flows of dropping rather than marking those packets. For 1007 example, delay-insensitive flows using reliable delivery might have 1008 an incentive to increase rather than to decrease their sending rate 1009 in the presence of dropped packets. Similarly, delay-sensitive flows 1010 using unreliable delivery might increase their use of FEC in response 1011 to an increased packet drop rate, increasing rather than decreasing 1012 their sending rate. For the same reasons, we do not believe that 1013 packet dropping itself is an effective deterrent for non-compliance 1014 even in an environment of high packet drop rates, when all flows are 1015 sharing the same packet drop rate. 1017 Several methods have been proposed to identify and restrict non- 1018 compliant or unresponsive flows. The addition of ECN to the network 1019 environment would not in any way increase the difficulty of designing 1020 and deploying such mechanisms. If anything, the addition of ECN to 1021 the architecture would make the job of identifying unresponsive flows 1022 slightly easier. For example, in an ECN-Capable environment routers 1023 are not limited to information about packets that are dropped or have 1024 the CE codepoint set at that router itself; in such an environment, 1025 routers could also take note of arriving CE packets that indicate 1026 congestion encountered by that packet earlier in the path. 1028 8. Non-compliance in the Network 1030 This section considers the issues when a router is operating, 1031 possibly maliciously, to modify either of the bits in the ECN field. 1033 By tampering with the bits in the ECN field, an adversary (or a 1034 broken router) could do one or more of the following: falsely report 1035 congestion, disable ECN-Capability for an individual packet, erase 1036 the ECN congestion indication, or falsely indicate ECN-Capability. 1037 Section 18 systematically examines the various cases by which the ECN 1038 field could be modified. The important criterion considered in 1039 determining the consequences of such modifications is whether it is 1040 likely to lead to poorer behavior in any dimension (throughput, 1041 delay, fairness or functionality) than if a router were to drop a 1042 packet. 1044 The first two possible changes, falsely reporting congestion or 1045 disabling ECN-Capability for an individual packet, are no worse than 1046 if the router were to simply drop the packet. From a congestion 1047 control point of view, setting the CE codepoint in the absence of 1048 congestion by a non-compliant router would be no worse than a router 1049 dropping a packet unnecessarily. By "erasing" an ECT codepoint of a 1050 packet that is later dropped in the network, a router's actions could 1051 result in an unnecessary packet drop for that packet later in the 1052 network. 1054 However, as discussed in Section 18, a router that erases the ECN 1055 congestion indication or falsely indicates ECN-Capability could 1056 potentially do more damage to the flow that if it has simply dropped 1057 the packet. A rogue or broken router that "erased" the CE codepoint 1058 in arriving CE packets would prevent that indication of congestion 1059 from reaching downstream receivers. This could result in the failure 1060 of congestion control for that flow and a resulting increase in 1061 congestion in the network, ultimately resulting in subsequent packets 1062 dropped for this flow as the average queue size increased at the 1063 congested gateway. 1065 Section 19 considers the potential repercussions of subverting end- 1066 to-end congestion control by either falsely indicating ECN- 1067 Capability, or by erasing the congestion indication in ECN (the CE- 1068 codepoint). We observe in Section 19 that the consequence of 1069 subverting ECN-based congestion control may lead to potential 1070 unfairness, but this is likely to be no worse than the subversion of 1071 either ECN-based or packet-based congestion control by the end nodes. 1073 8.1. Complications Introduced by Split Paths 1075 If a router or other network element has access to all of the packets 1076 of a flow, then that router could do no more damage to a flow by 1077 altering the ECN field than it could by simply dropping all of the 1078 packets from that flow. However, in some cases, a malicious or 1079 broken router might have access to only a subset of the packets from 1080 a flow. The question is as follows: can this router, by altering 1081 the ECN field in this subset of the packets, do more damage to that 1082 flow than if it has simply dropped that set of the packets? 1084 This is also discussed in detail in Section 18, which conclude as 1085 follows: It is true that the adversary that has access only to a 1086 subset of packets in an aggregate might, by subverting ECN-based 1087 congestion control, be able to deny the benefits of ECN to the other 1088 packets in the aggregate. While this is undesirable, this is not a 1089 sufficient concern to result in disabling ECN. 1091 9. Encapsulated Packets 1093 9.1. IP packets encapsulated in IP 1095 The encapsulation of IP packet headers in tunnels is used in many 1096 places, including IPsec and IP in IP [RFC2003]. This section 1097 considers issues related to interactions between ECN and IP tunnels, 1098 and specifies two alternative solutions. This discussion is 1099 complemented by RFC 2983's discussion of interactions between 1100 Differentiated Services and IP tunnels of various forms [RFC 2983], 1101 as Differentiated Services uses the remaining six bits of the IP 1102 header octet that is used by ECN (see Figure 2 in Section 5). 1104 Some IP tunnel modes are based on adding a new "outer" IP header that 1105 encapsulates the original, or "inner" IP header and its associated 1106 packet. In many cases, the new "outer" IP header may be added and 1107 removed at intermediate points along a connection, enabling the 1108 network to establish a tunnel without requiring endpoint 1109 participation. We denote tunnels that specify that the outer header 1110 be discarded at tunnel egress as "simple tunnels". 1112 ECN uses the ECN field in the IP header for signaling between routers 1113 and connection endpoints. ECN interacts with IP tunnels based on the 1114 treatment of the ECN field in the IP header. In simple IP tunnels 1115 the octet containing the ECN field is copied or mapped from the inner 1116 IP header to the outer IP header at IP tunnel ingress, and the outer 1117 header's copy of this field is discarded at IP tunnel egress. If the 1118 outer header were to be simply discarded without taking care to deal 1119 with the ECN field, and an ECN-capable router were to set the CE 1120 (Congestion Experienced) codepoint within a packet in a simple IP 1121 tunnel, this indication would be discarded at tunnel egress, losing 1122 the indication of congestion. 1124 Thus, the use of ECN over simple IP tunnels would result in routers 1125 attempting to use the outer IP header to signal congestion to 1126 endpoints, but those congestion warnings never arriving because the 1127 outer header is discarded at the tunnel egress point. This problem 1128 was encountered with ECN and IPsec in tunnel mode, and RFC 2481 1129 recommended that ECN not be used with the older simple IPsec tunnels 1130 in order to avoid this behavior and its consequences. When ECN 1131 becomes widely deployed, then simple tunnels likely to carry ECN- 1132 capable traffic will have to be changed. 1134 From a security point of view, the use of ECN in the outer header of 1135 an IP tunnel might raise security concerns because an adversary could 1136 tamper with the ECN information that propagates beyond the tunnel 1137 endpoint. Based on an analysis in Sections 18 and 19 of these 1138 concerns and the resultant risks, our overall approach is to make 1139 support for ECN an option for IP tunnels, so that an IP tunnel can be 1140 specified or configured either to use ECN or not to use ECN in the 1141 outer header of the tunnel. Thus, in environments or tunneling 1142 protocols where the risks of using ECN are judged to outweigh its 1143 benefits, the tunnel can simply not use ECN in the outer header. 1144 Then the only indication of congestion experienced at routers within 1145 the tunnel would be through packet loss. 1147 The result is that there are two viable options for the behavior of 1148 ECN-capable connections over an IP tunnel, especially IPsec tunnels: 1149 * A limited-functionality option in which ECN is preserved in the 1150 inner header, but disabled in the outer header. The only 1151 mechanism available for signaling congestion occurring within the 1152 tunnel in this case is dropped packets. 1154 * A full-functionality option that supports ECN in both the inner 1155 and outer headers, and propagates congestion warnings from nodes 1156 within the tunnel to endpoints. 1158 Support for these options requires varying amounts of changes to IP 1159 header processing at tunnel ingress and egress. A small subset of 1160 these changes sufficient to support only the limited-functionality 1161 option would be sufficient to eliminate any incompatibility between 1162 ECN and IP tunnels. 1164 One goal of this document is to give guidance about the tradeoffs 1165 between the limited-functionality and full-functionality options. A 1166 full discussion of the potential effects of an adversary's 1167 modifications of the ECN field is given in Sections 18 and 19. 1169 9.1.1. The Limited-functionality and Full-functionality Options 1171 The limited-functionality option for ECN encapsulation in IP tunnels 1172 is for the non-ECT codepoint to be set in the outside (encapsulating) 1173 header regardless of the value of the ECN field in the inside 1174 (encapsulated) header. With this option, the ECN field in the inner 1175 header is not altered upon de-capsulation. The disadvantage of this 1176 approach is that the flow does not have ECN support for that part of 1177 the path that is using IP tunneling, even if the encapsulated packet 1178 (from the original TCP sender) is ECN-Capable. That is, if the 1179 encapsulated packet arrives at a congested router that is ECN- 1180 capable, and the router can decide to drop or mark the packet as an 1181 indication of congestion to the end nodes, the router will not be 1182 permitted to set the CE codepoint in the packet header, but instead 1183 will have to drop the packet. 1185 The full-functionality option for ECN encapsulation is to copy the 1186 ECN codepoint of the inside header to the outside header on 1187 encapsulation if the inside header is not-ECT or ECT, and to set the 1188 ECN codepoint of the outside header to ECT(0) if the ECN codepoint of 1189 the inside header is CE. On decapsulation, if the CE codepoint is 1190 set on the outside header, then the CE codepoint is also set in the 1191 inner header. Otherwise, the ECN codepoint on the inner header is 1192 left unchanged. That is, for full ECN support the encapsulation and 1193 decapsulation processing involves the following: At tunnel ingress, 1194 the full-functionality option sets the ECN codepoint in the outer 1195 header. If the ECN codepoint in the inner header is not-ECT or ECT, 1196 then it is copied to the ECN codepoint in the outer header. If the 1197 ECN codepoint in the inner header is CE, then the ECN codepoint in 1198 the outer header is set to ECT(0). Upon decapsulation at the tunnel 1199 egress, the full-functionality option sets the CE codepoint in the 1200 inner header if the CE codepoint is set in the outer header. 1201 Otherwise, no change is made to this field of the inner header. 1203 With the full-functionality option, a flow can take advantage of ECN 1204 in those parts of the path that might use IP tunneling. The 1205 disadvantage of the full-functionality option from a security 1206 perspective is that the IP tunnel cannot protect the flow from 1207 certain modifications to the ECN bits in the IP header within the 1208 tunnel. The potential dangers from modifications to the ECN bits in 1209 the IP header are described in detail in Sections 18 and 19. 1211 (1) An IP tunnel MUST modify the handling of the DS field octet at 1212 IP tunnel endpoints by implementing either the limited- 1213 functionality or the full-functionality option. 1214 (2) Optionally, an IP tunnel MAY enable the endpoints of an IP 1215 tunnel to negotiate the choice between the limited-functionality 1216 and the full-functionality option for ECN in the tunnel. 1218 The minimum required to make ECN usable with IP tunnels is the 1219 limited-functionality option, which prevents ECN from being enabled 1220 in the outer header of an IPsec tunnel. Full support for ECN 1221 requires the use of the full-functionality option. If there are no 1222 optional mechanisms for the tunnel endpoints to negotiate a choice 1223 between the limited-functionality or full-functionality option, there 1224 can be a pre-existing agreement between the tunnel endpoints about 1225 whether to support the limited-functionality or the full- 1226 functionality ECN option. 1228 In addition, it is RECOMMENDED that packets with the CE codepoint in 1229 the outer header be dropped if they arrive at the tunnel egress point 1230 for a tunnel that uses the limited-functionality option, or for a 1231 tunnel that uses the full-functionality option but for which the not- 1232 ECT codepoint is set in the inner header. This is motivated by 1233 backwards compatibility and to ensure that no unauthorized 1234 modifications of the ECN field take place, and is discussed further 1235 in the next Section (9.1.2). 1237 9.1.2. Changes to the ECN Field within an IP Tunnel. 1239 The presence of a copy of the ECN field in the inner header of an IP 1240 tunnel mode packet provides an opportunity for detection of 1241 unauthorized modifications to the ECN field in the outer header. 1242 Comparison of the ECT fields in the inner and outer headers falls 1243 into two categories for implementations that conform to this 1244 document: 1245 * If the IP tunnel uses the full-functionality option, then the 1246 not-ECT codepoint should be set in the outer header if and only if 1247 it is also set in the inner header. 1248 * If the tunnel uses the limited-functionality option, then the 1249 not-ECT codepoint should be set in the outer header. 1251 Receipt of a packet not satisfying the appropriate condition could be 1252 a cause of concern. 1254 Consider the case of an IP tunnel where the tunnel ingress point has 1255 not been updated to this document's requirements, while the tunnel 1256 egress point has been updated to support ECN. In this case, the IP 1257 tunnel is not explicitly configured to support the full-functionality 1258 ECN option. However, the tunnel ingress point is behaving identically 1259 to a tunnel ingress point that supports the full-functionality 1260 option. If packets from an ECN-capable connection use this tunnel, 1261 the ECT codepoint will be set in the outer header at the tunnel 1262 ingress point. Congestion within the tunnel may then result in ECN- 1263 capable routers setting CE in the outer header. Because the tunnel 1264 has not been explicitly configured to support the full-functionality 1265 option, the tunnel egress point expects the not-ECT codepoint to be 1266 set in the outer header. When an ECN-capable tunnel egress point 1267 receives a packet with the ECT or CE codepoint in the outer header, 1268 in a tunnel that has not been configured to support the full- 1269 functionality option, that packet should be processed, according to 1270 whether the CE codepoint was set, as follows. It is RECOMMENDED that 1271 on a tunnel that has not been configured to support the full- 1272 functionality option, packets should be dropped at the egress point 1273 if the CE codepoint is set in the outer header but not in the inner 1274 header, and should be forwarded otherwise. 1276 An IP tunnel cannot provide protection against erasure of congestion 1277 indications based on changing the ECN codepoint from CE to ECT. The 1278 erasure of congestion indications may impact the network and other 1279 flows in ways that would not be possible in the absence of ECN. It 1280 is important to note that erasure of congestion indications can only 1281 be performed to congestion indications placed by nodes within the 1282 tunnel; the copy of the ECN field in the inner header preserves 1283 congestion notifications from nodes upstream of the tunnel ingress 1284 (unless the inner header is also erased). If erasure of congestion 1285 notifications is judged to be a security risk that exceeds the 1286 congestion management benefits of ECN, then tunnels could be 1287 specified or configured to use the limited-functionality option. 1289 9.2. IPsec Tunnels 1291 IPsec supports secure communication over potentially insecure network 1292 components such as intermediate routers. IPsec protocols support two 1293 operating modes, transport mode and tunnel mode, that span a wide 1294 range of security requirements and operating environments. Transport 1295 mode security protocol header(s) are inserted between the IP (IPv4 or 1296 IPv6) header and higher layer protocol headers (e.g., TCP), and hence 1297 transport mode can only be used for end-to-end security on a 1298 connection. IPsec tunnel mode is based on adding a new "outer" IP 1299 header that encapsulates the original, or "inner" IP header and its 1300 associated packet. Tunnel mode security headers are inserted between 1301 these two IP headers. In contrast to transport mode, the new "outer" 1302 IP header and tunnel mode security headers can be added and removed 1303 at intermediate points along a connection, enabling security gateways 1304 to secure vulnerable portions of a connection without requiring 1305 endpoint participation in the security protocols. An important 1306 aspect of tunnel mode security is that in the original specification, 1307 the outer header is discarded at tunnel egress, ensuring that 1308 security threats based on modifying the IP header do not propagate 1309 beyond that tunnel endpoint. Further discussion of IPsec can be 1310 found in [RFC2401]. 1312 The IPsec protocol as originally defined in [ESP, AH] required that 1313 the inner header's ECN field not be changed by IPsec decapsulation 1314 processing at a tunnel egress node; this would have ruled out the 1315 possibility of full-functionality mode for ECN. At the same time, 1316 this would ensure that an adversary's modifications to the ECN field 1317 cannot be used to launch theft- or denial-of-service attacks across 1318 an IPsec tunnel endpoint, as any such modifications will be discarded 1319 at the tunnel endpoint. 1321 In principle, permitting the use of ECN functionality in the outer 1322 header of an IPsec tunnel raises security concerns because an 1323 adversary could tamper with the information that propagates beyond 1324 the tunnel endpoint. Based on an analysis (included in Sections 18 1325 and 19) of these concerns and the associated risks, our overall 1326 approach has been to provide configuration support for IPsec changes 1327 to remove the conflict with ECN. 1329 In particular, in tunnel mode the IPsec tunnel MUST support either 1330 the limited-functionality or the full-functionality mode outlined in 1331 Section 9.1.1. 1333 This makes permission to use ECN functionality in the outer header of 1334 an IPsec tunnel a configurable part of the corresponding IPsec 1335 Security Association (SA), so that it can be disabled in situations 1336 where the risks are judged to outweigh the benefits. The result is 1337 that an IPsec security administrator is presented with two 1338 alternatives for the behavior of ECN-capable connections within an 1339 IPsec tunnel, the limited-functionality alternative and full- 1340 functionality alternative described earlier. All IPsec 1341 implementations MUST implement either the limited-functionality or 1342 the full-functionality alternative in order to eliminate 1343 incompatibility between ECN and IPsec tunnels, but implementers MAY 1344 choose to implement either alternative. 1346 In addition, this document specifies how the endpoints of an IPsec 1347 tunnel could negotiate enabling ECN functionality in the outer 1348 headers of that tunnel based on security policy. The ability to 1349 negotiate ECN usage between tunnel endpoints would enable a security 1350 administrator to disable ECN in situations where she believes the 1351 risks (e.g., of lost congestion notifications) outweigh the benefits 1352 of ECN. 1354 The IPsec protocol, as defined in [ESP, AH], does not include the IP 1355 header's ECN field in any of its cryptographic calculations (in the 1356 case of tunnel mode, the outer IP header's ECN field is not 1357 included). Hence modification of the ECN field by a network node has 1358 no effect on IPsec's end-to-end security, because it cannot cause any 1359 IPsec integrity check to fail. As a consequence, IPsec does not 1360 provide any defense against an adversary's modification of the ECN 1361 field (i.e., a man-in-the-middle attack), as the adversary's 1362 modification will also have no effect on IPsec's end-to-end security. 1363 In some environments, the ability to modify the ECN field without 1364 affecting IPsec integrity checks may constitute a covert channel; if 1365 it is necessary to eliminate such a channel or reduce its bandwidth, 1366 then the IPsec tunnel should be run in limited-functionality mode. 1368 9.2.1. Negotiation between Tunnel Endpoints 1370 This section describes the detailed changes to enable usage of ECN 1371 over IPsec tunnels, including the negotiation of ECN support between 1372 tunnel endpoints. This is supported by three changes to IPsec: 1373 * An optional Security Association Database (SAD) field indicating 1374 whether tunnel encapsulation and decapsulation processing allows 1375 or forbids ECN usage in the outer IP header. 1376 * An optional Security Association Attribute that enables 1377 negotiation of this SAD field between the two endpoints of an SA 1378 that supports tunnel mode. 1379 * Changes to tunnel mode encapsulation and decapsulation 1380 processing to allow or forbid ECN usage in the outer IP header 1381 based on the value of the SAD field. When ECN usage is allowed in 1382 the outer IP header, the ECT codepoint is set in the outer header 1383 for ECN-capable connections and congestion notifications 1384 (indicated by the CE codepoint) from such connections are 1385 propagated to the inner header at tunnel egress. 1387 If negotiation of ECN usage is implemented, then the SAD field SHOULD 1388 also be implemented. On the other hand, negotiation of ECN usage is 1389 OPTIONAL in all cases, even for implementations that support the SAD 1390 field. The encapsulation and decapsulation processing changes are 1391 REQUIRED, but MAY be implemented without the other two changes by 1392 assuming that ECN usage is always forbidden. The full-functionality 1393 alternative for ECN usage over IPsec tunnels consists of the SAD 1394 field and the full version of encapsulation and decapsulation 1395 processing changes, with or without the OPTIONAL negotiation support. 1396 The limited-functionality alternative consists of a subset of the 1397 encapsulation and decapsulation changes that always forbids ECN 1398 usage. 1400 These changes are covered further in the following three subsections. 1402 9.2.1.1. ECN Tunnel Security Association Database Field 1404 Full ECN functionality adds a new field to the SAD (see [RFC2401]): 1406 ECN Tunnel: allowed or forbidden. 1408 Indicates whether ECN-capable connections using this SA in tunnel 1409 mode are permitted to receive ECN congestion notifications for 1410 congestion occurring within the tunnel. The allowed value enables 1411 ECN congestion notifications. The forbidden value disables such 1412 notifications, causing all congestion to be indicated via dropped 1413 packets. 1415 [OPTIONAL. The value of this field SHOULD be assumed to be 1416 "forbidden" in implementations that do not support it.] 1418 If this attribute is implemented, then the SA specification in a 1419 Security Policy Database (SPD) entry MUST support a corresponding 1420 attribute, and this SPD attribute MUST be covered by the SPD 1421 administrative interface (currently described in Section 4.4.1 of 1422 [RFC2401]). 1424 9.2.1.2. ECN Tunnel Security Association Attribute 1426 A new IPsec Security Association Attribute is defined to enable the 1427 support for ECN congestion notifications based on the outer IP header 1428 to be negotiated for IPsec tunnels (see [RFC2407]). This attribute 1429 is OPTIONAL, although implementations that support it SHOULD also 1430 support the SAD field defined in Section 9.2.1.1. 1432 Attribute Type 1434 class value type 1435 ------------------------------------------------- 1436 ECN Tunnel 10 Basic 1438 The IPsec SA Attribute value 10 has been allocated by IANA to 1439 indicate that the ECN Tunnel SA Attribute is being negotiated; the 1440 type of this attribute is Basic (see Section 4.5 of [RFC2407]). The 1441 Class Values are used to conduct the negotiation. See [RFC2407, 1442 RFC2408, RFC2409] for further information including encoding formats 1443 and requirements for negotiating this SA attribute. 1445 Class Values 1447 ECN Tunnel 1449 Specifies whether ECN functionality is allowed to 1450 be used with Tunnel Encapsulation Mode. 1451 This affects tunnel encapsulation and decapsulation processing - 1452 see Section 9.2.1.3. 1454 RESERVED 0 1455 Allowed 1 1456 Forbidden 2 1458 Values 3-61439 are reserved to IANA. Values 61440-65535 are for 1459 private use. 1461 If unspecified, the default shall be assumed to be Forbidden. 1463 ECN Tunnel is a new SA attribute, and hence initiators that use it 1464 can expect to encounter responders that do not understand it, and 1465 therefore reject proposals containing it. For backwards 1466 compatibility with such implementations initiators SHOULD always also 1467 include a proposal without the ECN Tunnel attribute to enable such a 1468 responder to select a transform or proposal that does not contain the 1469 ECN Tunnel attribute. RFC 2407 currently requires responders to 1470 reject all proposals if any proposal contains an unknown attribute; 1471 this requirement is expected to be changed to require a responder not 1472 to select proposals or transforms containing unknown attributes. 1474 9.2.1.3. Changes to IPsec Tunnel Header Processing 1476 For full ECN support, the encapsulation and decapsulation processing 1477 for the IPv4 TOS field and the IPv6 Traffic Class field are changed 1478 from that specified in [RFC2401] to the following: 1480 <-- How Outer Hdr Relates to Inner Hdr --> 1481 Outer Hdr at Inner Hdr at 1482 IPv4 Encapsulator Decapsulator 1483 Header fields: -------------------- ------------ 1484 DS Field copied from inner hdr (5) no change 1485 ECN Field constructed (7) constructed (8) 1487 IPv6 1488 Header fields: 1489 DS Field copied from inner hdr (6) no change 1490 ECN Field constructed (7) constructed (8) 1492 (5)(6) If the packet will immediately enter a domain for which the 1493 DSCP value in the outer header is not appropriate, that value MUST 1494 be mapped to an appropriate value for the domain [RFC 2474]. Also 1495 see [RFC 2475] for further information. 1497 (7) If the value of the ECN Tunnel field in the SAD entry for this 1498 SA is "allowed" and the ECN field in the inner header is set to 1499 any value other than CE, copy this ECN field to the outer header. 1500 If the ECN field in the inner header is set to CE, then set the 1501 ECN field in the outer header to ECT(0). 1503 (8) If the value of the ECN tunnel field in the SAD entry for this 1504 SA is "allowed" and the ECN field in the inner header is set to 1505 ECT(0) or ECT(1) and the ECN field in the outer header is set to 1506 CE, then copy the ECN field from the outer header to the inner 1507 header. Otherwise, make no change to the ECN field in the inner 1508 header. 1510 (5) and (6) are identical to match usage in [RFC2401], although 1511 they are different in [RFC2401]. 1513 The above description applies to implementations that support the ECN 1514 Tunnel field in the SAD; such implementations MUST implement this 1515 processing instead of the processing of the IPv4 TOS octet and IPv6 1516 Traffic Class octet defined in [RFC2401]. This constitutes the full- 1517 functionality alternative for ECN usage with IPsec tunnels. 1519 An implementation that does not support the ECN Tunnel field in the 1520 SAD MUST implement this processing by assuming that the value of the 1521 ECN Tunnel field of the SAD is "forbidden" for every SA. In this 1522 case, the processing of the ECN field reduces to: 1524 (7) Set the ECN field to not-ECT in the outer header. 1525 (8) Make no change to the ECN field in the inner header. 1527 This constitutes the limited functionality alternative for ECN usage 1528 with IPsec tunnels. 1530 For backwards compatibility, packets with the CE codepoint set in the 1531 outer header SHOULD be dropped if they arrive on an SA that is using 1532 the limited-functionality option, or that is using the full- 1533 functionality option with the not-ECN codepoint set in the inner 1534 header. 1536 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 1538 If the ECN Field is changed inappropriately within an IPsec tunnel, 1539 and this change is detected at the tunnel egress, then the receipt of 1540 a packet not satisfying the appropriate condition for its SA is an 1541 auditable event. An implementation MAY create audit records with 1542 per-SA counts of incorrect packets over some time period rather than 1543 creating an audit record for each erroneous packet. Any such audit 1544 record SHOULD contain the headers from at least one erroneous packet, 1545 but need not contain the headers from every packet represented by the 1546 entry. 1548 9.2.3. Comments for IPsec Support 1550 Substantial comments were received on two areas of this document 1551 during review by the IPsec working group. This section describes 1552 these comments and explains why the proposed changes were not 1553 incorporated. 1555 The first comment indicated that per-node configuration is easier to 1556 implement than per-SA configuration. After serious thought and 1557 despite some initial encouragement of per-node configuration, it no 1558 longer seems to be a good idea. The concern is that as ECN-awareness 1559 is progressively deployed in IPsec, many ECN-aware IPsec 1560 implementations will find themselves communicating with a mixture of 1561 ECN-aware and ECN-unaware IPsec tunnel endpoints. In such an 1562 environment with per-node configuration, the only reasonable thing to 1563 do is forbid ECN usage for all IPsec tunnels, which is not the 1564 desired outcome. 1566 In the second area, several reviewers noted that SA negotiation is 1567 complex, and adding to it is non-trivial. One reviewer suggested 1568 using ICMP after tunnel setup as a possible alternative. The 1569 addition to SA negotiation in this document is OPTIONAL and will 1570 remain so; implementers are free to ignore it. The authors believe 1571 that the assurance it provides can be useful in a number of 1572 situations. In practice, if this is not implemented, it can be 1573 deleted at a subsequent stage in the standards process. Extending 1574 ICMP to negotiate ECN after tunnel setup is more complex than 1575 extending SA attribute negotiation. Some tunnels do not permit 1576 traffic to be addressed to the tunnel egress endpoint, hence the ICMP 1577 packet would have to be addressed to somewhere else, scanned for by 1578 the egress endpoint, and discarded there or at its actual 1579 destination. In addition, ICMP delivery is unreliable, and hence 1580 there is a possibility of an ICMP packet being dropped, entailing the 1581 invention of yet another ack/retransmit mechanism. It seems better 1582 simply to specify an OPTIONAL extension to the existing SA 1583 negotiation mechanism. 1585 9.3. IP packets encapsulated in non-IP packet headers. 1587 A different set of issues are raised, relative to ECN, when IP 1588 packets are encapsulated in tunnels with non-IP packet headers. This 1589 occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. 1590 For these protocols, there is no conflict with ECN; it is just that 1591 ECN cannot be used within the tunnel unless an ECN codepoint can be 1592 specified for the header of the encapsulating protocol. Earlier work 1593 considered a preliminary proposal for incorporating ECN into MPLS, 1594 and proposals for incorporating ECN into GRE, L2TP, or PPTP will be 1595 considered as the need arises. 1597 10. Issues Raised by Monitoring and Policing Devices 1599 One possibility is that monitoring and policing devices (or more 1600 informally, "penalty boxes") will be installed in the network to 1601 monitor whether best-effort flows are appropriately responding to 1602 congestion, and to preferentially drop packets from flows determined 1603 not to be using adequate end-to-end congestion control procedures. 1605 We recommend that any "penalty box" that detects a flow or an 1606 aggregate of flows that is not responding to end-to-end congestion 1607 control first change from marking to dropping packets from that flow, 1608 before taking any additional action to restrict the bandwidth 1609 available to that flow. Thus, initially, the router may drop packets 1610 in which the router would otherwise would have set the CE codepoint. 1611 This could include dropping those arriving packets for that flow that 1612 are ECN-Capable and that already have the CE codepoint set. In this 1613 way, any congestion indications seen by that router for that flow 1614 will be guaranteed to also be seen by the end nodes, even in the 1615 presence of malicious or broken routers elsewhere in the path. If we 1616 assume that the first action taken at any "penalty box" for an ECN- 1617 capable flow will be to drop packets instead of marking them, then 1618 there is no way that an adversary that subverts ECN-based end-to-end 1619 congestion control can cause a flow to be characterized as being non- 1620 cooperative and placed into a more severe action within the "penalty 1621 box". 1623 The monitoring and policing devices that are actually deployed could 1624 fall short of the `ideal' monitoring device described above, in that 1625 the monitoring is applied not to a single flow, but to an aggregate 1626 of flows (e.g., those sharing a single IPsec tunnel). In this case, 1627 the switch from marking to dropping would apply to all of the flows 1628 in that aggregate, denying the benefits of ECN to the other flows in 1629 the aggregate also. At the highest level of aggregation, another 1630 form of the disabling of ECN happens even in the absence of 1631 monitoring and policing devices, when ECN-Capable RED queues switch 1632 from marking to dropping packets as an indication of congestion when 1633 the average queue size has exceeded some threshold. 1635 11. Evaluations of ECN 1637 11.1. Related Work Evaluating ECN 1639 This section discusses some of the related work evaluating the use of 1640 ECN. The ECN Web Page [ECN] has pointers to other papers, as well as 1641 to implementations of ECN. 1643 [Floyd94] considers the advantages and drawbacks of adding ECN to the 1644 TCP/IP architecture. As shown in the simulation-based comparisons, 1645 one advantage of ECN is to avoid unnecessary packet drops for short 1646 or delay-sensitive TCP connections. A second advantage of ECN is in 1647 avoiding some unnecessary retransmit timeouts in TCP. This paper 1648 discusses in detail the integration of ECN into TCP's congestion 1649 control mechanisms. The possible disadvantages of ECN discussed in 1650 the paper are that a non-compliant TCP connection could falsely 1651 advertise itself as ECN-capable, and that a TCP ACK packet carrying 1652 an ECN-Echo message could itself be dropped in the network. The 1653 first of these two issues is discussed in the appendix of this 1654 document, and the second is addressed by the addition of the CWR flag 1655 in the TCP header. 1657 Experimental evaluations of ECN include [RFC2884,K98]. The 1658 conclusions of [K98] and [RFC2884] are that ECN TCP gets moderately 1659 better throughput than non-ECN TCP; that ECN TCP flows are fair 1660 towards non-ECN TCP flows; and that ECN TCP is robust with two-way 1661 traffic (with congestion in both directions) and with multiple 1662 congested gateways. Experiments with many short web transfers show 1663 that, while most of the short connections have similar transfer times 1664 with or without ECN, a small percentage of the short connections have 1665 very long transfer times for the non-ECN experiments as compared to 1666 the ECN experiments. 1668 11.2. A Discussion of the ECN nonce. 1670 The use of two ECT codepoints, ECT(0) and ECT(1), can provide a one- 1671 bit ECN nonce in packet headers [SCWA99]. The primary motivation for 1672 this is the desire to allow mechanisms for the data sender to verify 1673 that network elements are not erasing the CE codepoint, and that data 1674 receivers are properly reporting to the sender the receipt of packets 1675 with the CE codepoint set, as required by the transport protocol. 1676 This section discusses issues of backwards compatibility with IP ECN 1677 implementations in routers conformant with RFC 2481, in which only 1678 one ECT codepoint was defined. We do not believe that the 1679 incremental deployment of ECN implementations that understand the 1680 ECT(1) codepoint will cause significant operational problems. This 1681 is particularly likely to be the case when the deployment of the 1682 ECT(1) codepoint begins with routers, before the ECT(1) codepoint 1683 starts to be used by end-nodes. 1685 11.2.1. The Incremental Deployment of ECT(1) in Routers. 1687 ECN has been an Experimental standard since January 1999, and there 1688 are already implementations of ECN in routers that do not understand 1689 the ECT(1) codepoint. When the use of the ECT(1) codepoint is 1690 standardized for TCP or for other transport protocols, this could 1691 mean that a data sender is using the ECT(1) codepoint, but that this 1692 codepoint is not understood by a congested router on the path. 1694 If allowed by the transport protocol, a data sender would be free not 1695 to make use of ECT(1) at all, and to send all ECN-capable packets 1696 with the codepoint ECT(0). However, if an ECN-capable sender is 1697 using ECT(1), and the congested router on the path did not understand 1698 the ECT(1) codepoint, then the router would end up marking some of 1699 the ECT(0) packets, and dropping some of the ECT(1) packets, as 1700 indications of congestion. Since TCP is required to react to both 1701 marked and dropped packets, this behavior of dropping packets that 1702 could have been marked poses no significant threat to the network, 1703 and is consistent with the overall approach to ECN that allows 1704 routers to determine when and whether to mark packets as they see fit 1705 (see Section 5). 1707 12. Summary of changes required in IP and TCP 1709 This document specified two bits in the IP header to be used for ECN. 1710 The not-ECT codepoint indicates that the transport protocol will 1711 ignore the CE codepoint. This is the default value for the ECN 1712 codepoint. The ECT codepoints indicate that the transport protocol 1713 is willing and able to participate in ECN. 1715 The router sets the CE codepoint to indicate congestion to the end 1716 nodes. The CE codepoint in a packet header MUST NOT be reset by a 1717 router. 1719 TCP requires three changes for ECN, a setup phase and two new flags 1720 in the TCP header. The ECN-Echo flag is used by the data receiver to 1721 inform the data sender of a received CE packet. The Congestion 1722 Window Reduced (CWR) flag is used by the data sender to inform the 1723 data receiver that the congestion window has been reduced. 1725 When ECN (Explicit Congestion Notification [RFC2481]) is used, it is 1726 required that congestion indications generated within an IP tunnel 1727 not be lost at the tunnel egress. We specified a minor modification 1728 to the IP protocol's handling of the ECN field during encapsulation 1729 and de-capsulation to allow flows that will undergo IP tunneling to 1730 use ECN. 1732 Two options for ECN in tunnels were specified: 1733 1) A limited-functionality option that does not use ECN inside the IP 1734 tunnel, by setting the ECN field in the outer header to not-ECT, and 1735 not altering the inner header at the time of decapsulation. 1736 2) The full-functionality option, which sets the ECN field in the 1737 outer header to either not-ECT or to one of the ECT codepoints, 1738 depending on the ECN field in the inner header. At decapsulation, if 1739 the CE codepoint is set in the outer header, and the inner header is 1740 set to one of the ECT codepoints, then the CE codepoint is copied to 1741 the inner header. 1743 All IP tunnels MUST implement one of the two alternative approaches 1744 described above. For IPsec tunnels, this document also defines an 1745 optional IPsec Security Association (SA) attribute that enables 1746 negotiation of ECN usage within IPsec tunnels and an optional field 1747 in the Security Association Database to indicate whether ECN is 1748 permitted in tunnel mode on a SA. The required changes to IPsec 1749 tunnels for ECN usage modify RFC 2401 [RFC2401], which defines the 1750 IPsec architecture and specifies some aspects of its implementation. 1751 The new IPsec SA attribute is in addition to those already defined in 1752 Section 4.5 of [RFC2407]. 1754 This document is intended to obsolete RFC 2481, "A Proposal to add 1755 Explicit Congestion Notification (ECN) to IP", which defined ECN as 1756 an Experimental Protocol for the Internet Community. The rest of 1757 this section describes the relationship between this document and its 1758 predecessor. 1760 RFC 2481 included a brief discussion of the use of ECN with 1761 encapsulated packets, and noted that for the IPsec specifications at 1762 the time (January 1999), flows could not safely use ECN if they were 1763 to traverse IPsec tunnels. RFC 2481 also described the changes that 1764 could be made to IPsec tunnel specifications to made them compatible 1765 with ECN. 1767 This document also incorporates work that was done after RFC 2481, 1768 First was to describe the changes to IPsec tunnels in detail, and 1769 extensively discuss the security implications of ECN (now included as 1770 Sections 18 and 19 of this document). Second was to extend the 1771 discussion of IPsec tunnels to include all IP tunnels. Because older 1772 IP tunnels are not compatible with a flow's use of ECN, the 1773 deployment of ECN in the Internet will create strong pressure for 1774 older IP tunnels to be updated to an ECN-compatible version, using 1775 either the limited-functionality or the full-functionality option. 1777 This document does not address the issue of including ECN in non-IP 1778 tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary 1779 document about adding ECN support to MPLS was not advanced. 1781 A third new piece of work after RFC2481 was to describe the ECN 1782 procedure with retransmitted data packets, that an ECT codepoint 1783 should not be set on retransmitted data packets. The motivation for 1784 this additional specification is to eliminate a possible avenue for 1785 denial-of-service attacks on an existing TCP connection. Some prior 1786 deployments of ECN-capable TCP might not conform to the (new) 1787 requirement not to set an ECT codepoint on retransmitted packets; we 1788 do not believe this will cause significant problems in practice. 1790 This document also expands slightly on the specification of the use 1791 of SYN packets for the negotiation of ECN. While some prior 1792 deployments of ECN-capable TCP might not conform to the requirements 1793 specified in this document, we do not believe that this will lead to 1794 any performance or compatibility problems for TCP connections with a 1795 combination of TCP implementations at the endpoints. 1797 This document also includes the specification of the ECT(1) 1798 codepoint, which may be used by TCP as part of the implementation of 1799 an ECN nonce. 1801 13. Conclusions 1803 Given the current effort to implement AQM, we believe this is the 1804 right time to deploy congestion avoidance mechanisms that do not 1805 depend on packet drops alone. With the increased deployment of 1806 applications and transports sensitive to the delay and loss of a 1807 single packet (e.g., realtime traffic, short web transfers), 1808 depending on packet loss as a normal congestion notification 1809 mechanism appears to be insufficient (or at the very least, non- 1810 optimal). 1812 We examined the consequence of modifications of the ECN field within 1813 the network, analyzing all the opportunities for an adversary to 1814 change the ECN field. In many cases, the change to the ECN field is 1815 no worse than dropping a packet. However, we noted that some changes 1816 have the more serious consequence of subverting end-to-end congestion 1817 control. However, we point out that even then the potential damage 1818 is limited, and is similar to the threat posed by end-systems 1819 intentionally failing to cooperate with end-to-end congestion 1820 control. 1822 14. Acknowledgements 1824 Many people have made contributions to this work and this document, 1825 including many that we have not managed to directly acknowledge in 1826 this document. In addition, we would like to thank Kenjiro Cho for 1827 the proposal for the TCP mechanism for negotiating ECN-Capability, 1828 Kevin Fall for the proposal of the CWR bit, Steve Blake for material 1829 on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for 1830 discussions of ECN issues, and Steve Bellovin, Jim Bound, Brian 1831 Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern 1832 Paxson for discussions of security issues. We also thank the 1833 Internet End-to-End Research Group for ongoing discussions of these 1834 issues. 1836 Email discussions with a number of people, including Alexey 1837 Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed 1838 the issues raised by non-conformant equipment in the Internet that 1839 does not respond to TCP SYN packets with the ECE and CWR flags set. 1840 We thank Mark Handley, Jitentra Padhye, and others for discussions on 1841 the TCP initialization procedures. 1843 The discussion of ECN and IP tunnel considerations draws heavily on 1844 related discussions and documents from the Differentiated Services 1845 Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, 1846 for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen 1847 for proposing modifications to RFC 2407 that improve the usability of 1848 negotiating the ECN Tunnel SA attribute. 1850 We thank David Wetherall, David Ely, and Neil Spring for the proposal 1851 for the ECN nonce. We also thank Stefan Savage for discussions on 1852 this issue. We thank Bob Briscoe and Jon Crowcroft for raising the 1853 issue of fragmentation in IP, on alternate semantics for the fourth 1854 ECN codepoint, and several other topics. We thank Richard Wendland 1855 for feedback on several issues in the draft. 1857 15. References 1859 [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, 1860 November 1998. 1862 [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1863 Levels", BCP 14, RFC 2119, March 1997. 1865 [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". 1866 Reference for informational purposes only. 1868 [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", 1869 RFC 2406, November 1998. 1871 [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways 1872 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 1873 N.4, August 1993, p. 397-413. 1875 [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM 1876 Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. 1878 [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", 1879 URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- 1880 ecn. Reference for informational purposes only. 1882 [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End 1883 Congestion Control in the Internet", IEEE/ACM Transactions on 1884 Networking, August 1999. 1886 [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", 1887 SIGCOMM '97, September 1997. 1889 [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing 1890 Encapsulation (GRE), RFC 1701, October 1994. 1892 [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. 1893 ACM SIGCOMM '88, pp. 314-329. 1895 [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance 1896 Algorithm", Message to end2end-interest mailing list, April 1990. URL 1897 "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". 1899 [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) 1900 benefits for TCP", Master's thesis, UCLA, 1998, URL 1901 "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz". 1903 [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. 1904 Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. 1906 [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven 1907 Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. 1909 [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, 1910 Requirements for Traffic Engineering Over MPLS, RFC 2702, September 1911 1999. 1913 [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. 1914 and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, 1915 July 1999. 1917 [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1918 1981. 1920 [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1921 September 1981. 1923 [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the 1924 Internet Checksum", RFC 1141, January 1990. 1926 [RFC1349] Almquist, P., "Type of Service in the Internet Protocol 1927 Suite", RFC 1349, July 1992. 1929 [RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC 1930 1455, May 1993. 1932 [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic 1933 Routing Encapsulation (GRE), RFC 1701, October 1994. 1935 [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic 1936 Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. 1938 [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October 1939 1996. 1941 [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate 1942 Requirement Levels, RFC 2119, March 1997. 1944 [RFC2309] Braden, B., et al., "Recommendations on Queue Management 1945 and Congestion Avoidance in the Internet", RFC 2309, April 1998. 1947 [RFC2401] S. Kent and R. Atkinson, Security Architecture for the 1948 Internet Protocol, RFC 2401, November 1998. 1950 [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation 1951 for ISAKMP, RFC 2407, November 1998. 1953 [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, 1954 Internet Security Association and Key Management Protocol (ISAKMP), 1955 RFC 2409, November 1998. 1957 [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), 1958 RFC 2409, November 1998. 1960 [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition 1961 of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 1962 Headers", RFC 2474, December 1998. 1964 [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. 1965 Weiss, An Architecture for Differentiated Services, RFC 2475, 1966 December 1998. 1968 [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit 1969 Congestion Notification (ECN) to IP, RFC 2481, January 1999. 1971 [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", 1972 RFC 2581, April 1999. 1974 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation 1975 of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, 1976 July 2000. 1978 [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, 1979 October 2000. 1981 [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For 1982 Values In the Internet Protocol and Related Headers", RFC 2780, March 1983 2000. 1985 [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for 1986 Congestion Avoidance in Computer Networks", ACM Transactions on 1987 Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. 1989 [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom 1990 Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM 1991 Computer Communications Review, October 1999. 1993 16. Security Considerations 1995 Security considerations have been discussed in Sections 7, 8, 18, and 1996 19. 1998 17. IPv4 Header Checksum Recalculation 2000 IPv4 header checksum recalculation is an issue with some high-end 2001 router architectures using an output-buffered switch, since most if 2002 not all of the header manipulation is performed on the input side of 2003 the switch, while the ECN decision would need to be made local to the 2004 output buffer. This is not an issue for IPv6, since there is no IPv6 2005 header checksum. The IPv4 TOS octet is the last byte of a 16-bit 2006 half-word. 2008 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 2009 checksum after the TTL field is decremented. The incremental 2010 updating of the IPv4 checksum after the CE codepoint was set would 2011 work as follows: Let HC be the original header checksum for an ECT(0) 2012 packet, and let HC' be the new header checksum after the CE checksum 2013 has been set. That is, the ECN field has changed from '10' to '11'. 2014 Then for header checksums calculated with one's complement 2015 subtraction, HC' would be recalculated as follows: 2017 HC' = { HC - 1 HC > 1 2018 { 0x0000 HC = 1 2020 For header checksums calculated on two's complement machines, HC' would 2021 be recalculated as follows after the CE bit was set: 2023 HC' = { HC - 1 HC > 0 2024 { 0xFFFE HC = 0 2026 A similar incremental updating of the IPv4 checksum can be carried out 2027 when the ECN field is changed from ECT(1) to CE, that is, from '01' to 2028 '11'. 2030 18. Possible Changes to the ECN Field in the Network 2032 This section discusses in detail possible changes to the ECN field in 2033 the network, such as falsely reporting congestion, disabling ECN- 2034 Capability for an individual packet, erasing the ECN congestion 2035 indication, or falsely indicating ECN-Capability. 2037 18.1. Possible Changes to the IP Header 2039 18.1.1. Erasing the Congestion Indication 2041 First, we consider the changes that a router could make that would 2042 result in effectively erasing the congestion indication after it had 2043 been set by a router upstream. The convention followed is: 2044 ECN codepoint of received packet -> ECN codepoint of packet 2045 transmitted. 2047 Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint 2048 effectively erases the congestion indication. However, with the use 2049 of two ECT codepoints, a router erasing the CE codepoint has no way 2050 to know whether the original ECT codepoint was ECT(0) or ECT(1). 2051 Thus, it is possible for the transport protocol to deploy mechanisms 2052 to detect such erasures of the CE codepoint. 2054 The consequence of the erasure of the CE codepoint for the upstream 2055 router is that there is a potential for congestion to build for a 2056 time, because the congestion indication does not reach the source. 2057 However, the packet would be received and acknowledged. 2059 The potential effect of erasing the congestion indication is complex, 2060 and is discussed in depth in Section 19 below. Note that the effect 2061 of erasing the congestion indication is different from dropping a 2062 packet in the network. When a data packet is dropped, the drop is 2063 detected by the TCP sender, and interpreted as an indication of 2064 congestion. Similarly, if a sufficient number of consecutive 2065 acknowledgement packets are dropped, causing the cumulative 2066 acknowledgement field not to be advanced at the sender, the sender is 2067 limited by the congestion window from sending additional packets, and 2068 ultimately the retransmit timer expires. 2070 In contrast, a systematic erasure of the CE bit by a downstream 2071 router can have the effect of causing a queue buildup at an upstream 2072 router, including the possible loss of packets due to buffer 2073 overflow. There is a potential of unfairness in that another flow 2074 that goes through the congested router could react to the CE bit set 2075 while the flow that has the CE bit erased could see better 2076 performance. The limitations on this potential unfairness are 2077 discussed in more detail in Section 19 below. 2079 The last of the three changes is to replace the CE codepoint with the 2080 not-ECT codepoint. thus erasing the congestion indication and 2081 disabling ECN-Capability at the same time. 2083 The `erasure' of the congestion indication is only effective if the 2084 packet does not end up being marked or dropped again by a downstream 2085 router. If the CE codepoint is replaced by an ECT codepoint, the 2086 packet remains ECN-Capable, and could be either marked or dropped by 2087 a downstream router as an indication of congestion. If the CE 2088 codepoint is replaced by the not-ECT codepoint, the packet is no 2089 longer ECN-capable, and can therefore be dropped but not marked by a 2090 downstream router as an indication of congestion. 2092 18.1.2. Falsely Reporting Congestion 2094 This change is to set the CE codepoint when an ECT codepoint was 2095 already set, even though there was no congestion. This change does 2096 not affect the treatment of that packet along the rest of the path. 2097 In particular, a router does not examine the CE codepoint in deciding 2098 whether to drop or mark an arriving packet. 2100 However, this could result in the application unnecessarily invoking 2101 end-to-end congestion control, and reducing its arrival rate. By 2102 itself, this is no worse (for the application or for the network) 2103 than if the tampering router had actually dropped the packet. 2105 18.1.3. Disabling ECN-Capability 2107 This change is to turn off the ECT codepoint of a packet. This means 2108 that if the packet later encounters congestion (e.g., by arriving to 2109 a RED queue with a moderate average queue size), it will be dropped 2110 instead of being marked. By itself, this is no worse (for the 2111 application) than if the tampering router had actually dropped the 2112 packet. The saving grace in this particular case is that there is no 2113 congested router upstream expecting a reaction from setting the CE 2114 bit. 2116 18.1.4. Falsely Indicating ECN-Capability 2117 This change would incorrectly label a packet as ECN-Capable. The 2118 packet may have been sent either by an ECN-Capable transport or a 2119 transport that is not ECN-Capable. 2121 If the packet later encounters moderate congestion at an ECN-Capable 2122 router, the router could set the CE codepoint instead of dropping the 2123 packet. If the transport protocol in fact is not ECN-Capable, then 2124 the transport will never receive this indication of congestion, and 2125 will not reduce its sending rate in response. The potential 2126 consequences of falsely indicating ECN-capability are discussed 2127 further in Section 19 below. 2129 If the packet never later encounters congestion at an ECN-Capable 2130 router, then the first of these two changes would have no effect, 2131 other than possibly interfering with the use of the ECN nonce by the 2132 transport protocol. The last change, however, would have the effect 2133 of giving false reports of congestion to a monitoring device along 2134 the path. If the transport protocol is ECN-Capable, then this change 2135 could also have an effect at the transport level, by combining 2136 falsely indicating ECN-Capability with falsely reporting congestion. 2137 For an ECN-capable transport, this would cause the transport to 2138 unnecessarily react to congestion. In this particular case, the 2139 router that is incorrectly changing the ECN field could have dropped 2140 the packet. Thus for this case of an ECN-capable transport, the 2141 consequence of this change to the ECN field is no worse than dropping 2142 the packet. 2144 18.2. Information carried in the Transport Header 2146 For TCP, an ECN-capable TCP receiver informs its TCP peer that it is 2147 ECN-capable at the TCP level, conveying this information in the TCP 2148 header at the time the connection is setup. This document does not 2149 consider potential dangers introduced by changes in the transport 2150 header within the network. In the case of IPsec tunnels, the IPsec 2151 tunnel protects the transport header. 2153 Another issue concerns TCP packets with a spoofed IP source address 2154 carrying invalid ECN information in the transport header. For 2155 completeness, we examine here some possible ways that a node spoofing 2156 the IP source address of another node could use the two ECN flags in 2157 the TCP header to launch a denial-of-service attack. However, these 2158 attacks would require an ability for the attacker to use valid TCP 2159 sequence numbers, and any attacker with this ability and with the 2160 ability to spoof IP source addresses could damage the TCP connection 2161 without using the ECN flags. Therefore, ECN does not add any new 2162 vulnerabilities in this respect. 2164 An acknowledgement packet with a spoofed IP source address of the TCP 2165 data receiver could include the ECE bit set. If accepted by the TCP 2166 data sender as a valid packet, this spoofed acknowledgement packet 2167 could result in the TCP data sender unnecessarily halving its 2168 congestion window. However, to be accepted by the data sender, such 2169 a spoofed acknowledgement packet would have to have the correct 2170 32-bit sequence number as well as a valid acknowledgement number. An 2171 attacker that could successfully send such a spoofed acknowledgement 2172 packet could also send a spoofed RST packet, or do other equally 2173 damaging operations to the TCP connection. 2175 Packets with a spoofed IP source address of the TCP data sender could 2176 include the CWR bit set. Again, to be accepted, such a packet would 2177 have to have a valid sequence number. In addition, such a spoofed 2178 packet would have a limited performance impact. Spoofing a data 2179 packet with the CWR bit set could result in the TCP data receiver 2180 sending fewer ECE packets than it would otherwise, if the data 2181 receiver was sending ECE packets when it received the spoofed CWR 2182 packet. 2184 18.3. Split Paths 2186 In some cases, a malicious or broken router might have access to only 2187 a subset of the packets from a flow. The question is as follows: 2188 can this router, by altering the ECN field in this subset of the 2189 packets, do more damage to that flow than if it had simply dropped 2190 that set of packets? 2192 We will classify the packets in the flow as A packets and B packets, 2193 and assume that the adversary only has access to A packets. Assume 2194 that the adversary is subverting end-to-end congestion control along 2195 the path traveled by A packets only, by either falsely indicating 2196 ECN-Capability upstream of the point where congestion occurs, or 2197 erasing the congestion indication downstream. Consider also that 2198 there exists a monitoring device that sees both the A and B packets, 2199 and will "punish" both the A and B packets if the total flow is 2200 determined not to be properly responding to indications of 2201 congestion. Another key characteristic that we believe is likely to 2202 be true is that the monitoring device, before `punishing' the A&B 2203 flow, will first drop packets instead of setting the CE codepoint, 2204 and will drop arriving packets of that flow that already have the CE 2205 codepoint set. If the end nodes are in fact using end-to-end 2206 congestion control, they will see all of the indications of 2207 congestion seen by the monitoring device, and will begin to respond 2208 to these indications of congestion. Thus, the monitoring device is 2209 successful in providing the indications to the flow at an early 2210 stage. 2212 It is true that the adversary that has access only to the A packets 2213 might, by subverting ECN-based congestion control, be able to deny 2214 the benefits of ECN to the other packets in the A&B aggregate. While 2215 this is unfortunate, this is not a reason to disable ECN within an 2216 IPsec tunnel. 2218 A variant of falsely reporting congestion occurs when there are two 2219 adversaries along a path, where the first adversary falsely reports 2220 congestion, and the second adversary `erases' those reports. (Unlike 2221 packet drops, ECN congestion reports can be `reversed' later in the 2222 network by a malicious or broken router. However, the use of the ECN 2223 nonce could help the transport to detect this behavior.) While this 2224 would be transparent to the end node, it is possible that a 2225 monitoring device between the first and second adversaries would see 2226 the false indications of congestion. Keep in mind our recommendation 2227 in this document, that before `punishing' a flow for not responding 2228 appropriately to congestion, the router will first switch to dropping 2229 rather than marking as an indication of congestion, for that flow. 2230 When this includes dropping arriving packets from that flow that have 2231 the CE codepoint set, this ensures that these indications of 2232 congestion are being seen by the end nodes. Thus, there is no 2233 additional harm that we are able to postulate as a result of multiple 2234 conflicting adversaries. 2236 19. Implications of Subverting End-to-End Congestion Control 2238 This section focuses on the potential repercussions of subverting 2239 end-to-end congestion control by either falsely indicating ECN- 2240 Capability, or by erasing the congestion indication in ECN (the CE 2241 codepoint). Subverting end-to-end congestion control by either of 2242 these two methods can have consequences both for the application and 2243 for the network. We discuss these separately below. 2245 The first method to subvert end-to-end congestion control, that of 2246 falsely indicating ECN-Capability, effectively subverts end-to-end 2247 congestion control only if the packet later encounters congestion 2248 that results in the setting of the CE codepoint. In this case, the 2249 transport protocol (which may not be ECN-capable) does not receive 2250 the indication of congestion from these downstream congested routers. 2252 The second method to subvert end-to-end congestion control, `erasing' 2253 the CE codepoint in a packet, effectively subverts end-to-end 2254 congestion control only when the CE codepoint in the packet was set 2255 earlier by a congested router. In this case, the transport protocol 2256 does not receive the indication of congestion from the upstream 2257 congested routers. 2259 Either of these two methods of subverting end-to-end congestion 2260 control can potentially introduce more damage to the network (and 2261 possibly to the flow itself) than if the adversary had simply dropped 2262 packets from that flow. However, as we discuss later in this section 2263 and in Section 7, this potential damage is limited. 2265 19.1. Implications for the Network and for Competing Flows 2267 The CE codepoint of the ECN field is only used by routers as an 2268 indication of congestion during periods of *moderate* congestion. 2269 ECN-capable routers should drop rather than mark packets during heavy 2270 congestion even if the router's queue is not yet full. For example, 2271 for routers using active queue management based on RED, the router 2272 should drop rather than mark packets that arrive while the average 2273 queue sizes exceed the RED queue's maximum threshold. 2275 One consequence for the network of subverting end-to-end congestion 2276 control is that flows that do not receive the congestion indications 2277 from the network might increase their sending rate until they drive 2278 the network into heavier congestion. Then, the congested router 2279 could begin to drop rather than mark arriving packets. For flows 2280 that are not isolated by some form of per-flow scheduling or other 2281 per-flow mechanisms, but are instead aggregated with other flows in a 2282 single queue in an undifferentiated fashion, this packet-dropping at 2283 the congested router would apply to all flows that share that queue. 2284 Thus, the consequences would be to increase the level of congestion 2285 in the network. 2287 In some cases, the increase in the level of congestion will lead to a 2288 substantial buffer buildup at the congested queue that will be 2289 sufficient to drive the congested queue from the packet-marking to 2290 the packet-dropping regime. This transition could occur either 2291 because of buffer overflow, or because of the active queue management 2292 policy described above that drops packets when the average queue is 2293 above RED's maximum threshold. At this point, all flows, including 2294 the subverted flow, will begin to see packet drops instead of packet 2295 marks, and a malicious or broken router will no longer be able to 2296 `erase' these indications of congestion in the network. If the end 2297 nodes are deploying appropriate end-to-end congestion control, then 2298 the subverted flow will reduce its arrival rate in response to 2299 congestion. When the level of congestion is sufficiently reduced, 2300 the congested queue can return from the packet-dropping regime to the 2301 packet-marking regime. The steady-state pattern could be one of the 2302 congested queue oscillating between these two regimes. 2304 In other cases, the consequences of subverting end-to-end congestion 2305 control will not be severe enough to drive the congested link into 2306 sufficiently-heavy congestion that packets are dropped instead of 2307 being marked. In this case, the implications for competing flows in 2308 the network will be a slightly-increased rate of packet marking or 2309 dropping, and a corresponding decrease in the bandwidth available to 2310 those flows. This can be a stable state if the arrival rate of the 2311 subverted flow is sufficiently small, relative to the link bandwidth, 2312 that the average queue size at the congested router remains under 2313 control. In particular, the subverted flow could have a limited 2314 bandwidth demand on the link at this router, while still getting more 2315 than its "fair" share of the link. This limited demand could be due 2316 to a limited demand from the data source; a limitation from the TCP 2317 advertised window; a lower-bandwidth access pipe; or other factors. 2318 Thus the subversion of ECN-based congestion control can still lead to 2319 unfairness, which we believe is appropriate to note here. 2321 The threat to the network posed by the subversion of ECN-based 2322 congestion control in the network is essentially the same as the 2323 threat posed by an end-system that intentionally fails to cooperate 2324 with end-to-end congestion control. The deployment of mechanisms in 2325 routers to address this threat is an open research question, and is 2326 discussed further in Section 10. 2328 Let us take the example described in Section 18.1.1, where the CE 2329 codepoint that was set in a packet is erased: {'11' -> '10' or '11' 2330 -> '01'}. The consequence for the congested upstream router that set 2331 the CE codepoint is that this congestion indication does not reach 2332 the end nodes for that flow. The source (even one which is completely 2333 cooperative and not malicious) is thus allowed to continue to 2334 increase its sending rate (if it is a TCP flow, by increasing its 2335 congestion window). The flow potentially achieves better throughput 2336 than the other flows that also share the congested router, especially 2337 if there are no policing mechanisms or per-flow queueing mechanisms 2338 at that router. Consider the behavior of the other flows, especially 2339 if they are cooperative: that is, the flows that do not experience 2340 subverted end-to-end congestion control. They are likely to reduce 2341 their load (e.g., by reducing their window size) on the congested 2342 router, thus benefiting our subverted flow. This results in 2343 unfairness. As we discussed above, this unfairness could either be 2344 transient (because the congested queue is driven into the packet- 2345 marking regime), oscillatory (because the congested queue oscillates 2346 between the packet marking and the packet dropping regime), or more 2347 moderate but a persistent stable state (because the congested queue 2348 is never driven to the packet dropping regime). 2350 The results would be similar if the subverted flow was intentionally 2351 avoiding end-to-end congestion control. One difference is that a 2352 flow that is intentionally avoiding end-to-end congestion control at 2353 the end nodes can avoid end-to-end congestion control even when the 2354 congested queue is in packet-dropping mode, by refusing to reduce its 2355 sending rate in response to packet drops in the network. Thus the 2356 problems for the network from the subversion of ECN-based congestion 2357 control are less severe than the problems caused by the intentional 2358 avoidance of end-to-end congestion control in the end nodes. It is 2359 also the case that it is considerably more difficult to control the 2360 behavior of the end nodes than it is to control the behavior of the 2361 infrastructure itself. This is not to say that the problems for the 2362 network posed by the network's subversion of ECN-based congestion 2363 control are small; just that they are dwarfed by the problems for the 2364 network posed by the subversion of either ECN-based or other 2365 currently known packet-based congestion control mechanisms by the end 2366 nodes. 2368 19.2. Implications for the Subverted Flow 2370 When a source indicates that it is ECN-capable, there is an 2371 expectation that the routers in the network that are capable of 2372 participating in ECN will use the CE codepoint for indication of 2373 congestion. There is the potential benefit of using ECN in reducing 2374 the amount of packet loss (in addition to the reduced queueing delays 2375 because of active queue management policies). When the packet flows 2376 through a tunnel where the nodes that the tunneled packets traverse 2377 are untrusted in some way, the expectation is that IPsec will protect 2378 the flow from subversion that results in undesirable consequences. 2380 In many cases, a subverted flow will benefit from the subversion of 2381 end-to-end congestion control for that flow in the network, by 2382 receiving more bandwidth than it would have otherwise, relative to 2383 competing non-subverted flows. If the congested queue reaches the 2384 packet-dropping stage, then the subversion of end-to-end congestion 2385 control might or might not be of overall benefit to the subverted 2386 flow, depending on that flow's relative tradeoffs between throughput, 2387 loss, and delay. 2389 One form of subverting end-to-end congestion control is to falsely 2390 indicate ECN-capability by setting the ECT codepoint. This has the 2391 consequence of downstream congested routers setting the CE codepoint 2392 in vain. However, as described in Section 9.1.2, if an ECT codepoint 2393 is changed in an IP tunnel, this can be detected at the egress point 2394 of the tunnel, as long as the inner header was not changed within the 2395 tunnel. 2397 The second form of subverting end-to-end congestion control is to 2398 erase the congestion indication by erasing the CE codepoint. In this 2399 case, it is the upstream congested routers that set the CE codepoint 2400 in vain. 2402 If an ECT codepoint is erased within an IP tunnel, then this can be 2403 detected at the egress point of the tunnel, as long as the inner 2404 header was not changed within the tunnel. If the CE codepoint is set 2405 upstream of the IP tunnel, then any erasure of the outer header's CE 2406 codepoint within the tunnel will have no effect because the inner 2407 header preserves the set value of the CE codepoint. However, if the 2408 CE codepoint is set within the tunnel, and erased either within or 2409 downstream of the tunnel, this is not necessarily detected at the 2410 egress point of the tunnel. 2412 With this subversion of end-to-end congestion control, an end-system 2413 transport does not respond to the congestion indication. Along with 2414 the increased unfairness for the non-subverted flows described in the 2415 previous section, the congested router's queue could continue to 2416 build, resulting in packet loss at the congested router - which is a 2417 means for indicating congestion to the transport in any case. In the 2418 interim, the flow might experience higher queueing delays, possibly 2419 along with an increased bandwidth relative to other non-subverted 2420 flows. But transports do not inherently make assumptions of 2421 consistently experiencing carefully managed queueing in the path. We 2422 believe that these forms of subverting end-to-end congestion control 2423 are no worse for the subverted flow than if the adversary had simply 2424 dropped the packets of that flow itself. 2426 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 2428 We have shown that, in many cases, a malicious or broken router that 2429 is able to change the bits in the ECN field can do no more damage 2430 than if it had simply dropped the packet in question. However, this 2431 is not true in all cases, in particular in the cases where the broken 2432 router subverted end-to-end congestion control by either falsely 2433 indicating ECN-Capability or by erasing the ECN congestion indication 2434 (in the CE codepoint). While there are many ways that a router can 2435 harm a flow by dropping packets, a router cannot subvert end-to-end 2436 congestion control by dropping packets. As an example, a router 2437 cannot subvert TCP congestion control by dropping data packets, 2438 acknowledgement packets, or control packets. 2440 Even though packet-dropping cannot be used to subvert end-to-end 2441 congestion control, there *are* non-ECN-based methods for subverting 2442 end-to-end congestion control that a broken or malicious router could 2443 use. For example, a broken router could duplicate data packets, thus 2444 effectively negating the effects of end-to-end congestion control 2445 along some portion of the path. (For a router that duplicated 2446 packets within an IPsec tunnel, the security administrator can cause 2447 the duplicate packets to be discarded by configuring anti-replay 2448 protection for the tunnel.) This duplication of packets within the 2449 network would have similar implications for the network and for the 2450 subverted flow as those described in Sections 18.1.1 and 18.1.4 2451 above. 2453 20. The Motivation for the ECT Codepoints. 2455 20.1. The Motivation for an ECT Codepoint. 2457 The need for an ECT codepoint is motivated by the fact that ECN will 2458 be deployed incrementally in an Internet where some transport 2459 protocols and routers understand ECN and some do not. With an ECT 2460 codepoint, the router can drop packets from flows that are not ECN- 2461 capable, but can *instead* set the CE codepoint in packets that *are* 2462 ECN-capable. Because an ECT codepoint allows an end node to have the 2463 CE codepoint set in a packet *instead* of having the packet dropped, 2464 an end node might have some incentive to deploy ECN. 2466 If there was no ECT codepoint, then the router would have to set the 2467 CE codepoint for packets from both ECN-capable and non-ECN-capable 2468 flows. In this case, there would be no incentive for end-nodes to 2469 deploy ECN, and no viable path of incremental deployment from a non- 2470 ECN world to an ECN-capable world. Consider the first stages of such 2471 an incremental deployment, where a subset of the flows are ECN- 2472 capable. At the onset of congestion, when the packet 2473 dropping/marking rate would be low, routers would only set CE 2474 codepoints, rather than dropping packets. However, only those flows 2475 that are ECN-capable would understand and respond to CE packets. The 2476 result is that the ECN-capable flows would back off, and the non-ECN- 2477 capable flows would be unaware of the ECN signals and would continue 2478 to open their congestion windows. 2480 In this case, there are two possible outcomes: (1) the ECN-capable 2481 flows back off, the non-ECN-capable flows get all of the bandwidth, 2482 and congestion remains mild, or (2) the ECN-capable flows back off, 2483 the non-ECN-capable flows don't, and congestion increases until the 2484 router transitions from setting the CE codepoint to dropping packets. 2485 While this second outcome evens out the fairness, the ECN-capable 2486 flows would still receive little benefit from being ECN-capable, 2487 because the increased congestion would drive the router to packet- 2488 dropping behavior. 2490 A flow that advertised itself as ECN-Capable but does not respond to 2491 CE codepoints is functionally equivalent to a flow that turns off 2492 congestion control, as discussed earlier in this document. 2494 Thus, in a world when a subset of the flows are ECN-capable, but 2495 where ECN-capable flows have no mechanism for indicating that fact to 2496 the routers, there would be less effective and less fair congestion 2497 control in the Internet, resulting in a strong incentive for end 2498 nodes not to deploy ECN. 2500 20.2. The Motivation for two ECT Codepoints. 2502 The primary motivation for the two ECT codepoints is to provide a 2503 one-bit ECN nonce. The ECN nonce allows the development of 2504 mechanisms for the sender to probabilistically verify that network 2505 elements are not erasing the CE codepoint, and that data receivers 2506 are properly reporting to the sender the receipt of packets with the 2507 CE codepoint set. 2509 Another possibility for senders to detect misbehaving network 2510 elements or receivers would be for the data sender to occasionally 2511 send a data packet with the CE codepoint set, to see if the receiver 2512 reports receiving the CE codepoint. Of course, if these packets 2513 encountered congestion in the network, the router might make no 2514 change in the packets, because the CE codepoint would already be set. 2515 Thus, for packets sent with the CE codepoint set, the TCP end-nodes 2516 could not determine if some router intended to set the CE codepoint 2517 in these packets. For this reason, sending packets with the CE 2518 codepoint would have to be done sparingly, and would be a less 2519 effective check against misbehaving network elements and receivers 2520 than would be the ECN nonce. 2522 The assignment of the fourth ECN codepoint to ECT(1) precludes the 2523 use of this codepoint for other purposes. For clarity, we briefly 2524 list those possible purposes here. 2526 One possibility might have been for the data sender to use the fourth 2527 ECN codepoint to indicate an alternate semantics for ECN. However, 2528 this seems to us more appropriate to be signalled using a 2529 differentiated services codepoint in the DS field. 2531 A second possible use for the fourth ECN codepoint would have been to 2532 give the router two separate codepoints for the indication of 2533 congestion, CE(0) and CE(1), for mild and severe congestion 2534 respectively. While this could be useful in some cases, this 2535 certainly does not seem a compelling requirement at this point. If 2536 there was judged to be a compelling need for this, the complications 2537 of incremental deployment would most likely necessitate more that 2538 just one codepoint for this function. 2540 A third use that has been informally proposed for the ECN codepoint 2541 is for use in some forms of multicast congestion control, based on 2542 randomized procedures for duplicating marked packets at routers. 2543 Some proposed multicast packet duplication procedures are based on a 2544 new ECN codepoint that (1) conveys the fact that congestion occurred 2545 upstream of the duplication point that marked the packet with this 2546 codepoint and (2) can detect congestion downstream of that 2547 duplication point. ECT(1) can serve this purpose because it is both 2548 distinct from ECT(0) and is replaced by CE when ECN marking occurs in 2549 response to congestion or incipient congestion. Explanation of how 2550 this enhanced version of ECN would be used by multicast congestion 2551 control is beyond the scope of this document, as are ECN-aware 2552 multicast packet duplication procedures and the processing of the ECN 2553 field at multicast receivers in all cases (i.e., irrespective of the 2554 multicast packet duplication procedure(s) used). 2556 The specification of IP tunnel modifications for ECN in this document 2557 assumes that the only change made to the outer IP header's ECN field 2558 between tunnel endpoints is to set the CE codepoint to indicate 2559 congestion. This is not consistent with some of the proposed uses of 2560 ECT(1) by the multicast duplication procedures in the previous 2561 paragraph, and such procedures SHOULD NOT be deployed within tunnels 2562 configured for full ECN functionality. Limited ECN functionality may 2563 be used instead, although in practice many tunnel protocols 2564 (including IPsec) will not work correctly if multicast traffic 2565 duplication occurs within the tunnel 2567 21. Why use Two Bits in the IP Header? 2569 Given the need for an ECT indication in the IP header, there still 2570 remains the question of whether the ECT (ECN-Capable Transport) and 2571 CE (Congestion Experienced) codepoints should have been overloaded on 2572 a single bit. This overloaded-one-bit alternative, explored in 2573 [Floyd94], would have involved a single bit with two values. One 2574 value, "ECT and not CE", would represent an ECN-Capable Transport, 2575 and the other value, "CE or not ECT", would represent either 2576 Congestion Experienced or a non-ECN-Capable transport. 2578 One difference between the one-bit and two-bit implementations 2579 concerns packets that traverse multiple congested routers. Consider 2580 a CE packet that arrives at a second congested router, and is 2581 selected by the active queue management at that router for either 2582 marking or dropping. In the one-bit implementation, the second 2583 congested router has no choice but to drop the CE packet, because it 2584 cannot distinguish between a CE packet and a non-ECT packet. In the 2585 two-bit implementation, the second congested router has the choice of 2586 either dropping the CE packet, or of leaving it alone with the CE 2587 codepoint set. 2589 Another difference between the one-bit and two-bit implementations 2590 comes from the fact that with the one-bit implementation, receivers 2591 in a single flow cannot distinguish between CE and non-ECT packets. 2592 Thus, in the one-bit implementation an ECN-capable data sender would 2593 have to unambiguously indicate to the receiver or receivers whether 2594 each packet had been sent as ECN-Capable or as non-ECN-Capable. One 2595 possibility would be for the sender to indicate in the transport 2596 header whether the packet was sent as ECN-Capable. A second 2597 possibility that would involve a functional limitation for the one- 2598 bit implementation would be for the sender to unambiguously indicate 2599 that it was going to send *all* of its packets as ECN-Capable or as 2600 non-ECN-Capable. For a multicast transport protocol, this 2601 unambiguous indication would have to be apparent to receivers joining 2602 an on-going multicast session. 2604 Another concern that was described earlier (and recommended in this 2605 document) is that transports (particularly TCP) should not mark pure 2606 ACK packets or retransmitted packets as being ECN-Capable. A pure 2607 ACK packet from a non-ECN-capable transport could be dropped, without 2608 necessarily having an impact on the transport from a congestion 2609 control perspective (because subsequent ACKs are cumulative). An 2610 ECN-capable transport reacting to the CE codepoint in a pure ACK 2611 packet by reducing the window would be at a disadvantage in 2612 comparison to a non-ECN-capable transport. For this reason (and for 2613 reasons described earlier in relation to retransmitted packets), it 2614 is desirable to have the ECT codepoint set on a per-packet basis. 2616 Another advantage of the two-bit approach is that it is somewhat more 2617 robust. The most critical issue, discussed in Section 8, is that the 2618 default indication should be that of a non-ECN-Capable transport. In 2619 a two-bit implementation, this requirement for the default value 2620 simply means that the non-ECT codepoint should be the default. In 2621 the one-bit implementation, this means that the single overloaded bit 2622 should by default be in the "CE or not ECT" position. This is less 2623 clear and straightforward, and possibly more open to incorrect 2624 implementations either in the end nodes or in the routers. 2626 In summary, while the one-bit implementation could be a possible 2627 implementation, it has the following significant limitations relative 2628 to the two-bit implementation. First, the one-bit implementation has 2629 more limited functionality for the treatment of CE packets at a 2630 second congested router. Second, the one-bit implementation requires 2631 either that extra information be carried in the transport header of 2632 packets from ECN-Capable flows (to convey the functionality of the 2633 second bit elsewhere, namely in the transport header), or that 2634 senders in ECN-Capable flows accept the limitation that receivers 2635 must be able to determine a priori which packets are ECN-Capable and 2636 which are not ECN-Capable. Third, the one-bit implementation is 2637 possibly more open to errors from faulty implementations that choose 2638 the wrong default value for the ECN bit. We believe that the use of 2639 the extra bit in the IP header for the ECT-bit is extremely valuable 2640 to overcome these limitations. 2642 22. Historical Definitions for the IPv4 TOS Octet 2644 RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP 2645 header. In RFC 791, bits 6 and 7 of the ToS octet are listed as 2646 "Reserved for Future Use", and are shown set to zero. The first two 2647 fields of the ToS octet were defined as the Precedence and Type of 2648 Service (TOS) fields. 2650 0 1 2 3 4 5 6 7 2651 +-----+-----+-----+-----+-----+-----+-----+-----+ 2652 | PRECEDENCE | TOS | 0 | 0 | RFC 791 2653 +-----+-----+-----+-----+-----+-----+-----+-----+ 2655 RFC 1122 included bits 6 and 7 in the TOS field, though it did not 2656 discuss any specific use for those two bits: 2658 0 1 2 3 4 5 6 7 2659 +-----+-----+-----+-----+-----+-----+-----+-----+ 2660 | PRECEDENCE | TOS | RFC 1122 2661 +-----+-----+-----+-----+-----+-----+-----+-----+ 2663 The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: 2665 0 1 2 3 4 5 6 7 2666 +-----+-----+-----+-----+-----+-----+-----+-----+ 2667 | PRECEDENCE | TOS | MBZ | RFC 1349 2668 +-----+-----+-----+-----+-----+-----+-----+-----+ 2670 Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary 2671 Cost". In addition to the Precedence and Type of Service (TOS) 2672 fields, the last field, MBZ (for "must be zero") was defined as 2673 currently unused. RFC 1349 stated that "The originator of a datagram 2674 sets [the MBZ] field to zero (unless participating in an Internet 2675 protocol experiment which makes use of that bit)." 2677 RFC 1455 [RFC 1455] defined an experimental standard that used all 2678 four bits in the TOS field to request a guaranteed level of link 2679 security. 2681 RFC 1349 and RFC 1455 have been obsoleted by "Definition of the 2682 Differentiated Services Field (DS Field) in the IPv4 and IPv6 2683 Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed 2684 as Currently Unused (CU). RFC 2780 [RFC2780] specified ECN as an 2685 experimental use of the two-bit CU field. RFC 2780 updated the 2686 definition of the DS Field to only encompass the first six bits of 2687 this octet rather than all eight bits; these first six bits are 2688 defined as the Differentiated Services CodePoint (DSCP): 2690 0 1 2 3 4 5 6 7 2691 +-----+-----+-----+-----+-----+-----+-----+-----+ 2692 | DSCP | CU | RFCs 2474, 2693 2780 2694 +-----+-----+-----+-----+-----+-----+-----+-----+ 2696 Because of this unstable history, the definition of the ECN field in 2697 this document cannot be guaranteed to be backwards compatible with 2698 all past uses of these two bits. 2700 Prior to RFC 2474, routers were not permitted to modify bits in 2701 either the DSCP or ECN field of packets forwarded through them, and 2702 hence routers that comply only with RFCs prior to 2474 should have no 2703 effect on ECN. For end nodes, bit 7 (the second ECN bit) must be 2704 transmitted as zero for any implementation compliant only with RFCs 2705 prior to 2474. Such nodes may transmit bit 6 (the first ECN bit) as 2706 one for the "Minimize Monetary Cost" provision of RFC 1349 or the 2707 experiment authorized by RFC 1455; neither this aspect of RFC 1349 2708 nor the experiment in RFC 1455 were widely implemented or used. The 2709 damage that could be done by a broken, non-conformant router would 2710 include "erasing" the CE codepoint for an ECN-capable packet that 2711 arrived at the router with the CE codepoint set, or setting the CE 2712 codepoint even in the absence of congestion. This has been discussed 2713 in the section on "Non-compliance in the Network". 2715 The damage that could be done in an ECN-capable environment by a non- 2716 ECN-capable end-node transmitting packets with the ECT codepoint set 2717 has been discussed in the section on "Non-compliance by the End 2718 Nodes". 2720 23. IANA Considerations 2722 The codepoints for the ECN Field of the IP header and the bits for 2723 CWR and ECE in the TCP header are specified by the Standards Action 2724 of this RFC, as is required by RFC 2780. 2726 IANA allocated the IPSEC Security Association Attribute value 10 for 2727 the ECN Tunnel use described in Section 9.2.1.2 above at the request 2728 of David Black in November 1999. If this draft is approved for 2729 publication as an RFC, IANA should change the Reference for this 2730 allocation from David Black's request to this RFC based on its RFC 2731 number. 2733 AUTHORS' ADDRESSES 2735 K. K. Ramakrishnan 2736 TeraOptic Networks, Inc. 2737 Phone: +1 (408) 666-8650 2738 Email: kk@teraoptic.com 2740 Sally Floyd 2741 Phone: +1 (510) 666-2989 2742 ACIRI 2743 Email: floyd@aciri.org 2744 URL: http://www.aciri.org/floyd/ 2746 David L. Black 2747 EMC Corporation 2748 42 South St. 2749 Hopkinton, MA 01748 2750 Phone: +1 (508) 435-1000 x75140 2751 Email: black_david@emc.com 2753 This draft was created in March 2001. 2754 It expires September 2001.