idnits 2.17.00 (12 Aug 2021) /tmp/idnits22471/draft-ietf-tsvwg-ecn-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 58 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 59 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2475' is mentioned on line 1479, but not defined == Missing Reference: 'RFC 2001' is mentioned on line 575, but not defined ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581) == Missing Reference: 'RFC 2983' is mentioned on line 1086, but not defined == Missing Reference: 'RFC 2474' is mentioned on line 1478, but not defined == Missing Reference: 'RFC 1455' is mentioned on line 2661, but not defined ** Obsolete undefined reference: RFC 1455 (Obsoleted by RFC 2474) == Unused Reference: 'FRED' is defined on line 1870, but no explicit reference was found in the text == Unused Reference: 'RFC1455' is defined on line 1913, but no explicit reference was found in the text == Unused Reference: 'RFC1701' is defined on line 1916, but no explicit reference was found in the text == Unused Reference: 'RFC1702' is defined on line 1919, but no explicit reference was found in the text == Unused Reference: 'RFC 2119' is defined on line 1925, but no explicit reference was found in the text == Unused Reference: 'RFC2408' is defined on line 1937, but no explicit reference was found in the text == Unused Reference: 'RFC2409' is defined on line 1941, but no explicit reference was found in the text == Unused Reference: 'RFC2475' is defined on line 1948, but no explicit reference was found in the text == Unused Reference: 'RFC2983' is defined on line 1962, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2402 (ref. 'AH') (Obsoleted by RFC 4302, RFC 4305) -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN' ** Obsolete normative reference: RFC 2406 (ref. 'ESP') (Obsoleted by RFC 4303, RFC 4305) -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF99' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED' ** Downref: Normative reference to an Informational RFC: RFC 1701 (ref. 'GRE') -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90' -- Possible downref: Non-RFC (?) normative reference: ref. 'K98' -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96' ** Downref: Normative reference to an Informational RFC: RFC 2702 (ref. 'MPLS') ** Downref: Normative reference to an Informational RFC: RFC 2637 (ref. 'PPTP') ** Downref: Normative reference to an Informational RFC: RFC 1141 ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474) ** Obsolete normative reference: RFC 1455 (Obsoleted by RFC 2474) -- Duplicate reference: RFC1701, mentioned in 'RFC1701', was also mentioned in 'GRE'. ** Downref: Normative reference to an Informational RFC: RFC 1701 ** Downref: Normative reference to an Informational RFC: RFC 1702 -- Duplicate reference: RFC2119, mentioned in 'RFC 2119', was also mentioned in 'B97'. ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301) ** Obsolete normative reference: RFC 2407 (Obsoleted by RFC 4306) ** Obsolete normative reference: RFC 2409 (ref. 'RFC2408') (Obsoleted by RFC 4306) -- Duplicate reference: RFC2409, mentioned in 'RFC2409', was also mentioned in 'RFC2408'. ** Obsolete normative reference: RFC 2409 (Obsoleted by RFC 4306) ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Downref: Normative reference to an Informational RFC: RFC 2884 ** Downref: Normative reference to an Informational RFC: RFC 2983 -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90' -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99' Summary: 25 errors (**), 0 flaws (~~), 18 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force K. K. Ramakrishnan 3 INTERNET DRAFT TeraOptic Networks 4 draft-ietf-tsvwg-ecn-02.txt Sally Floyd 5 ACIRI 6 D. Black 7 EMC 8 February, 2001 9 Expires: August, 2001 11 The Addition of Explicit Congestion Notification (ECN) to IP 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 This document specifies the incorporation of ECN (Explicit Congestion 37 Notification) to TCP and IP, including ECN's use of two bits in the 38 IP header. We begin by describing TCP's use of packet drops as an 39 indication of congestion. Next we explain that with the addition of 40 active queue management (e.g., RED) to the Internet infrastructure, 41 where routers detect congestion before the queue overflows, routers 42 are no longer limited to packet drops as an indication of congestion. 43 Routers can instead set the Congestion Experienced (CE) codepoint in 44 the IP header of packets from ECN-capable transports. We describe 45 when the CE codepoint is to be set in routers, and describe 46 modifications needed to TCP to make it ECN-capable. Modifications to 47 other transport protocols (e.g., unreliable unicast or multicast, 48 reliable multicast, other reliable unicast transport protocols) could 49 be considered as those protocols are developed and advance through 50 the standards process. 52 We also describe in this document the issues involving the use of ECN 53 within IP tunnels, and within IPsec tunnels in particular. 55 One of the guiding principles for this document is that all the 56 mechanisms specified here are incrementally deployable. 58 Table of Contents 59 1. Introduction 60 2. Conventions and Acronyms 61 3. Assumptions and General Principles 62 4. Active Queue Management (AQM) 63 5. Explicit Congestion Notification in IP 64 5.1. ECN as an Indication of Persistent Congestion 65 5.2. Dropped or Corrupted Packets 66 5.3. Fragmentation 67 6. Support from the Transport Protocol 68 6.1. TCP 69 6.1.1 TCP Initialization 70 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 71 6.1.2. The TCP Sender 72 6.1.3. The TCP Receiver 73 6.1.4. Congestion on the ACK-path 74 6.1.5. Retransmitted TCP packets 75 6.1.6. TCP Window Probes. 76 7. Non-compliance by the End Nodes 77 8. Non-compliance in the Network 78 8.1. Complications Introduced by Split Paths 79 9. Encapsulated Packets 80 9.1. IP packets encapsulated in IP 81 9.1.1. The Limited-functionality and Full-functionality Options 82 9.1.2. Changes to the ECN Field within an IP Tunnel. 83 9.2. IPsec Tunnels 84 9.2.1. Negotiation between Tunnel Endpoints 85 9.2.1.1. ECN Tunnel Security Association Database Field 86 9.2.1.2. ECN Tunnel Security Association Attribute 87 9.2.1.3. Changes to IPsec Tunnel Header Processing 88 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 89 9.2.3. Comments for IPsec Support 90 9.3. IP packets encapsulated in non-IP packet headers. 91 10. Issues Raised by Monitoring and Policing Devices 92 11. Evaluations of ECN 93 11.1. Related Work Evaluating ECN 94 11.2. A Discussion of the ECN nonce. 95 11.2.1. The Incremental Deployment of ECT(1) in Routers. 96 12. Summary of changes required in IP and TCP 97 13. Conclusions 98 14. Acknowledgements 99 15. References 100 16. Security Considerations 101 17. IPv4 Header Checksum Recalculation 102 18. Possible Changes to the ECN Field in the Network 103 18.1. Possible Changes to the IP Header 104 18.1.1. Erasing the Congestion Indication 105 18.1.2. Falsely Reporting Congestion 106 18.1.3. Disabling ECN-Capability 107 18.1.4. Falsely Indicating ECN-Capability 108 18.2. Information carried in the Transport Header 109 18.3. Split Paths 110 19. Implications of Subverting End-to-End Congestion Control 111 19.1. Implications for the Network and for Competing Flows 112 19.2. Implications for the Subverted Flow 113 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 114 20. The Motivation for the ECT Codepoints. 115 20.1. The Motivation for an ECT Codepoint. 116 20.2. The Motivation for two ECT Codepoints. 117 21. Why use Two Bits in the IP Header? 118 22. Historical Definitions for the IPv4 TOS Octet 119 23. IANA Considerations 121 RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare 122 this with draft-ietf-tsvwg-ecn-01, compare the following: 123 "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff" 124 "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-02.troff" 125 Changes from draft-ietf-tsvwg-ecn-01: 126 Added the ECT(1) codepoint, and changed references about bits to 127 references about codepoints in many places. Also added Section 11.2 on 128 "A Discussion of the ECN nonce", and Section 20.2 on "The Motivation for 129 two ECT Codepoints". 130 Added a paragraph saying that by default, the discussion of setting 131 the CE codepoint applies to all Differentiated Services Per-Hop 132 Behaviors. 133 Added Section 5.3 on fragmentation. 134 Added "A host MUST NOT set ECT on SYN or SYN-ACK packets." to the end 135 of Section 6.1.1, just to be explicit. 136 Corrected some references to "Section 19" to "Section 22". 137 Clarified that ECN is defined identically in IPv4 and in IPv6. 139 1. Introduction 141 TCP's congestion control and avoidance algorithms are based on the 142 notion that the network is a black-box [Jacobson88, Jacobson90]. The 143 network's state of congestion or otherwise is determined by end- 144 systems probing for the network state, by gradually increasing the 145 load on the network (by increasing the window of packets that are 146 outstanding in the network) until the network becomes congested and a 147 packet is lost. Treating the network as a "black-box" and treating 148 loss as an indication of congestion in the network is appropriate for 149 pure best-effort data carried by TCP, with little or no sensitivity 150 to delay or loss of individual packets. In addition, TCP's 151 congestion management algorithms have techniques built-in (such as 152 Fast Retransmit and Fast Recovery) to minimize the impact of losses, 153 from a throughput perspective. However, these mechanisms are not 154 intended to help applications that are in fact sensitive to the delay 155 or loss of one or more individual packets. Interactive traffic such 156 as telnet, web-browsing, and transfer of audio and video data can be 157 sensitive to packet losses (especially when using an unreliable data 158 delivery transport such as UDP) or to the increased latency of the 159 packet caused by the need to retransmit the packet after a loss (with 160 the reliable data delivery semantics provided by TCP). 162 Since TCP determines the appropriate congestion window to use by 163 gradually increasing the window size until it experiences a dropped 164 packet, this causes the queues at the bottleneck router to build up. 165 With most packet drop policies at the router that are not sensitive 166 to the load placed by each individual flow (e.g., tail-drop on queue 167 overflow), this means that some of the packets of latency-sensitive 168 flows may be dropped. In addition, such drop policies lead to 169 synchronization of loss across multiple flows. 171 Active queue management mechanisms detect congestion before the queue 172 overflows, and provide an indication of this congestion to the end 173 nodes. Thus, active queue management can reduce unnecessary queueing 174 delay for all traffic sharing that queue. The advantages of active 175 queue management are discussed in RFC 2309 [RFC2309]. Active queue 176 management avoids some of the bad properties of dropping on queue 177 overflow, including the undesirable synchronization of loss across 178 multiple flows. More importantly, active queue management means that 179 transport protocols with mechanisms for congestion control (e.g., 180 TCP) do not have to rely on buffer overflow as the only indication of 181 congestion. 183 Active queue management mechanisms may use one of several methods for 184 indicating congestion to end-nodes. One is to use packet drops, as is 185 currently done. However, active queue management allows the router to 186 separate policies of queueing or dropping packets from the policies 187 for indicating congestion. Thus, active queue management allows 188 routers to use the Congestion Experienced (CE) codepoint in a packet 189 header as an indication of congestion, instead of relying solely on 190 packet drops. This has the potential of reducing the impact of loss 191 on latency-sensitive flows. 193 This document is intended to obsolete RFC 2481, "A Proposal to add 194 Explicit Congestion Notification (ECN) to IP", which defined ECN as 195 an Experimental Protocol for the Internet Community. 197 RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This 198 document obsoletes three subsequent internet-drafts on ECN, "IPsec 199 Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP 200 with ECN: The Treatment of Retransmitted Data Packets". This 201 document is intended largely to merge the earlier documents all into 202 a single document, for greater clarity, in preparation to becoming a 203 Proposed Standard. 205 2. Conventions and Acronyms 207 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 208 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 209 document, are to be interpreted as described in [B97]. 211 3. Assumptions and General Principles 213 In this section, we describe some of the important design principles 214 and assumptions that guided the design choices in this proposal. 216 * Because ECN is likely to be adopted gradually, accommodating 217 migration is essential. Some routers may still only drop packets to 218 indicate congestion, and some end-systems may not be ECN-capable. The 219 most viable strategy is one that accommodates incremental deployment 220 without having to resort to "islands" of ECN-capable and non-ECN- 221 capable environments. 222 * New mechanisms for congestion control and avoidance need to co- 223 exist and cooperate with existing mechanisms for congestion control. 224 In particular, new mechanisms have to co-exist with TCP's current 225 methods of adapting to congestion and with routers' current practice 226 of dropping packets in periods of congestion. 227 * Congestion may persist over different time-scales. The time scales 228 that we are concerned with are congestion events that may last longer 229 than a round-trip time. 230 * The number of packets in an individual flow (e.g., TCP connection 231 or an exchange using UDP) may range from a small number of packets to 232 quite a large number. We are interested in managing the congestion 233 caused by flows that send enough packets so that they are still 234 active when network feedback reaches them. 235 * Asymmetric routing is likely to be a normal occurrence in the 236 Internet. The path (sequence of links and routers) followed by data 237 packets may be different from the path followed by the acknowledgment 238 packets in the reverse direction. 239 * Many routers process the "regular" headers in IP packets more 240 efficiently than they process the header information in IP options. 241 This suggests keeping congestion experienced information in the 242 regular headers of an IP packet. 243 * It must be recognized that not all end-systems will cooperate in 244 mechanisms for congestion control. However, new mechanisms shouldn't 245 make it easier for TCP applications to disable TCP congestion 246 control. The benefit of lying about participating in new mechanisms 247 such as ECN-capability should be small. 249 4. Active Queue Management (AQM) 251 Random Early Detection (RED) is one mechanism for Active Queue 252 Management (AQM) that has been proposed to detect incipient 253 congestion [FJ93], and is currently being deployed in the Internet 254 [RFC2309]. AQM is meant to be a general mechanism using one of 255 several alternatives for congestion indication, but in the absence of 256 ECN, AQM is restricted to using packet drops as a mechanism for 257 congestion indication. AQM drops packets based on the average queue 258 length exceeding a threshold, rather than only when the queue 259 overflows. However, because AQM may drop packets before the queue 260 actually overflows, AQM is not always forced by memory limitations to 261 discard the packet. 263 AQM can set a Congestion Experienced (CE) codepoint in the packet 264 header instead of dropping the packet, when such a field is provided 265 in the IP header and understood by the transport protocol. The use 266 of the CE codepoint with ECN allows the receiver(s) to receive the 267 packet, avoiding the potential for excessive delays due to 268 retransmissions after packet losses. We use the term 'CE packet' to 269 denote a packet that has the CE codepoint set. 271 5. Explicit Congestion Notification in IP 273 This document specifies that the Internet provide a congestion 274 indication for incipient congestion (as in RED and earlier work 275 [RJ90]) where the notification can sometimes be through marking 276 packets rather than dropping them. This uses an ECN field in the IP 277 header with two bits, making four ECN codepoints, '00' to '11'. The 278 ECN-Capable Transport (ECT) codepoints '10' and '01' are set by the 279 data sender to indicate that the end-points of the transport protocol 280 are ECN-capable; we call them ECT(0) and ECT(1) respectively. The 281 phrase "the ECT codepoint" in this documents refers to either of the 282 two ECT codepoints. Routers treat the ECT(0) and ECT(1) codepoints 283 as equivalent. Senders are free to use either the ECT(0) or the 284 ECT(1) codepoint to indicate ECT, on a packet-by-packet basis. 286 The use of both the two codepoints for ECT, ECT(0) and ECT(1), is 287 motivated primarily by the desire to allow mechanisms for the data 288 sender to verify that network elements are not erasing the CE 289 codepoint, and that data receivers are properly reporting to the 290 sender the receipt of packets with the CE codepoint set, as required 291 by the transport protocol. Guidelines for the senders and receivers 292 to differentiate between the ECT(0) and ECT(1) codepoints will be 293 addressed in separate documents, for each transport protocol. In 294 particular, this document does not address mechanisms for TCP end- 295 nodes to differentiate between the ECT(0) and ECT(1) codepoints. 297 Protocols and senders that only require a single ECT codepoint SHOULD 298 use ECT(0). 300 The not-ECT codepoint '00' indicates a packet that is not using ECN. 301 The CE codepoint '11' is set by a router to indicate congestion to 302 the end nodes. Routers that have a packet arriving at a full queue 303 drop the packet, just as they do in the absence of ECN. 305 +-----+-----+ 306 | ECN FIELD | 307 +-----+-----+ 308 ECT CE The ECT and CE bits defined in RFC 2481. 309 0 0 Not-ECT 310 0 1 ECT(1) 311 1 0 ECT(0) 312 1 1 CE 314 Figure 1: The ECN Field in IP. 316 The use of two ECT codepoints essentially gives a one-bit ECN nonce 317 in packet headers, and routers necessarily "erase" the nonce when 318 they set the CE codepoint [SCWA99]. For example, routers that erased 319 the CE codepoint would face additional difficulty in reconstructing 320 the original nonce, and thus repeated erasure of the CE codepoint 321 would be more likely to be detected by the end-nodes. The ECN nonce 322 also can address the problem of misbehaving transport receivers lying 323 to the transport sender about whether or not the CE codepoint was set 324 in a packet. The motivations for the use of two ECT codepoints is 325 discussed in more detail in Section 20, along with some discussion of 326 alternate possibilities for the fourth ECT codepoint. Backwards 327 compatibility with earlier ECN implementations that do not understand 328 the ECT(1) codepoint is discussed in Section 11. 330 In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable 331 Transport (ECT) bit and the CE bit. The ECN field with only the ECN- 332 Capable Transport (ECT) bit set in RFC 2481 corresponds to the ECT(0) 333 codepoint in this document, and the ECN field with both the ECT and 334 CE bit in RFC 2481 corresponds to the CE codepoint in this document. 335 The '01' codepoint was left undefined in RFC 2481, and this is the 336 reason for recommending the use of ECT(0) when only a single ECT 337 codepoint is needed. 339 0 1 2 3 4 5 6 7 340 +-----+-----+-----+-----+-----+-----+-----+-----+ 341 | DS FIELD, DSCP | ECN FIELD | 342 +-----+-----+-----+-----+-----+-----+-----+-----+ 344 DSCP: differentiated services codepoint 345 ECN: Explicit Congestion Notification 347 Figure 2: The Differentiated Services and ECN Fields in IP. 349 Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. 350 The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6, 351 and the ECN field is defined identically in both cases. The 352 definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic 353 Class octet have been superseded by the six-bit DS (Differentiated 354 Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in 355 [RFC2474] as Currently Unused, and are specified in RFC 2780 as 356 approved for experimental use for ECN. Section 22 gives a brief 357 history of the TOS octet. 359 Because of the unstable history of the TOS octet, the use of the ECN 360 field as specified in this document cannot be guaranteed to be 361 backwards compatible with those past uses of these two bits that pre- 362 date ECN. The potential dangers of this lack of backwards 363 compatibility are discussed in Section 22. 365 Upon the receipt by an ECN-Capable transport of a single CE packet, 366 the congestion control algorithms followed at the end-systems MUST be 367 essentially the same as the congestion control response to a *single* 368 dropped packet. For example, for ECN-Capable TCP the source TCP is 369 required to halve its congestion window for any window of data 370 containing either a packet drop or an ECN indication. 372 One reason for requiring that the congestion-control response to the 373 CE packet be essentially the same as the response to a dropped packet 374 is to accommodate the incremental deployment of ECN in both end- 375 systems and in routers. Some routers may drop ECN-Capable packets 376 (e.g., using the same AQM policies for congestion detection) while 377 other routers set the CE codepoint, for equivalent levels of 378 congestion. Similarly, a router might drop a non-ECN-Capable packet 379 but set the CE codepoint in an ECN-Capable packet, for equivalent 380 levels of congestion. If there were different congestion control 381 responses to a CE codepoint than to a packet drop, this could result 382 in unfair treatment for different flows. 384 An additional goal is that the end-systems should react to congestion 385 at most once per window of data (i.e., at most once per round-trip 386 time), to avoid reacting multiple times to multiple indications of 387 congestion within a round-trip time. 389 For a router, the CE codepoint of an ECN-Capable packet SHOULD only 390 be set if the router would otherwise have dropped the packet as an 391 indication of congestion to the end nodes. When the router's buffer 392 is not yet full and the router is prepared to drop a packet to inform 393 end nodes of incipient congestion, the router should first check to 394 see if the ECT codepoint is set in that packet's IP header. If so, 395 then instead of dropping the packet, the router MAY instead set the 396 CE codepoint in the IP header. 398 An environment where all end nodes were ECN-Capable could allow new 399 criteria to be developed for setting the CE codepoint, and new 400 congestion control mechanisms for end-node reaction to CE packets. 401 However, this is a research issue, and as such is not addressed in 402 this document. 404 When a CE packet (i.e., a packet that has the CE codepoint set) is 405 received by a router, the CE codepoint is left unchanged, and the 406 packet is transmitted as usual. When severe congestion has occurred 407 and the router's queue is full, then the router has no choice but to 408 drop some packet when a new packet arrives. We anticipate that such 409 packet losses will become relatively infrequent when a majority of 410 end-systems become ECN-Capable and participate in TCP or other 411 compatible congestion control mechanisms. In an ECN-Capable 412 environment that is adequately-provisioned, packet losses should 413 occur primarily during transients or in the presence of non- 414 cooperating sources. 416 The above discussion of when CE may be set instead of dropping a 417 packet applies by default to all Differentiated Services Per-Hop 418 Behaviors (PHBs) [RFC 2475]. Specifications for PHBs MAY provide 419 more specifics on how a compliant implementation is to choose between 420 setting CE and dropping a packet, but this is NOT REQUIRED. A router 421 MUST NOT set CE instead of dropping a packet when the drop that would 422 occur is caused by reasons other than congestion or the desire to 423 indicate incipient congestion to end nodes (e.g., a diffserv edge 424 node may be configured to unconditionally drop certain classes of 425 traffic to prevent them from entering its diffserv domain). 427 We expect that routers will set the CE codepoint in response to 428 incipient congestion as indicated by the average queue size, using 429 the RED algorithms suggested in [FJ93, RFC2309]. To the best of our 430 knowledge, this is the only proposal currently under discussion in 431 the IETF for routers to drop packets proactively, before the buffer 432 overflows. However, this document does not attempt to specify a 433 particular mechanism for active queue management, leaving that 434 endeavor, if needed, to other areas of the IETF. While ECN is 435 inextricably tied up with the need to have a reasonable active queue 436 management mechanism at the router, the reverse does not hold; active 437 queue management mechanisms have been developed and deployed 438 independent of ECN, using packet drops as indications of congestion 439 in the absence of ECN in the IP architecture. 441 5.1. ECN as an Indication of Persistent Congestion 443 We emphasize that a *single* packet with the CE codepoint set in an 444 IP packet causes the transport layer to respond, in terms of 445 congestion control, as it would to a packet drop. The instantaneous 446 queue size is likely to see considerable variations even when the 447 router does not experience persistent congestion. As such, it is 448 important that transient congestion at a router, reflected by the 449 instantaneous queue size reaching a threshold much smaller than the 450 capacity of the queue, not trigger a reaction at the transport layer. 451 Therefore, the CE codepoint should not be set by a router based on 452 the instantaneous queue size. 454 For example, since the ATM and Frame Relay mechanisms for congestion 455 indication have typically been defined without an associated notion 456 of average queue size as the basis for determining that an 457 intermediate node is congested, we believe that they provide a very 458 noisy signal. The TCP-sender reaction specified in this document for 459 ECN is NOT the appropriate reaction for such a noisy signal of 460 congestion notification. However, if the routers that interface to 461 the ATM network have a way of maintaining the average queue at the 462 interface, and use it to come to a reliable determination that the 463 ATM subnet is congested, they may use the ECN notification that is 464 defined here. 466 We continue to encourage experiments in techniques at layer 2 (e.g., 467 in ATM switches or Frame Relay switches) to take advantage of ECN. 468 For example, using a scheme such as RED (where packet marking is 469 based on the average queue length exceeding a threshold), layer 2 470 devices could provide a reasonably reliable indication of congestion. 471 When all the layer 2 devices in a path set that layer's own 472 Congestion Experienced codepoint (e.g., the EFCI bit for ATM, the 473 FECN bit in Frame Relay) in this reliable manner, then the interface 474 router to the layer 2 network could copy the state of that layer 2 475 Congestion Experienced codepoint into the CE codepoint in the IP 476 header. We recognize that this is not the current practice, nor is 477 it in current standards. However, encouraging experimentation in this 478 manner may provide the information needed to enable evolution of 479 existing layer 2 mechanisms to provide a more reliable means of 480 congestion indication, when they use a single bit for indicating 481 congestion. 483 5.2. Dropped or Corrupted Packets 485 For the proposed use for ECN in this document (that is, for a 486 transport protocol such as TCP for which a dropped data packet is an 487 indication of congestion), end nodes detect dropped data packets, and 488 the congestion response of the end nodes to a dropped data packet is 489 at least as strong as the congestion response to a received CE 490 packet. To ensure the reliable delivery of the congestion indication 491 of the CE codepoint, an ECT codepoint MUST NOT be set in a packet 492 unless the loss of that packet in the network would be detected by 493 the end nodes and interpreted as an indication of congestion. 495 Transport protocols such as TCP do not necessarily detect all packet 496 drops, such as the drop of a "pure" ACK packet; for example, TCP does 497 not reduce the arrival rate of subsequent ACK packets in response to 498 an earlier dropped ACK packet. Any proposal for extending ECN- 499 Capability to such packets would have to address issues such as the 500 case of an ACK packet that was marked with the CE codepoint but was 501 later dropped in the network. We believe that this aspect is still 502 the subject of research, so this document specifies that at this 503 time, "pure" ACK packets MUST NOT indicate ECN-Capability. 505 Similarly, if a CE packet is dropped later in the network due to 506 corruption (bit errors), the end nodes should still invoke congestion 507 control, just as TCP would today in response to a dropped data 508 packet. This issue of corrupted CE packets would have to be 509 considered in any proposal for the network to distinguish between 510 packets dropped due to corruption, and packets dropped due to 511 congestion or buffer overflow. In particular, the ubiquitous 512 deployment of ECN would not, in and of itself, be a sufficient 513 development to allow end-nodes to interpret packet drops as 514 indications of corruption rather than congestion. 516 5.3. Fragmentation 518 All ECN-capable packets SHOULD have the DF (Don't Fragment) bit set. 519 Reassembly of a fragmented packet MUST NOT lose indications of 520 congestion. In other words, if any fragment of an IP packet to be 521 reassembled has the CE codepoint set, then one of two actions MUST be 522 taken: 523 * The reassembled packet has the CE codepoint set. This MUST NOT 524 occur if any of the other fragments contributing to this 525 reassembly carries the Not-ECT codepoint. 526 * The packet is dropped instead of being reassmembled. 527 If both actions are applicable, either MAY be chosen. Reassembly of 528 a fragmented packet MUST NOT change the ECN codepoint when all of the 529 fragments carry the same codepoint. 531 Situations may arise in which the above specification is 532 insufficiently precise. For example, it does not place requirements 533 on reassembly of fragments that carry a mixture of ECT(0), ECT(1) 534 and/or Not-ECT. In situations where more precise reassembly behavior 535 would be required, protocol specifications SHOULD instead specify 536 that DF MUST be set in all packets sent by the protocol. 538 6. Support from the Transport Protocol 540 ECN requires support from the transport protocol, in addition to the 541 functionality given by the ECN field in the IP packet header. The 542 transport protocol might require negotiation between the endpoints 543 during setup to determine that all of the endpoints are ECN-capable, 544 so that the sender can set the ECT codepoint in transmitted packets. 545 Second, the transport protocol must be capable of reacting 546 appropriately to the receipt of CE packets. This reaction could be 547 in the form of the data receiver informing the data sender of the 548 received CE packet (e.g., TCP), of the data receiver unsubscribing to 549 a layered multicast group (e.g., RLM [MJV96]), or of some other 550 action that ultimately reduces the arrival rate of that flow on that 551 congested link. CE packets indicate persistent rather than transient 552 congestion (see Section 5.1), and hence reactions to the receipt of 553 CE packets should be those appropriate for persistent congestion. 555 This document only addresses the addition of ECN Capability to TCP, 556 leaving issues of ECN in other transport protocols to further 557 research. For TCP, ECN requires three new pieces of functionality: 558 negotiation between the endpoints during connection setup to 559 determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the 560 TCP header so that the data receiver can inform the data sender when 561 a CE packet has been received; and a Congestion Window Reduced (CWR) 562 flag in the TCP header so that the data sender can inform the data 563 receiver that the congestion window has been reduced. The support 564 required from other transport protocols is likely to be different, 565 particularly for unreliable or reliable multicast transport 566 protocols, and will have to be determined as other transport 567 protocols are brought to the IETF for standardization. 569 6.1. TCP 571 The following sections describe in detail the proposed use of ECN in 572 TCP. This proposal is described in essentially the same form in 573 [Floyd94]. We assume that the source TCP uses the standard congestion 574 control algorithms of Slow-start, Fast Retransmit and Fast Recovery 575 [RFC 2001]. 577 This proposal specifies two new flags in the Reserved field of the 578 TCP header. The TCP mechanism for negotiating ECN-Capability uses 579 the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved 580 field of the TCP header is designated as the ECN-Echo flag. The 581 location of the 6-bit Reserved field in the TCP header is shown in 582 Figure 4 of RFC 793 [RFC793] (and is reproduced below for 583 completeness). This specification of the ECN Field leaves the 584 Reserved field as a 4-bit field using bits 4-7. 586 To enable the TCP receiver to determine when to stop setting the ECN- 587 Echo flag, we introduce a second new flag in the TCP header, the CWR 588 flag. The CWR flag is assigned to Bit 8 in the Reserved field of the 589 TCP header. 591 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 592 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 593 | | | U | A | P | R | S | F | 594 | Header Length | Reserved | R | C | S | S | Y | I | 595 | | | G | K | H | T | N | N | 596 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 598 Figure 3: The old definition of bytes 13 and 14 of the TCP 599 header. 601 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 602 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 603 | | | C | E | U | A | P | R | S | F | 604 | Header Length | Reserved | W | C | R | C | S | S | Y | I | 605 | | | R | E | G | K | H | T | N | N | 606 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 608 Figure 4: The new definition of bytes 13 and 14 of the TCP 609 Header. 611 Thus, ECN uses the ECT and CE flags in the IP header (as shown in 612 Figure 1) for signaling between routers and connection endpoints, and 613 uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure 614 4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, 615 a typical sequence of events in an ECN-based reaction to congestion 616 is as follows: 617 * An ECT codepoint is set in packets transmitted by the sender to 618 indicate that ECN is supported by the transport entities for these 619 packets. 620 * An ECN-capable router detects impending congestion and detects 621 that an ECT codepoint is set in the packet it is about to drop. 622 Instead of dropping the packet, the router chooses to set the CE 623 codepoint in the IP header and forwards the packet. 624 * The receiver receives the packet with the CE codepoint set, and 625 sets the ECN-Echo flag in its next TCP ACK sent to the sender. 626 * The sender receives the TCP ACK with ECN-Echo set, and reacts to 627 the congestion as if a packet had been dropped. 628 * The sender sets the CWR flag in the TCP header of the next 629 packet sent to the receiver to acknowledge its receipt of and 630 reaction to the ECN-Echo flag. 632 The negotiation for using ECN by the TCP transport entities and the 633 use of the ECN-Echo and CWR flags is described in more detail in the 634 sections below. 636 6.1.1 TCP Initialization 638 In the TCP connection setup phase, the source and destination TCPs 639 exchange information about their willingness to use ECN. Subsequent 640 to the completion of this negotiation, the TCP sender sets an ECT 641 codepoint in the IP header of data packets to indicate to the network 642 that the transport is capable and willing to participate in ECN for 643 this packet. This indicates to the routers that they may mark this 644 packet with the CE codepoint, if they would like to use that as a 645 method of congestion notification. If the TCP connection does not 646 wish to use ECN notification for a particular packet, the sending TCP 647 sets the ECN codepoint to not-ECT, and the TCP receiver ignores the 648 CE codepoint in the received packet. 650 For this discussion, we designate the initiating host as Host A and 651 the responding host as Host B. We call a SYN packet with the ECE and 652 CWR flags set an "ECN-setup SYN packet", and we call a SYN packet 653 with at least one of the ECE and CWR flags not set a "non-ECN-setup 654 SYN packet". Similarly, we call a SYN-ACK packet with only the ECE 655 flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and 656 we call a SYN-ACK packet with any other configuration of the ECE and 657 CWR flags a "non-ECN-setup SYN-ACK packet". 659 Before a TCP connection can use ECN, Host A sends an ECN-setup SYN 660 packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN 661 packet, the setting of both ECE and CWR in the ECN-setup SYN packet 662 is defined as an indication that the sending TCP is ECN-Capable, 663 rather than as an indication of congestion or of response to 664 congestion. More precisely, an ECN-setup SYN packet indicates that 665 the TCP implementation transmitting the SYN packet will participate 666 in ECN as both a sender and receiver. Specifically, as a receiver, 667 it will respond to incoming data packets that have the CE codepoint 668 set in the IP header by setting ECE in outgoing TCP Acknowledgement 669 (ACK) packets. As a sender, it will respond to incoming packets that 670 have ECE set by reducing the congestion window and setting CWR when 671 appropriate. An ECN-setup SYN packet does not commit the TCP sender 672 to setting the ECT codepoint in any or all of the packets it may 673 transmit. However, the commitment to respond appropriately to 674 incoming packets with the CE codepoint set remains even if the TCP 675 sender in a later transmission, within this TCP connection, sends a 676 SYN packet without ECE and CWR set. 678 When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag 679 but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an 680 indication that the TCP transmitting the SYN-ACK packet is ECN- 681 Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does 682 not commit the TCP host to setting the ECT codepoint in transmitted 683 packets. 685 The following rules apply to the sending of ECN-setup packets: 687 * If a host has received an ECN-setup SYN packet, then it MAY send an 688 ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup 689 SYN-ACK packet. 690 * A host MUST NOT set ECT on data packets unless it has sent at least 691 one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at 692 least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no 693 non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has 694 received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK 695 packet, then it SHOULD NOT set ECT on data packets. 696 * If a host ever sets the ECT codepoint on a data packet, then that 697 host MUST correctly set/clear the CWR TCP bit on all subsequent 698 packets in the connection. 699 * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK 700 packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- 701 ACK packet, then if that host receives TCP data packets with ECT and 702 CE codepoints set in the IP header, then that host MUST process these 703 packets as specified for an ECN-capable connection. 704 * A host that is not willing to use ECN on a TCP connection SHOULD 705 clear both the ECE and CWR flags in all non-ECN-setup SYN and/or SYN- 706 ACK packets that it sends to indicate this unwillingness. Receivers 707 MUST correctly handle all forms of the non-ECN-setup SYN and SYN-ACK 708 packets. 709 * A host MUST NOT set ECT on SYN or SYN-ACK packets. 711 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 713 There is the question of why we chose to have the TCP sending the SYN 714 set two ECN-related flags in the Reserved field of the TCP header for 715 the SYN packet, while the responding TCP sending the SYN-ACK sets 716 only one ECN-related flag in the SYN-ACK packet. This asymmetry is 717 necessary for the robust negotiation of ECN-capability with some 718 deployed TCP implementations. There exists at least one faulty TCP 719 implementation in which TCP receivers set the Reserved field of the 720 TCP header in ACK packets (and hence the SYN-ACK) simply to reflect 721 the Reserved field of the TCP header in the received data packet. 722 Because the TCP SYN packet sets the ECN-Echo and CWR flags to 723 indicate ECN-capability, while the SYN-ACK packet sets only the ECN- 724 Echo flag, the sending TCP correctly interprets a receiver's 725 reflection of its own flags in the Reserved field as an indication 726 that the receiver is not ECN-capable. The sending TCP is not mislead 727 by a faulty TCP implementation sending a SYN-ACK packet that simply 728 reflects the Reserved field of the incoming SYN packet. 730 6.1.2. The TCP Sender 732 For a TCP connection using ECN, new data packets are transmitted with 733 an ECT codepoint set in the IP header. When only one ECT codepoint 734 is needed by a sender for all packets sent on a TCP connection, 735 ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK 736 packet (that is, an ACK packet with the ECN-Echo flag set in the TCP 737 header), then the sender knows that congestion was encountered in the 738 network on the path from the sender to the receiver. The indication 739 of congestion should be treated just as a congestion loss in non-ECN- 740 Capable TCP. That is, the TCP source halves the congestion window 741 "cwnd" and reduces the slow start threshold "ssthresh". The sending 742 TCP SHOULD NOT increase the congestion window in response to the 743 receipt of an ECN-Echo ACK packet. 745 TCP should not react to congestion indications more than once every 746 window of data (or more loosely, more than once every round-trip 747 time). That is, the TCP sender's congestion window should be reduced 748 only once in response to a series of dropped and/or CE packets from a 749 single window of data. In addition, the TCP source should not 750 decrease the slow-start threshold, ssthresh, if it has been decreased 751 within the last round trip time. However, if any retransmitted 752 packets are dropped, then this is interpreted by the source TCP as a 753 new instance of congestion. 755 After the source TCP reduces its congestion window in response to a 756 CE packet, incoming acknowledgements that continue to arrive can 757 "clock out" outgoing packets as allowed by the reduced congestion 758 window. If the congestion window consists of only one MSS (maximum 759 segment size), and the sending TCP receives an ECN-Echo ACK packet, 760 then the sending TCP should in principle still reduce its congestion 761 window in half. However, the value of the congestion window is 762 bounded below by a value of one MSS. If the sending TCP were to 763 continue to send, using a congestion window of 1 MSS, this results in 764 the transmission of one packet per round-trip time. It is necessary 765 to still reduce the sending rate of the TCP sender even further, on 766 receipt of an ECN-Echo packet when the congestion window is one. We 767 use the retransmit timer as a means of reducing the rate further in 768 this circumstance. Therefore, the sending TCP MUST reset the 769 retransmit timer on receiving the ECN-Echo packet when the congestion 770 window is one. The sending TCP will then be able to send a new 771 packet only when the retransmit timer expires. 773 When an ECN-Capable TCP sender reduces its congestion window for any 774 reason (because of a retransmit timeout, a Fast Retransmit, or in 775 response to an ECN Notification), the TCP sender sets the CWR flag in 776 the TCP header of the first new data packet sent after the window 777 reduction. If that data packet is dropped in the network, then the 778 sending TCP will have to reduce the congestion window again and 779 retransmit the dropped packet. 781 We ensure that the "Congestion Window Reduced" information is 782 reliably delivered to the TCP receiver. This comes about from the 783 fact that if the new data packet carrying the CWR flag is dropped, 784 then the TCP sender will have to again reduce its congestion window, 785 and send another new data packet with the CWR flag set. Thus, the 786 CWR bit in the TCP header SHOULD NOT be set on retransmitted packets. 787 When the TCP data sender is ready to set the CWR bit after reducing 788 the congestion window, it SHOULD set the CWR bit only on the first 789 new data packet that it transmits. 791 [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] 792 discusses the validation test in the ns simulator, which illustrates 793 a wide range of ECN scenarios. These scenarios include the following: 794 an ECN followed by another ECN, a Fast Retransmit, or a Retransmit 795 Timeout; a Retransmit Timeout or a Fast Retransmit followed by an 796 ECN; and a congestion window of one packet followed by an ECN. 798 TCP follows existing algorithms for sending data packets in response 799 to incoming ACKs, multiple duplicate acknowledgements, or retransmit 800 timeouts [RFC2581]. TCP also follows the normal procedures for 801 increasing the congestion window when it receives ACK packets without 802 the ECN-Echo bit set [RFC2581]. 804 6.1.3. The TCP Receiver 806 When TCP receives a CE data packet at the destination end-system, the 807 TCP data receiver sets the ECN-Echo flag in the TCP header of the 808 subsequent ACK packet. If there is any ACK withholding implemented, 809 as in current "delayed-ACK" TCP implementations where the TCP 810 receiver can send an ACK for two arriving data packets, then the ECN- 811 Echo flag in the ACK packet will be set to '1' if the CE codepoint is 812 set in any of the data packets being acknowledged. That is, if any 813 of the received data packets are CE packets, then the returning ACK 814 has the ECN-Echo flag set. 816 To provide robustness against the possibility of a dropped ACK packet 817 carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in 818 a series of ACK packets sent subsequently. The TCP receiver uses the 819 CWR flag received from the TCP sender to determine when to stop 820 setting the ECN-Echo flag. 822 After a TCP receiver sends an ACK packet with the ECN-Echo bit set, 823 that TCP receiver continues to set the ECN-Echo flag in all the ACK 824 packets it sends (whether they acknowledge CE data packets or non-CE 825 data packets) until it receives a CWR packet (a packet with the CWR 826 flag set). After the receipt of the CWR packet, acknowledgements for 827 subsequent non-CE data packets do not have the ECN-Echo flag set. If 828 another CE packet is received by the data receiver, the receiver 829 would once again send ACK packets with the ECN-Echo flag set. While 830 the receipt of a CWR packet does not guarantee that the data sender 831 received the ECN-Echo message, this does suggest that the data sender 832 reduced its congestion window at some point *after* it sent the data 833 packet for which the CE codepoint was set. 835 We have already specified that a TCP sender is not required to reduce 836 its congestion window more than once per window of data. Some care 837 is required if the TCP sender is to avoid unnecessary reductions of 838 the congestion window when a window of data includes both dropped 839 packets and (marked) CE packets. This is illustrated in [Floyd98]. 841 6.1.4. Congestion on the ACK-path 843 For the current generation of TCP congestion control algorithms, pure 844 acknowledgement packets (e.g., packets that do not contain any 845 accompanying data) should be sent with the not-ECT codepoint. 846 Current TCP receivers have no mechanisms for reducing traffic on the 847 ACK-path in response to congestion notification. Mechanisms for 848 responding to congestion on the ACK-path are areas for current and 849 future research. (One simple possibility would be for the sender to 850 reduce its congestion window when it receives a pure ACK packet with 851 the CE codepoint set). For current TCP implementations, a single 852 dropped ACK generally has only a very small effect on the TCP's 853 sending rate. 855 6.1.5. Retransmitted TCP packets 857 This document specifies ECN-capable TCP implementations MUST NOT set 858 either ECT codepoint (ECT(0) or ECT(1)) in the IP header for 859 retransmitted data packets, and that the TCP data receiver SHOULD 860 ignore the ECN field on arriving data packets that are outside of the 861 receiver's current window. This is for greater security against 862 denial-of-service attacks, as well as for robustness of the ECN 863 congestion indication with packets that are dropped later in the 864 network. 866 First, we note that if the TCP sender were to set an ECT codepoint on 867 a retransmitted packet, then if an unnecessarily-retransmitted packet 868 was later dropped in the network, the end nodes would never receive 869 the indication of congestion from the router setting the CE 870 codepoint. Thus, setting an ECT codepoint on retransmitted data 871 packets is not consistent with the robust delivery of the congestion 872 indication even for packets that are later dropped in the network. 874 In addition, an attacker capable of spoofing the IP source address of 875 the TCP sender could send data packets with arbitrary sequence 876 numbers, with the CE codepoint set in the IP header. On receiving 877 this spoofed data packet, the TCP data receiver would determine that 878 the data does not lie in the current receive window, and return a 879 duplicate acknowledgement. We define an out-of-window packet at the 880 TCP data receiver as a data packet that lies outside the receiver's 881 current window. On receiving an out-of-window packet, the TCP data 882 receiver has to decide whether or not to treat the CE codepoint in 883 the packet header as a valid indication of congestion, and therefore 884 whether to return ECN-Echo indications to the TCP data sender. If 885 the TCP data receiver ignored the CE codepoint in an out-of-window 886 packet, then the TCP data sender would not receive this possibly- 887 legitimate indication of congestion from the network, resulting in a 888 violation of end-to-end congestion control. On the other hand, if 889 the TCP data receiver honors the CE indication in the out-of-window 890 packet, and reports the indication of congestion to the TCP data 891 sender, then the malicious node that created the spoofed, out-of- 892 window packet has successfully "attacked" the TCP connection by 893 forcing the data sender to unnecessarily reduce (halve) its 894 congestion window. To prevent such a denial-of-service attack, we 895 specify that a legitimate TCP data sender MUST NOT set an ECT 896 codepoint on retransmitted data packets, and that the TCP data 897 receiver SHOULD ignore the CE codepoint on out-of-window packets. 899 One drawback of not setting ECT(0) or ECT(1) on retransmitted packets 900 is that it denies ECN protection for retransmitted packets. However, 901 for an ECN-capable TCP connection in a fully-ECN-capable environment 902 with mild congestion, packets should rarely be dropped due to 903 congestion in the first place, and so instances of retransmitted 904 packets should rarely arise. If packets are being retransmitted, 905 then there are already packet losses (from corruption or from 906 congestion) that ECN has been unable to prevent. 908 We note that if the router sets the CE codepoint for an ECN-capable 909 data packet within a TCP connection, then the TCP connection is 910 guaranteed to receive that indication of congestion, or to receive 911 some other indication of congestion within the same window of data, 912 even if this packet is dropped or reordered in the network. We 913 consider two cases, when the packet is later retransmitted, and when 914 the packet is not later retransmitted. 916 In the first case, if the packet is either dropped or delayed, and at 917 some point retransmitted by the data sender, then the retransmission 918 is a result of a Fast Retransmit or a Retransmit Timeout for either 919 that packet or for some prior packet in the same window of data. In 920 this case, because the data sender already has retransmitted this 921 packet, we know that the data sender has already responded to an 922 indication of congestion for some packet within the same window of 923 data as the original packet. Thus, even if the first transmission of 924 the packet is dropped in the network, or is delayed, if it had the CE 925 codepoint set, and is later ignored by the data receiver as an out- 926 of-window packet, this is not a problem, because the sender has 927 already responded to an indication of congestion for that window of 928 data. 930 In the second case, if the packet is never retransmitted by the data 931 sender, then this data packet is the only copy of this data received 932 by the data receiver, and therefore arrives at the data receiver as 933 an in-window packet, regardless of how much the packet might be 934 delayed or reordered. In this case, if the CE codepoint is set on 935 the packet within the network, this will be treated by the data 936 receiver as a valid indication of congestion. 938 6.1.6. TCP Window Probes. 940 When the TCP data receiver advertises a zero window, the TCP data 941 sender sends window probes to determine if the receiver's window has 942 increased. Window probe packets do not contain any user data except 943 for the sequence number, which is a byte. If a window probe packet 944 is dropped in the network, this loss is not detected by the receiver. 945 Therefore, the TCP data sender MUST NOT set either an ECT codepoint 946 or the CWR bit on window probe packets. 948 However, because window probes use exact sequence numbers, they 949 cannot be easily spoofed in denial-of-service attacks. Therefore, if 950 a window probe arrives with the CE codepoint set, then the receiver 951 SHOULD respond to the ECN indications. 953 7. Non-compliance by the End Nodes 955 This section discusses concerns about the vulnerability of ECN to 956 non-compliant end-nodes (i.e., end nodes that set the ECT codepoint 957 in transmitted packets but do not respond to received CE packets). 958 We argue that the addition of ECN to the IP architecture will not 959 significantly increase the current vulnerability of the architecture 960 to unresponsive flows. 962 Even for non-ECN environments, there are serious concerns about the 963 damage that can be done by non-compliant or unresponsive flows (that 964 is, flows that do not respond to congestion control indications by 965 reducing their arrival rate at the congested link). For example, an 966 end-node could "turn off congestion control" by not reducing its 967 congestion window in response to packet drops. This is a concern for 968 the current Internet. It has been argued that routers will have to 969 deploy mechanisms to detect and differentially treat packets from 970 non-compliant flows [RFC2309,FF99]. It has also been suggested that 971 techniques such as end-to-end per-flow scheduling and isolation of 972 one flow from another, differentiated services, or end-to-end 973 reservations could remove some of the more damaging effects of 974 unresponsive flows. 976 It might seem that dropping packets in itself is an adequate 977 deterrent for non-compliance, and that the use of ECN removes this 978 deterrent. We would argue in response that (1) ECN-capable routers 979 preserve packet-dropping behavior in times of high congestion; and 980 (2) even in times of high congestion, dropping packets in itself is 981 not an adequate deterrent for non-compliance. 983 First, ECN-Capable routers will only mark packets (as opposed to 984 dropping them) when the packet marking rate is reasonably low. During 985 periods where the average queue size exceeds an upper threshold, and 986 therefore the potential packet marking rate would be high, our 987 recommendation is that routers drop packets rather then set the CE 988 codepoint in packet headers. 990 During the periods of low or moderate packet marking rates when ECN 991 would be deployed, there would be little deterrent effect on 992 unresponsive flows of dropping rather than marking those packets. For 993 example, delay-insensitive flows using reliable delivery might have 994 an incentive to increase rather than to decrease their sending rate 995 in the presence of dropped packets. Similarly, delay-sensitive flows 996 using unreliable delivery might increase their use of FEC in response 997 to an increased packet drop rate, increasing rather than decreasing 998 their sending rate. For the same reasons, we do not believe that 999 packet dropping itself is an effective deterrent for non-compliance 1000 even in an environment of high packet drop rates, when all flows are 1001 sharing the same packet drop rate. 1003 Several methods have been proposed to identify and restrict non- 1004 compliant or unresponsive flows. The addition of ECN to the network 1005 environment would not in any way increase the difficulty of designing 1006 and deploying such mechanisms. If anything, the addition of ECN to 1007 the architecture would make the job of identifying unresponsive flows 1008 slightly easier. For example, in an ECN-Capable environment routers 1009 are not limited to information about packets that are dropped or have 1010 the CE codepoint set at that router itself; in such an environment, 1011 routers could also take note of arriving CE packets that indicate 1012 congestion encountered by that packet earlier in the path. 1014 8. Non-compliance in the Network 1016 This section considers the issues when a router is operating, 1017 possibly maliciously, to modify either of the bits in the ECN field. 1019 By tampering with the bits in the ECN field, an adversary (or a 1020 broken router) could do one or more of the following: falsely report 1021 congestion, disable ECN-Capability for an individual packet, erase 1022 the ECN congestion indication, or falsely indicate ECN-Capability. 1023 Section 18 systematically examines the various cases by which the ECN 1024 field could be modified. The important criterion considered in 1025 determining the consequences of such modifications is whether it is 1026 likely to lead to poorer behavior in any dimension (throughput, 1027 delay, fairness or functionality) than if a router were to drop a 1028 packet. 1030 The first two possible changes, falsely reporting congestion or 1031 disabling ECN-Capability for an individual packet, are no worse than 1032 if the router were to simply drop the packet. From a congestion 1033 control point of view, setting the CE codepoint in the absence of 1034 congestion by a non-compliant router would be no worse than a router 1035 dropping a packet unnecessarily. By "erasing" an ECT codepoint of a 1036 packet that is later dropped in the network, a router's actions could 1037 result in an unnecessary packet drop for that packet later in the 1038 network. 1040 However, as discussed in Section 18, a router that erases the ECN 1041 congestion indication or falsely indicates ECN-Capability could 1042 potentially do more damage to the flow that if it has simply dropped 1043 the packet. A rogue or broken router that "erased" the CE codepoint 1044 in arriving CE packets would prevent that indication of congestion 1045 from reaching downstream receivers. This could result in the failure 1046 of congestion control for that flow and a resulting increase in 1047 congestion in the network, ultimately resulting in subsequent packets 1048 dropped for this flow as the average queue size increased at the 1049 congested gateway. 1051 Section 19 considers the potential repercussions of subverting end- 1052 to-end congestion control by either falsely indicating ECN- 1053 Capability, or by erasing the congestion indication in ECN (the CE- 1054 codepoint). We observe in Section 19 that the consequence of 1055 subverting ECN-based congestion control may lead to potential 1056 unfairness, but this is likely to be no worse than the subversion of 1057 either ECN-based or packet-based congestion control by the end nodes. 1059 8.1. Complications Introduced by Split Paths 1061 If a router or other network element has access to all of the packets 1062 of a flow, then that router could do no more damage to a flow by 1063 altering the ECN field than it could by simply dropping all of the 1064 packets from that flow. However, in some cases, a malicious or 1065 broken router might have access to only a subset of the packets from 1066 a flow. The question is as follows: can this router, by altering 1067 the ECN field in this subset of the packets, do more damage to that 1068 flow than if it has simply dropped that set of the packets? 1070 This is also discussed in detail in Section 18, which conclude as 1071 follows: It is true that the adversary that has access only to a 1072 subset of packets in an aggregate might, by subverting ECN-based 1073 congestion control, be able to deny the benefits of ECN to the other 1074 packets in the aggregate. While this is undesirable, this is not a 1075 sufficient concern to result in disabling ECN. 1077 9. Encapsulated Packets 1079 9.1. IP packets encapsulated in IP 1081 The encapsulation of IP packet headers in tunnels is used in many 1082 places, including IPsec and IP in IP [RFC2003]. This section 1083 considers issues related to interactions between ECN and IP tunnels, 1084 and specifies two alternative solutions. This discussion is 1085 complemented by RFC 2983's discussion of interactions between 1086 Differentiated Services and IP tunnels of various forms [RFC 2983], 1087 as Differentiated Services uses the remaining six bits of the IP 1088 header octet that is used by ECN (see Figure 2 in Section 5). 1090 Some IP tunnel modes are based on adding a new "outer" IP header that 1091 encapsulates the original, or "inner" IP header and its associated 1092 packet. In many cases, the new "outer" IP header may be added and 1093 removed at intermediate points along a connection, enabling the 1094 network to establish a tunnel without requiring endpoint 1095 participation. We denote tunnels that specify that the outer header 1096 be discarded at tunnel egress as "simple tunnels". 1098 ECN uses the ECN field in the IP header for signaling between routers 1099 and connection endpoints. ECN interacts with IP tunnels based on the 1100 treatment of the ECN field in the IP header. In simple IP tunnels 1101 the octet containing the ECN field is copied or mapped from the inner 1102 IP header to the outer IP header at IP tunnel ingress, and the outer 1103 header's copy of this field is discarded at IP tunnel egress. If the 1104 outer header were to be simply discarded without taking care to deal 1105 with the ECN field, and an ECN-capable router were to set the CE 1106 (Congestion Experienced) codepoint within a packet in a simple IP 1107 tunnel, this indication would be discarded at tunnel egress, losing 1108 the indication of congestion. 1110 Thus, the use of ECN over simple IP tunnels would result in routers 1111 attempting to use the outer IP header to signal congestion to 1112 endpoints, but those congestion warnings never arriving because the 1113 outer header is discarded at the tunnel egress point. This problem 1114 was encountered with ECN and IPsec in tunnel mode, and RFC 2481 1115 recommended that ECN not be used with the older simple IPsec tunnels 1116 in order to avoid this behavior and its consequences. When ECN 1117 becomes widely deployed, then simple tunnels likely to carry ECN- 1118 capable traffic will have to be changed. 1120 From a security point of view, the use of ECN in the outer header of 1121 an IP tunnel might raise security concerns because an adversary could 1122 tamper with the ECN information that propagates beyond the tunnel 1123 endpoint. Based on an analysis in Sections 18 and 19 of these 1124 concerns and the resultant risks, our overall approach is to make 1125 support for ECN an option for IP tunnels, so that an IP tunnel can be 1126 specified or configured either to use ECN or not to use ECN in the 1127 outer header of the tunnel. Thus, in environments or tunneling 1128 protocols where the risks of using ECN are judged to outweigh its 1129 benefits, the tunnel can simply not use ECN in the outer header. 1130 Then the only indication of congestion experienced at routers within 1131 the tunnel would be through packet loss. 1133 The result is that there are two viable options for the behavior of 1134 ECN-capable connections over an IP tunnel, especially IPsec tunnels: 1135 * A limited-functionality option in which ECN is preserved in the 1136 inner header, but disabled in the outer header. The only 1137 mechanism available for signaling congestion occurring within the 1138 tunnel in this case is dropped packets. 1139 * A full-functionality option that supports ECN in both the inner 1140 and outer headers, and propagates congestion warnings from nodes 1141 within the tunnel to endpoints. 1143 Support for these options requires varying amounts of changes to IP 1144 header processing at tunnel ingress and egress. A small subset of 1145 these changes sufficient to support only the limited-functionality 1146 option would be sufficient to eliminate any incompatibility between 1147 ECN and IP tunnels. 1149 One goal of this document is to give guidance about the tradeoffs 1150 between the limited-functionality and full-functionality options. A 1151 full discussion of the potential effects of an adversary's 1152 modifications of the ECN field is given in Sections 18 and 19. 1154 9.1.1. The Limited-functionality and Full-functionality Options 1156 The limited-functionality option for ECN encapsulation in IP tunnels 1157 is for the non-ECT codepoint to be set in the outside (encapsulating) 1158 header regardless of the value of the ECN field in the inside 1159 (encapsulated) header. With this option, the ECN field in the inner 1160 header is not altered upon de-capsulation. The disadvantage of this 1161 approach is that the flow does not have ECN support for that part of 1162 the path that is using IP tunneling, even if the encapsulated packet 1163 (from the original TCP sender) is ECN-Capable. That is, if the 1164 encapsulated packet arrives at a congested router that is ECN- 1165 capable, and the router can decide to drop or mark the packet as an 1166 indication of congestion to the end nodes, the router will not be 1167 permitted to set the CE codepoint in the packet header, but instead 1168 will have to drop the packet. 1170 The full-functionality option for ECN encapsulation is to copy the 1171 ECN codepoint of the inside header to the outside header on 1172 encapsulation if the inside header is not-ECT or ECT, and to set the 1173 ECN codepoint of the outside header to ECT(0) if the ECN codepoint of 1174 the inside header is CE. On decapsulation, if the CE codepoint is 1175 set on the outside header, then the CE codepoint is also set in the 1176 inner header. Otherwise, the ECN codepoint on the inner header is 1177 left unchanged. That is, for full ECN support the encapsulation and 1178 decapsulation processing involves the following: At tunnel ingress, 1179 the full-functionality option sets the ECN codepoint in the outer 1180 header. If the ECN codepoint in the inner header is not-ECT or ECT, 1181 then it is copied to the ECN codepoint in the outer header. If the 1182 ECN codepoint in the inner header is CE, then the ECN codepoint in 1183 the outer header is set to ECT(0). Upon decapsulation at the tunnel 1184 egress, the full-functionality option sets the CE codepoint in the 1185 inner header if the CE codepoint is set in the outer header. 1186 Otherwise, no change is made to this field of the inner header. 1188 With the full-functionality option, a flow can take advantage of ECN 1189 in those parts of the path that might use IP tunneling. The 1190 disadvantage of the full-functionality option from a security 1191 perspective is that the IP tunnel cannot protect the flow from 1192 certain modifications to the ECN bits in the IP header within the 1193 tunnel. The potential dangers from modifications to the ECN bits in 1194 the IP header are described in detail in Sections 18 and 19. 1196 (1) An IP tunnel MUST modify the handling of the DS field octet at 1197 IP tunnel endpoints by implementing either the limited- 1198 functionality or the full-functionality option. 1199 (2) Optionally, an IP tunnel MAY enable the endpoints of an IP 1200 tunnel to negotiate the choice between the limited-functionality 1201 and the full-functionality option for ECN in the tunnel. 1203 The minimum required to make ECN usable with IP tunnels is the 1204 limited-functionality option, which prevents ECN from being enabled 1205 in the outer header of an IPsec tunnel. Full support for ECN 1206 requires the use of the full-functionality option. If there are no 1207 optional mechanisms for the tunnel endpoints to negotiate a choice 1208 between the limited-functionality or full-functionality option, there 1209 can be a pre-existing agreement between the tunnel endpoints about 1210 whether to support the limited-functionality or the full- 1211 functionality ECN option. 1213 In addition, it is RECOMMENDED that packets with the CE codepoint in 1214 the outer header be dropped if they arrive at the tunnel egress point 1215 for a tunnel that uses the limited-functionality option, or for a 1216 tunnel that uses the full-functionality option but for which the not- 1217 ECT codepoint is set in the inner header. This is motivated by 1218 backwards compatibility and to ensure that no unauthorized 1219 modifications of the ECN field take place, and is discussed further 1220 in the next Section (9.1.2). 1222 9.1.2. Changes to the ECN Field within an IP Tunnel. 1224 The presence of a copy of the ECN field in the inner header of an IP 1225 tunnel mode packet provides an opportunity for detection of 1226 unauthorized modifications to the ECN field in the outer header. 1227 Comparison of the ECT fields in the inner and outer headers falls 1228 into two categories for implementations that conform to this 1229 document: 1230 * If the IP tunnel uses the full-functionality option, then the 1231 not-ECT codepoint should be set in the outer header if and only if 1232 it is also set in the inner header. 1233 * If the tunnel uses the limited-functionality option, then the 1234 not-ECT codepoint should be set in the outer header. 1236 Receipt of a packet not satisfying the appropriate condition could be 1237 a cause of concern. 1239 Consider the case of an IP tunnel where the tunnel ingress point has 1240 not been updated to this document's requirements, while the tunnel 1241 egress point has been updated to support ECN. In this case, the IP 1242 tunnel is not explicitly configured to support the full-functionality 1243 ECN option. However, the tunnel ingress point is behaving identically 1244 to a tunnel ingress point that supports the full-functionality 1245 option. If packets from an ECN-capable connection use this tunnel, 1246 the ECT codepoint will be set in the outer header at the tunnel 1247 ingress point. Congestion within the tunnel may then result in ECN- 1248 capable routers setting CE in the outer header. Because the tunnel 1249 has not been explicitly configured to support the full-functionality 1250 option, the tunnel egress point expects the not-ECT codepoint to be 1251 set in the outer header. When an ECN-capable tunnel egress point 1252 receives a packet with the ECT or CE codepoint in the outer header, 1253 in a tunnel that has not been configured to support the full- 1254 functionality option, that packet should be processed, according to 1255 whether the CE codepoint was set, as follows. It is RECOMMENDED that 1256 on a tunnel that has not been configured to support the full- 1257 functionality option, packets should be dropped at the egress point 1258 if the CE codepoint is set in the outer header but not in the inner 1259 header, and should be forwarded otherwise. 1261 An IP tunnel cannot provide protection against erasure of congestion 1262 indications based on changing the ECN codepoint from CE to ECT. The 1263 erasure of congestion indications may impact the network and other 1264 flows in ways that would not be possible in the absence of ECN. It 1265 is important to note that erasure of congestion indications can only 1266 be performed to congestion indications placed by nodes within the 1267 tunnel; the copy of the ECN field in the inner header preserves 1268 congestion notifications from nodes upstream of the tunnel ingress 1269 (unless the inner header is also erased). If erasure of congestion 1270 notifications is judged to be a security risk that exceeds the 1271 congestion management benefits of ECN, then tunnels could be 1272 specified or configured to use the limited-functionality option. 1274 9.2. IPsec Tunnels 1276 IPsec supports secure communication over potentially insecure network 1277 components such as intermediate routers. IPsec protocols support two 1278 operating modes, transport mode and tunnel mode, that span a wide 1279 range of security requirements and operating environments. Transport 1280 mode security protocol header(s) are inserted between the IP (IPv4 or 1281 IPv6) header and higher layer protocol headers (e.g., TCP), and hence 1282 transport mode can only be used for end-to-end security on a 1283 connection. IPsec tunnel mode is based on adding a new "outer" IP 1284 header that encapsulates the original, or "inner" IP header and its 1285 associated packet. Tunnel mode security headers are inserted between 1286 these two IP headers. In contrast to transport mode, the new "outer" 1287 IP header and tunnel mode security headers can be added and removed 1288 at intermediate points along a connection, enabling security gateways 1289 to secure vulnerable portions of a connection without requiring 1290 endpoint participation in the security protocols. An important 1291 aspect of tunnel mode security is that in the original specification, 1292 the outer header is discarded at tunnel egress, ensuring that 1293 security threats based on modifying the IP header do not propagate 1294 beyond that tunnel endpoint. Further discussion of IPsec can be 1295 found in [RFC2401]. 1297 The IPsec protocol as originally defined in [ESP, AH] required that 1298 the inner header's ECN field not be changed by IPsec decapsulation 1299 processing at a tunnel egress node; this would have ruled out the 1300 possibility of full-functionality mode for ECN. At the same time, 1301 this would ensure that an adversary's modifications to the ECN field 1302 cannot be used to launch theft- or denial-of-service attacks across 1303 an IPsec tunnel endpoint, as any such modifications will be discarded 1304 at the tunnel endpoint. 1306 In principle, permitting the use of ECN functionality in the outer 1307 header of an IPsec tunnel raises security concerns because an 1308 adversary could tamper with the information that propagates beyond 1309 the tunnel endpoint. Based on an analysis (included in Sections 18 1310 and 19) of these concerns and the associated risks, our overall 1311 approach has been to provide configuration support for IPsec changes 1312 to remove the conflict with ECN. 1314 In particular, in tunnel mode the IPsec tunnel MUST support either 1315 the limited-functionality or the full-functionality mode outlined in 1316 Section 9.1.1. 1318 This makes permission to use ECN functionality in the outer header of 1319 an IPsec tunnel a configurable part of the corresponding IPsec 1320 Security Association (SA), so that it can be disabled in situations 1321 where the risks are judged to outweigh the benefits. The result is 1322 that an IPsec security administrator is presented with two 1323 alternatives for the behavior of ECN-capable connections within an 1324 IPsec tunnel, the limited-functionality alternative and full- 1325 functionality alternative described earlier. All IPsec 1326 implementations MUST implement either the limited-functionality or 1327 the full-functionality alternative in order to eliminate 1328 incompatibility between ECN and IPsec tunnels, but implementers MAY 1329 choose to implement either alternative. 1331 In addition, this document specifies how the endpoints of an IPsec 1332 tunnel could negotiate enabling ECN functionality in the outer 1333 headers of that tunnel based on security policy. The ability to 1334 negotiate ECN usage between tunnel endpoints would enable a security 1335 administrator to disable ECN in situations where she believes the 1336 risks (e.g., of lost congestion notifications) outweigh the benefits 1337 of ECN. 1339 The IPsec protocol, as defined in [ESP, AH], does not include the IP 1340 header's ECN field in any of its cryptographic calculations (in the 1341 case of tunnel mode, the outer IP header's ECN field is not 1342 included). Hence modification of the ECN field by a network node has 1343 no effect on IPsec's end-to-end security, because it cannot cause any 1344 IPsec integrity check to fail. As a consequence, IPsec does not 1345 provide any defense against an adversary's modification of the ECN 1346 field (i.e., a man-in-the-middle attack), as the adversary's 1347 modification will also have no effect on IPsec's end-to-end security. 1348 In some environments, the ability to modify the ECN field without 1349 affecting IPsec integrity checks may constitute a covert channel; if 1350 it is necessary to eliminate such a channel or reduce its bandwidth, 1351 then the IPsec tunnel should be run in limited-functionality mode. 1353 9.2.1. Negotiation between Tunnel Endpoints 1355 This section describes the detailed changes to enable usage of ECN 1356 over IPsec tunnels, including the negotiation of ECN support between 1357 tunnel endpoints. This is supported by three changes to IPsec: 1358 * An optional Security Association Database (SAD) field indicating 1359 whether tunnel encapsulation and decapsulation processing allows 1360 or forbids ECN usage in the outer IP header. 1361 * An optional Security Association Attribute that enables 1362 negotiation of this SAD field between the two endpoints of an SA 1363 that supports tunnel mode. 1364 * Changes to tunnel mode encapsulation and decapsulation 1365 processing to allow or forbid ECN usage in the outer IP header 1366 based on the value of the SAD field. When ECN usage is allowed in 1367 the outer IP header, the ECT codepoint is set in the outer header 1368 for ECN-capable connections and congestion notifications 1369 (indicated by the CE codepoint) from such connections are 1370 propagated to the inner header at tunnel egress. 1372 If negotiation of ECN usage is implemented, then the SAD field SHOULD 1373 also be implemented. On the other hand, negotiation of ECN usage is 1374 OPTIONAL in all cases, even for implementations that support the SAD 1375 field. The encapsulation and decapsulation processing changes are 1376 REQUIRED, but MAY be implemented without the other two changes by 1377 assuming that ECN usage is always forbidden. The full-functionality 1378 alternative for ECN usage over IPsec tunnels consists of the SAD 1379 field and the full version of encapsulation and decapsulation 1380 processing changes, with or without the OPTIONAL negotiation support. 1381 The limited-functionality alternative consists of a subset of the 1382 encapsulation and decapsulation changes that always forbids ECN 1383 usage. 1385 These changes are covered further in the following three subsections. 1387 9.2.1.1. ECN Tunnel Security Association Database Field 1389 Full ECN functionality adds a new field to the SAD (see [RFC2401]): 1391 ECN Tunnel: allowed or forbidden. 1393 Indicates whether ECN-capable connections using this SA in tunnel 1394 mode are permitted to receive ECN congestion notifications for 1395 congestion occurring within the tunnel. The allowed value enables 1396 ECN congestion notifications. The forbidden value disables such 1397 notifications, causing all congestion to be indicated via dropped 1398 packets. 1400 [OPTIONAL. The value of this field SHOULD be assumed to be 1401 "forbidden" in implementations that do not support it.] 1403 If this attribute is implemented, then the SA specification in a 1404 Security Policy Database (SPD) entry MUST support a corresponding 1405 attribute, and this SPD attribute MUST be covered by the SPD 1406 administrative interface (currently described in Section 4.4.1 of 1407 [RFC2401]). 1409 9.2.1.2. ECN Tunnel Security Association Attribute 1411 A new IPsec Security Association Attribute is defined to enable the 1412 support for ECN congestion notifications based on the outer IP header 1413 to be negotiated for IPsec tunnels (see [RFC2407]). This attribute 1414 is OPTIONAL, although implementations that support it SHOULD also 1415 support the SAD field defined in Section 9.2.1.1. 1417 Attribute Type 1419 class value type 1420 ------------------------------------------------- 1421 ECN Tunnel 10 Basic 1423 The IPsec SA Attribute value 10 has been allocated by IANA to 1424 indicate that the ECN Tunnel SA Attribute is being negotiated; the 1425 type of this attribute is Basic (see Section 4.5 of [RFC2407]). The 1426 Class Values are used to conduct the negotiation. See [RFC2407, 1427 RFC2408, RFC2409] for further information including encoding formats 1428 and requirements for negotiating this SA attribute. 1430 Class Values 1432 ECN Tunnel 1434 Specifies whether ECN functionality is allowed to 1435 be used with Tunnel Encapsulation Mode. 1436 This affects tunnel encapsulation and decapsulation processing - 1437 see Section 9.2.1.3. 1439 RESERVED 0 1440 Allowed 1 1441 Forbidden 2 1442 Values 3-61439 are reserved to IANA. Values 61440-65535 are for 1443 private use. 1445 If unspecified, the default shall be assumed to be Forbidden. 1447 ECN Tunnel is a new SA attribute, and hence initiators that use it 1448 can expect to encounter responders that do not understand it, and 1449 therefore reject proposals containing it. For backwards 1450 compatibility with such implementations initiators SHOULD always also 1451 include a proposal without the ECN Tunnel attribute to enable such a 1452 responder to select a transform or proposal that does not contain the 1453 ECN Tunnel attribute. RFC 2407 currently requires responders to 1454 reject all proposals if any proposal contains an unknown attribute; 1455 this requirement is expected to be changed to require a responder not 1456 to select proposals or transforms containing unknown attributes. 1458 9.2.1.3. Changes to IPsec Tunnel Header Processing 1460 For full ECN support, the encapsulation and decapsulation processing 1461 for the IPv4 TOS field and the IPv6 Traffic Class field are changed 1462 from that specified in [RFC2401] to the following: 1464 <-- How Outer Hdr Relates to Inner Hdr --> 1465 Outer Hdr at Inner Hdr at 1466 IPv4 Encapsulator Decapsulator 1467 Header fields: -------------------- ------------ 1468 DS Field copied from inner hdr (5) no change 1469 ECN Field constructed (7) constructed (8) 1471 IPv6 1472 Header fields: 1473 DS Field copied from inner hdr (6) no change 1474 ECN Field constructed (7) constructed (8) 1476 (5)(6) If the packet will immediately enter a domain for which the 1477 DSCP value in the outer header is not appropriate, that value MUST 1478 be mapped to an appropriate value for the domain [RFC 2474]. Also 1479 see [RFC 2475] for further information. 1481 (7) If the value of the ECN Tunnel field in the SAD entry for this 1482 SA is "allowed" and the ECN field in the inner header is set to 1483 any value other than CE, copy this ECN field to the outer header. 1484 If the ECN field in the inner header is set to CE, then set the 1485 ECN field in the outer header to ECT(0). 1487 (8) If the value of the ECN tunnel field in the SAD entry for this 1488 SA is "allowed" and the ECN field in the inner header is set to 1489 ECT(0) or ECT(1) and the ECN field in the outer header is set to 1490 CE, then copy the ECN field from the outer header to the inner 1491 header. Otherwise, make no change to the ECN field in the inner 1492 header. 1494 (5) and (6) are identical to match usage in [RFC2401], although 1495 they are different in [RFC2401]. 1497 The above description applies to implementations that support the ECN 1498 Tunnel field in the SAD; such implementations MUST implement this 1499 processing instead of the processing of the IPv4 TOS octet and IPv6 1500 Traffic Class octet defined in [RFC2401]. This constitutes the full- 1501 functionality alternative for ECN usage with IPsec tunnels. 1503 An implementation that does not support the ECN Tunnel field in the 1504 SAD MUST implement this processing by assuming that the value of the 1505 ECN Tunnel field of the SAD is "forbidden" for every SA. In this 1506 case, the processing of the ECN field reduces to: 1508 (7) Set the ECN field to not-ECT in the outer header. 1509 (8) Make no change to the ECN field in the inner header. 1511 This constitutes the limited functionality alternative for ECN usage 1512 with IPsec tunnels. 1514 For backwards compatibility, packets with the CE codepoint set in the 1515 outer header SHOULD be dropped if they arrive on an SA that is using 1516 the limited-functionality option, or that is using the full- 1517 functionality option with the not-ECN codepoint set in the inner 1518 header. 1520 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 1522 If the ECN Field is changed inappropriately within an IPsec tunnel, 1523 and this change is detected at the tunnel egress, then the receipt of 1524 a packet not satisfying the appropriate condition for its SA is an 1525 auditable event. An implementation MAY create audit records with 1526 per-SA counts of incorrect packets over some time period rather than 1527 creating an audit record for each erroneous packet. Any such audit 1528 record SHOULD contain the headers from at least one erroneous packet, 1529 but need not contain the headers from every packet represented by the 1530 entry. 1532 9.2.3. Comments for IPsec Support 1534 Substantial comments were received on two areas of this document 1535 during review by the IPsec working group. This section describes 1536 these comments and explains why the proposed changes were not 1537 incorporated. 1539 The first comment indicated that per-node configuration is easier to 1540 implement than per-SA configuration. After serious thought and 1541 despite some initial encouragement of per-node configuration, it no 1542 longer seems to be a good idea. The concern is that as ECN-awareness 1543 is progressively deployed in IPsec, many ECN-aware IPsec 1544 implementations will find themselves communicating with a mixture of 1545 ECN-aware and ECN-unaware IPsec tunnel endpoints. In such an 1546 environment with per-node configuration, the only reasonable thing to 1547 do is forbid ECN usage for all IPsec tunnels, which is not the 1548 desired outcome. 1550 In the second area, several reviewers noted that SA negotiation is 1551 complex, and adding to it is non-trivial. One reviewer suggested 1552 using ICMP after tunnel setup as a possible alternative. The 1553 addition to SA negotiation in this document is OPTIONAL and will 1554 remain so; implementers are free to ignore it. The authors believe 1555 that the assurance it provides can be useful in a number of 1556 situations. In practice, if this is not implemented, it can be 1557 deleted at a subsequent stage in the standards process. Extending 1558 ICMP to negotiate ECN after tunnel setup is more complex than 1559 extending SA attribute negotiation. Some tunnels do not permit 1560 traffic to be addressed to the tunnel egress endpoint, hence the ICMP 1561 packet would have to be addressed to somewhere else, scanned for by 1562 the egress endpoint, and discarded there or at its actual 1563 destination. In addition, ICMP delivery is unreliable, and hence 1564 there is a possibility of an ICMP packet being dropped, entailing the 1565 invention of yet another ack/retransmit mechanism. It seems better 1566 simply to specify an OPTIONAL extension to the existing SA 1567 negotiation mechanism. 1569 9.3. IP packets encapsulated in non-IP packet headers. 1571 A different set of issues are raised, relative to ECN, when IP 1572 packets are encapsulated in tunnels with non-IP packet headers. This 1573 occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. 1574 For these protocols, there is no conflict with ECN; it is just that 1575 ECN cannot be used within the tunnel unless an ECN codepoint can be 1576 specified for the header of the encapsulating protocol. Earlier work 1577 considered a preliminary proposal for incorporating ECN into MPLS, 1578 and proposals for incorporating ECN into GRE, L2TP, or PPTP will be 1579 considered as the need arises. 1581 10. Issues Raised by Monitoring and Policing Devices 1583 One possibility is that monitoring and policing devices (or more 1584 informally, "penalty boxes") will be installed in the network to 1585 monitor whether best-effort flows are appropriately responding to 1586 congestion, and to preferentially drop packets from flows determined 1587 not to be using adequate end-to-end congestion control procedures. 1589 We recommend that any "penalty box" that detects a flow or an 1590 aggregate of flows that is not responding to end-to-end congestion 1591 control first change from marking to dropping packets from that flow, 1592 before taking any additional action to restrict the bandwidth 1593 available to that flow. Thus, initially, the router may drop packets 1594 in which the router would otherwise would have set the CE codepoint. 1595 This could include dropping those arriving packets for that flow that 1596 are ECN-Capable and that already have the CE codepoint set. In this 1597 way, any congestion indications seen by that router for that flow 1598 will be guaranteed to also be seen by the end nodes, even in the 1599 presence of malicious or broken routers elsewhere in the path. If we 1600 assume that the first action taken at any "penalty box" for an ECN- 1601 capable flow will be to drop packets instead of marking them, then 1602 there is no way that an adversary that subverts ECN-based end-to-end 1603 congestion control can cause a flow to be characterized as being non- 1604 cooperative and placed into a more severe action within the "penalty 1605 box". 1607 The monitoring and policing devices that are actually deployed could 1608 fall short of the `ideal' monitoring device described above, in that 1609 the monitoring is applied not to a single flow, but to an aggregate 1610 of flows (e.g., those sharing a single IPsec tunnel). In this case, 1611 the switch from marking to dropping would apply to all of the flows 1612 in that aggregate, denying the benefits of ECN to the other flows in 1613 the aggregate also. At the highest level of aggregation, another 1614 form of the disabling of ECN happens even in the absence of 1615 monitoring and policing devices, when ECN-Capable RED queues switch 1616 from marking to dropping packets as an indication of congestion when 1617 the average queue size has exceeded some threshold. 1619 11. Evaluations of ECN 1621 11.1. Related Work Evaluating ECN 1623 This section discusses some of the related work evaluating the use of 1624 ECN. The ECN Web Page [ECN] has pointers to other papers, as well as 1625 to implementations of ECN. 1627 [Floyd94] considers the advantages and drawbacks of adding ECN to the 1628 TCP/IP architecture. As shown in the simulation-based comparisons, 1629 one advantage of ECN is to avoid unnecessary packet drops for short 1630 or delay-sensitive TCP connections. A second advantage of ECN is in 1631 avoiding some unnecessary retransmit timeouts in TCP. This paper 1632 discusses in detail the integration of ECN into TCP's congestion 1633 control mechanisms. The possible disadvantages of ECN discussed in 1634 the paper are that a non-compliant TCP connection could falsely 1635 advertise itself as ECN-capable, and that a TCP ACK packet carrying 1636 an ECN-Echo message could itself be dropped in the network. The 1637 first of these two issues is discussed in the appendix of this 1638 document, and the second is addressed by the addition of the CWR flag 1639 in the TCP header. 1641 Experimental evaluations of ECN include [RFC2884,K98]. The 1642 conclusions of [K98] and [RFC2884] are that ECN TCP gets moderately 1643 better throughput than non-ECN TCP; that ECN TCP flows are fair 1644 towards non-ECN TCP flows; and that ECN TCP is robust with two-way 1645 traffic (with congestion in both directions) and with multiple 1646 congested gateways. Experiments with many short web transfers show 1647 that, while most of the short connections have similar transfer times 1648 with or without ECN, a small percentage of the short connections have 1649 very long transfer times for the non-ECN experiments as compared to 1650 the ECN experiments. 1652 11.2. A Discussion of the ECN nonce. 1654 The use of two ECT codepoints, ECT(0) and ECT(1), can provide a one- 1655 bit ECN nonce in packet headers [SCWA99]. The primary motivation for 1656 this is the desire to allow mechanisms for the data sender to verify 1657 that network elements are not erasing the CE codepoint, and that data 1658 receivers are properly reporting to the sender the receipt of packets 1659 with the CE codepoint set, as required by the transport protocol. 1660 This section discusses issues of backwards compatibility with IP ECN 1661 implementations in routers conformant with RFC 2481, in which only 1662 one ECT codepoint was defined. We do not believe that the 1663 incremental deployment of ECN implementations that understand the 1664 ECT(1) codepoint will cause significant operational problems. This 1665 is particularly likely to be the case when the deployment of the 1666 ECT(1) codepoint begins with routers, before the ECT(1) codepoint 1667 starts to be used by end-nodes. 1669 11.2.1. The Incremental Deployment of ECT(1) in Routers. 1671 ECN has been an Experimental standard since January 1999, and there 1672 are already implementations of ECN in routers that do not understand 1673 the ECT(1) codepoint. When the use of the ECT(1) codepoint is 1674 standardized for TCP or for other transport protocols, this could 1675 mean that a data sender is using the ECT(1) codepoint, but that this 1676 codepoint is not understood by a congested router on the path. 1678 If allowed by the transport protocol, a data sender would be free not 1679 to make use of ECT(1) at all, and to send all ECN-capable packets 1680 with the codepoint ECT(0). However, if an ECN-capable sender is 1681 using ECT(1), and the congested router on the path did not understand 1682 the ECT(1) codepoint, then the router would end up marking some of 1683 the ECT(0) packets, and dropping some of the ECT(1) packets, as 1684 indications of congestion. Since TCP is required to react to both 1685 marked and dropped packets, this behavior of dropping packets that 1686 could have been marked poses no significant threat to the network, 1687 and is consistent with the overall approach to ECN that allows 1688 routers to determine when and whether to mark packets as they see fit 1689 (see Section 5). 1691 12. Summary of changes required in IP and TCP 1693 This document specified two bits in the IP header to be used for ECN. 1694 The not-ECT codepoint indicates that the transport protocol will 1695 ignore the CE codepoint. This is the default value for the ECN 1696 codepoint. The ECT codepoints indicate that the transport protocol 1697 is willing and able to participate in ECN. 1699 The router sets the CE codepoint to indicate congestion to the end 1700 nodes. The CE codepoint in a packet header MUST NOT be reset by a 1701 router. 1703 TCP requires three changes for ECN, a setup phase and two new flags 1704 in the TCP header. The ECN-Echo flag is used by the data receiver to 1705 inform the data sender of a received CE packet. The Congestion 1706 Window Reduced (CWR) flag is used by the data sender to inform the 1707 data receiver that the congestion window has been reduced. 1709 When ECN (Explicit Congestion Notification [RFC2481]) is used, it is 1710 required that congestion indications generated within an IP tunnel 1711 not be lost at the tunnel egress. We specified a minor modification 1712 to the IP protocol's handling of the ECN field during encapsulation 1713 and de-capsulation to allow flows that will undergo IP tunneling to 1714 use ECN. 1716 Two options for ECN in tunnels were specified: 1717 1) A limited-functionality option that does not use ECN inside the IP 1718 tunnel, by setting the ECN field in the outer header to not-ECT, and 1719 not altering the inner header at the time of decapsulation. 1720 2) The full-functionality option, which sets the ECN field in the 1721 outer header to either not-ECT or to one of the ECT codepoints, 1722 depending on the ECN field in the inner header. At decapsulation, if 1723 the CE codepoint is set in the outer header, and the inner header is 1724 set to one of the ECT codepoints, then the CE codepoint is copied to 1725 the inner header. 1727 All IP tunnels MUST implement one of the two alternative approaches 1728 described above. For IPsec tunnels, this document also defines an 1729 optional IPsec Security Association (SA) attribute that enables 1730 negotiation of ECN usage within IPsec tunnels and an optional field 1731 in the Security Association Database to indicate whether ECN is 1732 permitted in tunnel mode on a SA. The required changes to IPsec 1733 tunnels for ECN usage modify RFC 2401 [RFC2401], which defines the 1734 IPsec architecture and specifies some aspects of its implementation. 1735 The new IPsec SA attribute is in addition to those already defined in 1736 Section 4.5 of [RFC2407]. 1738 This document is intended to obsolete RFC 2481, "A Proposal to add 1739 Explicit Congestion Notification (ECN) to IP", which defined ECN as 1740 an Experimental Protocol for the Internet Community. The rest of 1741 this section describes the relationship between this document and its 1742 predecessor. 1744 RFC 2481 included a brief discussion of the use of ECN with 1745 encapsulated packets, and noted that for the IPsec specifications at 1746 the time (January 1999), flows could not safely use ECN if they were 1747 to traverse IPsec tunnels. RFC 2481 also described the changes that 1748 could be made to IPsec tunnel specifications to made them compatible 1749 with ECN. 1751 This document also incorporates work that was done after RFC 2481, 1752 First was to describe the changes to IPsec tunnels in detail, and 1753 extensively discuss the security implications of ECN (now included as 1754 Sections 18 and 19 of this document). Second was to extend the 1755 discussion of IPsec tunnels to include all IP tunnels. Because older 1756 IP tunnels are not compatible with a flow's use of ECN, the 1757 deployment of ECN in the Internet will create strong pressure for 1758 older IP tunnels to be updated to an ECN-compatible version, using 1759 either the limited-functionality or the full-functionality option. 1761 This document does not address the issue of including ECN in non-IP 1762 tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary 1763 document about adding ECN support to MPLS was not advanced. 1765 A third new piece of work after RFC2481 was to describe the ECN 1766 procedure with retransmitted data packets, that an ECT codepoint 1767 should not be set on retransmitted data packets. The motivation for 1768 this additional specification is to eliminate a possible avenue for 1769 denial-of-service attacks on an existing TCP connection. Some prior 1770 deployments of ECN-capable TCP might not conform to the (new) 1771 requirement not to set an ECT codepoint on retransmitted packets; we 1772 do not believe this will cause significant problems in practice. 1774 This document also expands slightly on the specification of the use 1775 of SYN packets for the negotiation of ECN. While some prior 1776 deployments of ECN-capable TCP might not conform to the requirements 1777 specified in this document, we do not believe that this will lead to 1778 any performance or compatibility problems for TCP connections with a 1779 combination of TCP implementations at the endpoints. 1781 This document also includes the specification of the ECT(1) 1782 codepoint, which may be used by TCP as part of the implementation of 1783 an ECN nonce. 1785 13. Conclusions 1787 Given the current effort to implement AQM, we believe this is the 1788 right time to deploy congestion avoidance mechanisms that do not 1789 depend on packet drops alone. With the increased deployment of 1790 applications and transports sensitive to the delay and loss of a 1791 single packet (e.g., realtime traffic, short web transfers), 1792 depending on packet loss as a normal congestion notification 1793 mechanism appears to be insufficient (or at the very least, non- 1794 optimal). 1796 We examined the consequence of modifications of the ECN field within 1797 the network, analyzing all the opportunities for an adversary to 1798 change the ECN field. In many cases, the change to the ECN field is 1799 no worse than dropping a packet. However, we noted that some changes 1800 have the more serious consequence of subverting end-to-end congestion 1801 control. However, we point out that even then the potential damage 1802 is limited, and is similar to the threat posed by end-systems 1803 intentionally failing to cooperate with end-to-end congestion 1804 control. 1806 14. Acknowledgements 1808 Many people have made contributions to this work and this document, 1809 including many that we have not managed to directly acknowledge in 1810 this document. In addition, we would like to thank Kenjiro Cho for 1811 the proposal for the TCP mechanism for negotiating ECN-Capability, 1812 Kevin Fall for the proposal of the CWR bit, Steve Blake for material 1813 on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for 1814 discussions of ECN issues, and Steve Bellovin, Jim Bound, Brian 1815 Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern 1816 Paxson for discussions of security issues. We also thank the 1817 Internet End-to-End Research Group for ongoing discussions of these 1818 issues. 1820 Email discussions with a number of people, including Alexey 1821 Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed 1822 the issues raised by non-conformant equipment in the Internet that 1823 does not respond to TCP SYN packets with the ECE and CWR flags set. 1824 We thank Mark Handley, Jitentra Padhye, and others for discussions on 1825 the TCP initialization procedures. 1827 The discussion of ECN and IP tunnel considerations draws heavily on 1828 related discussions and documents from the Differentiated Services 1829 Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, 1830 for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen 1831 for proposing modifications to RFC 2407 that improve the usability of 1832 negotiating the ECN Tunnel SA attribute. 1834 We thank David Wetherall, David Ely, and Neil Spring for the proposal 1835 for the ECN nonce. We also thank Stefan Savage for discussions on 1836 this issue. We thank Bob Briscoe and Jon Crowcroft for raising the 1837 issue of fragmentation in IP, on alternate semantics for the fourth 1838 ECN codepoint, and several other topics. We thank Richard Wendland 1839 for feedback on several issues in the draft. 1841 15. References 1843 [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, 1844 November 1998. 1846 [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1847 Levels", BCP 14, RFC 2119, March 1997. 1849 [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". 1850 Reference for informational purposes only. 1852 [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", 1853 RFC 2406, November 1998. 1855 [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways 1856 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 1857 N.4, August 1993, p. 397-413. 1859 [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM 1860 Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. 1862 [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", 1863 URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- 1864 ecn. Reference for informational purposes only. 1866 [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End 1867 Congestion Control in the Internet", IEEE/ACM Transactions on 1868 Networking, August 1999. 1870 [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", 1871 SIGCOMM '97, September 1997. 1873 [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing 1874 Encapsulation (GRE), RFC 1701, October 1994. 1876 [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. 1877 ACM SIGCOMM '88, pp. 314-329. 1879 [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance 1880 Algorithm", Message to end2end-interest mailing list, April 1990. URL 1881 "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". 1883 [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) 1884 benefits for TCP", Master's thesis, UCLA, 1998, URL 1885 "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz". 1887 [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. 1888 Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. 1890 [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven 1891 Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. 1893 [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, 1894 Requirements for Traffic Engineering Over MPLS, RFC 2702, September 1895 1999. 1897 [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. 1898 and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, 1899 July 1999. 1901 [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1902 1981. 1904 [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1905 September 1981. 1907 [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the 1908 Internet Checksum", RFC 1141, January 1990. 1910 [RFC1349] Almquist, P., "Type of Service in the Internet Protocol 1911 Suite", RFC 1349, July 1992. 1913 [RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC 1914 1455, May 1993. 1916 [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic 1917 Routing Encapsulation (GRE), RFC 1701, October 1994. 1919 [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic 1920 Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. 1922 [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October 1923 1996. 1925 [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate 1926 Requirement Levels, RFC 2119, March 1997. 1928 [RFC2309] Braden, B., et al., "Recommendations on Queue Management 1929 and Congestion Avoidance in the Internet", RFC 2309, April 1998. 1931 [RFC2401] S. Kent and R. Atkinson, Security Architecture for the 1932 Internet Protocol, RFC 2401, November 1998. 1934 [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation 1935 for ISAKMP, RFC 2407, November 1998. 1937 [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, 1938 Internet Security Association and Key Management Protocol (ISAKMP), 1939 RFC 2409, November 1998. 1941 [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), 1942 RFC 2409, November 1998. 1944 [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition 1945 of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 1946 Headers", RFC 2474, December 1998. 1948 [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. 1949 Weiss, An Architecture for Differentiated Services, RFC 2475, 1950 December 1998. 1952 [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit 1953 Congestion Notification (ECN) to IP, RFC 2481, January 1999. 1955 [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", 1956 RFC 2581, April 1999. 1958 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation 1959 of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, 1960 July 2000. 1962 [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, 1963 October 2000. 1965 [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For 1966 Values In the Internet Protocol and Related Headers", RFC 2780, March 1967 2000. 1969 [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for 1970 Congestion Avoidance in Computer Networks", ACM Transactions on 1971 Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. 1973 [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom 1974 Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM 1975 Computer Communications Review, October 1999. 1977 16. Security Considerations 1979 Security considerations have been discussed in Sections 7, 8, 18, and 1980 19. 1982 17. IPv4 Header Checksum Recalculation 1984 IPv4 header checksum recalculation is an issue with some high-end 1985 router architectures using an output-buffered switch, since most if 1986 not all of the header manipulation is performed on the input side of 1987 the switch, while the ECN decision would need to be made local to the 1988 output buffer. This is not an issue for IPv6, since there is no IPv6 1989 header checksum. The IPv4 TOS octet is the last byte of a 16-bit 1990 half-word. 1992 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 1993 checksum after the TTL field is decremented. The incremental 1994 updating of the IPv4 checksum after the CE codepoint was set would 1995 work as follows: Let HC be the original header checksum for an ECT(0) 1996 packet, and let HC' be the new header checksum after the CE checksum 1997 has been set. That is, the ECN field has changed from '10' to '11'. 1998 Then for header checksums calculated with one's complement 1999 subtraction, HC' would be recalculated as follows: 2001 HC' = { HC - 1 HC > 1 2002 { 0x0000 HC = 1 2004 For header checksums calculated on two's complement machines, HC' would 2005 be recalculated as follows after the CE bit was set: 2007 HC' = { HC - 1 HC > 0 2008 { 0xFFFE HC = 0 2010 A similar incremental updating of the IPv4 checksum can be carried out 2011 when the ECN field is changed from ECT(1) to CE, that is, from '01' to 2012 '11'. 2014 18. Possible Changes to the ECN Field in the Network 2016 This section discusses in detail possible changes to the ECN field in 2017 the network, such as falsely reporting congestion, disabling ECN- 2018 Capability for an individual packet, erasing the ECN congestion 2019 indication, or falsely indicating ECN-Capability. 2021 18.1. Possible Changes to the IP Header 2023 18.1.1. Erasing the Congestion Indication 2025 First, we consider the changes that a router could make that would 2026 result in effectively erasing the congestion indication after it had 2027 been set by a router upstream. The convention followed is: 2028 ECN codepoint of received packet -> ECN codepoint of packet 2029 transmitted. 2031 Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint 2032 effectively erases the congestion indication. However, with the use 2033 of two ECT codepoints, a router erasing the CE codepoint has no way 2034 to know whether the original ECT codepoint was ECT(0) or ECT(1). 2035 Thus, it is possible for the transport protocol to deploy mechanisms 2036 to detect such erasures of the CE codepoint. 2038 The consequence of the erasure of the CE codepoint for the upstream 2039 router is that there is a potential for congestion to build for a 2040 time, because the congestion indication does not reach the source. 2041 However, the packet would be received and acknowledged. 2043 The potential effect of erasing the congestion indication is complex, 2044 and is discussed in depth in Section 19 below. Note that the effect 2045 of erasing the congestion indication is different from dropping a 2046 packet in the network. When a data packet is dropped, the drop is 2047 detected by the TCP sender, and interpreted as an indication of 2048 congestion. Similarly, if a sufficient number of consecutive 2049 acknowledgement packets are dropped, causing the cumulative 2050 acknowledgement field not to be advanced at the sender, the sender is 2051 limited by the congestion window from sending additional packets, and 2052 ultimately the retransmit timer expires. 2054 In contrast, a systematic erasure of the CE bit by a downstream 2055 router can have the effect of causing a queue buildup at an upstream 2056 router, including the possible loss of packets due to buffer 2057 overflow. There is a potential of unfairness in that another flow 2058 that goes through the congested router could react to the CE bit set 2059 while the flow that has the CE bit erased could see better 2060 performance. The limitations on this potential unfairness are 2061 discussed in more detail in Section 19 below. 2063 The last of the three changes is to replace the CE codepoint with the 2064 not-ECT codepoint. thus erasing the congestion indication and 2065 disabling ECN-Capability at the same time. 2067 The `erasure' of the congestion indication is only effective if the 2068 packet does not end up being marked or dropped again by a downstream 2069 router. If the CE codepoint is replaced by an ECT codepoint, the 2070 packet remains ECN-Capable, and could be either marked or dropped by 2071 a downstream router as an indication of congestion. If the CE 2072 codepoint is replaced by the not-ECT codepoint, the packet is no 2073 longer ECN-capable, and can therefore be dropped but not marked by a 2074 downstream router as an indication of congestion. 2076 18.1.2. Falsely Reporting Congestion 2078 This change is to set the CE codepoint when an ECT codepoint was 2079 already set, even though there was no congestion. This change does 2080 not affect the treatment of that packet along the rest of the path. 2081 In particular, a router does not examine the CE codepoint in deciding 2082 whether to drop or mark an arriving packet. 2084 However, this could result in the application unnecessarily invoking 2085 end-to-end congestion control, and reducing its arrival rate. By 2086 itself, this is no worse (for the application or for the network) 2087 than if the tampering router had actually dropped the packet. 2089 18.1.3. Disabling ECN-Capability 2091 This change is to turn off the ECT codepoint of a packet. This means 2092 that if the packet later encounters congestion (e.g., by arriving to 2093 a RED queue with a moderate average queue size), it will be dropped 2094 instead of being marked. By itself, this is no worse (for the 2095 application) than if the tampering router had actually dropped the 2096 packet. The saving grace in this particular case is that there is no 2097 congested router upstream expecting a reaction from setting the CE 2098 bit. 2100 18.1.4. Falsely Indicating ECN-Capability 2101 This change would incorrectly label a packet as ECN-Capable. The 2102 packet may have been sent either by an ECN-Capable transport or a 2103 transport that is not ECN-Capable. 2105 If the packet later encounters moderate congestion at an ECN-Capable 2106 router, the router could set the CE codepoint instead of dropping the 2107 packet. If the transport protocol in fact is not ECN-Capable, then 2108 the transport will never receive this indication of congestion, and 2109 will not reduce its sending rate in response. The potential 2110 consequences of falsely indicating ECN-capability are discussed 2111 further in Section 19 below. 2113 If the packet never later encounters congestion at an ECN-Capable 2114 router, then the first of these two changes would have no effect, 2115 other than possibly interfering with the use of the ECN nonce by the 2116 transport protocol. The last change, however, would have the effect 2117 of giving false reports of congestion to a monitoring device along 2118 the path. If the transport protocol is ECN-Capable, then this change 2119 could also have an effect at the transport level, by combining 2120 falsely indicating ECN-Capability with falsely reporting congestion. 2121 For an ECN-capable transport, this would cause the transport to 2122 unnecessarily react to congestion. In this particular case, the 2123 router that is incorrectly changing the ECN field could have dropped 2124 the packet. Thus for this case of an ECN-capable transport, the 2125 consequence of this change to the ECN field is no worse than dropping 2126 the packet. 2128 18.2. Information carried in the Transport Header 2130 For TCP, an ECN-capable TCP receiver informs its TCP peer that it is 2131 ECN-capable at the TCP level, conveying this information in the TCP 2132 header at the time the connection is setup. This document does not 2133 consider potential dangers introduced by changes in the transport 2134 header within the network. In the case of IPsec tunnels, the IPsec 2135 tunnel protects the transport header. 2137 Another issue concerns TCP packets with a spoofed IP source address 2138 carrying invalid ECN information in the transport header. For 2139 completeness, we examine here some possible ways that a node spoofing 2140 the IP source address of another node could use the two ECN flags in 2141 the TCP header to launch a denial-of-service attack. However, these 2142 attacks would require an ability for the attacker to use valid TCP 2143 sequence numbers, and any attacker with this ability and with the 2144 ability to spoof IP source addresses could damage the TCP connection 2145 without using the ECN flags. Therefore, ECN does not add any new 2146 vulnerabilities in this respect. 2148 An acknowledgement packet with a spoofed IP source address of the TCP 2149 data receiver could include the ECE bit set. If accepted by the TCP 2150 data sender as a valid packet, this spoofed acknowledgement packet 2151 could result in the TCP data sender unnecessarily halving its 2152 congestion window. However, to be accepted by the data sender, such 2153 a spoofed acknowledgement packet would have to have the correct 2154 32-bit sequence number as well as a valid acknowledgement number. An 2155 attacker that could successfully send such a spoofed acknowledgement 2156 packet could also send a spoofed RST packet, or do other equally 2157 damaging operations to the TCP connection. 2159 Packets with a spoofed IP source address of the TCP data sender could 2160 include the CWR bit set. Again, to be accepted, such a packet would 2161 have to have a valid sequence number. In addition, such a spoofed 2162 packet would have a limited performance impact. Spoofing a data 2163 packet with the CWR bit set could result in the TCP data receiver 2164 sending fewer ECE packets than it would otherwise, if the data 2165 receiver was sending ECE packets when it received the spoofed CWR 2166 packet. 2168 18.3. Split Paths 2170 In some cases, a malicious or broken router might have access to only 2171 a subset of the packets from a flow. The question is as follows: 2172 can this router, by altering the ECN field in this subset of the 2173 packets, do more damage to that flow than if it had simply dropped 2174 that set of packets? 2176 We will classify the packets in the flow as A packets and B packets, 2177 and assume that the adversary only has access to A packets. Assume 2178 that the adversary is subverting end-to-end congestion control along 2179 the path traveled by A packets only, by either falsely indicating 2180 ECN-Capability upstream of the point where congestion occurs, or 2181 erasing the congestion indication downstream. Consider also that 2182 there exists a monitoring device that sees both the A and B packets, 2183 and will "punish" both the A and B packets if the total flow is 2184 determined not to be properly responding to indications of 2185 congestion. Another key characteristic that we believe is likely to 2186 be true is that the monitoring device, before `punishing' the A&B 2187 flow, will first drop packets instead of setting the CE codepoint, 2188 and will drop arriving packets of that flow that already have the CE 2189 codepoint set. If the end nodes are in fact using end-to-end 2190 congestion control, they will see all of the indications of 2191 congestion seen by the monitoring device, and will begin to respond 2192 to these indications of congestion. Thus, the monitoring device is 2193 successful in providing the indications to the flow at an early 2194 stage. 2196 It is true that the adversary that has access only to the A packets 2197 might, by subverting ECN-based congestion control, be able to deny 2198 the benefits of ECN to the other packets in the A&B aggregate. While 2199 this is unfortunate, this is not a reason to disable ECN within an 2200 IPsec tunnel. 2202 A variant of falsely reporting congestion occurs when there are two 2203 adversaries along a path, where the first adversary falsely reports 2204 congestion, and the second adversary `erases' those reports. (Unlike 2205 packet drops, ECN congestion reports can be `reversed' later in the 2206 network by a malicious or broken router. However, the use of the ECN 2207 nonce could help the transport to detect this behavior.) While this 2208 would be transparent to the end node, it is possible that a 2209 monitoring device between the first and second adversaries would see 2210 the false indications of congestion. Keep in mind our recommendation 2211 in this document, that before `punishing' a flow for not responding 2212 appropriately to congestion, the router will first switch to dropping 2213 rather than marking as an indication of congestion, for that flow. 2214 When this includes dropping arriving packets from that flow that have 2215 the CE codepoint set, this ensures that these indications of 2216 congestion are being seen by the end nodes. Thus, there is no 2217 additional harm that we are able to postulate as a result of multiple 2218 conflicting adversaries. 2220 19. Implications of Subverting End-to-End Congestion Control 2222 This section focuses on the potential repercussions of subverting 2223 end-to-end congestion control by either falsely indicating ECN- 2224 Capability, or by erasing the congestion indication in ECN (the CE 2225 codepoint). Subverting end-to-end congestion control by either of 2226 these two methods can have consequences both for the application and 2227 for the network. We discuss these separately below. 2229 The first method to subvert end-to-end congestion control, that of 2230 falsely indicating ECN-Capability, effectively subverts end-to-end 2231 congestion control only if the packet later encounters congestion 2232 that results in the setting of the CE codepoint. In this case, the 2233 transport protocol (which may not be ECN-capable) does not receive 2234 the indication of congestion from these downstream congested routers. 2236 The second method to subvert end-to-end congestion control, `erasing' 2237 the CE codepoint in a packet, effectively subverts end-to-end 2238 congestion control only when the CE codepoint in the packet was set 2239 earlier by a congested router. In this case, the transport protocol 2240 does not receive the indication of congestion from the upstream 2241 congested routers. 2243 Either of these two methods of subverting end-to-end congestion 2244 control can potentially introduce more damage to the network (and 2245 possibly to the flow itself) than if the adversary had simply dropped 2246 packets from that flow. However, as we discuss later in this section 2247 and in Section 7, this potential damage is limited. 2249 19.1. Implications for the Network and for Competing Flows 2251 The CE codepoint of the ECN field is only used by routers as an 2252 indication of congestion during periods of *moderate* congestion. 2253 ECN-capable routers should drop rather than mark packets during heavy 2254 congestion even if the router's queue is not yet full. For example, 2255 for routers using active queue management based on RED, the router 2256 should drop rather than mark packets that arrive while the average 2257 queue sizes exceed the RED queue's maximum threshold. 2259 One consequence for the network of subverting end-to-end congestion 2260 control is that flows that do not receive the congestion indications 2261 from the network might increase their sending rate until they drive 2262 the network into heavier congestion. Then, the congested router 2263 could begin to drop rather than mark arriving packets. For flows 2264 that are not isolated by some form of per-flow scheduling or other 2265 per-flow mechanisms, but are instead aggregated with other flows in a 2266 single queue in an undifferentiated fashion, this packet-dropping at 2267 the congested router would apply to all flows that share that queue. 2268 Thus, the consequences would be to increase the level of congestion 2269 in the network. 2271 In some cases, the increase in the level of congestion will lead to a 2272 substantial buffer buildup at the congested queue that will be 2273 sufficient to drive the congested queue from the packet-marking to 2274 the packet-dropping regime. This transition could occur either 2275 because of buffer overflow, or because of the active queue management 2276 policy described above that drops packets when the average queue is 2277 above RED's maximum threshold. At this point, all flows, including 2278 the subverted flow, will begin to see packet drops instead of packet 2279 marks, and a malicious or broken router will no longer be able to 2280 `erase' these indications of congestion in the network. If the end 2281 nodes are deploying appropriate end-to-end congestion control, then 2282 the subverted flow will reduce its arrival rate in response to 2283 congestion. When the level of congestion is sufficiently reduced, 2284 the congested queue can return from the packet-dropping regime to the 2285 packet-marking regime. The steady-state pattern could be one of the 2286 congested queue oscillating between these two regimes. 2288 In other cases, the consequences of subverting end-to-end congestion 2289 control will not be severe enough to drive the congested link into 2290 sufficiently-heavy congestion that packets are dropped instead of 2291 being marked. In this case, the implications for competing flows in 2292 the network will be a slightly-increased rate of packet marking or 2293 dropping, and a corresponding decrease in the bandwidth available to 2294 those flows. This can be a stable state if the arrival rate of the 2295 subverted flow is sufficiently small, relative to the link bandwidth, 2296 that the average queue size at the congested router remains under 2297 control. In particular, the subverted flow could have a limited 2298 bandwidth demand on the link at this router, while still getting more 2299 than its "fair" share of the link. This limited demand could be due 2300 to a limited demand from the data source; a limitation from the TCP 2301 advertised window; a lower-bandwidth access pipe; or other factors. 2302 Thus the subversion of ECN-based congestion control can still lead to 2303 unfairness, which we believe is appropriate to note here. 2305 The threat to the network posed by the subversion of ECN-based 2306 congestion control in the network is essentially the same as the 2307 threat posed by an end-system that intentionally fails to cooperate 2308 with end-to-end congestion control. The deployment of mechanisms in 2309 routers to address this threat is an open research question, and is 2310 discussed further in Section 10. 2312 Let us take the example described in Section 18.1.1, where the CE 2313 codepoint that was set in a packet is erased: {'11' -> '10' or '11' 2314 -> '01'}. The consequence for the congested upstream router that set 2315 the CE codepoint is that this congestion indication does not reach 2316 the end nodes for that flow. The source (even one which is completely 2317 cooperative and not malicious) is thus allowed to continue to 2318 increase its sending rate (if it is a TCP flow, by increasing its 2319 congestion window). The flow potentially achieves better throughput 2320 than the other flows that also share the congested router, especially 2321 if there are no policing mechanisms or per-flow queueing mechanisms 2322 at that router. Consider the behavior of the other flows, especially 2323 if they are cooperative: that is, the flows that do not experience 2324 subverted end-to-end congestion control. They are likely to reduce 2325 their load (e.g., by reducing their window size) on the congested 2326 router, thus benefiting our subverted flow. This results in 2327 unfairness. As we discussed above, this unfairness could either be 2328 transient (because the congested queue is driven into the packet- 2329 marking regime), oscillatory (because the congested queue oscillates 2330 between the packet marking and the packet dropping regime), or more 2331 moderate but a persistent stable state (because the congested queue 2332 is never driven to the packet dropping regime). 2334 The results would be similar if the subverted flow was intentionally 2335 avoiding end-to-end congestion control. One difference is that a 2336 flow that is intentionally avoiding end-to-end congestion control at 2337 the end nodes can avoid end-to-end congestion control even when the 2338 congested queue is in packet-dropping mode, by refusing to reduce its 2339 sending rate in response to packet drops in the network. Thus the 2340 problems for the network from the subversion of ECN-based congestion 2341 control are less severe than the problems caused by the intentional 2342 avoidance of end-to-end congestion control in the end nodes. It is 2343 also the case that it is considerably more difficult to control the 2344 behavior of the end nodes than it is to control the behavior of the 2345 infrastructure itself. This is not to say that the problems for the 2346 network posed by the network's subversion of ECN-based congestion 2347 control are small; just that they are dwarfed by the problems for the 2348 network posed by the subversion of either ECN-based or other 2349 currently known packet-based congestion control mechanisms by the end 2350 nodes. 2352 19.2. Implications for the Subverted Flow 2354 When a source indicates that it is ECN-capable, there is an 2355 expectation that the routers in the network that are capable of 2356 participating in ECN will use the CE codepoint for indication of 2357 congestion. There is the potential benefit of using ECN in reducing 2358 the amount of packet loss (in addition to the reduced queueing delays 2359 because of active queue management policies). When the packet flows 2360 through a tunnel where the nodes that the tunneled packets traverse 2361 are untrusted in some way, the expectation is that IPsec will protect 2362 the flow from subversion that results in undesirable consequences. 2364 In many cases, a subverted flow will benefit from the subversion of 2365 end-to-end congestion control for that flow in the network, by 2366 receiving more bandwidth than it would have otherwise, relative to 2367 competing non-subverted flows. If the congested queue reaches the 2368 packet-dropping stage, then the subversion of end-to-end congestion 2369 control might or might not be of overall benefit to the subverted 2370 flow, depending on that flow's relative tradeoffs between throughput, 2371 loss, and delay. 2373 One form of subverting end-to-end congestion control is to falsely 2374 indicate ECN-capability by setting the ECT codepoint. This has the 2375 consequence of downstream congested routers setting the CE codepoint 2376 in vain. However, as described in Section 9.1.2, if an ECT codepoint 2377 is changed in an IP tunnel, this can be detected at the egress point 2378 of the tunnel, as long as the inner header was not changed within the 2379 tunnel. 2381 The second form of subverting end-to-end congestion control is to 2382 erase the congestion indication by erasing the CE codepoint. In this 2383 case, it is the upstream congested routers that set the CE codepoint 2384 in vain. 2386 If an ECT codepoint is erased within an IP tunnel, then this can be 2387 detected at the egress point of the tunnel, as long as the inner 2388 header was not changed within the tunnel. If the CE codepoint is set 2389 upstream of the IP tunnel, then any erasure of the outer header's CE 2390 codepoint within the tunnel will have no effect because the inner 2391 header preserves the set value of the CE codepoint. However, if the 2392 CE codepoint is set within the tunnel, and erased either within or 2393 downstream of the tunnel, this is not necessarily detected at the 2394 egress point of the tunnel. 2396 With this subversion of end-to-end congestion control, an end-system 2397 transport does not respond to the congestion indication. Along with 2398 the increased unfairness for the non-subverted flows described in the 2399 previous section, the congested router's queue could continue to 2400 build, resulting in packet loss at the congested router - which is a 2401 means for indicating congestion to the transport in any case. In the 2402 interim, the flow might experience higher queueing delays, possibly 2403 along with an increased bandwidth relative to other non-subverted 2404 flows. But transports do not inherently make assumptions of 2405 consistently experiencing carefully managed queueing in the path. We 2406 believe that these forms of subverting end-to-end congestion control 2407 are no worse for the subverted flow than if the adversary had simply 2408 dropped the packets of that flow itself. 2410 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 2412 We have shown that, in many cases, a malicious or broken router that 2413 is able to change the bits in the ECN field can do no more damage 2414 than if it had simply dropped the packet in question. However, this 2415 is not true in all cases, in particular in the cases where the broken 2416 router subverted end-to-end congestion control by either falsely 2417 indicating ECN-Capability or by erasing the ECN congestion indication 2418 (in the CE codepoint). While there are many ways that a router can 2419 harm a flow by dropping packets, a router cannot subvert end-to-end 2420 congestion control by dropping packets. As an example, a router 2421 cannot subvert TCP congestion control by dropping data packets, 2422 acknowledgement packets, or control packets. 2424 Even though packet-dropping cannot be used to subvert end-to-end 2425 congestion control, there *are* non-ECN-based methods for subverting 2426 end-to-end congestion control that a broken or malicious router could 2427 use. For example, a broken router could duplicate data packets, thus 2428 effectively negating the effects of end-to-end congestion control 2429 along some portion of the path. (For a router that duplicated 2430 packets within an IPsec tunnel, the security administrator can cause 2431 the duplicate packets to be discarded by configuring anti-replay 2432 protection for the tunnel.) This duplication of packets within the 2433 network would have similar implications for the network and for the 2434 subverted flow as those described in Sections 18.1.1 and 18.1.4 2435 above. 2437 20. The Motivation for the ECT Codepoints. 2439 20.1. The Motivation for an ECT Codepoint. 2441 The need for an ECT codepoint is motivated by the fact that ECN will 2442 be deployed incrementally in an Internet where some transport 2443 protocols and routers understand ECN and some do not. With an ECT 2444 codepoint, the router can drop packets from flows that are not ECN- 2445 capable, but can *instead* set the CE codepoint in packets that *are* 2446 ECN-capable. Because an ECT codepoint allows an end node to have the 2447 CE codepoint set in a packet *instead* of having the packet dropped, 2448 an end node might have some incentive to deploy ECN. 2450 If there was no ECT codepoint, then the router would have to set the 2451 CE codepoint for packets from both ECN-capable and non-ECN-capable 2452 flows. In this case, there would be no incentive for end-nodes to 2453 deploy ECN, and no viable path of incremental deployment from a non- 2454 ECN world to an ECN-capable world. Consider the first stages of such 2455 an incremental deployment, where a subset of the flows are ECN- 2456 capable. At the onset of congestion, when the packet 2457 dropping/marking rate would be low, routers would only set CE 2458 codepoints, rather than dropping packets. However, only those flows 2459 that are ECN-capable would understand and respond to CE packets. The 2460 result is that the ECN-capable flows would back off, and the non-ECN- 2461 capable flows would be unaware of the ECN signals and would continue 2462 to open their congestion windows. 2464 In this case, there are two possible outcomes: (1) the ECN-capable 2465 flows back off, the non-ECN-capable flows get all of the bandwidth, 2466 and congestion remains mild, or (2) the ECN-capable flows back off, 2467 the non-ECN-capable flows don't, and congestion increases until the 2468 router transitions from setting the CE codepoint to dropping packets. 2469 While this second outcome evens out the fairness, the ECN-capable 2470 flows would still receive little benefit from being ECN-capable, 2471 because the increased congestion would drive the router to packet- 2472 dropping behavior. 2474 A flow that advertised itself as ECN-Capable but does not respond to 2475 CE codepoints is functionally equivalent to a flow that turns off 2476 congestion control, as discussed earlier in this document. 2478 Thus, in a world when a subset of the flows are ECN-capable, but 2479 where ECN-capable flows have no mechanism for indicating that fact to 2480 the routers, there would be less effective and less fair congestion 2481 control in the Internet, resulting in a strong incentive for end 2482 nodes not to deploy ECN. 2484 20.2. The Motivation for two ECT Codepoints. 2486 The primary motivation for the two ECT codepoints is to provide a 2487 one-bit ECN nonce. The ECN nonce allows the development of 2488 mechanisms for the sender to probabilistically verify that network 2489 elements are not erasing the CE codepoint, and that data receivers 2490 are properly reporting to the sender the receipt of packets with the 2491 CE codepoint set. 2493 Another possibility for senders to detect misbehaving network 2494 elements or receivers would be for the data sender to occasionally 2495 send a data packet with the CE codepoint set, to see if the receiver 2496 reports receiving the CE codepoint. Of course, if these packets 2497 encountered congestion in the network, the router might make no 2498 change in the packets, because the CE codepoint would already be set. 2499 Thus, for packets sent with the CE codepoint set, the TCP end-nodes 2500 could not determine if some router intended to set the CE codepoint 2501 in these packets. For this reason, sending packets with the CE 2502 codepoint would have to be done sparingly, and would be a less 2503 effective check against misbehaving network elements and receivers 2504 than would be the ECN nonce. 2506 The assignment of the fourth ECN codepoint to ECT(1) precludes the 2507 use of this codepoint for other purposes. For clarity, we briefly 2508 list those possible purposes here. 2510 One possibility might have been for the data sender to use the fourth 2511 ECN codepoint to indicate an alternate semantics for ECN. However, 2512 this seems to us more appropriate to be signalled using a 2513 differentiated services codepoint in the DS field. 2515 A second possible use for the fourth ECN codepoint would have been to 2516 give the router two separate codepoints for the indication of 2517 congestion, CE(0) and CE(1), for mild and severe congestion 2518 respectively. While this could be useful in some cases, this 2519 certainly does not seem a compelling requirement at this point. If 2520 there was judged to be a compelling need for this, the complications 2521 of incremental deployment would most likely necessitate more that 2522 just one codepoint for this function. 2524 A third use that has been informally proposed for the ECN codepoint 2525 is for use in some forms of multicast congestion control, based on 2526 randomized procedures for duplicating marked packets at routers. 2527 Some proposed multicast packet duplication procedures are based on a 2528 new ECN codepoint that (1) conveys the fact that congestion occurred 2529 upstream of the duplication point that marked the packet with this 2530 codepoint and (2) can detect congestion downstream of that 2531 duplication point. ECT(1) can serve this purpose because it is both 2532 distinct from ECT(0) and is replaced by CE when ECN marking occurs in 2533 response to congestion or incipient congestion. Explanation of how 2534 this enhanced version of ECN would be used by multicast congestion 2535 control is beyond the scope of this document, as are ECN-aware 2536 multicast packet duplication procedures and the processing of the ECN 2537 field at multicast receivers in all cases (i.e., irrespective of the 2538 multicast packet duplication procedure(s) used). 2540 The specification of IP tunnel modifications for ECN in this document 2541 assumes that the only change made to the outer IP header's ECN field 2542 between tunnel endpoints is to set the CE codepoint to indicate 2543 congestion. This is not consistent with some of the proposed uses of 2544 ECT(1) by the multicast duplication procedures in the previous 2545 paragraph, and such procedures SHOULD NOT be deployed within tunnels 2546 configured for full ECN functionality. Limited ECN functionality may 2547 be used instead, although in practice many tunnel protocols 2548 (including IPsec) will not work correctly if multicast traffic 2549 duplication occurs within the tunnel 2551 21. Why use Two Bits in the IP Header? 2553 Given the need for an ECT indication in the IP header, there still 2554 remains the question of whether the ECT (ECN-Capable Transport) and 2555 CE (Congestion Experienced) codepoints should have been overloaded on 2556 a single bit. This overloaded-one-bit alternative, explored in 2557 [Floyd94], would have involved a single bit with two values. One 2558 value, "ECT and not CE", would represent an ECN-Capable Transport, 2559 and the other value, "CE or not ECT", would represent either 2560 Congestion Experienced or a non-ECN-Capable transport. 2562 One difference between the one-bit and two-bit implementations 2563 concerns packets that traverse multiple congested routers. Consider 2564 a CE packet that arrives at a second congested router, and is 2565 selected by the active queue management at that router for either 2566 marking or dropping. In the one-bit implementation, the second 2567 congested router has no choice but to drop the CE packet, because it 2568 cannot distinguish between a CE packet and a non-ECT packet. In the 2569 two-bit implementation, the second congested router has the choice of 2570 either dropping the CE packet, or of leaving it alone with the CE 2571 codepoint set. 2573 Another difference between the one-bit and two-bit implementations 2574 comes from the fact that with the one-bit implementation, receivers 2575 in a single flow cannot distinguish between CE and non-ECT packets. 2576 Thus, in the one-bit implementation an ECN-capable data sender would 2577 have to unambiguously indicate to the receiver or receivers whether 2578 each packet had been sent as ECN-Capable or as non-ECN-Capable. One 2579 possibility would be for the sender to indicate in the transport 2580 header whether the packet was sent as ECN-Capable. A second 2581 possibility that would involve a functional limitation for the one- 2582 bit implementation would be for the sender to unambiguously indicate 2583 that it was going to send *all* of its packets as ECN-Capable or as 2584 non-ECN-Capable. For a multicast transport protocol, this 2585 unambiguous indication would have to be apparent to receivers joining 2586 an on-going multicast session. 2588 Another concern that was described earlier (and recommended in this 2589 document) is that transports (particularly TCP) should not mark pure 2590 ACK packets or retransmitted packets as being ECN-Capable. A pure 2591 ACK packet from a non-ECN-capable transport could be dropped, without 2592 necessarily having an impact on the transport from a congestion 2593 control perspective (because subsequent ACKs are cumulative). An 2594 ECN-capable transport reacting to the CE codepoint in a pure ACK 2595 packet by reducing the window would be at a disadvantage in 2596 comparison to a non-ECN-capable transport. For this reason (and for 2597 reasons described earlier in relation to retransmitted packets), it 2598 is desirable to have the ECT codepoint set on a per-packet basis. 2600 Another advantage of the two-bit approach is that it is somewhat more 2601 robust. The most critical issue, discussed in Section 8, is that the 2602 default indication should be that of a non-ECN-Capable transport. In 2603 a two-bit implementation, this requirement for the default value 2604 simply means that the non-ECT codepoint should be the default. In 2605 the one-bit implementation, this means that the single overloaded bit 2606 should by default be in the "CE or not ECT" position. This is less 2607 clear and straightforward, and possibly more open to incorrect 2608 implementations either in the end nodes or in the routers. 2610 In summary, while the one-bit implementation could be a possible 2611 implementation, it has the following significant limitations relative 2612 to the two-bit implementation. First, the one-bit implementation has 2613 more limited functionality for the treatment of CE packets at a 2614 second congested router. Second, the one-bit implementation requires 2615 either that extra information be carried in the transport header of 2616 packets from ECN-Capable flows (to convey the functionality of the 2617 second bit elsewhere, namely in the transport header), or that 2618 senders in ECN-Capable flows accept the limitation that receivers 2619 must be able to determine a priori which packets are ECN-Capable and 2620 which are not ECN-Capable. Third, the one-bit implementation is 2621 possibly more open to errors from faulty implementations that choose 2622 the wrong default value for the ECN bit. We believe that the use of 2623 the extra bit in the IP header for the ECT-bit is extremely valuable 2624 to overcome these limitations. 2626 22. Historical Definitions for the IPv4 TOS Octet 2628 RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP 2629 header. In RFC 791, bits 6 and 7 of the ToS octet are listed as 2630 "Reserved for Future Use", and are shown set to zero. The first two 2631 fields of the ToS octet were defined as the Precedence and Type of 2632 Service (TOS) fields. 2634 0 1 2 3 4 5 6 7 2635 +-----+-----+-----+-----+-----+-----+-----+-----+ 2636 | PRECEDENCE | TOS | 0 | 0 | RFC 791 2637 +-----+-----+-----+-----+-----+-----+-----+-----+ 2639 RFC 1122 included bits 6 and 7 in the TOS field, though it did not 2640 discuss any specific use for those two bits: 2642 0 1 2 3 4 5 6 7 2643 +-----+-----+-----+-----+-----+-----+-----+-----+ 2644 | PRECEDENCE | TOS | RFC 1122 2645 +-----+-----+-----+-----+-----+-----+-----+-----+ 2647 The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: 2649 0 1 2 3 4 5 6 7 2650 +-----+-----+-----+-----+-----+-----+-----+-----+ 2651 | PRECEDENCE | TOS | MBZ | RFC 1349 2652 +-----+-----+-----+-----+-----+-----+-----+-----+ 2654 Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary 2655 Cost". In addition to the Precedence and Type of Service (TOS) 2656 fields, the last field, MBZ (for "must be zero") was defined as 2657 currently unused. RFC 1349 stated that "The originator of a datagram 2658 sets [the MBZ] field to zero (unless participating in an Internet 2659 protocol experiment which makes use of that bit)." 2661 RFC 1455 [RFC 1455] defined an experimental standard that used all 2662 four bits in the TOS field to request a guaranteed level of link 2663 security. 2665 RFC 1349 and RFC 1455 have been obsoleted by "Definition of the 2666 Differentiated Services Field (DS Field) in the IPv4 and IPv6 2667 Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed 2668 as Currently Unused (CU). RFC 2780 [RFC2780] specified ECN as an 2669 experimental use of the two-bit CU field. RFC 2780 updated the 2670 definition of the DS Field to only encompass the first six bits of 2671 this octet rather than all eight bits; these first six bits are 2672 defined as the Differentiated Services CodePoint (DSCP): 2674 0 1 2 3 4 5 6 7 2675 +-----+-----+-----+-----+-----+-----+-----+-----+ 2676 | DSCP | CU | RFCs 2474, 2677 2780 2678 +-----+-----+-----+-----+-----+-----+-----+-----+ 2680 Because of this unstable history, the definition of the ECN field in 2681 this document cannot be guaranteed to be backwards compatible with 2682 all past uses of these two bits. 2684 Prior to RFC 2474, routers were not permitted to modify bits in 2685 either the DSCP or ECN field of packets forwarded through them, and 2686 hence routers that comply only with RFCs prior to 2474 should have no 2687 effect on ECN. For end nodes, bit 7 (the second ECN bit) must be 2688 transmitted as zero for any implementation compliant only with RFCs 2689 prior to 2474. Such nodes may transmit bit 6 (the first ECN bit) as 2690 one for the "Minimize Monetary Cost" provision of RFC 1349 or the 2691 experiment authorized by RFC 1455; neither this aspect of RFC 1349 2692 nor the experiment in RFC 1455 were widely implemented or used. The 2693 damage that could be done by a broken, non-conformant router would 2694 include "erasing" the CE codepoint for an ECN-capable packet that 2695 arrived at the router with the CE codepoint set, or setting the CE 2696 codepoint even in the absence of congestion. This has been discussed 2697 in the section on "Non-compliance in the Network". 2699 The damage that could be done in an ECN-capable environment by a non- 2700 ECN-capable end-node transmitting packets with the ECT codepoint set 2701 has been discussed in the section on "Non-compliance by the End 2702 Nodes". 2704 23. IANA Considerations 2706 The codepoints for the ECN Field of the IP header and the bits for 2707 CWR and ECE in the TCP header are specified by the Standards Action 2708 of this RFC, as is required by RFC 2780. 2710 IANA allocated the IPSEC Security Association Attribute value 10 for 2711 the ECN Tunnel use described in Section 9.2.1.2 above at the request 2712 of David Black in November 1999. If this draft is approved for 2713 publication as an RFC, IANA should change the Reference for this 2714 allocation from David Black's request to this RFC based on its RFC 2715 number. 2717 AUTHORS' ADDRESSES 2719 K. K. Ramakrishnan 2720 TeraOptic Networks, Inc. 2721 Phone: +1 (408) 666-8650 2722 Email: kk@teraoptic.com 2724 Sally Floyd 2725 Phone: +1 (510) 666-2989 2726 ACIRI 2727 Email: floyd@aciri.org 2728 URL: http://www.aciri.org/floyd/ 2730 David L. Black 2731 EMC Corporation 2732 42 South St. 2733 Hopkinton, MA 01748 2734 Phone: +1 (508) 435-1000 x75140 2735 Email: black_david@emc.com 2737 This draft was created in February 2001. 2738 It expires August 2001.