idnits 2.17.00 (12 Aug 2021) /tmp/idnits30796/draft-ietf-ipsecme-iptfs-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1314 has weird spacing: '...4 any any...' == Line 1330 has weird spacing: '...4 any any...' -- The document date (November 8, 2021) is 187 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '--800--' is mentioned on line 1134, but not defined -- Looks like a reference, but probably isn't: '60' on line 1134 == Missing Reference: '-240-' is mentioned on line 1134, but not defined == Missing Reference: '--4000----------------------' is mentioned on line 1134, but not defined Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Hopps 3 Internet-Draft LabN Consulting, L.L.C. 4 Intended status: Standards Track November 8, 2021 5 Expires: May 12, 2022 7 IP-TFS: Aggregation and Fragmentation Mode for ESP and its Use for IP 8 Traffic Flow Security 9 draft-ietf-ipsecme-iptfs-12 11 Abstract 13 This document describes a mechanism for aggregation and fragmentation 14 of IP packets when they are being encapsulated in ESP payload. This 15 new payload type can be used for various purposes such as decreasing 16 encapsulation overhead for small IP packets; however, the focus in 17 this document is to enhance IPsec traffic flow security (IP-TFS) by 18 adding Traffic Flow Confidentiality (TFC) to encrypted IP 19 encapsulated traffic. TFC is provided by obscuring the size and 20 frequency of IP traffic using a fixed-sized, constant-send-rate IPsec 21 tunnel. The solution allows for congestion control as well as non- 22 constant send-rate usage. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at https://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on May 12, 2022. 41 Copyright Notice 43 Copyright (c) 2021 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (https://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. Terminology & Concepts . . . . . . . . . . . . . . . . . 4 60 2. The AGGFRAG Tunnel . . . . . . . . . . . . . . . . . . . . . 4 61 2.1. Tunnel Content . . . . . . . . . . . . . . . . . . . . . 4 62 2.2. Payload Content . . . . . . . . . . . . . . . . . . . . . 5 63 2.2.1. Data Blocks . . . . . . . . . . . . . . . . . . . . . 6 64 2.2.2. End Padding . . . . . . . . . . . . . . . . . . . . . 6 65 2.2.3. Fragmentation, Sequence Numbers and All-Pad Payloads 6 66 2.2.4. Empty Payload . . . . . . . . . . . . . . . . . . . . 8 67 2.2.5. IP Header Value Mapping . . . . . . . . . . . . . . . 8 68 2.2.6. IP Time-To-Live (TTL) and Tunnel errors . . . . . . . 9 69 2.2.7. Effective MTU of the Tunnel . . . . . . . . . . . . . 9 70 2.3. Exclusive SA Use . . . . . . . . . . . . . . . . . . . . 10 71 2.4. Modes of Operation . . . . . . . . . . . . . . . . . . . 10 72 2.4.1. Non-Congestion Controlled Mode . . . . . . . . . . . 10 73 2.4.2. Congestion Controlled Mode . . . . . . . . . . . . . 10 74 2.5. Summary of Receiver Processing . . . . . . . . . . . . . 12 75 3. Congestion Information . . . . . . . . . . . . . . . . . . . 12 76 3.1. ECN Support . . . . . . . . . . . . . . . . . . . . . . . 14 77 4. Configuration of AGGFRAG Tunnels for IP-TFS . . . . . . . . . 14 78 4.1. Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . 14 79 4.2. Fixed Packet Size . . . . . . . . . . . . . . . . . . . . 14 80 4.3. Congestion Control . . . . . . . . . . . . . . . . . . . 14 81 5. IKEv2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 82 5.1. USE_AGGFRAG Notification Message . . . . . . . . . . . . 15 83 6. Packet and Data Formats . . . . . . . . . . . . . . . . . . . 15 84 6.1. AGGFRAG_PAYLOAD Payload . . . . . . . . . . . . . . . . . 15 85 6.1.1. Non-Congestion Control AGGFRAG_PAYLOAD Payload Format 16 86 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format . . 17 87 6.1.3. Data Blocks . . . . . . . . . . . . . . . . . . . . . 19 88 6.1.4. IKEv2 USE_AGGFRAG Notification Message . . . . . . . 20 89 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 90 7.1. AGGFRAG_PAYLOAD Sub-Type Registry . . . . . . . . . . . . 21 91 7.2. USE_AGGFRAG Notify Message Status Type . . . . . . . . . 21 92 8. Security Considerations . . . . . . . . . . . . . . . . . . . 22 93 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 94 9.1. Normative References . . . . . . . . . . . . . . . . . . 22 95 9.2. Informative References . . . . . . . . . . . . . . . . . 22 96 Appendix A. Example Of An Encapsulated IP Packet Flow . . . . . 24 97 Appendix B. A Send and Loss Event Rate Calculation . . . . . . . 25 98 Appendix C. Comparisons of IP-TFS . . . . . . . . . . . . . . . 26 99 C.1. Comparing Overhead . . . . . . . . . . . . . . . . . . . 26 100 C.1.1. IP-TFS Overhead . . . . . . . . . . . . . . . . . . . 26 101 C.1.2. ESP with Padding Overhead . . . . . . . . . . . . . . 26 102 C.2. Overhead Comparison . . . . . . . . . . . . . . . . . . . 27 103 C.3. Comparing Available Bandwidth . . . . . . . . . . . . . . 28 104 C.3.1. Ethernet . . . . . . . . . . . . . . . . . . . . . . 28 105 Appendix D. Acknowledgements . . . . . . . . . . . . . . . . . . 30 106 Appendix E. Contributors . . . . . . . . . . . . . . . . . . . . 30 107 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 31 109 1. Introduction 111 Traffic Analysis ([RFC4301], [AppCrypt]) is the act of extracting 112 information about data being sent through a network. While directly 113 obscuring the data with encryption [RFC4303], the traffic pattern 114 itself exposes information due to variations in its shape and timing 115 ([RFC8546], [AppCrypt]). Hiding the size and frequency of traffic is 116 referred to as Traffic Flow Confidentiality (TFC) per [RFC4303]. 118 [RFC4303] provides for TFC by allowing padding to be added to 119 encrypted IP packets and allowing for transmission of all-pad packets 120 (indicated using protocol 59). This method has the major limitation 121 that it can significantly under-utilize the available bandwidth. 123 This document defines an aggregation and fragmentation (AGGFRAG) mode 124 for ESP, and its use for IP Traffic Flow Security (IP-TFS). This 125 solution provides for full TFC without the aforementioned bandwidth 126 limitation. This is accomplished by using a constant-send-rate IPsec 127 [RFC4303] tunnel with fixed-sized encapsulating packets; however, 128 these fixed-sized packets can contain partial, whole or multiple IP 129 packets to maximize the bandwidth of the tunnel. A non-constant 130 send-rate is allowed, but the confidentiality properties of its use 131 are outside the scope of this document. 133 For a comparison of the overhead of IP-TFS with the RFC4303 134 prescribed TFC solution see Appendix C. 136 Additionally, IP-TFS provides for operating fairly within congested 137 networks [RFC2914]. This is important for when the IP-TFS user is 138 not in full control of the domain through which the IP-TFS tunnel 139 path flows. 141 The mechanisms, such as the AGGFRAG mode, defined in this document 142 are generic with the intent of allowing for non-TFS uses, but such 143 uses are outside the scope of this document. 145 1.1. Terminology & Concepts 147 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 148 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 149 "OPTIONAL" in this document are to be interpreted as described in BCP 150 14 [RFC2119] [RFC8174] when, and only when, they appear in all 151 capitals, as shown here. 153 This document assumes familiarity with IP security concepts including 154 TFC as described in [RFC4301]. 156 2. The AGGFRAG Tunnel 158 As mentioned in Section 1, AGGFRAG mode utilizes an IPsec [RFC4303] 159 tunnel as its transport. For the purpose of IP-TFS, fixed-sized 160 encapsulating packets are sent at a constant rate on the AGGFRAG 161 tunnel. 163 The primary input to the tunnel algorithm is the requested bandwidth 164 to be used by the tunnel. Two values are then required to provide 165 for this bandwidth use, the fixed size of the encapsulating packets, 166 and rate at which to send them. 168 The fixed packet size MAY either be specified manually or be 169 determined through other methods such as the Packetization Layer MTU 170 Discovery (PLMTUD) ([RFC4821], [RFC8899]) or Path MTU discovery 171 (PMTUD) ([RFC1191], [RFC8201]). PMTUD is known to have issues so 172 PLMTUD is considered the more robust option. For PLMTUD, congestion 173 control payloads can be used as in-band probes (see Section 6.1.2 and 174 [RFC8899]). 176 Given the encapsulating packet size and the requested bandwidth to be 177 used, the corresponding packet send rate can be calculated. The 178 packet send rate is the requested bandwidth to be used divided by the 179 size of the encapsulating packet. 181 The egress (receiving) side of the AGGFRAG tunnel MUST allow for and 182 expect the ingress (sending) side of the AGGFRAG tunnel to vary the 183 size and rate of sent encapsulating packets, unless constrained by 184 other policy. 186 2.1. Tunnel Content 188 As previously mentioned, one issue with the TFC padding solution in 189 [RFC4303] is the large amount of wasted bandwidth as only one IP 190 packet can be sent per encapsulating packet. In order to maximize 191 bandwidth, IP-TFS breaks this one-to-one association by introducing 192 an AGGFRAG mode for ESP. 194 AGGFRAG mode aggregates as well as fragments the inner IP traffic 195 flow into encapsulating IPsec tunnel packets. For IP-TFS, the IPsec 196 encapsulating tunnel packets are a fixed size. Padding is only added 197 to the the tunnel packets if there is no data available to be sent at 198 the time of tunnel packet transmission, or if fragmentation has been 199 disabled by the receiver. 201 This is accomplished using a new Encapsulating Security Payload (ESP, 202 [RFC4303]) Next Header field value AGGFRAG_PAYLOAD (Section 6.1). 204 Other non-IP-TFS uses of this AGGFRAG mode have been suggested, such 205 as increased performance through packet aggregation, as well as 206 handling MTU issues using fragmentation. These uses are not defined 207 here, but are also not restricted by this document. 209 2.2. Payload Content 211 The AGGFRAG_PAYLOAD payload content defined in this document is 212 comprised of a 4 or 24 octet header followed by either a partial 213 datablock, a full datablock, or multiple partial or full datablocks. 214 The following diagram illustrates this payload within the ESP packet. 215 See Section 6.1 for the exact formats of the AGGFRAG_PAYLOAD payload. 217 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 . Outer Encapsulating Header ... . 219 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 . ESP Header... . 221 +---------------------------------------------------------------+ 222 | [AGGFRAG subtype/flags] : BlockOffset | 223 +---------------------------------------------------------------+ 224 : [Optional Congestion Info] : 225 +---------------------------------------------------------------+ 226 | DataBlocks ... ~ 227 ~ ~ 228 ~ | 229 +---------------------------------------------------------------| 230 . ESP Trailer... . 231 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Figure 1: Layout of an AGGFRAG mode IPsec Packet 235 The "BlockOffset" value is either zero or some offset into or past 236 the end of the "DataBlocks" data. 238 If the "BlockOffset" value is zero it means that the "DataBlocks" 239 data begins with a new data block. 241 Conversely, if the "BlockOffset" value is non-zero it points to the 242 start of the new data block, and the initial "DataBlocks" data 243 belongs to the data block that is still being re-assembled. 245 If the "BlockOffset" points past the end of the "DataBlocks" data 246 then the next data block occurs in a subsequent encapsulating packet. 248 Having the "BlockOffset" always point at the next available data 249 block allows for recovering the next inner packet in the presence of 250 outer encapsulating packet loss. 252 An example AGGFRAG mode packet flow can be found in Appendix A. 254 2.2.1. Data Blocks 256 +---------------------------------------------------------------+ 257 | Type | rest of IPv4, IPv6 or pad. 258 +-------- 260 Figure 2: Layout of a DataBlock 262 A data block is defined by a 4-bit type code followed by the data 263 block data. The type values have been carefully chosen to coincide 264 with the IPv4/IPv6 version field values so that no per-data block 265 type overhead is required to encapsulate an IP packet. Likewise, the 266 length of the data block is extracted from the encapsulated IPv4's 267 "Total Length" or IPv6's "Payload Length" fields. 269 2.2.2. End Padding 271 Since a data block's type is identified in its first 4-bits, the only 272 time padding is required is when there is no data to encapsulate. 273 For this end padding a "Pad Data Block" is used. 275 2.2.3. Fragmentation, Sequence Numbers and All-Pad Payloads 277 In order for a receiver to reassemble fragmented inner-packets, the 278 sender MUST send the inner-packet fragments back-to-back in the 279 logical outer packet stream (i.e., using consecutive ESP sequence 280 numbers). However, the sender is allowed to insert "all-pad" 281 payloads (i.e., payloads with a "BlockOffset" of zero and a single 282 pad "DataBlock") in between the packets carrying the inner-packet 283 fragment payloads. This interleaving of all-pad payloads allows the 284 sender to always send a tunnel packet, regardless of the 285 encapsulation computational requirements. 287 When a receiver is reassembling an inner-packet, and it receives an 288 "all-pad" payload, it increments the expected sequence number that 289 the next inner-packet fragment is expected to arrive in. 291 Given the above, the receiver will need to handle out-of-order 292 arrival of outer ESP packets prior to reassembly processing. ESP 293 already provides for optionally detecting replay attacks. Detecting 294 replay attacks normally utilizes a window method. A similar sequence 295 number based sliding window can be used to correct re-ordering of the 296 outer packet stream. Receiving a larger (newer) sequence number 297 packet advances the window, and received older ESP packets whose 298 sequence numbers the window has passed by are dropped. A good choice 299 for the size of this window depends on the amount of misordering the 300 user may normally experience. 302 As the amount of misordering that may be present is hard to predict, 303 the window size SHOULD be configurable by the user. Implementations 304 MAY also dynamically adjust the reordering window based on actual 305 misordering seen in arriving packets. 307 Please note when IP-TFS sends a continuous stream of packets, there 308 is no requirement for an explicit lost packet timer; however, using a 309 lost packet timer is RECOMMENDED. If an implementation does not use 310 a lost packet timer and only considers an outer packet lost when the 311 reorder window moves by it, the inner traffic can be delayed by up to 312 the reorder window size times the per packet send rate. This amount 313 of delay could be significant for slower send rates or when larger 314 reorder window sizes are in use. As the lost packet timer affects 315 delay of inner packet delivery, one could choose to set it 316 proportionate to the tunnel rate. 318 While ESP guarantees an increasing sequence number with subsequently 319 sent packets, it does not actually require the sequence numbers to be 320 generated with no gaps (e.g., sending only even numbered sequence 321 numbers would be allowed as long as they are always increasing). 322 Gaps in the sequence numbers will not work for this document so the 323 sequence number stream MUST increase monotonically by 1 for each 324 subsequent packet. 326 When using the AGGFRAG_PAYLOAD in conjunction with replay detection, 327 the window size for both MAY be reduced to the smaller of the two 328 window sizes. This is because packets outside of the smaller window 329 but inside the larger would still be dropped by the mechanism with 330 the smaller window size. However, there is also no requirement to 331 make these values the same. Indeed, in some cases, such as slow 332 tunnels where a very small or zero reorder window size is 333 appropriate, the user may still want a large replay detection window 334 to log replayed packets. Additionally, large replay windows can be 335 implemented with very little overhead compared to large reorder 336 windows. 338 Finally, as sequence numbers are reset when switching SAs (e.g., when 339 re-keying a child SA), senders MUST NOT send initial fragments of an 340 inner packet using one SA and subsequent fragments in a different SA. 342 2.2.3.1. Optional Extra Padding 344 When the tunnel bandwidth is not being fully utilized, a sender MAY 345 pad-out the current encapsulating packet in order to deliver an inner 346 packet un-fragmented in the following outer packet. The benefit 347 would be to avoid inner-packet fragmentation in the presence of a 348 bursty offered load (non-bursty traffic will naturally not fragment). 349 Senders MAY also choose to allow for a minimum fragment size to be 350 configured (e.g., as a percentage of the AGGFRAG_PAYLOAD payload 351 size) to avoid fragmentation at the cost of tunnel bandwidth. The 352 cost with these methods is complexity and added delay of inner 353 traffic. The main advantage to avoiding fragmentation is to minimize 354 inner packet loss in the presence of outer packet loss. When this is 355 worthwhile (e.g., how much loss and what type of loss is required, 356 given different inner traffic shapes and utilization, for this to 357 make sense), and what values to use for the allowable/added delay may 358 be worth researching, but is outside the scope of this document. 360 While use of padding to avoid fragmentation does not impact 361 interoperability, used inappropriately it can reduce the effective 362 throughput of a tunnel. Senders implementing either of the above 363 approaches will need to take care to not reduce the effective 364 capacity, and overall utility, of the tunnel through the overuse of 365 padding. 367 2.2.4. Empty Payload 369 To support reporting of congestion control information (described 370 later) using a non-AGGFRAG_PAYLOAD enabled SA, it is allowed to send 371 an AGGFRAG_PAYLOAD payload with no data blocks (i.e., the ESP payload 372 length is equal to the AGGFRAG_PAYLOAD header length). This special 373 payload is called an empty payload. 375 Currently this situation is only applicable in non-IKEv2 use cases. 377 2.2.5. IP Header Value Mapping 379 [RFC4301] provides some direction on when and how to map various 380 values from an inner IP header to the outer encapsulating header, 381 namely the Don't-Fragment (DF) bit ([RFC0791] and [RFC8200]), the 382 Differentiated Services (DS) field [RFC2474] and the Explicit 383 Congestion Notification (ECN) field [RFC3168]. Unlike [RFC4301], 384 AGGFRAG mode may and often will be encapsulating more than one IP 385 packet per ESP packet. To deal with this, these mappings are 386 restricted further. 388 2.2.5.1. DF bit 390 AGGFRAG mode never maps the inner DF bit as it is unrelated to the 391 AGGFRAG tunnel functionality; AGGFRAG mode never needs to IP fragment 392 the inner packets and the inner packets will not affect the 393 fragmentation of the outer encapsulation packets. 395 2.2.5.2. ECN value 397 The ECN value need not be mapped as any congestion related to the 398 constant-send-rate IP-TFS tunnel is unrelated (by design) to the 399 inner traffic flow. The sender MAY still set the ECN value of inner 400 packets based on the normal ECN specification [RFC3168]. 402 2.2.5.3. DS field 404 By default the DS field SHOULD NOT be copied, although a sender MAY 405 choose to allow for configuration to override this behavior. A 406 sender SHOULD also allow the DS value to be set by configuration. 408 2.2.6. IP Time-To-Live (TTL) and Tunnel errors 410 [RFC4301] specifies how to modify the inner packet TTL [RFC0791]. 412 Any errors (e.g., ICMP errors arriving back at the tunnel ingress due 413 to tunnel traffic) are handled the same as with non-AGGFRAG IPsec 414 tunnels. 416 2.2.7. Effective MTU of the Tunnel 418 Unlike [RFC4301], there is normally no effective MTU (EMTU) on an 419 AGGFRAG tunnel as all IP packet sizes are properly transmitted 420 without requiring IP fragmentation prior to tunnel ingress. That 421 said, a sender MAY allow for explicitly configuring an MTU for the 422 tunnel. 424 If fragmentation has been disabled on the AGGFRAG tunnel, then the 425 tunnel's EMTU and behaviors are the same as normal IPsec tunnels 426 [RFC4301]. 428 2.3. Exclusive SA Use 430 This document does not specify mixed use of an AGGFRAG_PAYLOAD 431 enabled SA. A sender MUST only send AGGFRAG_PAYLOAD payloads over an 432 SA configured for AGGFRAG mode. 434 2.4. Modes of Operation 436 Just as with normal IPsec/ESP tunnels, AGGFRAG tunnels are 437 unidirectional. Bidirectional IP-TFS functionality is achieved by 438 setting up 2 AGGFRAG tunnels, one in either direction. 440 An AGGFRAG tunnel used for IP-TFS can operate in 2 modes, a non- 441 congestion controlled mode and congestion controlled mode. 443 2.4.1. Non-Congestion Controlled Mode 445 In the non-congestion controlled mode, IP-TFS sends fixed-sized 446 packets over an AGGFRAG tunnel at a constant rate. The packet send 447 rate is constant and is not automatically adjusted regardless of any 448 network congestion (e.g., packet loss). 450 For similar reasons as given in [RFC7510] the non-congestion 451 controlled mode should only be used where the user has full 452 administrative control over the path the tunnel will take. This is 453 required so the user can guarantee the bandwidth and also be sure as 454 to not be negatively affecting network congestion [RFC2914]. In this 455 case packet loss should be reported to the administrator (e.g., via 456 syslog, YANG notification, SNMP traps, etc) so that any failures due 457 to a lack of bandwidth can be corrected. 459 Non-congestion control mode is also appropriate if ESP over TCP is in 460 use [RFC8229]. 462 2.4.2. Congestion Controlled Mode 464 With the congestion controlled mode, IP-TFS adapts to network 465 congestion by lowering the packet send rate to accommodate the 466 congestion, as well as raising the rate when congestion subsides. 467 Since overhead is per packet, by allowing for maximal fixed-size 468 packets and varying the send rate transport overhead is minimized. 470 The output of the congestion control algorithm will adjust the rate 471 at which the ingress sends packets. While this document does not 472 require a specific congestion control algorithm, best current 473 practice RECOMMENDS that the algorithm conform to [RFC5348]. 474 Congestion control principles are documented in [RFC2914] as well. 475 [RFC4342] provides an example of the [RFC5348] algorithm which 476 matches the requirements of IP-TFS (i.e., designed for fixed-size 477 packet and send rate varied based on congestion. 479 The required inputs for the TCP friendly rate control algorithm 480 described in [RFC5348] are the receiver's loss event rate and the 481 sender's estimated round-trip time (RTT). These values are provided 482 by IP-TFS using the congestion information header fields described in 483 Section 3. In particular, these values are sufficient to implement 484 the algorithm described in [RFC5348]. 486 At a minimum, the congestion information MUST be sent, from the 487 receiver and from the sender, at least once per RTT. Prior to 488 establishing an RTT the information SHOULD be sent constantly from 489 the sender and the receiver so that an RTT estimate can be 490 established. Not receiving this information over multiple 491 consecutive RTT intervals should be considered a congestion event 492 that causes the sender to adjust its sending rate lower. For 493 example, [RFC4342] calls this the "no feedback timeout" and it is 494 equal to 4 RTT intervals. When a "no feedback timeout" has occurred 495 [RFC4342] halves the sending rate. 497 An implementation MAY choose to always include the congestion 498 information in its AGGFRAG payload header if sending on an IP-TFS 499 enabled SA. Since IP-TFS normally will operate with a large packet 500 size, the congestion information should represent a small portion of 501 the available tunnel bandwidth. An implementation choosing to always 502 send the data MAY also choose to only update the "LossEventRate" and 503 "RTT" header field values it sends every "RTT" though. 505 When choosing a congestion control algorithm (or a selection of 506 algorithms) note that IP-TFS is not providing for reliable delivery 507 of IP traffic, and so per packet ACKs are not required and are not 508 provided. 510 It is worth noting that the variable send-rate of a congestion 511 controlled AGGFRAG tunnel, is not private; however, this send-rate is 512 being driven by network congestion, and as long as the encapsulated 513 (inner) traffic flow shape and timing are not directly affecting the 514 (outer) network congestion, the variations in the tunnel rate will 515 not weaken the provided inner traffic flow confidentiality. 517 2.4.2.1. Circuit Breakers 519 In additional to congestion control, implementations MAY choose to 520 define and implement circuit breakers [RFC8084] as a recovery method 521 of last resort. Enabling circuit breakers is also a reason a user 522 may wish to enable congestion information reports even when using the 523 non-congestion controlled mode of operation. The definition of 524 circuit breakers are outside the scope of this document. 526 2.5. Summary of Receiver Processing 528 An AGGFRAG enabled SA receiver has a few tasks to perform. 530 The receiver MAY process incoming AGGFRAG_PAYLOAD payloads as soon as 531 they arrive as much as it can. I.e., if the incoming AGGFRAG_PAYLOAD 532 packet contains complete inner packet(s), the receiver should extract 533 and transmit them immediately. For partial packets the receiver 534 needs to keep the partial packets in the memory until the they fall 535 out from the reordering window, or until the missing parts of the 536 packets are received, in which case it will reassemble and transmit 537 them. If AGGFRAG_PAYLOAD payload contains multiple packets they 538 SHOULD be sent out in the order they are in the AGGFRAG_PAYLOAD 539 (i.e., keep the original order they were received on the other end). 540 The cost of using this method is that an amplification of out-of- 541 order delivery of inner packets can occur due to inner packet 542 aggregation. 544 Instead of the method described in the previous paragraph, the 545 receiver MAY reorder out-of-order AGGFRAG_PAYLOAD payloads received 546 into in-sequence-order AGGFRAG_PAYLOAD payloads (Section 2.2.3), and 547 only after it has in-order AGGFRAG_PAYLOAD payload stream, the 548 receiver transmits the inner-packets. Using this method will make 549 sure the packets are sent in-order, i.e., there is no reordering 550 possible, but the cost is that a lost packet will cause delay of up 551 to the lost packet timer interval (or the full reorder window if no 552 lost packet timer is used), and there will be extra burstiness in the 553 output stream (when lost packet is dropped out from the re-order 554 window, all outer packets received after that are then immediately 555 processed, and sent out back to back). 557 Additionally, if congestion control is enabled, the receiver sends 558 congestion control data (Section 6.1.2) back to the sender as 559 described in Section 2.4.2 and Section 3. 561 3. Congestion Information 563 In order to support the congestion control mode, the sender needs to 564 know the loss event rate and to approximate the RTT [RFC5348]. In 565 order to obtain these values, the receiver sends congestion control 566 information on it's SA back to the sender. Thus, to support 567 congestion control the receiver must have a paired SA back to the 568 sender (this is always the case when the tunnel was created using 569 IKEv2). If the SA back to the sender is a non-AGGFRAG_PAYLOAD 570 enabled SA then an AGGFRAG_PAYLOAD empty payload (i.e., header only) 571 is used to convey the information. 573 In order to calculate a loss event rate compatible with [RFC5348], 574 the receiver needs to have a round-trip time estimate. Thus the 575 sender communicates this estimate in the "RTT" header field. On 576 startup this value will be zero as no RTT estimate is yet known. 578 In order for the sender to estimate its "RTT" value, the sender 579 places a timestamp value in the "TVal" header field. On first 580 receipt of this "TVal", the receiver records the new "TVal" value 581 along with the time it arrived locally, subsequent receipt of the 582 same "TVal" MUST NOT update the recorded time. 584 When the receiver sends its CC header it places this latest recorded 585 "TVal" in the "TEcho" header field, along with 2 delay values, "Echo 586 Delay" and "Transmit Delay". The "Echo Delay" value is the time 587 delta from the recorded arrival time of "TVal" and the current clock 588 in microseconds. The second value, "Transmit Delay", is the 589 receiver's current transmission delay on the tunnel (i.e., the 590 average time between sending packets on its half of the AGGFRAG 591 tunnel). 593 When the sender receives back its "TVal" in the "TEcho" header field 594 it calculates 2 RTT estimates. The first is the actual delay found 595 by subtracting the "TEcho" value from its current clock and then 596 subtracting "Echo Delay" as well. The second RTT estimate is found 597 by adding the received "Transmit Delay" header value to the senders 598 own transmission delay (i.e., the average time between sending 599 packets on its half of the AGGFRAG tunnel). The larger of these 2 600 RTT estimates SHOULD be used as the "RTT" value. 602 The two RTT estimates are required to handle different combinations 603 of faster or slower tunnel packet paths with faster or slower fixed 604 tunnel rates. Choosing the larger of the two values guarantees that 605 the "RTT" is never considered faster than the aggregate transmission 606 delay based on the IP-TFS send rate (the second estimate), as well as 607 never being considered faster than the actual RTT along the tunnel 608 packet path (the first estimate). 610 The receiver also calculates, and communicates in the "LossEventRate" 611 header field, the loss event rate for use by the sender. This is 612 slightly different from [RFC4342] which periodically sends all the 613 loss interval data back to the sender so that it can do the 614 calculation. See Appendix B for a suggested way to calculate the 615 loss event rate value. Initially this value will be zero (indicating 616 no loss) until enough data has been collected by the receiver to 617 update it. 619 3.1. ECN Support 621 In additional to normal packet loss information AGGFRAG mode supports 622 use of the ECN bits in the encapsulating IP header [RFC3168] for 623 identifying congestion. If ECN use is enabled and a packet arrives 624 at the egress (receiving) side with the Congestion Experienced (CE) 625 value set, then the receiver considers that packet as being dropped, 626 although it does not drop it. The receiver MUST set the E bit in any 627 AGGFRAG_PAYLOAD payload header containing a "LossEventRate" value 628 derived from a CE value being considered. 630 As noted in [RFC3168] the ECN bits are not protected by IPsec and 631 thus may constitute a covert channel. For this reason, ECN use 632 SHOULD NOT be enabled by default. 634 4. Configuration of AGGFRAG Tunnels for IP-TFS 636 IP-TFS is meant to be deployable with a minimal amount of 637 configuration. All IP-TFS specific configuration should be specified 638 at the unidirectional tunnel ingress (sending) side. It is intended 639 that non-IKEv2 operation is supported, at least, with local static 640 configuration. 642 4.1. Bandwidth 644 Bandwidth is a local configuration option. For non-congestion 645 controlled mode, the bandwidth SHOULD be configured. For congestion 646 controlled mode, the bandwidth can be configured or the congestion 647 control algorithm discovers and uses the maximum bandwidth available. 648 No standardized configuration method is required. 650 4.2. Fixed Packet Size 652 The fixed packet size to be used for the tunnel encapsulation packets 653 MAY be configured manually or can be automatically determined using 654 other methods such as PLMTUD ([RFC4821], [RFC8899]) or PMTUD 655 ([RFC1191], [RFC8201]). As PMTUD is known to have issues, PLMTUD is 656 considered the more robust option. No standardized configuration 657 method is required. 659 4.3. Congestion Control 661 Congestion control is a local configuration option. No standardized 662 configuration method is required. 664 5. IKEv2 666 5.1. USE_AGGFRAG Notification Message 668 As mentioned previously AGGFRAG tunnels utilize ESP payloads of type 669 AGGFRAG_PAYLOAD. 671 When using IKEv2, a new "USE_AGGFRAG" Notification Message enables 672 the AGGFRAG_PAYLOAD payload on a child SA pair. The method used is 673 similar to how USE_TRANSPORT_MODE is negotiated, as described in 674 [RFC7296]. 676 To request use of the AGGFRAG_PAYLOAD payload on the Child SA pair, 677 the initiator includes the USE_AGGFRAG notification in an SA payload 678 requesting a new Child SA (either during the initial IKE_AUTH or 679 during CREATE_CHILD_SA exchanges). If the request is accepted then 680 the response MUST also include a notification of type USE_AGGFRAG. 681 If the responder declines the request the child SA will be 682 established without AGGFRAG_PAYLOAD payload use enabled. If this is 683 unacceptable to the initiator, the initiator MUST delete the child 684 SA. 686 As the use of the AGGFRAG_PAYLOAD payload is currently only defined 687 for non-transport mode tunnels, the USE_AGGFRAG notification MUST NOT 688 be combined with USE_TRANSPORT notification. 690 The USE_AGGFRAG notification contains a 1 octet payload of flags that 691 specify requirements from the sender of the notification. If any 692 requirement flags are not understood or cannot be supported by the 693 receiver then the receiver SHOULD NOT enable use of AGGFRAG_PAYLOAD 694 (either by not responding with the USE_AGGFRAG notification, or in 695 the case of the initiator, by deleting the child SA if the now 696 established non-AGGFRAG_PAYLOAD using SA is unacceptable). 698 The notification type and payload flag values are defined in 699 Section 6.1.4. 701 6. Packet and Data Formats 703 The packet and data formats defined below are generic with the intent 704 of allowing for non-IP-TFS uses, but such uses are outside the scope 705 of this document. 707 6.1. AGGFRAG_PAYLOAD Payload 709 ESP Next Header value: 0x5 710 An AGGFRAG payload is identified by the ESP Next Header value 711 AGGFRAG_PAYLOAD which has the value 0x5. The value 5 was chosen to 712 not conflict with other used values. The first octet of this payload 713 indicates the format of the remaining payload data. 715 0 1 2 3 4 5 6 7 716 +-+-+-+-+-+-+-+-+-+-+- 717 | Sub-type | ... 718 +-+-+-+-+-+-+-+-+-+-+- 720 Sub-type: 721 An 8-bit value indicating the payload format. 723 This document defines 2 payload sub-types. These payload formats are 724 defined in the following sections. 726 6.1.1. Non-Congestion Control AGGFRAG_PAYLOAD Payload Format 728 The non-congestion control AGGFRAG_PAYLOAD payload is comprised of a 729 4 octet header followed by a variable amount of "DataBlocks" data as 730 shown below. 732 1 2 3 733 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 734 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 735 | Sub-Type (0) | Reserved | BlockOffset | 736 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 737 | DataBlocks ... 738 +-+-+-+-+-+-+-+-+-+-+- 740 Sub-type: 741 An octet indicating the payload format. For this non-congestion 742 control format, the value is 0. 744 Reserved: 745 An octet set to 0 on generation, and ignored on receipt. 747 BlockOffset: 748 A 16-bit unsigned integer counting the number of octets of 749 "DataBlocks" data before the start of a new data block. If the 750 start of a new data block occurs in a subsequent payload the 751 "BlockOffset" will point past the end of the "DataBlocks" data. 752 In this case all the "DataBlocks" data belongs to the current data 753 block being assembled. When the "BlockOffset" extends into 754 subsequent payloads it continues to only count "DataBlocks" data 755 (i.e., it does not count subsequent packets non-"DataBlocks" data 756 such as header octets). 758 DataBlocks: 759 Variable number of octets that begins with the start of a data 760 block, or the continuation of a previous data block, followed by 761 zero or more additional data blocks. 763 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format 765 The congestion control AGGFRAG_PAYLOAD payload is comprised of a 24 766 octet header followed by a variable amount of "DataBlocks" data as 767 shown below. 769 1 2 3 770 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 771 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 772 | Sub-type (1) | Reserved |P|E| BlockOffset | 773 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 774 | LossEventRate | 775 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 776 | RTT | Echo Delay ... 777 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 778 ... Echo Delay | Transmit Delay | 779 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 780 | TVal | 781 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 782 | TEcho | 783 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 784 | DataBlocks ... 785 +-+-+-+-+-+-+-+-+-+-+- 787 Sub-type: 788 An octet indicating the payload format. For this congestion 789 control format, the value is 1. 791 Reserved: 792 A 6-bit field set to 0 on generation, and ignored on receipt. 794 P: 795 A 1-bit value if set indicates that PLMTUD probing is in progress. 796 This information can be used to avoid treating missing packets as 797 loss events by the CC algorithm when running the PLMTUD probe 798 algorithm. 800 E: 801 A 1-bit value if set indicates that Congestion Experienced (CE) 802 ECN bits were received and used in deriving the reported 803 "LossEventRate". 805 BlockOffset: 807 The same value as the non-congestion controlled payload format 808 value. 810 LossEventRate: 811 A 32-bit value specifying the inverse of the current loss event 812 rate as calculated by the receiver. A value of zero indicates no 813 loss. Otherwise the loss event rate is "1/LossEventRate". 815 RTT: 816 A 22-bit value specifying the sender's current round-trip time 817 estimate in microseconds. The value MAY be zero prior to the 818 sender having calculated a round-trip time estimate. The value 819 SHOULD be set to zero on non-AGGFRAG_PAYLOAD enabled SAs. If the 820 value is equal to or larger than "0x3FFFFF" it MUST be set to 821 "0x3FFFFF". 823 Echo Delay: 824 A 21-bit value specifying the delay in microseconds incurred 825 between the receiver first receiving the "TVal" value which it is 826 sending back in "TEcho". If the value is equal to or larger than 827 "0x1FFFFF" it MUST be set to "0x1FFFFF". 829 Transmit Delay: 830 A 21-bit value specifying the transmission delay in microseconds. 831 This is the fixed (or average) delay on the receiver between it 832 sending packets on the IPTFS tunnel. If the value is equal to or 833 larger than "0x1FFFFF" it MUST be set to "0x1FFFFF". 835 TVal: 836 An opaque 32-bit value that will be echoed back by the receiver in 837 later packets in the "TEcho" field, along with an "Echo Delay" 838 value of how long that echo took. 840 TEcho: 841 The opaque 32-bit value from a received packet's "TVal" field. 842 The received "TVal" is placed in "TEcho" along with an "Echo 843 Delay" value indicating how long it has been since receiving the 844 "TVal" value. 846 DataBlocks: 847 Variable number of octets that begins with the start of a data 848 block, or the continuation of a previous data block, followed by 849 zero or more additional data blocks. For the special case of 850 sending congestion control information on an non-IP-TFS enabled SA 851 this value MUST be empty (i.e., be zero octets long). 853 6.1.3. Data Blocks 855 1 2 3 856 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 858 | Type | IPv4, IPv6 or pad... 859 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 861 Type: 862 A 4-bit field where 0x0 identifies a pad data block, 0x4 indicates 863 an IPv4 data block, and 0x6 indicates an IPv6 data block. 865 6.1.3.1. IPv4 Data Block 867 1 2 3 868 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 870 | 0x4 | IHL | TypeOfService | TotalLength | 871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 872 | Rest of the inner packet ... 873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 875 These values are the actual values within the encapsulated IPv4 876 header. In other words, the start of this data block is the start of 877 the encapsulated IP packet. 879 Type: 880 A 4-bit value of 0x4 indicating IPv4 (i.e., first nibble of the 881 IPv4 packet). 883 TotalLength: 884 The 16-bit unsigned integer "Total Length" field of the IPv4 inner 885 packet. 887 6.1.3.2. IPv6 Data Block 889 1 2 3 890 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 891 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 892 | 0x6 | TrafficClass | FlowLabel | 893 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 894 | PayloadLength | Rest of the inner packet ... 895 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 897 These values are the actual values within the encapsulated IPv6 898 header. In other words, the start of this data block is the start of 899 the encapsulated IP packet. 901 Type: 902 A 4-bit value of 0x6 indicating IPv6 (i.e., first nibble of the 903 IPv6 packet). 905 PayloadLength: 906 The 16-bit unsigned integer "Payload Length" field of the inner 907 IPv6 inner packet. 909 6.1.3.3. Pad Data Block 911 1 2 3 912 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 913 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 914 | 0x0 | Padding ... 915 +-+-+-+-+-+-+-+-+-+-+- 917 Type: 918 A 4-bit value of 0x0 indicating a padding data block. 920 Padding: 921 Extends to end of the encapsulating packet. 923 6.1.4. IKEv2 USE_AGGFRAG Notification Message 925 As discussed in Section 5.1, a notification message USE_AGGFRAG is 926 used to negotiate use of the ESP AGGFRAG_PAYLOAD Next Header value. 928 The USE_AGGFRAG Notification Message State Type is (TBD2). 930 The notification payload contains 1 octet of requirement flags. 931 There are currently 2 requirement flags defined. This may be revised 932 by later specifications. 934 +-+-+-+-+-+-+-+-+ 935 |0|0|0|0|0|0|C|D| 936 +-+-+-+-+-+-+-+-+ 938 0: 939 6 bits - reserved, MUST be zero on send, unless defined by later 940 specifications. 942 C: 943 Congestion Control bit. If set, then the sender is requiring that 944 congestion control information MUST be returned to it periodically 945 as defined in Section 3. 947 D: 949 Don't Fragment bit. If set, indicates the sender of the notify 950 message does not support receiving packet fragments (i.e., inner 951 packets MUST be sent using a single "Data Block"). This value 952 only applies to what the sender is capable of receiving; the 953 sender MAY still send packet fragments unless similarly restricted 954 by the receiver in it's USE_AGGFRAG notification. 956 7. IANA Considerations 958 7.1. AGGFRAG_PAYLOAD Sub-Type Registry 960 This document requests IANA create a registry called "AGGFRAG_PAYLOAD 961 Sub-Type Registry" under a new category named "ESP AGGFRAG_PAYLOAD 962 Parameters". The registration policy for this registry is "Expert 963 Review" ([RFC8126] and [RFC7120]). 965 Name: 966 AGGFRAG_PAYLOAD Sub-Type Registry 968 Description: 969 AGGFRAG_PAYLOAD Payload Formats. 971 Reference: 972 This document 974 This initial content for this registry is as follows: 976 Sub-Type Name Reference 977 -------------------------------------------------------- 978 0 Non-Congestion Control Format This document 979 1 Congestion Control Format This document 980 3-255 Reserved 982 7.2. USE_AGGFRAG Notify Message Status Type 984 This document requests a status type USE_AGGFRAG be allocated from 985 the "IKEv2 Notify Message Types - Status Types" registry. 987 Value: 988 TBD2 990 Name: 991 USE_AGGFRAG 993 Reference: 994 This document 996 8. Security Considerations 998 This document describes an aggregation and fragmentation mechanism 999 and it use to add TFC to IP traffic. The use described is expected 1000 to increase the security of the traffic being transported. Other 1001 than the additional security afforded by using this mechanism, IP-TFS 1002 utilizes the security protocols [RFC4303] and [RFC7296] and so their 1003 security considerations apply to IP-TFS as well. 1005 As noted in (Section 3.1) the ECN bits are not protected by IPsec and 1006 thus may constitute a covert channel. For this reason, ECN use 1007 SHOULD NOT be enabled by default. 1009 As noted previously in Section 2.4.2, for TFC to be fully maintained 1010 the encapsulated traffic flow should not be affecting network 1011 congestion in a predictable way, and if it would be then non- 1012 congestion controlled mode use should be considered instead. 1014 9. References 1016 9.1. Normative References 1018 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1019 Requirement Levels", BCP 14, RFC 2119, 1020 DOI 10.17487/RFC2119, March 1997, 1021 . 1023 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 1024 RFC 4303, DOI 10.17487/RFC4303, December 2005, 1025 . 1027 [RFC7296] Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T. 1028 Kivinen, "Internet Key Exchange Protocol Version 2 1029 (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October 1030 2014, . 1032 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1033 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1034 May 2017, . 1036 9.2. Informative References 1038 [AppCrypt] 1039 Schneier, B., "Applied Cryptography: Protocols, 1040 Algorithms, and Source Code in C", 11 2017. 1042 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1043 DOI 10.17487/RFC0791, September 1981, 1044 . 1046 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1047 DOI 10.17487/RFC1191, November 1990, 1048 . 1050 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1051 "Definition of the Differentiated Services Field (DS 1052 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1053 DOI 10.17487/RFC2474, December 1998, 1054 . 1056 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 1057 RFC 2914, DOI 10.17487/RFC2914, September 2000, 1058 . 1060 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1061 of Explicit Congestion Notification (ECN) to IP", 1062 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1063 . 1065 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1066 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1067 December 2005, . 1069 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1070 Datagram Congestion Control Protocol (DCCP) Congestion 1071 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1072 DOI 10.17487/RFC4342, March 2006, 1073 . 1075 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1076 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 1077 . 1079 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1080 Friendly Rate Control (TFRC): Protocol Specification", 1081 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1082 . 1084 [RFC7120] Cotton, M., "Early IANA Allocation of Standards Track Code 1085 Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January 1086 2014, . 1088 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1089 "Encapsulating MPLS in UDP", RFC 7510, 1090 DOI 10.17487/RFC7510, April 2015, 1091 . 1093 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 1094 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1095 . 1097 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 1098 Writing an IANA Considerations Section in RFCs", BCP 26, 1099 RFC 8126, DOI 10.17487/RFC8126, June 2017, 1100 . 1102 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1103 (IPv6) Specification", STD 86, RFC 8200, 1104 DOI 10.17487/RFC8200, July 2017, 1105 . 1107 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1108 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1109 DOI 10.17487/RFC8201, July 2017, 1110 . 1112 [RFC8229] Pauly, T., Touati, S., and R. Mantha, "TCP Encapsulation 1113 of IKE and IPsec Packets", RFC 8229, DOI 10.17487/RFC8229, 1114 August 2017, . 1116 [RFC8546] Trammell, B. and M. Kuehlewind, "The Wire Image of a 1117 Network Protocol", RFC 8546, DOI 10.17487/RFC8546, April 1118 2019, . 1120 [RFC8899] Fairhurst, G., Jones, T., Tuexen, M., Ruengeler, I., and 1121 T. Voelker, "Packetization Layer Path MTU Discovery for 1122 Datagram Transports", RFC 8899, DOI 10.17487/RFC8899, 1123 September 2020, . 1125 Appendix A. Example Of An Encapsulated IP Packet Flow 1127 Below an example inner IP packet flow within the encapsulating tunnel 1128 packet stream is shown. Notice how encapsulated IP packets can start 1129 and end anywhere, and more than one or less than 1 may occur in a 1130 single encapsulating packet. 1132 Offset: 0 Offset: 100 Offset: 2900 Offset: 1400 1133 [ ESP1 (1500) ][ ESP2 (1500) ][ ESP3 (1500) ][ ESP4 (1500) ] 1134 [--800--][--800--][60][-240-][--4000----------------------][pad] 1136 Figure 3: Inner and Outer Packet Flow 1138 The encapsulated IP packet flow (lengths include IP header and 1139 payload) is as follows: an 800 octet packet, an 800 octet packet, a 1140 60 octet packet, a 240 octet packet, a 4000 octet packet. 1142 The "BlockOffset" values in the 4 AGGFRAG payload headers for this 1143 packet flow would thus be: 0, 100, 2900, 1400 respectively. The 1144 first encapsulating packet ESP1 has a zero "BlockOffset" which points 1145 at the IP data block immediately following the AGGFRAG header. The 1146 following packet ESP2s "BlockOffset" points inward 100 octets to the 1147 start of the 60 octet data block. The third encapsulating packet 1148 ESP3 contains the middle portion of the 4000 octet data block so the 1149 offset points past its end and into the forth encapsulating packet. 1150 The fourth packet ESP4s offset is 1400 pointing at the padding which 1151 follows the completion of the continued 4000 octet packet. 1153 Appendix B. A Send and Loss Event Rate Calculation 1155 The current best practice indicates that congestion control SHOULD be 1156 done in a TCP friendly way. A TCP friendly congestion control 1157 algorithm is described in [RFC5348]. For this IP-TFS use case (as 1158 with [RFC4342]) the (fixed) packet size is used as the segment size 1159 for the algorithm. The main formula in the algorithm for the send 1160 rate is then as follows: 1162 1 1163 X = ----------------------------------------------- 1164 R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2)) 1166 Where "X" is the send rate in packets per second, "R" is the round 1167 trip time estimate and "p" is the loss event rate (the inverse of 1168 which is provided by the receiver). 1170 In addition the algorithm in [RFC5348] also uses an "X_recv" value 1171 (the receiver's receive rate). For IP-TFS one MAY set this value 1172 according to the sender's current tunnel send-rate ("X"). 1174 The IP-TFS receiver, having the RTT estimate from the sender can use 1175 the same method as described in [RFC5348] and [RFC4342] to collect 1176 the loss intervals and calculate the loss event rate value using the 1177 weighted average as indicated. The receiver communicates the inverse 1178 of this value back to the sender in the AGGFRAG_PAYLOAD payload 1179 header field "LossEventRate". 1181 The IP-TFS sender now has both the "R" and "p" values and can 1182 calculate the correct sending rate. If following [RFC5348] the 1183 sender should also use the slow start mechanism described therein 1184 when the IP-TFS SA is first established. 1186 Appendix C. Comparisons of IP-TFS 1188 C.1. Comparing Overhead 1190 For comparing overhead the overhead of ESP for both normal and 1191 AGGFRAG tunnel packets must be calculated, and so an algorithm for 1192 encryption and authentication must be chosen. For the data below 1193 AES-GCM-256 was selected. This leads to an IP+ESP overhead of 54. 1195 54 = 20 (IP) + 8 (ESPH) + 2 (ESPF) + 8 (IV) + 16 (ICV) 1197 Additionally, for IP-TFS, non-congestion control AGGFRAG_PAYLOAD 1198 headers were chosen which adds 4 octets for a total overhead of 58. 1200 C.1.1. IP-TFS Overhead 1202 For comparison the overhead of AGGFRAG payload is 58 octets per outer 1203 packet. Therefore the octet overhead per inner packet is 58 divided 1204 by the number of outer packets required (fractional allowed). The 1205 overhead as a percentage of inner packet size is a constant based on 1206 the Outer MTU size. 1208 OH = 58 / Outer Payload Size / Inner Packet Size 1209 OH % of Inner Packet Size = 100 * OH / Inner Packet Size 1210 OH % of Inner Packet Size = 5800 / Outer Payload Size 1212 Type IP-TFS IP-TFS IP-TFS 1213 MTU 576 1500 9000 1214 PSize 518 1442 8942 1215 ------------------------------- 1216 40 11.20% 4.02% 0.65% 1217 576 11.20% 4.02% 0.65% 1218 1500 11.20% 4.02% 0.65% 1219 9000 11.20% 4.02% 0.65% 1221 Figure 4: IP-TFS Overhead as Percentage of Inner Packet Size 1223 C.1.2. ESP with Padding Overhead 1225 The overhead per inner packet for constant-send-rate padded ESP 1226 (i.e., traditional IPsec TFC) is 36 octets plus any padding, unless 1227 fragmentation is required. 1229 When fragmentation of the inner packet is required to fit in the 1230 outer IPsec packet, overhead is the number of outer packets required 1231 to carry the fragmented inner packet times both the inner IP overhead 1232 (20) and the outer packet overhead (54) minus the initial inner IP 1233 overhead plus any required tail padding in the last encapsulation 1234 packet. The required tail padding is the number of required packets 1235 times the difference of the Outer Payload Size and the IP Overhead 1236 minus the Inner Payload Size. So: 1238 Inner Paylaod Size = IP Packet Size - IP Overhead 1239 Outer Payload Size = MTU - IPsec Overhead 1241 Inner Payload Size 1242 NF0 = ---------------------------------- 1243 Outer Payload Size - IP Overhead 1245 NF = CEILING(NF0) 1247 OH = NF * (IP Overhead + IPsec Overhead) 1248 - IP Overhead 1249 + NF * (Outer Payload Size - IP Overhead) 1250 - Inner Payload Size 1252 OH = NF * (IPsec Overhead + Outer Payload Size) 1253 - (IP Overhead + Inner Payload Size) 1255 OH = NF * (IPsec Overhead + Outer Payload Size) 1256 - Inner Packet Size 1258 C.2. Overhead Comparison 1260 The following tables collect the overhead values for some common L3 1261 MTU sizes in order to compare them. The first table is the number of 1262 octets of overhead for a given L3 MTU sized packet. The second table 1263 is the percentage of overhead in the same MTU sized packet. 1265 Type ESP+Pad ESP+Pad ESP+Pad IP-TFS IP-TFS IP-TFS 1266 L3 MTU 576 1500 9000 576 1500 9000 1267 PSize 522 1446 8946 518 1442 8942 1268 ----------------------------------------------------------- 1269 40 482 1406 8906 4.5 1.6 0.3 1270 128 394 1318 8818 14.3 5.1 0.8 1271 256 266 1190 8690 28.7 10.3 1.7 1272 518 4 928 8428 58.0 20.8 3.4 1273 576 576 870 8370 64.5 23.2 3.7 1274 1442 286 4 7504 161.5 58.0 9.4 1275 1500 228 1500 7446 168.0 60.3 9.7 1276 8942 1426 1558 4 1001.2 359.7 58.0 1277 9000 1368 1500 9000 1007.7 362.0 58.4 1279 Figure 5: Overhead comparison in octets 1281 Type ESP+Pad ESP+Pad ESP+Pad IP-TFS IP-TFS IP-TFS 1282 MTU 576 1500 9000 576 1500 9000 1283 PSize 522 1446 8946 518 1442 8942 1284 ----------------------------------------------------------- 1285 40 1205.0% 3515.0% 22265.0% 11.20% 4.02% 0.65% 1286 128 307.8% 1029.7% 6889.1% 11.20% 4.02% 0.65% 1287 256 103.9% 464.8% 3394.5% 11.20% 4.02% 0.65% 1288 518 0.8% 179.2% 1627.0% 11.20% 4.02% 0.65% 1289 576 100.0% 151.0% 1453.1% 11.20% 4.02% 0.65% 1290 1442 19.8% 0.3% 520.4% 11.20% 4.02% 0.65% 1291 1500 15.2% 100.0% 496.4% 11.20% 4.02% 0.65% 1292 8942 15.9% 17.4% 0.0% 11.20% 4.02% 0.65% 1293 9000 15.2% 16.7% 100.0% 11.20% 4.02% 0.65% 1295 Figure 6: Overhead as Percentage of Inner Packet Size 1297 C.3. Comparing Available Bandwidth 1299 Another way to compare the two solutions is to look at the amount of 1300 available bandwidth each solution provides. The following sections 1301 consider and compare the percentage of available bandwidth. For the 1302 sake of providing a well understood baseline normal (unencrypted) 1303 Ethernet as well as normal ESP values are included. 1305 C.3.1. Ethernet 1307 In order to calculate the available bandwidth the per packet overhead 1308 is calculated first. The total overhead of Ethernet is 14+4 octets 1309 of header and CRC plus and additional 20 octets of framing (preamble, 1310 start, and inter-packet gap) for a total of 38 octets. Additionally 1311 the minimum payload is 46 octets. 1313 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1314 MTU 590 1514 9014 590 1514 9014 any any 1315 OH 92 92 92 96 96 96 38 74 1316 ------------------------------------------------------------ 1317 40 614 1538 9038 47 42 40 84 114 1318 128 614 1538 9038 151 136 129 166 202 1319 256 614 1538 9038 303 273 258 294 330 1320 518 614 1538 9038 614 552 523 574 610 1321 576 1228 1538 9038 682 614 582 614 650 1322 1442 1842 1538 9038 1709 1538 1457 1498 1534 1323 1500 1842 3076 9038 1777 1599 1516 1538 1574 1324 8942 11052 10766 9038 10599 9537 9038 8998 9034 1325 9000 11052 10766 18076 10667 9599 9096 9038 9074 1327 Figure 7: L2 Octets Per Packet 1329 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1330 MTU 590 1514 9014 590 1514 9014 any any 1331 OH 92 92 92 96 96 96 38 74 1332 -------------------------------------------------------------- 1333 40 2.0M 0.8M 0.1M 26.4M 29.3M 30.9M 14.9M 11.0M 1334 128 2.0M 0.8M 0.1M 8.2M 9.2M 9.7M 7.5M 6.2M 1335 256 2.0M 0.8M 0.1M 4.1M 4.6M 4.8M 4.3M 3.8M 1336 518 2.0M 0.8M 0.1M 2.0M 2.3M 2.4M 2.2M 2.1M 1337 576 1.0M 0.8M 0.1M 1.8M 2.0M 2.1M 2.0M 1.9M 1338 1442 678K 812K 138K 731K 812K 857K 844K 824K 1339 1500 678K 406K 138K 703K 781K 824K 812K 794K 1340 8942 113K 116K 138K 117K 131K 138K 139K 138K 1341 9000 113K 116K 69K 117K 130K 137K 138K 137K 1343 Figure 8: Packets Per Second on 10G Ethernet 1345 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1346 590 1514 9014 590 1514 9014 any any 1347 92 92 92 96 96 96 38 74 1348 ---------------------------------------------------------------------- 1349 40 6.51% 2.60% 0.44% 84.36% 93.76% 98.94% 47.62% 35.09% 1350 128 20.85% 8.32% 1.42% 84.36% 93.76% 98.94% 77.11% 63.37% 1351 256 41.69% 16.64% 2.83% 84.36% 93.76% 98.94% 87.07% 77.58% 1352 518 84.36% 33.68% 5.73% 84.36% 93.76% 98.94% 93.17% 87.50% 1353 576 46.91% 37.45% 6.37% 84.36% 93.76% 98.94% 93.81% 88.62% 1354 1442 78.28% 93.76% 15.95% 84.36% 93.76% 98.94% 97.43% 95.12% 1355 1500 81.43% 48.76% 16.60% 84.36% 93.76% 98.94% 97.53% 95.30% 1356 8942 80.91% 83.06% 98.94% 84.36% 93.76% 98.94% 99.58% 99.18% 1357 9000 81.43% 83.60% 49.79% 84.36% 93.76% 98.94% 99.58% 99.18% 1359 Figure 9: Percentage of Bandwidth on 10G Ethernet 1361 A sometimes unexpected result of using an AGGFRAG tunnel (or any 1362 packet aggregating tunnel) is that, for small to medium sized 1363 packets, the available bandwidth is actually greater than native 1364 Ethernet. This is due to the reduction in Ethernet framing overhead. 1365 This increased bandwidth is paid for with an increase in latency. 1366 This latency is the time to send the unrelated octets in the outer 1367 tunnel frame. The following table illustrates the latency for some 1368 common values on a 10G Ethernet link. The table also includes 1369 latency introduced by padding if using ESP with padding. 1371 ESP+Pad ESP+Pad IP-TFS IP-TFS 1372 1500 9000 1500 9000 1374 ------------------------------------------ 1375 40 1.12 us 7.12 us 1.17 us 7.17 us 1376 128 1.05 us 7.05 us 1.10 us 7.10 us 1377 256 0.95 us 6.95 us 1.00 us 7.00 us 1378 518 0.74 us 6.74 us 0.79 us 6.79 us 1379 576 0.70 us 6.70 us 0.74 us 6.74 us 1380 1442 0.00 us 6.00 us 0.05 us 6.05 us 1381 1500 1.20 us 5.96 us 0.00 us 6.00 us 1383 Figure 10: Added Latency 1385 Notice that the latency values are very similar between the two 1386 solutions; however, whereas IP-TFS provides for constant high 1387 bandwidth, in some cases even exceeding native Ethernet, ESP with 1388 padding often greatly reduces available bandwidth. 1390 Appendix D. Acknowledgements 1392 We would like to thank Don Fedyk for help in reviewing and editing 1393 this work. We would also like to thank Michael Richardson, Sean 1394 Turner, Valery Smyslov and Tero Kivinen for reviews and many 1395 suggestions for improvements, as well as Joseph Touch for the 1396 transport area review and suggested improvements. 1398 Appendix E. Contributors 1400 The following people made significant contributions to this document. 1402 Lou Berger 1403 LabN Consulting, L.L.C. 1405 Email: lberger@labn.net 1407 Author's Address 1409 Christian Hopps 1410 LabN Consulting, L.L.C. 1412 Email: chopps@chopps.org