idnits 2.17.00 (12 Aug 2021) /tmp/idnits24521/draft-ietf-ipsecme-iptfs-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1281 has weird spacing: '...4 any any...' == Line 1297 has weird spacing: '...4 any any...' -- The document date (February 22, 2021) is 452 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '--800--' is mentioned on line 1099, but not defined -- Looks like a reference, but probably isn't: '60' on line 1099 == Missing Reference: '-240-' is mentioned on line 1099, but not defined == Missing Reference: '--4000----------------------' is mentioned on line 1099, but not defined Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Hopps 3 Internet-Draft LabN Consulting, L.L.C. 4 Intended status: Standards Track February 22, 2021 5 Expires: August 26, 2021 7 IP-TFS: IP Traffic Flow Security Using Aggregation and Fragmentation 8 draft-ietf-ipsecme-iptfs-07 10 Abstract 12 This document describes a mechanism to enhance IPsec traffic flow 13 security (IP-TFS) by adding Traffic Flow Confidentiality (TFC) to 14 encrypted IP encapsulated traffic. TFC is provided by obscuring the 15 size and frequency of IP traffic using a fixed-sized, constant-send- 16 rate IPsec tunnel. The solution allows for congestion control as 17 well as non-constant send-rate usage. The mechanisms defined in this 18 document are generic with the intent of allowing for non-TFS uses, 19 but such uses are outside the scope of this document. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on August 26, 2021. 38 Copyright Notice 40 Copyright (c) 2021 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Terminology & Concepts . . . . . . . . . . . . . . . . . 4 57 2. The IP-TFS Tunnel . . . . . . . . . . . . . . . . . . . . . . 4 58 2.1. Tunnel Content . . . . . . . . . . . . . . . . . . . . . 4 59 2.2. Payload Content . . . . . . . . . . . . . . . . . . . . . 5 60 2.2.1. Data Blocks . . . . . . . . . . . . . . . . . . . . . 6 61 2.2.2. End Padding . . . . . . . . . . . . . . . . . . . . . 6 62 2.2.3. Fragmentation, Sequence Numbers and All-Pad Payloads 6 63 2.2.4. Empty Payload . . . . . . . . . . . . . . . . . . . . 8 64 2.2.5. IP Header Value Mapping . . . . . . . . . . . . . . . 8 65 2.2.6. IP Time-To-Live (TTL) and Tunnel errors . . . . . . . 9 66 2.2.7. Effective MTU of the Tunnel . . . . . . . . . . . . . 9 67 2.3. Exclusive SA Use . . . . . . . . . . . . . . . . . . . . 9 68 2.4. Modes of Operation . . . . . . . . . . . . . . . . . . . 10 69 2.4.1. Non-Congestion Controlled Mode . . . . . . . . . . . 10 70 2.4.2. Congestion Controlled Mode . . . . . . . . . . . . . 10 71 2.5. Summary of Receiver Processing . . . . . . . . . . . . . 12 72 3. Congestion Information . . . . . . . . . . . . . . . . . . . 12 73 3.1. ECN Support . . . . . . . . . . . . . . . . . . . . . . . 13 74 4. Configuration . . . . . . . . . . . . . . . . . . . . . . . . 14 75 4.1. Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . 14 76 4.2. Fixed Packet Size . . . . . . . . . . . . . . . . . . . . 14 77 4.3. Congestion Control . . . . . . . . . . . . . . . . . . . 14 78 5. IKEv2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 79 5.1. USE_AGGFRAG Notification Message . . . . . . . . . . . . 14 80 6. Packet and Data Formats . . . . . . . . . . . . . . . . . . . 15 81 6.1. AGGFRAG_PAYLOAD Payload . . . . . . . . . . . . . . . . . 15 82 6.1.1. Non-Congestion Control AGGFRAG_PAYLOAD Payload Format 16 83 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format . . 16 84 6.1.3. Data Blocks . . . . . . . . . . . . . . . . . . . . . 18 85 6.1.4. IKEv2 USE_AGGFRAG Notification Message . . . . . . . 20 86 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 87 7.1. AGGFRAG_PAYLOAD Sub-Type Registry . . . . . . . . . . . . 21 88 7.2. USE_AGGFRAG Notify Message Status Type . . . . . . . . . 21 89 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 90 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 91 9.1. Normative References . . . . . . . . . . . . . . . . . . 22 92 9.2. Informative References . . . . . . . . . . . . . . . . . 22 93 Appendix A. Example Of An Encapsulated IP Packet Flow . . . . . 24 94 Appendix B. A Send and Loss Event Rate Calculation . . . . . . . 25 95 Appendix C. Comparisons of IP-TFS . . . . . . . . . . . . . . . 25 96 C.1. Comparing Overhead . . . . . . . . . . . . . . . . . . . 25 97 C.1.1. IP-TFS Overhead . . . . . . . . . . . . . . . . . . . 26 98 C.1.2. ESP with Padding Overhead . . . . . . . . . . . . . . 26 99 C.2. Overhead Comparison . . . . . . . . . . . . . . . . . . . 27 100 C.3. Comparing Available Bandwidth . . . . . . . . . . . . . . 28 101 C.3.1. Ethernet . . . . . . . . . . . . . . . . . . . . . . 28 102 Appendix D. Acknowledgements . . . . . . . . . . . . . . . . . . 30 103 Appendix E. Contributors . . . . . . . . . . . . . . . . . . . . 30 104 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 30 106 1. Introduction 108 Traffic Analysis ([RFC4301], [AppCrypt]) is the act of extracting 109 information about data being sent through a network. While directly 110 obscuring the data with encryption [RFC4303], the traffic pattern 111 itself exposes information due to variations in its shape and timing 112 ([RFC8546], [AppCrypt]). Hiding the size and frequency of traffic is 113 referred to as Traffic Flow Confidentiality (TFC) per [RFC4303]. 115 [RFC4303] provides for TFC by allowing padding to be added to 116 encrypted IP packets and allowing for transmission of all-pad packets 117 (indicated using protocol 59). This method has the major limitation 118 that it can significantly under-utilize the available bandwidth. 120 The IP-TFS (IP Traffic Flow Security) solution provides for full TFC 121 without the aforementioned bandwidth limitation. This is 122 accomplished by using a constant-send-rate IPsec [RFC4303] tunnel 123 with fixed-sized encapsulating packets; however, these fixed-sized 124 packets can contain partial, whole or multiple IP packets to maximize 125 the bandwidth of the tunnel. A non-constant send-rate is allowed, 126 but the confidentiality properties of its use are outside the scope 127 of this document. 129 For a comparison of the overhead of IP-TFS with the RFC4303 130 prescribed TFC solution see Appendix C. 132 Additionally, IP-TFS provides for operating fairly within congested 133 networks [RFC2914]. This is important for when the IP-TFS user is 134 not in full control of the domain through which the IP-TFS tunnel 135 path flows. 137 The mechanisms defined in this document are generic with the intent 138 of allowing for non-TFS uses, but such uses are outside the scope of 139 this document. 141 1.1. Terminology & Concepts 143 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 144 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 145 "OPTIONAL" in this document are to be interpreted as described in BCP 146 14 [RFC2119] [RFC8174] when, and only when, they appear in all 147 capitals, as shown here. 149 This document assumes familiarity with IP security concepts including 150 TFC as described in [RFC4301]. 152 2. The IP-TFS Tunnel 154 As mentioned in Section 1 IP-TFS utilizes an IPsec [RFC4303] tunnel 155 as its transport. To provide for full TFC, fixed-sized encapsulating 156 packets are sent at a constant rate on the tunnel. 158 The primary input to the tunnel algorithm is the requested bandwidth 159 to be used by the tunnel. Two values are then required to provide 160 for this bandwidth use, the fixed size of the encapsulating packets, 161 and rate at which to send them. 163 The fixed packet size MAY either be specified manually or be 164 determined through other methods such as the Packetization Layer MTU 165 Discovery (PLMTUD) ([RFC4821], [RFC8899]) or Path MTU discovery 166 (PMTUD) ([RFC1191], [RFC8201]). PMTUD is known to have issues so 167 PLMTUD is considered the more robust option. For PLMTUD, congestion 168 control payloads can be used as in-band probes (see Section 6.1.2 and 169 [RFC8899]). 171 Given the encapsulating packet size and the requested bandwidth to be 172 used, the corresponding packet send rate can be calculated. The 173 packet send rate is the requested bandwidth to be used divided by the 174 size of the encapsulating packet. 176 The egress (receiving) side of the IP-TFS tunnel MUST allow for and 177 expect the ingress (sending) side of the IP-TFS tunnel to vary the 178 size and rate of sent encapsulating packets, unless constrained by 179 other policy. 181 2.1. Tunnel Content 183 As previously mentioned, one issue with the TFC padding solution in 184 [RFC4303] is the large amount of wasted bandwidth as only one IP 185 packet can be sent per encapsulating packet. In order to maximize 186 bandwidth, IP-TFS breaks this one-to-one association. 188 IP-TFS aggregates as well as fragments the inner IP traffic flow into 189 fixed-sized encapsulating IPsec tunnel packets. Padding is only 190 added to the the tunnel packets if there is no data available to be 191 sent at the time of tunnel packet transmission, or if fragmentation 192 has been disabled by the receiver. 194 This is accomplished using a new Encapsulating Security Payload (ESP, 195 [RFC4303]) Next Header field value AGGFRAG_PAYLOAD (Section 6.1). 197 Other non-IP-TFS uses of this aggregation and fragmentation 198 encapsulation have been identified, such as increased performance 199 through packet aggregation, as well as handling MTU issues using 200 fragmentation. These uses are not defined here, but are also not 201 restricted by this document. 203 2.2. Payload Content 205 The AGGFRAG_PAYLOAD payload content defined in this document is 206 comprised of a 4 or 24 octet header followed by either a partial 207 datablock, a full datablock, or multiple partial or full datablocks. 208 The following diagram illustrates this payload within the ESP packet. 209 See Section 6.1 for the exact formats of the AGGFRAG_PAYLOAD payload. 211 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 . Outer Encapsulating Header ... . 213 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 . ESP Header... . 215 +---------------------------------------------------------------+ 216 | [AGGFRAG subtype/flags] : BlockOffset | 217 +---------------------------------------------------------------+ 218 : [Optional Congestion Info] : 219 +---------------------------------------------------------------+ 220 | DataBlocks ... ~ 221 ~ ~ 222 ~ | 223 +---------------------------------------------------------------| 224 . ESP Trailer... . 225 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Figure 1: Layout of an IP-TFS IPsec Packet 229 The "BlockOffset" value is either zero or some offset into or past 230 the end of the "DataBlocks" data. 232 If the "BlockOffset" value is zero it means that the "DataBlocks" 233 data begins with a new data block. 235 Conversely, if the "BlockOffset" value is non-zero it points to the 236 start of the new data block, and the initial "DataBlocks" data 237 belongs to the data block that is still being re-assembled. 239 If the "BlockOffset" points past the end of the "DataBlocks" data 240 then the next data block occurs in a subsequent encapsulating packet. 242 Having the "BlockOffset" always point at the next available data 243 block allows for recovering the next inner packet in the presence of 244 outer encapsulating packet loss. 246 An example IP-TFS packet flow can be found in Appendix A. 248 2.2.1. Data Blocks 250 +---------------------------------------------------------------+ 251 | Type | rest of IPv4, IPv6 or pad. 252 +-------- 254 Figure 2: Layout of a DataBlock 256 A data block is defined by a 4-bit type code followed by the data 257 block data. The type values have been carefully chosen to coincide 258 with the IPv4/IPv6 version field values so that no per-data block 259 type overhead is required to encapsulate an IP packet. Likewise, the 260 length of the data block is extracted from the encapsulated IPv4's 261 "Total Length" or IPv6's "Payload Length" fields. 263 2.2.2. End Padding 265 Since a data block's type is identified in its first 4-bits, the only 266 time padding is required is when there is no data to encapsulate. 267 For this end padding a "Pad Data Block" is used. 269 2.2.3. Fragmentation, Sequence Numbers and All-Pad Payloads 271 In order for a receiver to reassemble fragmented inner-packets, the 272 sender MUST send the inner-packet fragments back-to-back in the 273 logical outer packet stream (i.e., using consecutive ESP sequence 274 numbers). However, the sender is allowed to insert "all-pad" 275 payloads (i.e., payloads with a "BlockOffset" of zero and a single 276 pad "DataBlock") in between the packets carrying the inner-packet 277 fragment payloads. This interleaving of all-pad payloads allows the 278 sender to always send a tunnel packet, regardless of the 279 encapsulation computational requirements. 281 When a receiver is reassembling an inner-packet, and it receives an 282 "all-pad" payload, it increments the expected sequence number that 283 the next inner-packet fragment is expected to arrive in. 285 Given the above, the receiver will need to handle out-of-order 286 arrival of outer ESP packets prior to reassembly processing. ESP 287 already provides for optionally detecting replay attacks. Detecting 288 replay attacks normally utilizes a window method. A similar sequence 289 number based sliding window can be used to correct re-ordering of the 290 outer packet stream. Receiving a larger (newer) sequence number 291 packet advances the window, and received older ESP packets whose 292 sequence numbers the window has passed by are dropped. A good choice 293 for the size of this window depends on the amount of re-ordering the 294 user may normally experience. 296 As the amount of reordering that may be present is hard to predict, 297 the window size SHOULD be configurable by the user. Implementations 298 MAY also dynamically adjust the reordering window based on actual 299 reordering seen in arriving packets. Finally, note that as IP-TFS is 300 sending a continuous stream of packets there is no requirement for 301 timers (although there's no prohibition either) as newly arrived 302 packets will cause the window to advance and older packets will then 303 be processed as they leave the window. Implementations that are 304 concerned about memory use when packets are delayed (e.g., when an SA 305 deletion is delayed) can of course use timers to drop packets as 306 well. 308 While ESP guarantees an increasing sequence number with subsequently 309 sent packets, it does not actually require the sequence numbers to be 310 generated with no gaps (e.g., sending only even numbered sequence 311 numbers would be allowed as long as they are always increasing). 312 Gaps in the sequence numbers will not work for this document so the 313 sequence number stream MUST increase monotonically by 1 for each 314 subsequent packet. 316 When using the AGGFRAG_PAYLOAD in conjunction with replay detection, 317 the window size for both MAY be reduced to share the smaller of the 318 two window sizes. This is because packets outside of the smaller 319 window but inside the larger would still be dropped by the mechanism 320 with the smaller window size. 322 Finally, as sequence numbers are reset when switching SAs (e.g., when 323 re-keying a child SA), senders MUST NOT send initial fragments of an 324 inner packet using one SA and subsequent fragments in a different SA. 326 2.2.3.1. Optional Extra Padding 328 When the tunnel bandwidth is not being fully utilized, a sender MAY 329 pad-out the current encapsulating packet in order to deliver an inner 330 packet un-fragmented in the following outer packet. The benefit 331 would be to avoid inner-packet fragmentation in the presence of a 332 bursty offered load (non-bursty traffic will naturally not fragment). 333 Senders MAY also choose to allow for a minimum fragment size to be 334 configured (e.g., as a percentage of the AGGFRAG_PAYLOAD payload 335 size) to avoid fragmentation at the cost of tunnel bandwidth. The 336 cost with these methods is complexity and added delay of inner 337 traffic. The main advantage to avoiding fragmentation is to minimize 338 inner packet loss in the presence of outer packet loss. When this is 339 worthwhile (e.g., how much loss and what type of loss is required, 340 given different inner traffic shapes and utilization, for this to 341 make sense), and what values to use for the allowable/added delay may 342 be worth researching, but is outside the scope of this document. 344 While use of padding to avoid fragmentation does not impact 345 interoperability, used inappropriately it can reduce the effective 346 throughput of a tunnel. Senders implementing either of the above 347 approaches will need to take care to not reduce the effective 348 capacity, and overall utility, of the tunnel through the overuse of 349 padding. 351 2.2.4. Empty Payload 353 To support reporting of congestion control information (described 354 later) on a non-AGGFRAG_PAYLOAD enabled SA, IP-TFS allows for the 355 sending of an AGGFRAG_PAYLOAD payload with no data blocks (i.e., the 356 ESP payload length is equal to the AGGFRAG_PAYLOAD header length). 357 This special payload is called an empty payload. 359 Currently this situation is only applicable in non-IKEv2 use cases. 361 2.2.5. IP Header Value Mapping 363 [RFC4301] provides some direction on when and how to map various 364 values from an inner IP header to the outer encapsulating header, 365 namely the Don't-Fragment (DF) bit ([RFC0791] and [RFC8200]), the 366 Differentiated Services (DS) field [RFC2474] and the Explicit 367 Congestion Notification (ECN) field [RFC3168]. Unlike [RFC4301], IP- 368 TFS may and often will be encapsulating more than one IP packet per 369 ESP packet. To deal with this, these mappings are restricted 370 further. 372 2.2.5.1. DF bit 374 IP-TFS never maps the inner DF bit as it is unrelated to the IP-TFS 375 tunnel functionality; IP-TFS never needs to IP fragment the inner 376 packets and the inner packets will not affect the fragmentation of 377 the outer encapsulation packets. 379 2.2.5.2. ECN value 381 The ECN value need not be mapped as any congestion related to the 382 constant-send-rate IP-TFS tunnel is unrelated (by design) to the 383 inner traffic flow. The sender MAY still set the ECN value of inner 384 packets based on the normal ECN specification [RFC3168]. 386 2.2.5.3. DS field 388 By default the DS field SHOULD NOT be copied, although a sender MAY 389 choose to allow for configuration to override this behavior. A 390 sender SHOULD also allow the DS value to be set by configuration. 392 2.2.6. IP Time-To-Live (TTL) and Tunnel errors 394 [RFC4301] specifies how to modify the inner packet TTL [RFC0791]. 396 Any errors (e.g., ICMP errors arriving back at the tunnel ingress due 397 to tunnel traffic) are handled the same as with non IP-TFS IPsec 398 tunnels. 400 2.2.7. Effective MTU of the Tunnel 402 Unlike [RFC4301], there is normally no effective MTU (EMTU) on an IP- 403 TFS tunnel as all IP packet sizes are properly transmitted without 404 requiring IP fragmentation prior to tunnel ingress. That said, a 405 sender MAY allow for explicitly configuring an MTU for the tunnel. 407 If IP-TFS fragmentation has been disabled, then the tunnel's EMTU and 408 behaviors are the same as normal IPsec tunnels [RFC4301]. 410 2.3. Exclusive SA Use 412 This document does not specify mixed use of an AGGFRAG_PAYLOAD 413 enabled SA. A sender MUST only send AGGFRAG_PAYLOAD payloads over an 414 SA configured for AGGFRAG_PAYLOAD use. 416 2.4. Modes of Operation 418 Just as with normal IPsec/ESP tunnels, IP-TFS tunnels are 419 unidirectional. Bidirectional IP-TFS functionality is achieved by 420 setting up 2 IP-TFS tunnels, one in either direction. 422 An IP-TFS tunnel can operate in 2 modes, a non-congestion controlled 423 mode and congestion controlled mode. 425 2.4.1. Non-Congestion Controlled Mode 427 In the non-congestion controlled mode, IP-TFS sends fixed-sized 428 packets at a constant rate. The packet send rate is constant and is 429 not automatically adjusted regardless of any network congestion 430 (e.g., packet loss). 432 For similar reasons as given in [RFC7510] the non-congestion 433 controlled mode should only be used where the user has full 434 administrative control over the path the tunnel will take. This is 435 required so the user can guarantee the bandwidth and also be sure as 436 to not be negatively affecting network congestion [RFC2914]. In this 437 case packet loss should be reported to the administrator (e.g., via 438 syslog, YANG notification, SNMP traps, etc) so that any failures due 439 to a lack of bandwidth can be corrected. 441 Non-congestion control mode is also appropriate if ESP over TCP is in 442 use [RFC8229]. 444 2.4.2. Congestion Controlled Mode 446 With the congestion controlled mode, IP-TFS adapts to network 447 congestion by lowering the packet send rate to accommodate the 448 congestion, as well as raising the rate when congestion subsides. 449 Since overhead is per packet, by allowing for maximal fixed-size 450 packets and varying the send rate transport overhead is minimized. 452 The output of the congestion control algorithm will adjust the rate 453 at which the ingress sends packets. While this document does not 454 require a specific congestion control algorithm, best current 455 practice RECOMMENDS that the algorithm conform to [RFC5348]. 456 Congestion control principles are documented in [RFC2914] as well. 457 [RFC4342] provides an example of the [RFC5348] algorithm which 458 matches the requirements of IP-TFS (i.e., designed for fixed-size 459 packet and send rate varied based on congestion. 461 The required inputs for the TCP friendly rate control algorithm 462 described in [RFC5348] are the receiver's loss event rate and the 463 sender's estimated round-trip time (RTT). These values are provided 464 by IP-TFS using the congestion information header fields described in 465 Section 3. In particular, these values are sufficient to implement 466 the algorithm described in [RFC5348]. 468 At a minimum, the congestion information MUST be sent, from the 469 receiver and from the sender, at least once per RTT. Prior to 470 establishing an RTT the information SHOULD be sent constantly from 471 the sender and the receiver so that an RTT estimate can be 472 established. Not receiving this information over multiple 473 consecutive RTT intervals should be considered a congestion event 474 that causes the sender to adjust its sending rate lower. For 475 example, [RFC4342] calls this the "no feedback timeout" and it is 476 equal to 4 RTT intervals. When a "no feedback timeout" has occurred 477 [RFC4342] halves the sending rate. 479 An implementation MAY choose to always include the congestion 480 information in its IP-TFS payload header if sending on an IP-TFS 481 enabled SA. Since IP-TFS normally will operate with a large packet 482 size, the congestion information should represent a small portion of 483 the available tunnel bandwidth. An implementation choosing to always 484 send the data MAY also choose to only update the "LossEventRate" and 485 "RTT" header field values it sends every "RTT" though. 487 When choosing a congestion control algorithm (or a selection of 488 algorithms) note that IP-TFS is not providing for reliable delivery 489 of IP traffic, and so per packet ACKs are not required and are not 490 provided. 492 It is worth noting that the variable send-rate of a congestion 493 controlled IP-TFS tunnel, is not private; however, this send-rate is 494 being driven by network congestion, and as long as the encapsulated 495 (inner) traffic flow shape and timing are not directly affecting the 496 (outer) network congestion, the variations in the tunnel rate will 497 not weaken the provided inner traffic flow confidentiality. 499 2.4.2.1. Circuit Breakers 501 In additional to congestion control, implementations MAY choose to 502 define and implement circuit breakers [RFC8084] as a recovery method 503 of last resort. Enabling circuit breakers is also a reason a user 504 may wish to enable congestion information reports even when using the 505 non-congestion controlled mode of operation. The definition of 506 circuit breakers are outside the scope of this document. 508 2.5. Summary of Receiver Processing 510 An IP-TFS receiver has a few tasks to perform. 512 The receiver first reorders, possibly out-of-order ESP packets 513 received on an SA into in-sequence-order AGGFRAG_PAYLOAD payloads 514 (Section 2.2.3). If congestion control is enabled, the receiver 515 considers a packet lost when it's sequence number is abandoned (e.g., 516 pushed out of the re-ordering window, or timed-out) by the reordering 517 algorithm. 519 Additionally, if congestion control is enabled, the receiver sends 520 congestion control data (Section 6.1.2) back to the sender as 521 described in Section 2.4.2 and Section 3. 523 Finally, the receiver processes the now in-order AGGFRAG_PAYLOAD 524 payload stream to extract the inner-packets (Section 2.2.3, 525 Section 6.1). 527 3. Congestion Information 529 In order to support the congestion control mode, the sender needs to 530 know the loss event rate and to approximate the RTT [RFC5348]. In 531 order to obtain these values, the receiver sends congestion control 532 information on it's SA back to the sender. Thus, to support 533 congestion control the receiver must have a paired SA back to the 534 sender (this is always the case when the tunnel was created using 535 IKEv2). If the SA back to the sender is a non-AGGFRAG_PAYLOAD 536 enabled SA then an AGGFRAG_PAYLOAD empty payload (i.e., header only) 537 is used to convey the information. 539 In order to calculate a loss event rate compatible with [RFC5348], 540 the receiver needs to have a round-trip time estimate. Thus the 541 sender communicates this estimate in the "RTT" header field. On 542 startup this value will be zero as no RTT estimate is yet known. 544 In order for the sender to estimate its "RTT" value, the sender 545 places a timestamp value in the "TVal" header field. On first 546 receipt of this "TVal", the receiver records the new "TVal" value 547 along with the time it arrived locally, subsequent receipt of the 548 same "TVal" MUST NOT update the recorded time. 550 When the receiver sends its CC header it places this latest recorded 551 "TVal" in the "TEcho" header field, along with 2 delay values, "Echo 552 Delay" and "Transmit Delay". The "Echo Delay" value is the time 553 delta from the recorded arrival time of "TVal" and the current clock 554 in microseconds. The second value, "Transmit Delay", is the 555 receiver's current transmission delay on the tunnel (i.e., the 556 average time between sending packets on its half of the IP-TFS 557 tunnel). 559 When the sender receives back its "TVal" in the "TEcho" header field 560 it calculates 2 RTT estimates. The first is the actual delay found 561 by subtracting the "TEcho" value from its current clock and then 562 subtracting "Echo Delay" as well. The second RTT estimate is found 563 by adding the received "Transmit Delay" header value to the senders 564 own transmission delay (i.e., the average time between sending 565 packets on its half of the IP-TFS tunnel). The larger of these 2 RTT 566 estimates SHOULD be used as the "RTT" value. 568 The two RTT estimates are required to handle different combinations 569 of faster or slower tunnel packet paths with faster or slower fixed 570 tunnel rates. Choosing the larger of the two values guarantees that 571 the "RTT" is never considered faster than the aggregate transmission 572 delay based on the IP-TFS tunnel rate (the second estimate), as well 573 as never being considered faster than the actual RTT along the tunnel 574 packet path (the first estimate). 576 The receiver also calculates, and communicates in the "LossEventRate" 577 header field, the loss event rate for use by the sender. This is 578 slightly different from [RFC4342] which periodically sends all the 579 loss interval data back to the sender so that it can do the 580 calculation. See Appendix B for a suggested way to calculate the 581 loss event rate value. Initially this value will be zero (indicating 582 no loss) until enough data has been collected by the receiver to 583 update it. 585 3.1. ECN Support 587 In additional to normal packet loss information IP-TFS supports use 588 of the ECN bits in the encapsulating IP header [RFC3168] for 589 identifying congestion. If ECN use is enabled and a packet arrives 590 at the egress (receiving) side with the Congestion Experienced (CE) 591 value set, then the receiver considers that packet as being dropped, 592 although it does not drop it. The receiver MUST set the E bit in any 593 AGGFRAG_PAYLOAD payload header containing a "LossEventRate" value 594 derived from a CE value being considered. 596 As noted in [RFC3168] the ECN bits are not protected by IPsec and 597 thus may constitute a covert channel. For this reason, ECN use 598 SHOULD NOT be enabled by default. 600 4. Configuration 602 IP-TFS is meant to be deployable with a minimal amount of 603 configuration. All IP-TFS specific configuration should be specified 604 at the unidirectional tunnel ingress (sending) side. It is intended 605 that non-IKEv2 operation is supported, at least, with local static 606 configuration. 608 4.1. Bandwidth 610 Bandwidth is a local configuration option. For non-congestion 611 controlled mode, the bandwidth SHOULD be configured. For congestion 612 controlled mode, the bandwidth can be configured or the congestion 613 control algorithm discovers and uses the maximum bandwidth available. 614 No standardized configuration method is required. 616 4.2. Fixed Packet Size 618 The fixed packet size to be used for the tunnel encapsulation packets 619 MAY be configured manually or can be automatically determined using 620 other methods such as PLMTUD ([RFC4821], [RFC8899]) or PMTUD 621 ([RFC1191], [RFC8201]). As PMTUD is known to have issues, PLMTUD is 622 considered the more robust option. No standardized configuration 623 method is required. 625 4.3. Congestion Control 627 Congestion control is a local configuration option. No standardized 628 configuration method is required. 630 5. IKEv2 632 5.1. USE_AGGFRAG Notification Message 634 As mentioned previously IP-TFS tunnels utilize ESP payloads of type 635 AGGFRAG_PAYLOAD. 637 When using IKEv2, a new "USE_AGGFRAG" Notification Message enables 638 the AGGFRAG_PAYLOAD payload on a child SA pair. The method used is 639 similar to how USE_TRANSPORT_MODE is negotiated, as described in 640 [RFC7296]. 642 To request use of the AGGFRAG_PAYLOAD payload on the Child SA pair, 643 the initiator includes the USE_AGGFRAG notification in an SA payload 644 requesting a new Child SA (either during the initial IKE_AUTH or 645 during CREATE_CHILD_SA exchanges). If the request is accepted then 646 the response MUST also include a notification of type USE_AGGFRAG. 647 If the responder declines the request the child SA will be 648 established without AGGFRAG_PAYLOAD payload use enabled. If this is 649 unacceptable to the initiator, the initiator MUST delete the child 650 SA. 652 As the use of the AGGFRAG_PAYLOAD payload is currently only defined 653 for non-transport mode tunnels, the USE_AGGFRAG notification MUST NOT 654 be combined with USE_TRANSPORT notification. 656 The USE_AGGFRAG notification contains a 1 octet payload of flags that 657 specify requirements from the sender of the notification. If any 658 requirement flags are not understood or cannot be supported by the 659 receiver then the receiver SHOULD NOT enable use of AGGFRAG_PAYLOAD 660 (either by not responding with the USE_AGGFRAG notification, or in 661 the case of the initiator, by deleting the child SA if the now 662 established non-AGGFRAG_PAYLOAD using SA is unacceptable). 664 The notification type and payload flag values are defined in 665 Section 6.1.4. 667 6. Packet and Data Formats 669 The packet and data formats defined below are generic with the intent 670 of allowing for non-IP-TFS uses, but such uses are outside the scope 671 of this document. 673 6.1. AGGFRAG_PAYLOAD Payload 675 ESP Next Header value: 0x5 677 An IP-TFS payload is identified by the ESP Next Header value 678 AGGFRAG_PAYLOAD which has the value 0x5. The value 5 was chosen to 679 not conflict with other used values. The first octet of this payload 680 indicates the format of the remaining payload data. 682 0 1 2 3 4 5 6 7 683 +-+-+-+-+-+-+-+-+-+-+- 684 | Sub-type | ... 685 +-+-+-+-+-+-+-+-+-+-+- 687 Sub-type: 688 An 8-bit value indicating the payload format. 690 This document defines 2 payload sub-types. These payload formats are 691 defined in the following sections. 693 6.1.1. Non-Congestion Control AGGFRAG_PAYLOAD Payload Format 695 The non-congestion control AGGFRAG_PAYLOAD payload is comprised of a 696 4 octet header followed by a variable amount of "DataBlocks" data as 697 shown below. 699 1 2 3 700 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 702 | Sub-Type (0) | Reserved | BlockOffset | 703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 704 | DataBlocks ... 705 +-+-+-+-+-+-+-+-+-+-+- 707 Sub-type: 708 An octet indicating the payload format. For this non-congestion 709 control format, the value is 0. 711 Reserved: 712 An octet set to 0 on generation, and ignored on receipt. 714 BlockOffset: 715 A 16-bit unsigned integer counting the number of octets of 716 "DataBlocks" data before the start of a new data block. If the 717 start of a new data block occurs in a subsequent payload the 718 "BlockOffset" will point past the end of the "DataBlocks" data. 719 In this case all the "DataBlocks" data belongs to the current data 720 block being assembled. When the "BlockOffset" extends into 721 subsequent payloads it continues to only count "DataBlocks" data 722 (i.e., it does not count subsequent packets non-"DataBlocks" data 723 such as header octets). 725 DataBlocks: 726 Variable number of octets that begins with the start of a data 727 block, or the continuation of a previous data block, followed by 728 zero or more additional data blocks. 730 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format 732 The congestion control AGGFRAG_PAYLOAD payload is comprised of a 24 733 octet header followed by a variable amount of "DataBlocks" data as 734 shown below. 736 1 2 3 737 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 739 | Sub-type (1) | Reserved |P|E| BlockOffset | 740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 741 | LossEventRate | 742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 743 | RTT | Echo Delay ... 744 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 745 ... Echo Delay | Transmit Delay | 746 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 747 | TVal | 748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 749 | TEcho | 750 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 751 | DataBlocks ... 752 +-+-+-+-+-+-+-+-+-+-+- 754 Sub-type: 755 An octet indicating the payload format. For this congestion 756 control format, the value is 1. 758 Reserved: 759 A 6-bit field set to 0 on generation, and ignored on receipt. 761 P: 762 A 1-bit value if set indicates that PLMTUD probing is in progress. 763 This information can be used to avoid treating missing packets as 764 loss events by the CC algorithm when running the PLMTUD probe 765 algorithm. 767 E: 768 A 1-bit value if set indicates that Congestion Experienced (CE) 769 ECN bits were received and used in deriving the reported 770 "LossEventRate". 772 BlockOffset: 773 The same value as the non-congestion controlled payload format 774 value. 776 LossEventRate: 777 A 32-bit value specifying the inverse of the current loss event 778 rate as calculated by the receiver. A value of zero indicates no 779 loss. Otherwise the loss event rate is "1/LossEventRate". 781 RTT: 782 A 22-bit value specifying the sender's current round-trip time 783 estimate in microseconds. The value MAY be zero prior to the 784 sender having calculated a round-trip time estimate. The value 785 SHOULD be set to zero on non-AGGFRAG_PAYLOAD enabled SAs. If the 786 value is equal to or larger than "0x3FFFFF" it MUST be set to 787 "0x3FFFFF". 789 Echo Delay: 790 A 21-bit value specifying the delay in microseconds incurred 791 between the receiver first receiving the "TVal" value which it is 792 sending back in "TEcho". If the value is equal to or larger than 793 "0x1FFFFF" it MUST be set to "0x1FFFFF". 795 Transmit Delay: 796 A 21-bit value specifying the transmission delay in microseconds. 797 This is the fixed (or average) delay on the receiver between it 798 sending packets on the IPTFS tunnel. If the value is equal to or 799 larger than "0x1FFFFF" it MUST be set to "0x1FFFFF". 801 TVal: 802 An opaque 32-bit value that will be echoed back by the receiver in 803 later packets in the "TEcho" field, along with an "Echo Delay" 804 value of how long that echo took. 806 TEcho: 807 The opaque 32-bit value from a received packet's "TVal" field. 808 The received "TVal" is placed in "TEcho" along with an "Echo 809 Delay" value indicating how long it has been since receiving the 810 "TVal" value. 812 DataBlocks: 813 Variable number of octets that begins with the start of a data 814 block, or the continuation of a previous data block, followed by 815 zero or more additional data blocks. For the special case of 816 sending congestion control information on an non-IP-TFS enabled SA 817 this value MUST be empty (i.e., be zero octets long). 819 6.1.3. Data Blocks 821 1 2 3 822 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 823 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 824 | Type | IPv4, IPv6 or pad... 825 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 827 Type: 828 A 4-bit field where 0x0 identifies a pad data block, 0x4 indicates 829 an IPv4 data block, and 0x6 indicates an IPv6 data block. 831 6.1.3.1. IPv4 Data Block 833 1 2 3 834 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 835 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 836 | 0x4 | IHL | TypeOfService | TotalLength | 837 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 838 | Rest of the inner packet ... 839 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 841 These values are the actual values within the encapsulated IPv4 842 header. In other words, the start of this data block is the start of 843 the encapsulated IP packet. 845 Type: 846 A 4-bit value of 0x4 indicating IPv4 (i.e., first nibble of the 847 IPv4 packet). 849 TotalLength: 850 The 16-bit unsigned integer "Total Length" field of the IPv4 inner 851 packet. 853 6.1.3.2. IPv6 Data Block 855 1 2 3 856 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 858 | 0x6 | TrafficClass | FlowLabel | 859 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 860 | PayloadLength | Rest of the inner packet ... 861 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 863 These values are the actual values within the encapsulated IPv6 864 header. In other words, the start of this data block is the start of 865 the encapsulated IP packet. 867 Type: 868 A 4-bit value of 0x6 indicating IPv6 (i.e., first nibble of the 869 IPv6 packet). 871 PayloadLength: 872 The 16-bit unsigned integer "Payload Length" field of the inner 873 IPv6 inner packet. 875 6.1.3.3. Pad Data Block 877 1 2 3 878 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 880 | 0x0 | Padding ... 881 +-+-+-+-+-+-+-+-+-+-+- 883 Type: 884 A 4-bit value of 0x0 indicating a padding data block. 886 Padding: 887 Extends to end of the encapsulating packet. 889 6.1.4. IKEv2 USE_AGGFRAG Notification Message 891 As discussed in Section 5.1, a notification message USE_AGGFRAG is 892 used to negotiate use of the ESP AGGFRAG_PAYLOAD Next Header value. 894 The USE_AGGFRAG Notification Message State Type is (TBD2). 896 The notification payload contains 1 octet of requirement flags. 897 There are currently 2 requirement flags defined. This may be revised 898 by later specifications. 900 +-+-+-+-+-+-+-+-+ 901 |0|0|0|0|0|0|C|D| 902 +-+-+-+-+-+-+-+-+ 904 0: 905 6 bits - reserved, MUST be zero on send, unless defined by later 906 specifications. 908 C: 909 Congestion Control bit. If set, then the sender is requiring that 910 congestion control information MUST be returned to it periodically 911 as defined in Section 3. 913 D: 914 Don't Fragment bit. If set, indicates the sender of the notify 915 message does not support receiving packet fragments (i.e., inner 916 packets MUST be sent using a single "Data Block"). This value 917 only applies to what the sender is capable of receiving; the 918 sender MAY still send packet fragments unless similarly restricted 919 by the receiver in it's USE_AGGFRAG notification. 921 7. IANA Considerations 923 7.1. AGGFRAG_PAYLOAD Sub-Type Registry 925 This document requests IANA create a registry called "AGGFRAG_PAYLOAD 926 Sub-Type Registry" under a new category named "ESP AGGFRAG_PAYLOAD 927 Parameters". The registration policy for this registry is "Expert 928 Review" ([RFC8126] and [RFC7120]). 930 Name: 931 AGGFRAG_PAYLOAD Sub-Type Registry 933 Description: 934 AGGFRAG_PAYLOAD Payload Formats. 936 Reference: 937 This document 939 This initial content for this registry is as follows: 941 Sub-Type Name Reference 942 -------------------------------------------------------- 943 0 Non-Congestion Control Format This document 944 1 Congestion Control Format This document 945 3-255 Reserved 947 7.2. USE_AGGFRAG Notify Message Status Type 949 This document requests a status type USE_AGGFRAG be allocated from 950 the "IKEv2 Notify Message Types - Status Types" registry. 952 Value: 953 TBD2 955 Name: 956 USE_AGGFRAG 958 Reference: 959 This document 961 8. Security Considerations 963 This document describes a mechanism to add TFC to IP traffic. Use of 964 this mechanism is expected to increase the security of the traffic 965 being transported. Other than the additional security afforded by 966 using this mechanism, IP-TFS utilizes the security protocols 967 [RFC4303] and [RFC7296] and so their security considerations apply to 968 IP-TFS as well. 970 As noted in (Section 3.1) the ECN bits are not protected by IPsec and 971 thus may constitute a covert channel. For this reason, ECN use 972 SHOULD NOT be enabled by default. 974 As noted previously in Section 2.4.2, for TFC to be fully maintained 975 the encapsulated traffic flow should not be affecting network 976 congestion in a predictable way, and if it would be then non- 977 congestion controlled mode use should be considered instead. 979 9. References 981 9.1. Normative References 983 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 984 Requirement Levels", BCP 14, RFC 2119, 985 DOI 10.17487/RFC2119, March 1997, 986 . 988 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 989 RFC 4303, DOI 10.17487/RFC4303, December 2005, 990 . 992 [RFC7296] Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T. 993 Kivinen, "Internet Key Exchange Protocol Version 2 994 (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October 995 2014, . 997 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 998 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 999 May 2017, . 1001 9.2. Informative References 1003 [AppCrypt] 1004 Schneier, B., "Applied Cryptography: Protocols, 1005 Algorithms, and Source Code in C", 11 2017. 1007 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1008 DOI 10.17487/RFC0791, September 1981, 1009 . 1011 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1012 DOI 10.17487/RFC1191, November 1990, 1013 . 1015 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1016 "Definition of the Differentiated Services Field (DS 1017 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1018 DOI 10.17487/RFC2474, December 1998, 1019 . 1021 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 1022 RFC 2914, DOI 10.17487/RFC2914, September 2000, 1023 . 1025 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1026 of Explicit Congestion Notification (ECN) to IP", 1027 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1028 . 1030 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1031 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1032 December 2005, . 1034 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1035 Datagram Congestion Control Protocol (DCCP) Congestion 1036 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1037 DOI 10.17487/RFC4342, March 2006, 1038 . 1040 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1041 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 1042 . 1044 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1045 Friendly Rate Control (TFRC): Protocol Specification", 1046 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1047 . 1049 [RFC7120] Cotton, M., "Early IANA Allocation of Standards Track Code 1050 Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January 1051 2014, . 1053 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1054 "Encapsulating MPLS in UDP", RFC 7510, 1055 DOI 10.17487/RFC7510, April 2015, 1056 . 1058 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 1059 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1060 . 1062 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 1063 Writing an IANA Considerations Section in RFCs", BCP 26, 1064 RFC 8126, DOI 10.17487/RFC8126, June 2017, 1065 . 1067 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1068 (IPv6) Specification", STD 86, RFC 8200, 1069 DOI 10.17487/RFC8200, July 2017, 1070 . 1072 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1073 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1074 DOI 10.17487/RFC8201, July 2017, 1075 . 1077 [RFC8229] Pauly, T., Touati, S., and R. Mantha, "TCP Encapsulation 1078 of IKE and IPsec Packets", RFC 8229, DOI 10.17487/RFC8229, 1079 August 2017, . 1081 [RFC8546] Trammell, B. and M. Kuehlewind, "The Wire Image of a 1082 Network Protocol", RFC 8546, DOI 10.17487/RFC8546, April 1083 2019, . 1085 [RFC8899] Fairhurst, G., Jones, T., Tuexen, M., Ruengeler, I., and 1086 T. Voelker, "Packetization Layer Path MTU Discovery for 1087 Datagram Transports", RFC 8899, DOI 10.17487/RFC8899, 1088 September 2020, . 1090 Appendix A. Example Of An Encapsulated IP Packet Flow 1092 Below an example inner IP packet flow within the encapsulating tunnel 1093 packet stream is shown. Notice how encapsulated IP packets can start 1094 and end anywhere, and more than one or less than 1 may occur in a 1095 single encapsulating packet. 1097 Offset: 0 Offset: 100 Offset: 2900 Offset: 1400 1098 [ ESP1 (1500) ][ ESP2 (1500) ][ ESP3 (1500) ][ ESP4 (1500) ] 1099 [--800--][--800--][60][-240-][--4000----------------------][pad] 1101 Figure 3: Inner and Outer Packet Flow 1103 The encapsulated IP packet flow (lengths include IP header and 1104 payload) is as follows: an 800 octet packet, an 800 octet packet, a 1105 60 octet packet, a 240 octet packet, a 4000 octet packet. 1107 The "BlockOffset" values in the 4 IP-TFS payload headers for this 1108 packet flow would thus be: 0, 100, 2900, 1400 respectively. The 1109 first encapsulating packet ESP1 has a zero "BlockOffset" which points 1110 at the IP data block immediately following the IP-TFS header. The 1111 following packet ESP2s "BlockOffset" points inward 100 octets to the 1112 start of the 60 octet data block. The third encapsulating packet 1113 ESP3 contains the middle portion of the 4000 octet data block so the 1114 offset points past its end and into the forth encapsulating packet. 1115 The fourth packet ESP4s offset is 1400 pointing at the padding which 1116 follows the completion of the continued 4000 octet packet. 1118 Appendix B. A Send and Loss Event Rate Calculation 1120 The current best practice indicates that congestion control SHOULD be 1121 done in a TCP friendly way. A TCP friendly congestion control 1122 algorithm is described in [RFC5348]. For this IP-TFS use case (as 1123 with [RFC4342]) the (fixed) packet size is used as the segment size 1124 for the algorithm. The main formula in the algorithm for the send 1125 rate is then as follows: 1127 1 1128 X = ----------------------------------------------- 1129 R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2)) 1131 Where "X" is the send rate in packets per second, "R" is the round 1132 trip time estimate and "p" is the loss event rate (the inverse of 1133 which is provided by the receiver). 1135 In addition the algorithm in [RFC5348] also uses an "X_recv" value 1136 (the receiver's receive rate). For IP-TFS one MAY set this value 1137 according to the sender's current tunnel send-rate ("X"). 1139 The IP-TFS receiver, having the RTT estimate from the sender can use 1140 the same method as described in [RFC5348] and [RFC4342] to collect 1141 the loss intervals and calculate the loss event rate value using the 1142 weighted average as indicated. The receiver communicates the inverse 1143 of this value back to the sender in the AGGFRAG_PAYLOAD payload 1144 header field "LossEventRate". 1146 The IP-TFS sender now has both the "R" and "p" values and can 1147 calculate the correct sending rate. If following [RFC5348] the 1148 sender should also use the slow start mechanism described therein 1149 when the IP-TFS SA is first established. 1151 Appendix C. Comparisons of IP-TFS 1153 C.1. Comparing Overhead 1155 For comparing overhead the overhead of ESP for both normal and IP-TFS 1156 tunnel packets must be calculated, and so an algorithm for encryption 1157 and authentication must be chosen. For the data below AES-GCM-256 1158 was selected. This leads to an IP+ESP overhead of 54. 1160 54 = 20 (IP) + 8 (ESPH) + 2 (ESPF) + 8 (IV) + 16 (ICV) 1162 Additionally, for IP-TFS, non-congestion control AGGFRAG_PAYLOAD 1163 headers were chosen which adds 4 octets for a total overhead of 58. 1165 C.1.1. IP-TFS Overhead 1167 For comparison the overhead of IP-TFS is 58 octets per outer packet. 1168 Therefore the octet overhead per inner packet is 58 divided by the 1169 number of outer packets required (fractional allowed). The overhead 1170 as a percentage of inner packet size is a constant based on the Outer 1171 MTU size. 1173 OH = 58 / Outer Payload Size / Inner Packet Size 1174 OH % of Inner Packet Size = 100 * OH / Inner Packet Size 1175 OH % of Inner Packet Size = 5800 / Outer Payload Size 1177 Type IP-TFS IP-TFS IP-TFS 1178 MTU 576 1500 9000 1179 PSize 518 1442 8942 1180 ------------------------------- 1181 40 11.20% 4.02% 0.65% 1182 576 11.20% 4.02% 0.65% 1183 1500 11.20% 4.02% 0.65% 1184 9000 11.20% 4.02% 0.65% 1186 Figure 4: IP-TFS Overhead as Percentage of Inner Packet Size 1188 C.1.2. ESP with Padding Overhead 1190 The overhead per inner packet for constant-send-rate padded ESP 1191 (i.e., traditional IPsec TFC) is 36 octets plus any padding, unless 1192 fragmentation is required. 1194 When fragmentation of the inner packet is required to fit in the 1195 outer IPsec packet, overhead is the number of outer packets required 1196 to carry the fragmented inner packet times both the inner IP overhead 1197 (20) and the outer packet overhead (54) minus the initial inner IP 1198 overhead plus any required tail padding in the last encapsulation 1199 packet. The required tail padding is the number of required packets 1200 times the difference of the Outer Payload Size and the IP Overhead 1201 minus the Inner Payload Size. So: 1203 Inner Paylaod Size = IP Packet Size - IP Overhead 1204 Outer Payload Size = MTU - IPsec Overhead 1206 Inner Payload Size 1207 NF0 = ---------------------------------- 1208 Outer Payload Size - IP Overhead 1210 NF = CEILING(NF0) 1212 OH = NF * (IP Overhead + IPsec Overhead) 1213 - IP Overhead 1214 + NF * (Outer Payload Size - IP Overhead) 1215 - Inner Payload Size 1217 OH = NF * (IPsec Overhead + Outer Payload Size) 1218 - (IP Overhead + Inner Payload Size) 1220 OH = NF * (IPsec Overhead + Outer Payload Size) 1221 - Inner Packet Size 1223 C.2. Overhead Comparison 1225 The following tables collect the overhead values for some common L3 1226 MTU sizes in order to compare them. The first table is the number of 1227 octets of overhead for a given L3 MTU sized packet. The second table 1228 is the percentage of overhead in the same MTU sized packet. 1230 XXX rerun these. 1232 Type ESP+Pad ESP+Pad ESP+Pad IP-TFS IP-TFS IP-TFS 1233 L3 MTU 576 1500 9000 576 1500 9000 1234 PSize 522 1446 8946 518 1442 8942 1235 ----------------------------------------------------------- 1236 40 482 1406 8906 4.5 1.6 0.3 1237 128 394 1318 8818 14.3 5.1 0.8 1238 256 266 1190 8690 28.7 10.3 1.7 1239 518 4 928 8428 58.0 20.8 3.4 1240 576 576 870 8370 64.5 23.2 3.7 1241 1442 286 4 7504 161.5 58.0 9.4 1242 1500 228 1500 7446 168.0 60.3 9.7 1243 8942 1426 1558 4 1001.2 359.7 58.0 1244 9000 1368 1500 9000 1007.7 362.0 58.4 1246 Figure 5: Overhead comparison in octets 1248 Type ESP+Pad ESP+Pad ESP+Pad IP-TFS IP-TFS IP-TFS 1249 MTU 576 1500 9000 576 1500 9000 1250 PSize 522 1446 8946 518 1442 8942 1251 ----------------------------------------------------------- 1252 40 1205.0% 3515.0% 22265.0% 11.20% 4.02% 0.65% 1253 128 307.8% 1029.7% 6889.1% 11.20% 4.02% 0.65% 1254 256 103.9% 464.8% 3394.5% 11.20% 4.02% 0.65% 1255 518 0.8% 179.2% 1627.0% 11.20% 4.02% 0.65% 1256 576 100.0% 151.0% 1453.1% 11.20% 4.02% 0.65% 1257 1442 19.8% 0.3% 520.4% 11.20% 4.02% 0.65% 1258 1500 15.2% 100.0% 496.4% 11.20% 4.02% 0.65% 1259 8942 15.9% 17.4% 0.0% 11.20% 4.02% 0.65% 1260 9000 15.2% 16.7% 100.0% 11.20% 4.02% 0.65% 1262 Figure 6: Overhead as Percentage of Inner Packet Size 1264 C.3. Comparing Available Bandwidth 1266 Another way to compare the two solutions is to look at the amount of 1267 available bandwidth each solution provides. The following sections 1268 consider and compare the percentage of available bandwidth. For the 1269 sake of providing a well understood baseline normal (unencrypted) 1270 Ethernet as well as normal ESP values are included. 1272 C.3.1. Ethernet 1274 In order to calculate the available bandwidth the per packet overhead 1275 is calculated first. The total overhead of Ethernet is 14+4 octets 1276 of header and CRC plus and additional 20 octets of framing (preamble, 1277 start, and inter-packet gap) for a total of 38 octets. Additionally 1278 the minimum payload is 46 octets. 1280 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1281 MTU 590 1514 9014 590 1514 9014 any any 1282 OH 92 92 92 96 96 96 38 74 1283 ------------------------------------------------------------ 1284 40 614 1538 9038 47 42 40 84 114 1285 128 614 1538 9038 151 136 129 166 202 1286 256 614 1538 9038 303 273 258 294 330 1287 518 614 1538 9038 614 552 523 574 610 1288 576 1228 1538 9038 682 614 582 614 650 1289 1442 1842 1538 9038 1709 1538 1457 1498 1534 1290 1500 1842 3076 9038 1777 1599 1516 1538 1574 1291 8942 11052 10766 9038 10599 9537 9038 8998 9034 1292 9000 11052 10766 18076 10667 9599 9096 9038 9074 1294 Figure 7: L2 Octets Per Packet 1296 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1297 MTU 590 1514 9014 590 1514 9014 any any 1298 OH 92 92 92 96 96 96 38 74 1299 -------------------------------------------------------------- 1300 40 2.0M 0.8M 0.1M 26.4M 29.3M 30.9M 14.9M 11.0M 1301 128 2.0M 0.8M 0.1M 8.2M 9.2M 9.7M 7.5M 6.2M 1302 256 2.0M 0.8M 0.1M 4.1M 4.6M 4.8M 4.3M 3.8M 1303 518 2.0M 0.8M 0.1M 2.0M 2.3M 2.4M 2.2M 2.1M 1304 576 1.0M 0.8M 0.1M 1.8M 2.0M 2.1M 2.0M 1.9M 1305 1442 678K 812K 138K 731K 812K 857K 844K 824K 1306 1500 678K 406K 138K 703K 781K 824K 812K 794K 1307 8942 113K 116K 138K 117K 131K 138K 139K 138K 1308 9000 113K 116K 69K 117K 130K 137K 138K 137K 1310 Figure 8: Packets Per Second on 10G Ethernet 1312 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1313 590 1514 9014 590 1514 9014 any any 1314 92 92 92 96 96 96 38 74 1315 ---------------------------------------------------------------------- 1316 40 6.51% 2.60% 0.44% 84.36% 93.76% 98.94% 47.62% 35.09% 1317 128 20.85% 8.32% 1.42% 84.36% 93.76% 98.94% 77.11% 63.37% 1318 256 41.69% 16.64% 2.83% 84.36% 93.76% 98.94% 87.07% 77.58% 1319 518 84.36% 33.68% 5.73% 84.36% 93.76% 98.94% 93.17% 87.50% 1320 576 46.91% 37.45% 6.37% 84.36% 93.76% 98.94% 93.81% 88.62% 1321 1442 78.28% 93.76% 15.95% 84.36% 93.76% 98.94% 97.43% 95.12% 1322 1500 81.43% 48.76% 16.60% 84.36% 93.76% 98.94% 97.53% 95.30% 1323 8942 80.91% 83.06% 98.94% 84.36% 93.76% 98.94% 99.58% 99.18% 1324 9000 81.43% 83.60% 49.79% 84.36% 93.76% 98.94% 99.58% 99.18% 1326 Figure 9: Percentage of Bandwidth on 10G Ethernet 1328 A sometimes unexpected result of using IP-TFS (or any packet 1329 aggregating tunnel) is that, for small to medium sized packets, the 1330 available bandwidth is actually greater than native Ethernet. This 1331 is due to the reduction in Ethernet framing overhead. This increased 1332 bandwidth is paid for with an increase in latency. This latency is 1333 the time to send the unrelated octets in the outer tunnel frame. The 1334 following table illustrates the latency for some common values on a 1335 10G Ethernet link. The table also includes latency introduced by 1336 padding if using ESP with padding. 1338 ESP+Pad ESP+Pad IP-TFS IP-TFS 1339 1500 9000 1500 9000 1341 ------------------------------------------ 1342 40 1.12 us 7.12 us 1.17 us 7.17 us 1343 128 1.05 us 7.05 us 1.10 us 7.10 us 1344 256 0.95 us 6.95 us 1.00 us 7.00 us 1345 518 0.74 us 6.74 us 0.79 us 6.79 us 1346 576 0.70 us 6.70 us 0.74 us 6.74 us 1347 1442 0.00 us 6.00 us 0.05 us 6.05 us 1348 1500 1.20 us 5.96 us 0.00 us 6.00 us 1350 Figure 10: Added Latency 1352 Notice that the latency values are very similar between the two 1353 solutions; however, whereas IP-TFS provides for constant high 1354 bandwidth, in some cases even exceeding native Ethernet, ESP with 1355 padding often greatly reduces available bandwidth. 1357 Appendix D. Acknowledgements 1359 We would like to thank Don Fedyk for help in reviewing and editing 1360 this work. We would also like to thank Sean Turner and Valery 1361 Smyslov for reviews and many suggestions for improvements, as well as 1362 Joseph Touch for the transport area review and suggested 1363 improvements. 1365 Appendix E. Contributors 1367 The following people made significant contributions to this document. 1369 Lou Berger 1370 LabN Consulting, L.L.C. 1372 Email: lberger@labn.net 1374 Author's Address 1376 Christian Hopps 1377 LabN Consulting, L.L.C. 1379 Email: chopps@chopps.org