idnits 2.17.00 (12 Aug 2021) /tmp/idnits21841/draft-ietf-6lo-fragment-recovery-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. (Using the creation date from RFC4944, updated by this document, for RFC5378 checks: 2005-07-13) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (6 March 2020) is 806 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: draft-ietf-6lo-minimal-fragment has been published as RFC 8930 == Outdated reference: A later version (-02) exists of draft-ietf-lwig-6lowpan-virtual-reassembly-01 == Outdated reference: draft-ietf-intarea-frag-fragile has been published as RFC 8900 == Outdated reference: draft-ietf-6tisch-architecture has been published as RFC 9030 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 6lo P. Thubert, Ed. 3 Internet-Draft Cisco Systems 4 Updates: 4944 (if approved) 6 March 2020 5 Intended status: Standards Track 6 Expires: 7 September 2020 8 6LoWPAN Selective Fragment Recovery 9 draft-ietf-6lo-fragment-recovery-14 11 Abstract 13 This draft updates RFC 4944 with a simple protocol to recover 14 individual fragments across a route-over mesh network, with a minimal 15 flow control to protect the network against bloat. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at https://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on 7 September 2020. 34 Copyright Notice 36 Copyright (c) 2020 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 41 license-info) in effect on the date of publication of this document. 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. Code Components 44 extracted from this document must include Simplified BSD License text 45 as described in Section 4.e of the Trust Legal Provisions and are 46 provided without warranty as described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2.1. BCP 14 . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 2.2. References . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.3. New Terms . . . . . . . . . . . . . . . . . . . . . . . . 5 55 3. Updating RFC 4944 . . . . . . . . . . . . . . . . . . . . . . 6 56 4. Extending draft-ietf-6lo-minimal-fragment . . . . . . . . . . 6 57 4.1. Slack in the First Fragment . . . . . . . . . . . . . . . 6 58 4.2. Gap between frames . . . . . . . . . . . . . . . . . . . 7 59 4.3. Flow Control . . . . . . . . . . . . . . . . . . . . . . 7 60 4.4. Modifying the First Fragment . . . . . . . . . . . . . . 8 61 5. New Dispatch types and headers . . . . . . . . . . . . . . . 8 62 5.1. Recoverable Fragment Dispatch type and Header . . . . . . 9 63 5.2. RFRAG Acknowledgment Dispatch type and Header . . . . . . 11 64 6. Fragment Recovery . . . . . . . . . . . . . . . . . . . . . . 12 65 6.1. Forwarding Fragments . . . . . . . . . . . . . . . . . . 14 66 6.1.1. Receiving the first fragment . . . . . . . . . . . . 15 67 6.1.2. Receiving the next fragments . . . . . . . . . . . . 16 68 6.2. Receiving RFRAG Acknowledgments . . . . . . . . . . . . . 16 69 6.3. Aborting the Transmission of a Fragmented Packet . . . . 17 70 6.4. Applying Recoverable Fragmentation along a Diverse 71 Path . . . . . . . . . . . . . . . . . . . . . . . . . . 18 72 7. Management Considerations . . . . . . . . . . . . . . . . . . 18 73 7.1. Protocol Parameters . . . . . . . . . . . . . . . . . . . 18 74 7.2. Observing the network . . . . . . . . . . . . . . . . . . 21 75 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 76 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 77 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 78 11. Normative References . . . . . . . . . . . . . . . . . . . . 23 79 12. Informative References . . . . . . . . . . . . . . . . . . . 24 80 Appendix A. Rationale . . . . . . . . . . . . . . . . . . . . . 26 81 Appendix B. Requirements . . . . . . . . . . . . . . . . . . . . 28 82 Appendix C. Considerations on Flow Control . . . . . . . . . . . 29 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 30 85 1. Introduction 87 In most Low Power and Lossy Network (LLN) applications, the bulk of 88 the traffic consists of small chunks of data (on the order of a few 89 bytes to a few tens of bytes) at a time. Given that an IEEE Std. 90 802.15.4 [IEEE.802.15.4] frame can carry a payload of 74 bytes or 91 more, fragmentation is usually not required. However, and though 92 this happens only occasionally, a number of mission critical 93 applications do require the capability to transfer larger chunks of 94 data, for instance to support the firmware upgrade of the LLN nodes 95 or the extraction of logs from LLN nodes. In the former case, the 96 large chunk of data is transferred to the LLN node, whereas in the 97 latter, the large chunk flows away from the LLN node. In both cases, 98 the size can be on the order of 10 kilobytes or more and an end-to- 99 end reliable transport is required. 101 "Transmission of IPv6 Packets over IEEE 802.15.4 Networks" [RFC4944] 102 defines the original 6LoWPAN datagram fragmentation mechanism for 103 LLNs. One critical issue with this original design is that routing 104 an IPv6 [RFC8200] packet across a route-over mesh requires 105 reassembling the full packet at each hop, which may cause latency 106 along a path and an overall buffer bloat in the network. The "6TiSCH 107 Architecture" [I-D.ietf-6tisch-architecture] recommends using a 108 fragment forwarding (FF) technique to alleviate those undesirable 109 effects. 111 "LLN Minimal Fragment Forwarding" [FRAG-FWD] specifies the general 112 behavior that all FF techniques including this specification follow, 113 and presents the associated caveats. In particular, the routing 114 information is fully indicated in the first fragment, which is always 115 forwarded first. A state is formed and used to forward all the next 116 fragments along the same path. The Datagram_Tag is locally 117 significant to the Layer-2 source of the packet and is swapped at 118 each hop, more in Section 6. With this specification the 119 Datagram_Tag is encoded in one byte, and will saturate if there are 120 more than 256 datagram that transit in the fragmented form over a 121 same hop at the same time. This is not realistic at the time of this 122 writing. Should this happen in a new 6LoWPAN technology, a node will 123 need to use several Link-Layer addresses to increase its indexing 124 capacity. 126 "Virtual reassembly buffers in 6LoWPAN" 127 [I-D.ietf-lwig-6lowpan-virtual-reassembly] (VRB) proposes a FF 128 technique that is compatible with [RFC4944] without the need to 129 define a new protocol. However, adding that capability alone to the 130 local implementation of the original 6LoWPAN fragmentation would not 131 address the inherent fragility of fragmentation (see 132 [I-D.ietf-intarea-frag-fragile]) in particular the issues of 133 resources locked on the receiver and the wasted transmissions due to 134 the loss of a single fragment in a whole datagram. [Kent] compares 135 the unreliable delivery of fragments with a mechanism it calls 136 "selective acknowledgements" that recovers the loss of a fragment 137 individually. The paper illustrates the benefits that can be derived 138 from such a method in figures 1, 2 and 3, on pages 6 and 7. 139 [RFC4944] has no selective recovery and the whole datagram fails when 140 one fragment is not delivered to the destination 6LoWPAN endpoint. 141 Constrained memory resources are blocked on the receiver until the 142 receiver times out, possibly causing the loss of subsequent packets 143 that cannot be received for the lack of buffers. 145 That problem is exacerbated when forwarding fragments over multiple 146 hops since a loss at an intermediate hop will not be discovered by 147 either the source or the destination, and the source will keep on 148 sending fragments, wasting even more resources in the network and 149 possibly contributing to the condition that caused the loss to no 150 avail since the datagram cannot arrive in its entirety. RFC 4944 is 151 also missing signaling to abort a multi-fragment transmission at any 152 time and from either end, and, if the capability to forward fragments 153 is implemented, clean up the related state in the network. It is 154 also lacking flow control capabilities to avoid participating in 155 congestion that may in turn cause the loss of a fragment and 156 potentially the retransmission of the full datagram. 158 This specification provides a method to forward fragments over 159 typically a few hops in a route-over 6LoWPAN mesh, and a selective 160 acknowledgment to recover individual fragments between 6LoWPAN 161 endpoints. The method is designed to limit congestion loss in the 162 network and addresses the requirements that are detailed in 163 Appendix B. 165 2. Terminology 167 2.1. BCP 14 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 171 "OPTIONAL" in this document are to be interpreted as described in BCP 172 14 [RFC2119][RFC8174] when, and only when, they appear in all 173 capitals, as shown here. 175 2.2. References 177 In this document, readers will encounter terms and concepts that are 178 discussed in "Problem Statement and Requirements for IPv6 over 179 Low-Power Wireless Personal Area Network (6LoWPAN) Routing" [RFC6606] 181 "LLN Minimal Fragment Forwarding" [FRAG-FWD] introduces the generic 182 concept of a Virtual Reassembly Buffer (VRB) and specifies behaviours 183 and caveats that are common to a large family of FF techniques 184 including this, which fully inherits from that specification. It 185 also defines terms used in this document: 6LoWPAN endpoints, 186 Compressed Form, Datagram_Tag, Datagram_Size, and Fragment_Offset. 188 Past experience with fragmentation has shown that misassociated or 189 lost fragments can lead to poor network behavior and, occasionally, 190 trouble at the application layer. The reader is encouraged to read 191 "IPv4 Reassembly Errors at High Data Rates" [RFC4963] and follow the 192 references for more information. 194 That experience led to the definition of "Path MTU discovery" 195 [RFC8201] (PMTUD) protocol that limits fragmentation over the 196 Internet. 198 Specifically in the case of UDP, valuable additional information can 199 be found in "UDP Usage Guidelines for Application Designers" 200 [RFC8085]. 202 Readers are expected to be familiar with all the terms and concepts 203 that are discussed in "IPv6 over Low-Power Wireless Personal Area 204 Networks (6LoWPANs): Overview, Assumptions, Problem Statement, and 205 Goals" [RFC4919] and "Transmission of IPv6 Packets over IEEE 802.15.4 206 Networks" [RFC4944]. 208 "The Benefits of Using Explicit Congestion Notification (ECN)" 209 [RFC8087] provides useful information on the potential benefits and 210 pitfalls of using ECN. 212 Quoting the "Multiprotocol Label Switching (MPLS) Architecture" 213 [RFC3031]: with MPLS, 'packets are "labeled" before they are 214 forwarded' along a Label Switched Path (LSP). At subsequent hops, 215 there is no further analysis of the packet's network layer header. 216 Rather, the label is used as an index into a table which specifies 217 the next hop, and a new label". The MPLS technique is leveraged in 218 the present specification to forward fragments that actually do not 219 have a network layer header, since the fragmentation occurs below IP. 221 2.3. New Terms 223 This specification uses the following terms: 225 RFRAG: Recoverable Fragment 227 RFRAG-ACK: Recoverable Fragment Acknowledgement 229 RFRAG Acknowledgment Request: An RFRAG with the Acknowledgement 230 Request flag ('X' flag) set. 232 NULL bitmap: Refers to a bitmap with all bits set to zero. 234 FULL bitmap: Refers to a bitmap with all bits set to one. 236 Forward: The direction of a LSP path, followed by the RFRAG. 238 Reverse: The reverse direction of a LSP path, taken by the RFRAG- 239 ACK. 241 3. Updating RFC 4944 243 This specification updates the fragmentation mechanism that is 244 specified in "Transmission of IPv6 Packets over IEEE 802.15.4 245 Networks" [RFC4944] for use in route-over LLNs by providing a model 246 where fragments can be forwarded end-to-end across a 6LoWPAN LLN, and 247 where fragments that are lost on the way can be recovered 248 individually. A new format for fragments is introduced and new 249 dispatch types are defined in Section 5. 251 [RFC8138] allows modifying the size of a packet en route by removing 252 the consumed hops in a compressed Routing Header. This requires that 253 Fragment_Offset and Datagram_Size (see Section 2.3) are also modified 254 en route, which is difficult to do in the uncompressed form. This 255 specification expresses those fields in the Compressed Form and 256 allows modifying them en route (see Section 4.4) easily. 258 Note that consistent with Section 2 of [RFC6282], for the 259 fragmentation mechanism described in Section 5.3 of [RFC4944], any 260 header that cannot fit within the first fragment MUST NOT be 261 compressed when using the fragmentation mechanism described in this 262 specification. 264 4. Extending draft-ietf-6lo-minimal-fragment 266 This specification implements the generic FF technique defined in 267 "LLN Minimal Fragment Forwarding" [FRAG-FWD], provides end-to-end 268 fragment recovery and mechanisms that can be used for flow control. 270 4.1. Slack in the First Fragment 272 [FRAG-FWD] allows for refragmenting in intermediate nodes, meaning 273 that some bytes from a given fragment may be left in the VRB to be 274 added to the next fragment. The reason for this happening would be 275 the need for space in the outgoing fragment that was not needed in 276 the incoming fragment, for instance because the 6LoWPAN Header 277 Compression is not as efficient on the outgoing link, e.g., if the 278 Interface ID (IID) of the source IPv6 address is elided by the 279 originator on the first hop because it matches the source Link-Layer 280 address, but cannot be on the next hops because the source Link-Layer 281 address changes. 283 This specification cannot allow this operation since fragments are 284 recovered end-to-end based on a sequence number. This means that the 285 fragments that contain a 6LoWPAN-compressed header MUST have enough 286 slack to enable a less efficient compression in the next hops that 287 still fits in one MAC frame. For instance, if the IID of the source 288 IPv6 address is elided by the originator, then it MUST compute the 289 Fragment_Size as if the MTU was 8 bytes less. This way, the next hop 290 can restore the source IID to the first fragment without impacting 291 the second fragment. 293 4.2. Gap between frames 295 [FRAG-FWD] requires that a configurable interval of time is inserted 296 between transmissions to the same next hop and in particular between 297 fragments of a same datagram. In the case of half duplex interfaces, 298 this inter-frame gap ensures that the next hop is done forwarding the 299 previous frame and is capable of receiving the next one. 301 In the case of a mesh operating at a single frequency with 302 omnidirectional antennas, a larger inter-frame gap is required to 303 protect the frame against hidden terminal collisions with the 304 previous frame of the same flow that is still progressing along a 305 common path. 307 The inter-frame gap is useful even for unfragmented datagrams, but it 308 becomes a necessity for fragments that are typically generated in a 309 fast sequence and are all sent over the exact same path. 311 4.3. Flow Control 313 The inter-frame gap is the only protection that [FRAG-FWD] imposes by 314 default. This document enables to group fragments in windows and 315 request intermediate acknowledgements so the number of in-flight 316 fragments can be bounded. This document also adds an ECN mechanism 317 that can be used to adapt the size of the window, the size of the 318 fragments, and/or the inter-frame gap to protect the network. 320 This specification enables the source endpoint to apply a flow 321 control mechanism to tune those parameters, but the mechanism itself 322 is out of scope. In most cases, the expectation is that most 323 datagrams will represent only a few fragments, and that only the last 324 fragment will be acknowledged. A basic implementation of the source 325 endpoint is NOT REQUIRED to variate the size of the window, the 326 duration of the inter-frame gap or the size of a fragment in the 327 middle of the transmission of a datagram, and it MAY ignore the ECN 328 signal or simply reset the window to 1 (see Appendix C for more) till 329 the end of this datagram upon detecting a congestion. 331 The size of the fragments is typically computed from the Link MTU to 332 maximize the size of the resulting frames. The size of the window 333 and the duration of the inter-frame gap SHOULD be configurable, to 334 roughly adapt the size of the window to the number of hops in an 335 average path, and to follow the general recommendations in 336 [FRAG-FWD], respectively. 338 4.4. Modifying the First Fragment 340 The compression of the Hop Limit, of the source and destination 341 addresses in the IPv6 Header, and of the Routing Header may change en 342 route in a Route-Over mesh LLN. If the size of the first fragment is 343 modified, then the intermediate node MUST adapt the Datagram_Size to 344 reflect that difference. 346 The intermediate node MUST also save the difference of Datagram_Size 347 of the first fragment in the VRB and add it to the Datagram_Size and 348 to the Fragment_Offset of all the subsequent fragments for that 349 datagram. 351 5. New Dispatch types and headers 353 This document specifies an alternate to the 6LoWPAN fragmentation 354 sublayer [RFC4944] to emulate an Link MTU up to 2048 bytes for the 355 upper layer, which can be the 6LoWPAN Header Compression sublayer 356 that is defined in the "Compression Format for IPv6 Datagrams" 357 [RFC6282] specification. This specification also provides a reliable 358 transmission of the fragments over a multihop 6LoWPAN route-over mesh 359 network and a minimal flow control to reduce the chances of 360 congestion loss. 362 A LoWPAN Fragment Forwarding [FRAG-FWD] technique derived from MPLS 363 enables the forwarding of individual fragments across a 6LoWPAN 364 route-over mesh without reassembly at each hop. The Datagram_Tag is 365 used as a label; it is locally unique to the node that owns the 366 source Link-Layer address of the fragment, so together the Link-Layer 367 address and the label can identify the fragment globally. A node may 368 build the Datagram_Tag in its own locally-significant way, as long as 369 the chosen Datagram_Tag stays unique to the particular datagram for 370 the lifetime of that datagram. The result is that the label does not 371 need to be globally unique but also that it must be swapped at each 372 hop as the source Link-Layer address changes. 374 This specification extends RFC 4944 [RFC4944] with 2 new Dispatch 375 types, for Recoverable Fragment (RFRAG) and for the RFRAG 376 Acknowledgment back. The new 6LoWPAN Dispatch types are taken from 377 Page 0 [RFC8025] as indicated in Table 1 in Section 9. 379 In the following sections, a "Datagram_Tag" extends the semantics 380 defined in [RFC4944] Section 5.3."Fragmentation Type and Header". 381 The Datagram_Tag is a locally unique identifier for the datagram from 382 the perspective of the sender. This means that the Datagram_Tag 383 identifies a datagram uniquely in the network when associated with 384 the source of the datagram. As the datagram gets forwarded, the 385 source changes and the Datagram_Tag must be swapped as detailed in 386 [FRAG-FWD]. 388 5.1. Recoverable Fragment Dispatch type and Header 390 In this specification, if the packet is compressed then the size and 391 offset of the fragments are expressed with respect to the Compressed 392 Form of the packet form as opposed to the uncompressed (native) 393 packet form. 395 The format of the fragment header is shown in Figure 1. It is the 396 same for all fragments. The format has a length and an offset, as 397 well as a Sequence field. This would be redundant if the offset was 398 computed as the product of the Sequence by the length, but this is 399 not the case. The position of a fragment in the reassembly buffer is 400 neither correlated with the value of the Sequence field nor with the 401 order in which the fragments are received. This enables 402 refragmenting to cope with an MTU deduction, see the example of the 403 fragment seq. 5 that is retried end-to-end as smaller fragments seq. 404 13 and 14 in Section 6.2. 406 1 2 3 407 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 |1 1 1 0 1 0 0|E| Datagram_Tag | 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 |X| Sequence| Fragment_Size | Fragment_Offset | 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 414 X set == Ack-Request 416 Figure 1: RFRAG Dispatch type and Header 418 There is no requirement on the receiver to check for contiguity of 419 the received fragments. The sender knows that the datagram is fully 420 received when the acknowledged fragments cover the whole datagram. 421 This may be useful in particular in the case where the MTU changes 422 and a fragment Sequence is retried with a smaller Fragment_Size, the 423 remainder of the original fragment being retried with new Sequence 424 values. 426 The first fragment is recognized by a Sequence of 0; it carries its 427 Fragment_Size and the Datagram_Size of the compressed packet before 428 it is fragmented, whereas the other fragments carry their 429 Fragment_Size and Fragment_Offset. The last fragment for a datagram 430 is recognized when its Fragment_Offset and its Fragment_Size add up 431 to the Datagram_Size. 433 Recoverable Fragments are sequenced and a bitmap is used in the RFRAG 434 Acknowledgment to indicate the received fragments by setting the 435 individual bits that correspond to their sequence. 437 X: 1 bit; Ack-Request: when set, the sender requires an RFRAG 438 Acknowledgment from the receiver. 440 E: 1 bit; Explicit Congestion Notification; the "E" flag is reset by 441 the source of the fragment and set by intermediate routers to 442 signal that this fragment experienced congestion along its path. 444 Fragment_Size: 10-bit unsigned integer; the size of this fragment in 445 a unit that depends on the MAC layer technology. Unless 446 overridden by a more specific specification, that unit is the 447 byte, which allows fragments up to 1024 bytes. 449 Datagram_Tag: 8 bits; an identifier of the datagram that is locally 450 unique to the sender. 452 Sequence: 5-bit unsigned integer; the sequence number of the 453 fragment in the acknowledgement bitmap. Fragments are numbered 454 [0..N] where N is in [0..31]. A Sequence of 0 indicates the first 455 fragment in a datagram, but non-zero values are not indicative of 456 the position in the reassembly buffer. 458 Fragment_Offset: 16-bit unsigned integer. 460 When the Fragment_Offset is set to a non-0 value, its semantics 461 depend on the value of the Sequence field as follows: 463 * For a first fragment (i.e., with a Sequence of 0), this field 464 indicates the Datagram_Size of the compressed datagram, to help 465 the receiver allocate an adapted buffer for the reception and 466 reassembly operations. The fragment may be stored for local 467 reassembly. Alternatively, it may be routed based on the 468 destination IPv6 address. In that case, a VRB state must be 469 installed as described in Section 6.1.1. 470 * When the Sequence is not 0, this field indicates the offset of 471 the fragment in the Compressed Form of the datagram. The 472 fragment may be added to a local reassembly buffer or forwarded 473 based on an existing VRB as described in Section 6.1.2. 475 A Fragment_Offset that is set to a value of 0 indicates an abort 476 condition and all state regarding the datagram should be cleaned 477 up once the processing of the fragment is complete; the processing 478 of the fragment depends on whether there is a VRB already 479 established for this datagram, and the next hop is still 480 reachable: 482 * if a VRB already exists and is not broken, the fragment is to 483 be forwarded along the associated Label Switched Path (LSP) as 484 described in Section 6.1.2, but regardless of the value of the 485 Sequence field; 486 * else, if the Sequence is 0, then the fragment is to be routed 487 as described in Section 6.1.1, but no state is conserved 488 afterwards. In that case, the session if it exists is aborted 489 and the packet is also forwarded in an attempt to clean up the 490 next hops along the path indicated by the IPv6 header (possibly 491 including a routing header). 493 If the fragment cannot be forwarded or routed, then an abort 494 RFRAG-ACK is sent back to the source as described in 495 Section 6.1.2. 497 5.2. RFRAG Acknowledgment Dispatch type and Header 499 This specification also defines a 4-byte RFRAG Acknowledgment bitmap 500 that is used by the reassembling endpoint to confirm selectively the 501 reception of individual fragments. A given offset in the bitmap maps 502 one-to-one with a given sequence number and indicates which fragment 503 is acknowledged as follows: 505 1 2 3 506 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 508 | RFRAG Acknowledgment Bitmap | 509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 510 ^ ^ 511 | | bitmap indicating whether: 512 | +----- Fragment with Sequence 9 was received 513 +----------------------- Fragment with Sequence 0 was received 515 Figure 2: RFRAG Acknowledgment Bitmap Encoding 517 Figure 3 shows an example Acknowledgment bitmap which indicates that 518 all fragments from Sequence 0 to 20 were received, except for 519 fragments 1, 2 and 16 were lost and must be retried. 521 1 2 3 522 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 523 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 524 |1|0|0|1|1|1|1|1|1|1|1|1|1|1|1|1|0|1|1|1|1|0|0|0|0|0|0|0|0|0|0|0| 525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 527 Figure 3: Example RFRAG Acknowledgment Bitmap 529 The RFRAG Acknowledgment Bitmap is included in an RFRAG 530 Acknowledgment header, as follows: 532 1 2 3 533 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 535 |1 1 1 0 1 0 1|E| Datagram_Tag | 536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 537 | RFRAG Acknowledgment Bitmap (32 bits) | 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 Figure 4: RFRAG Acknowledgment Dispatch type and Header 542 E: 1 bit; Explicit Congestion Notification Echo 544 When set, the sender indicates that at least one of the 545 acknowledged fragments was received with an Explicit Congestion 546 Notification, indicating that the path followed by the fragments 547 is subject to congestion. More in Appendix C. 549 RFRAG Acknowledgment Bitmap: An RFRAG Acknowledgment Bitmap, whereby 550 setting the bit at offset x indicates that fragment x was 551 received, as shown in Figure 2. A NULL bitmap indicates that the 552 fragmentation process is aborted. A FULL bitmap indicates that 553 the fragmentation process is complete; all fragments were received 554 at the reassembly endpoint. 556 6. Fragment Recovery 558 The Recoverable Fragment header RFRAG is used to transport a fragment 559 and optionally request an RFRAG Acknowledgment that will confirm the 560 good reception of one or more fragments. An RFRAG Acknowledgment is 561 carried as a standalone fragment header (i.e., with no 6LoWPAN 562 payload) in a message that is propagated back to the 6LoWPAN endpoint 563 that was the originator of the fragments. To achieve this, each hop 564 that performed an MPLS-like operation on fragments reverses that 565 operation for the RFRAG_ACK by sending a frame from the next hop to 566 the previous hop as known by its Link-Layer address in the VRB. The 567 Datagram_Tag in the RFRAG_ACK is unique to the receiver and is enough 568 information for an intermediate hop to locate the VRB that contains 569 the Datagram_Tag used by the previous hop and the Layer-2 information 570 associated with it (interface and Link-Layer address). 572 The 6LoWPAN endpoint that fragments the packets at the 6LoWPAN level 573 (the sender) also controls the number of acknowledgments by setting 574 the Ack-Request flag in the RFRAG packets. The sender may set the 575 Ack-Request flag on any fragment to perform congestion control by 576 limiting the number of outstanding fragments, which are the fragments 577 that have been sent but for which reception or loss was not 578 positively confirmed by the reassembling endpoint. The maximum 579 number of outstanding fragments is controlled by the Window-Size. It 580 is configurable and may vary in case of ECN notification. When the 581 6LoWPAN endpoint that reassembles the packets at the 6LoWPAN level 582 (the receiver) receives a fragment with the Ack-Request flag set, it 583 MUST send an RFRAG Acknowledgment back to the originator to confirm 584 reception of all the fragments it has received so far. 586 The Ack-Request ('X') set in an RFRAG marks the end of a window. 587 This flag MUST be set on the last fragment if the sender wishes to 588 protect the datagram, and it MAY be set in any intermediate fragment 589 for the purpose of flow control. 591 This automatic repeat request (ARQ) process MUST be protected by a 592 Retransmission Time Out (RTO) timer, and the fragment that carries 593 the 'X' flag MAY be retried upon a time out for a configurable number 594 of times (see Section 7.1) with an exponential backoff. Upon 595 exhaustion of the retries the sender may either abort the 596 transmission of the datagram or resend the first fragment with an 'X' 597 flag set in order to establish a new path for the datagram and obtain 598 the list of fragments that were received over the old path in the 599 acknowledgment bitmap. When the sender of the fragment knows that an 600 underlying link-layer mechanism protects the fragments, it may 601 refrain from using the RFRAG Acknowledgment mechanism, and never set 602 the Ack-Request bit. 604 The receiver MAY issue unsolicited acknowledgments. An unsolicited 605 acknowledgment signals to the sender endpoint that it can resume 606 sending if it had reached its maximum number of outstanding 607 fragments. Another use is to inform the sender that the reassembling 608 endpoint aborted the processing of an individual datagram. 610 The RFRAG Acknowledgment has an ECN indication for flow control (see 611 Appendix C). The receiver of a fragment with the 'E' (ECN) flag set 612 MUST echo that information by setting the 'E' (ECN) flag in the next 613 RFRAG Acknowledgment. 615 In order to protect the datagram, the sender transfers a controlled 616 number of fragments and flags the last fragment of a window with an 617 RFRAG Acknowledgment Request. The receiver MUST acknowledge a 618 fragment with the acknowledgment request bit set. If any fragment 619 immediately preceding an acknowledgment request is still missing, the 620 receiver MAY intentionally delay its acknowledgment to allow in- 621 transit fragments to arrive. Because it might defeat the round-trip 622 delay computation, delaying the acknowledgment should be configurable 623 and not enabled by default. 625 When enough fragments are received to cover the whole datagram, the 626 receiving endpoint reconstructs the packet, passes it to the upper 627 layer, sends an RFRAG Acknowledgment on the reverse path with a FULL 628 bitmap, and arms a short timer, e.g., on the order of an average 629 round-trip delay in the network. The FULL bitmap is used as opposed 630 to a bitmap that acknowledges only the received fragments to let the 631 intermediate nodes know that the datagram is fully received. As the 632 timer runs, the receiving endpoint absorbs the fragments that were 633 still in flight for that datagram without creating a new state. The 634 receiving endpoint aborts the communication if it keeps going on 635 beyond the duration of the timer. 637 Note that acknowledgments might consume precious resources so the use 638 of unsolicited acknowledgments should be configurable and not enabled 639 by default. 641 An observation is that streamlining forwarding of fragments generally 642 reduces the latency over the LLN mesh, providing room for retries 643 within existing upper-layer reliability mechanisms. The sender 644 protects the transmission over the LLN mesh with a retry timer that 645 is configured for a use case and may be adapted dynamically, e.g., 646 according to the method detailed in [RFC6298]. It is expected that 647 the upper layer retries obey the recommendations in [RFC8085], in 648 which case a single round of fragment recovery should fit within the 649 upper layer recovery timers. 651 Fragments are sent in a round-robin fashion: the sender sends all the 652 fragments for a first time before it retries any lost fragment; lost 653 fragments are retried in sequence, oldest first. This mechanism 654 enables the receiver to acknowledge fragments that were delayed in 655 the network before they are retried. 657 When a single frequency is used by contiguous hops, the sender should 658 insert a delay between the frames (e.g., carrying fragments) that are 659 sent to the same next hop. The delay should cover multiple 660 transmissions so as to let a frame progress a few hops and avoid 661 hidden terminal issues. This precaution is not required on channel 662 hopping technologies such as Time Slotted Channel Hopping (TSCH) 663 [RFC6554], where nodes that communicate at Layer-2 are scheduled to 664 send and receive respectively, and different hops operate on 665 different channels. 667 6.1. Forwarding Fragments 669 It is assumed that the first fragment is large enough to carry the 670 IPv6 header and make routing decisions. If that is not so, then this 671 specification MUST NOT be used. 673 This specification extends the Virtual Reassembly Buffer (VRB) 674 technique to forward fragments with no intermediate reconstruction of 675 the entire packet. It inherits operations like Datagram_Tag 676 switching and using a timer to clean the VRB once the traffic ceases. 677 The first fragment carries the IP header and it is routed all the way 678 from the fragmenting endpoint to the reassembling endpoint. Upon 679 receiving the first fragment, the routers along the path install a 680 label-switched path (LSP), and the following fragments are label- 681 switched along that path. As a consequence, the next fragments can 682 only follow the path that was set up by the first fragment and cannot 683 follow an alternate route. The Datagram_Tag is used to carry the 684 label, which is swapped in each hop. All fragments follow the same 685 path and fragments are delivered in the order in which they are sent. 687 6.1.1. Receiving the first fragment 689 In Route-Over mode, the source and destination Link-Layer addresses 690 in a frame change at each hop. The label that is formed and placed 691 in the Datagram_Tag is associated with the source Link-Layer address 692 and only valid (and unique) for that source Link-Layer address. Upon 693 receiving the first fragment (i.e., with a Sequence of zero), an 694 intermediate router creates a VRB and the associated LSP state for 695 the tuple (source Link-Layer address, Datagram_Tag) and the fragment 696 is forwarded along the IPv6 route that matches the destination IPv6 697 address in the IPv6 header as prescribed by [FRAG-FWD], where the 698 receiving endpoint allocates a reassembly buffer. 700 The LSP state enables to match the (previous Link-Layer address, 701 Datagram_Tag) in an incoming fragment to the tuple (next Link-Layer 702 address, swapped Datagram_Tag) used in the forwarded fragment and 703 points at the VRB. In addition, the router also forms a reverse LSP 704 state indexed by the MAC address of the next hop and the swapped 705 Datagram_Tag. This reverse LSP state also points at the VRB and 706 enables matching the (next Link-Layer address, swapped_Datagram_Tag) 707 found in an RFRAG Acknowledgment to the tuple (previous Link-Layer 708 address, Datagram_Tag) used when forwarding a Fragment Acknowledgment 709 (RFRAG-ACK) back to the sender endpoint. 711 The first fragment may be received a second time, indicating that it 712 did not reach the destination and was retried. In that case, it 713 SHOULD follow the same path as the first occurrence. It is up to 714 sending endpoint to determine whether to abort a transmission and 715 then retry it from scratch, which may build an entirely new path. 717 6.1.2. Receiving the next fragments 719 Upon receiving the next fragment (i.e., with a non-zero Sequence), an 720 intermediate router looks up a LSP indexed by the tuple (Link-Layer 721 address, Datagram_Tag) found in the fragment. If it is found, the 722 router forwards the fragment using the associated VRB as prescribed 723 by [FRAG-FWD]. 725 If the VRB for the tuple is not found, the router builds an RFRAG-ACK 726 to abort the transmission of the packet. The resulting message has 727 the following information: 729 * The source and destination Link-Layer addresses are swapped from 730 those found in the fragment 731 * The Datagram_Tag is set to the Datagram_Tag found in the fragment 732 * A NULL bitmap is used to signal the abort condition 734 At this point the router is all set and can send the RFRAG-ACK back 735 to the previous router. The RFRAG-ACK should normally be forwarded 736 all the way to the source using the reverse LSP state in the VRBs in 737 the intermediate routers as described in the next section. 739 [FRAG-FWD] indicates that the receiving endpoint stores "the actual 740 packet data from the fragments received so far, in a form that makes 741 it possible to detect when the whole packet has been received and can 742 be processed or forwarded". How this is computed is implementation 743 specific but relies on receiving all the bytes up to the 744 Datagram_Size indicated in the first fragment. An implementation may 745 receive overlapping fragments as the result of retries after an MTU 746 change. 748 6.2. Receiving RFRAG Acknowledgments 750 Upon receipt of an RFRAG-ACK, the router looks up a reverse LSP 751 indexed by the tuple (Link-Layer address, Datagram_Tag), which are 752 respectively the source Link-Layer address of the received frame and 753 the received Datagram_Tag. If it is found, the router forwards the 754 fragment using the associated VRB as prescribed by [FRAG-FWD], but 755 using the reverse LSP so that the RFRAG-ACK flows back to the sender 756 endpoint. 758 If the reverse LSP is not found, the router MUST silently drop the 759 RFRAG-ACK message. 761 Either way, if the RFRAG-ACK indicates that the fragment was entirely 762 received (FULL bitmap), it arms a short timer, and upon timeout, the 763 VRB and all the associated state are destroyed. Until the timer 764 elapses, fragments of that datagram may still be received, e.g. if 765 the RFRAG-ACK was lost on the way back and the source retried the 766 last fragment. In that case, the router forwards the fragment 767 according to the state in the VRB. 769 This specification does not provide a method to discover the number 770 of hops or the minimal value of MTU along those hops. But should the 771 minimal MTU decrease, it is possible to retry a long fragment (say 772 Sequence of 5) with several shorter fragments with a Sequence that 773 was not used before (e.g., 13 and 14). Fragment 5 is marked as 774 abandoned and will not be retried anymore. Note that when thi 775 smechanism is in place, it is hard to predict the total number of 776 fragments that will be needed or the final shape of the bitmap that 777 would cover the whole packet. This is why the FULL bitmap is used 778 when the receiving endpoint gets the whole datagram regardless of 779 which fragments were actually used to do so. Intermediate nodes will 780 unabiguously knw that the process is complete. Note that Path MTU 781 Discovery is out of scope for this document. 783 6.3. Aborting the Transmission of a Fragmented Packet 785 A reset is signaled on the forward path with a pseudo fragment that 786 has the Fragment_Offset, Sequence, and Fragment_Size all set to 0, 787 and no data. 789 When the sender or a router on the way decides that a packet should 790 be dropped and the fragmentation process aborted, it generates a 791 reset pseudo fragment and forwards it down the fragment path. 793 Each router next along the path the way forwards the pseudo fragment 794 based on the VRB state. If an acknowledgment is not requested, the 795 VRB and all associated state are destroyed. 797 Upon reception of the pseudo fragment, the receiver cleans up all 798 resources for the packet associated with the Datagram_Tag. If an 799 acknowledgment is requested, the receiver responds with a NULL 800 bitmap. 802 The other way around, the receiver might need to abort the process of 803 a fragmented packet for internal reasons, for instance if it is out 804 of reassembly buffers, already uses all 256 possible values of the 805 Datagram_Tag, or if it keeps receiving fragments beyond a reasonable 806 time while it considers that this packet is already fully reassembled 807 and was passed to the upper layer. In that case, the receiver SHOULD 808 indicate so to the sender with a NULL bitmap in an RFRAG 809 Acknowledgment. The RFRAG Acknowledgment is forwarded all the way 810 back to the source of the packet and cleans up all resources on the 811 way. Upon an acknowledgment with a NULL bitmap, the sender endpoint 812 MUST abort the transmission of the fragmented datagram with one 813 exception: In the particular case of the first fragment, it MAY 814 decide to retry via an alternate next hop instead. 816 6.4. Applying Recoverable Fragmentation along a Diverse Path 818 The text above can be read with the assumption of a serial path 819 between a source and a destination. Section 4.5.3 of the "6TiSCH 820 Architecture" [I-D.ietf-6tisch-architecture] defines the concept of a 821 Track that can be a complex path between a source and a destination 822 with Packet ARQ, Replication, Elimination and Overhearing (PAREO) 823 along the Track. This specification can be used along any subset of 824 the complex Track where the first fragment is flooded. The last 825 RFRAG Acknowledgment is flooded on that same subset in the reverse 826 direction. Intermediate RFRAG Acknowledgments can be flooded on any 827 sub-subset of that reverse subset that reach back to the source. 829 7. Management Considerations 831 This specification extends "On Forwarding 6LoWPAN Fragments over a 832 Multihop IPv6 Network" [FRAG-FWD] and requires the same parameters in 833 the receiver and on intermediate nodes. There is no new parameter as 834 echoing ECN is always on. These parameters typically include the 835 reassembly time-out at the receiver and an inactivity clean-up timer 836 on the intermediate nodes, and the number of messages that can be 837 processed in parallel in all nodes. 839 The configuration settings introduced by this specification only 840 apply to the sender, which is in full control of the transmission. 841 LLNs vary a lot in size (there can be thousands of nodes in a mesh), 842 in speed (from 10 Kbps to several Mbps at the PHY layer), in traffic 843 density, and in optimizations that are desired (e.g., the selection 844 of a RPL [RFC6550] Objective Function [RFC6552] impacts the shape of 845 the routing graph). 847 For that reason, only a very generic guidance can be given on the 848 settings of the sender and on whether complex algorithms are needed 849 to perform flow control or estimate the round-trip time. To cover 850 the most complex use cases, this specification enables the sender to 851 vary the fragment size, the window size, and the inter-frame gap, 852 based on the number of losses, the observed variations of the round- 853 trip time and the setting of the ECN bit. 855 7.1. Protocol Parameters 857 The management system SHOULD be capable of providing the parameters 858 listed in this section and an implementation MUST abide by those 859 parameters and in particular never exceed the minimum and maximum 860 configured boundaries. 862 An implementation must control the rate at which it sends packets 863 over the same path to allow the next hop to forward a packet before 864 it gets the next. In a wireless network that uses the same frequency 865 along a path, more time must be inserted to avoid hidden terminal 866 issues between fragments (more in Section 4.2). 868 This is controlled by the following parameter: 870 inter-frame gap: Indicates the minimum amount of time between 871 transmissions. The inter-frame gap protects the propagation of 872 one transmission before the next one is triggered and creates a 873 duty cycle that controls the ratio of air time and memory in 874 intermediate nodes that a particular datagram will use. 876 An implementation should consider the generic recommendations from 877 the IETF in the matter of flow control and rate management in 878 [RFC5033]. To control the flow, an implementation may use a dynamic 879 value of the window size (Window_Size), adapt the fragment size 880 (Fragment_Size), and insert an inter-frame gap that is longer than 881 necessary. In a large network where nodes contend for the bandwidth, 882 a larger Fragment_Size consumes less bandwidth but also reduces 883 fluidity and incurs higher chances of loss in transmission. This is 884 controlled by the following parameters: 886 MinFragmentSize: The MinFragmentSize is the minimum value for the 887 Fragment_Size. 889 OptFragmentSize: The OptFragmentSize is the value for the 890 Fragment_Size that the sender should use to start with. It is 891 greater than or equal to MinFragmentSize. It is less than or 892 equal to MaxFragmentSize. For the first fragment, it must account 893 for the expansion of the IPv6 addresses and of the Hop Limit field 894 within MTU. For all fragments, it is a balance between the 895 expected fluidity and the overhead of MAC and 6LoWPAN headers. 896 For a small MTU, the idea is to keep it close to the maximum, 897 whereas for larger MTUs, it might makes sense to keep it short 898 enough, so that the duty cycle of the transmitter is bounded, 899 e.g., to transmit at least 10 frames per second. 901 MaxFragmentSize: The MaxFragmentSize is the maximum value for the 902 Fragment_Size. It MUST be lower than the minimum MTU along the 903 path. A large value augments the chances of buffer bloat and 904 transmission loss. The value MUST be less than 512 if the unit 905 that is defined for the PHY layer is the byte. 907 MinWindowSize: The minimum value of Window_Size that the sender can 908 use. A value of 1 is RECOMMENDED. 910 OptWindowSize: The OptWindowSize is the value for the Window_Size 911 that the sender should use to start with. It is greater than or 912 equal to MinWindowSize. It is less than or equal to 913 MaxWindowSize. A rule of a thumb for OptWindowSize could be an 914 estimation of the one-way trip time divided by the inter-frame 915 gap. If the acknowledgement back is too costly, it is possible to 916 set this to 32, meaning that only the last Fragment is 917 acknowledged in the first round. 919 MaxWindowSize: The maximum value of Window_Size that the sender can 920 use. The value MUST be strictly less than 33. 922 An implementation may perform its estimate of the RTO or use a 923 configured one. The ARQ process is controlled by the following 924 parameters: 926 MinARQTimeOut: The minimum amount of time a node should wait for an 927 RFRAG Acknowledgment before it takes the next action. It MUST be 928 more than the maximum expected round-trip time in the respective 929 network. 931 OptARQTimeOut: The initial value of the RTO, which is the amount of 932 time that a sender should wait for an RFRAG Acknowledgment before 933 it takes the next action. It is greater than or equal to 934 MinARQTimeOut. It is less than or equal to MaxARQTimeOut. See 935 Appendix C for recommendations on computing the round-trip time. 936 By default a value of 3 times the maximum expected round-trip time 937 in the respective network is RECOMMENDED. 939 MaxARQTimeOut: The maximum amount of time a node should wait for the 940 RFRAG Acknowledgment before it takes the next action. It must 941 cover the longest expected round-trip time, and be several times 942 less than the time-out that covers the recomposition buffer at the 943 receiver, which is typically on the order of the minute. An upper 944 bound can be estimated to ensure that the datagram is either fully 945 transmitted or dropped before an upper layer decides to retry it. 947 MaxFragRetries: The maximum number of retries for a particular 948 fragment. A default value of 3 is RECOMMENDED. An upper bound 949 can be estimated to ensure that the datagram is either fully 950 transmitted or dropped before an upper layer decides to retry it. 952 MaxDatagramRetries: The maximum number of retries from scratch for a 953 particular datagram. A default value of 1 is RECOMMENDED. An 954 upper bound can be estimated to ensure that the datagram is either 955 fully transmitted or dropped before an upper layer decides to 956 retry it. 958 An implementation may be capable of performing flow control based on 959 ECN; see in Appendix C. This is controlled by the following 960 parameter: 962 UseECN: Indicates whether the sender should react to ECN. The 963 sender may react to ECN by varying the Window_Size between 964 MinWindowSize and MaxWindowSize, varying the Fragment_Size between 965 MinFragmentSize and MaxFragmentSize, and/or by increasing or 966 reducing the inter-frame gap. 968 7.2. Observing the network 970 The management system should monitor the number of retries and of ECN 971 settings that can be observed from the perspective of both the sender 972 and the receiver with regards to the other endpoint. It may then 973 tune the optimum size of Fragment_Size and of Window_Size, 974 OptFragmentSize, and OptWindowSize, respectively, at the sender 975 towards a particular receiver, applicable to the next datagrams. The 976 values should be bounded by the expected number of hops and reduced 977 beyond that when the number of datagrams that can traverse an 978 intermediate point may exceed its capacity and cause a congestion 979 loss. The inter-frame gap is another tool that can be used to 980 increase the spacing between fragments of the same datagram and 981 reduce the ratio of time when a particular intermediate node holds a 982 fragment of that datagram. 984 8. Security Considerations 986 This document specifies an instantiation of a 6LoWPAN Fragment 987 Forwarding technique. [FRAG-FWD] provides the generic description of 988 Fragment Forwarding and this specification inherits from it. The 989 generic considerations in the Security sections of [FRAG-FWD] apply 990 equally to this document. 992 This specification does not recommend a particular algorithm for the 993 estimation of the duration of the RTO that covers the detection of 994 the loss of a fragment with the 'X' flag set; regardless, an attacker 995 on the path may slow down or discard packets, which in turn can 996 affect the throughput of fragmented packets. 998 Compared to "Transmission of IPv6 Packets over IEEE 802.15.4 999 Networks" [RFC4944], this specification reduces the Datagram_Tag to 8 1000 bits and the tag wraps faster than with [RFC4944]. But for a 1001 constrained network where a node is expected to be able to hold only 1002 one or a few large packets in memory, 256 is still a large number. 1003 Also, the acknowledgement mechanism allows cleaning up the state 1004 rapidly once the packet is fully transmitted or aborted. 1006 The abstract Virtual Recovery Buffer inherited from [FRAG-FWD] may be 1007 used to perform a Denial-of-Service (DoS) attack against the 1008 intermediate Routers since the routers need to maintain a state per 1009 flow. The particular VRB implementation technique described in 1010 [I-D.ietf-lwig-6lowpan-virtual-reassembly] allows realigning which 1011 data goes in which fragment, which causes the intermediate node to 1012 store a portion of the data, which adds an attack vector that is not 1013 present with this specification. With this specification, the data 1014 that is transported in each fragment is conserved and the state to 1015 keep does not include any data that would not fit in the previous 1016 fragment. 1018 9. IANA Considerations 1020 This document allocates 2 patterns for a total of 4 dispatch values 1021 in Page 0 for recoverable fragments from the "Dispatch Type Field" 1022 registry that was created by "Transmission of IPv6 Packets over IEEE 1023 802.15.4 Networks" [RFC4944] and reformatted by "6LoWPAN Paging 1024 Dispatch" [RFC8025]. 1026 The suggested patterns (to be confirmed by IANA) are indicated in 1027 Table 1. 1029 +-------------+------+----------------------------------+-----------+ 1030 | Bit Pattern | Page | Header Type | Reference | 1031 +=============+======+==================================+===========+ 1032 | 11 10100x | 0 | RFRAG - Recoverable Fragment | THIS RFC | 1033 +-------------+------+----------------------------------+-----------+ 1034 | 11 10100x | 1-14 | Unassigned | | 1035 +-------------+------+----------------------------------+-----------+ 1036 | 11 10100x | 15 | Reserved for Experimental Use | RFC 8025 | 1037 +-------------+------+----------------------------------+-----------+ 1038 | 11 10101x | 0 | RFRAG-ACK - RFRAG | THIS RFC | 1039 | | | Acknowledgment | | 1040 +-------------+------+----------------------------------+-----------+ 1041 | 11 10101x | 1-14 | Unassigned | | 1042 +-------------+------+----------------------------------+-----------+ 1043 | 11 10101x | 15 | Reserved for Experimental Use | RFC 8025 | 1044 +-------------+------+----------------------------------+-----------+ 1046 Table 1: Additional Dispatch Value Bit Patterns 1048 10. Acknowledgments 1050 The author wishes to thank Michel Veillette, Dario Tedeschi, Laurent 1051 Toutain, Carles Gomez Montenegro, Thomas Watteyne, and Michael 1052 Richardson for in-depth reviews and comments. Also many thanks to 1053 Roman Danyliw, Peter Yee, Colin Perkins, Tirumaleswar Reddy Konda, 1054 Eric Vyncke, Benjamin Kaduk, Warren Kumari, Magnus Westerlund, Mirja 1055 Kuhlewind, and Erik Nordmark for their careful reviews and for 1056 helping through the IETF Last Call and IESG review process, and to 1057 Jonathan Hui, Jay Werb, Christos Polyzois, Soumitri Kolavennu, Pat 1058 Kinney, Margaret Wasserman, Richard Kelsey, Carsten Bormann, and 1059 Harry Courtice for their various contributions in the long process 1060 that lead to this document. 1062 11. Normative References 1064 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1065 "Computing TCP's Retransmission Timer", RFC 6298, 1066 DOI 10.17487/RFC6298, June 2011, 1067 . 1069 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1070 Requirement Levels", BCP 14, RFC 2119, 1071 DOI 10.17487/RFC2119, March 1997, 1072 . 1074 [RFC4944] Montenegro, G., Kushalnagar, N., Hui, J., and D. Culler, 1075 "Transmission of IPv6 Packets over IEEE 802.15.4 1076 Networks", RFC 4944, DOI 10.17487/RFC4944, September 2007, 1077 . 1079 [RFC6282] Hui, J., Ed. and P. Thubert, "Compression Format for IPv6 1080 Datagrams over IEEE 802.15.4-Based Networks", RFC 6282, 1081 DOI 10.17487/RFC6282, September 2011, 1082 . 1084 [RFC6554] Hui, J., Vasseur, JP., Culler, D., and V. Manral, "An IPv6 1085 Routing Header for Source Routes with the Routing Protocol 1086 for Low-Power and Lossy Networks (RPL)", RFC 6554, 1087 DOI 10.17487/RFC6554, March 2012, 1088 . 1090 [RFC8025] Thubert, P., Ed. and R. Cragie, "IPv6 over Low-Power 1091 Wireless Personal Area Network (6LoWPAN) Paging Dispatch", 1092 RFC 8025, DOI 10.17487/RFC8025, November 2016, 1093 . 1095 [RFC8138] Thubert, P., Ed., Bormann, C., Toutain, L., and R. Cragie, 1096 "IPv6 over Low-Power Wireless Personal Area Network 1097 (6LoWPAN) Routing Header", RFC 8138, DOI 10.17487/RFC8138, 1098 April 2017, . 1100 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1101 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1102 May 2017, . 1104 [FRAG-FWD] Watteyne, T., Thubert, P., and C. Bormann, "On Forwarding 1105 6LoWPAN Fragments over a Multihop IPv6 Network", Work in 1106 Progress, Internet-Draft, draft-ietf-6lo-minimal-fragment- 1107 13, 5 March 2020, . 1110 12. Informative References 1112 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1113 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1114 DOI 10.17487/RFC8201, July 2017, 1115 . 1117 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1118 Recommendations Regarding Active Queue Management", 1119 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1120 . 1122 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 1123 Label Switching Architecture", RFC 3031, 1124 DOI 10.17487/RFC3031, January 2001, 1125 . 1127 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1128 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1129 . 1131 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 1132 RFC 2914, DOI 10.17487/RFC2914, September 2000, 1133 . 1135 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1136 of Explicit Congestion Notification (ECN) to IP", 1137 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1138 . 1140 [RFC4919] Kushalnagar, N., Montenegro, G., and C. Schumacher, "IPv6 1141 over Low-Power Wireless Personal Area Networks (6LoWPANs): 1142 Overview, Assumptions, Problem Statement, and Goals", 1143 RFC 4919, DOI 10.17487/RFC4919, August 2007, 1144 . 1146 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 1147 Errors at High Data Rates", RFC 4963, 1148 DOI 10.17487/RFC4963, July 2007, 1149 . 1151 [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., 1152 Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, 1153 JP., and R. Alexander, "RPL: IPv6 Routing Protocol for 1154 Low-Power and Lossy Networks", RFC 6550, 1155 DOI 10.17487/RFC6550, March 2012, 1156 . 1158 [RFC6552] Thubert, P., Ed., "Objective Function Zero for the Routing 1159 Protocol for Low-Power and Lossy Networks (RPL)", 1160 RFC 6552, DOI 10.17487/RFC6552, March 2012, 1161 . 1163 [RFC7554] Watteyne, T., Ed., Palattella, M., and L. Grieco, "Using 1164 IEEE 802.15.4e Time-Slotted Channel Hopping (TSCH) in the 1165 Internet of Things (IoT): Problem Statement", RFC 7554, 1166 DOI 10.17487/RFC7554, May 2015, 1167 . 1169 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1170 (IPv6) Specification", STD 86, RFC 8200, 1171 DOI 10.17487/RFC8200, July 2017, 1172 . 1174 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1175 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1176 March 2017, . 1178 [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using 1179 Explicit Congestion Notification (ECN)", RFC 8087, 1180 DOI 10.17487/RFC8087, March 2017, 1181 . 1183 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 1184 Control Algorithms", BCP 133, RFC 5033, 1185 DOI 10.17487/RFC5033, August 2007, 1186 . 1188 [RFC6606] Kim, E., Kaspar, D., Gomez, C., and C. Bormann, "Problem 1189 Statement and Requirements for IPv6 over Low-Power 1190 Wireless Personal Area Network (6LoWPAN) Routing", 1191 RFC 6606, DOI 10.17487/RFC6606, May 2012, 1192 . 1194 [I-D.ietf-lwig-6lowpan-virtual-reassembly] 1195 Bormann, C. and T. Watteyne, "Virtual reassembly buffers 1196 in 6LoWPAN", Work in Progress, Internet-Draft, draft-ietf- 1197 lwig-6lowpan-virtual-reassembly-01, 11 March 2019, 1198 . 1201 [I-D.ietf-intarea-frag-fragile] 1202 Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O., 1203 and F. Gont, "IP Fragmentation Considered Fragile", Work 1204 in Progress, Internet-Draft, draft-ietf-intarea-frag- 1205 fragile-17, 30 September 2019, 1206 . 1209 [I-D.ietf-6tisch-architecture] 1210 Thubert, P., "An Architecture for IPv6 over the TSCH mode 1211 of IEEE 802.15.4", Work in Progress, Internet-Draft, 1212 draft-ietf-6tisch-architecture-28, 29 October 2019, 1213 . 1216 [IEEE.802.15.4] 1217 IEEE, "IEEE Standard for Low-Rate Wireless Networks", 1218 IEEE Standard 802.15.4, DOI 10.1109/IEEE 1219 P802.15.4-REVd/D01, 1220 . 1222 [Kent] Kent, C. and J. Mogul, ""Fragmentation Considered 1223 Harmful", In Proc. SIGCOMM '87 Workshop on Frontiers in 1224 Computer Communications Technology", 1225 DOI 10.1145/55483.55524, August 1987, 1226 . 1229 Appendix A. Rationale 1231 There are a number of uses for large packets in Wireless Sensor 1232 Networks. Such usages may not be the most typical or represent the 1233 largest amount of traffic over the LLN; however, the associated 1234 functionality can be critical enough to justify extra care for 1235 ensuring effective transport of large packets across the LLN. 1237 The list of those usages includes: 1239 Towards the LLN node: Firmware update: For example, a new version 1240 of the LLN node software is downloaded from a system manager 1241 over unicast or multicast services. Such a reflashing 1242 operation typically involves updating a large number of similar 1243 LLN nodes over a relatively short period of time. 1245 Packages of Commands: A number of commands or 1246 a full configuration can be packaged as a single message to 1247 ensure consistency and enable atomic execution or complete roll 1248 back. Until such commands are fully received and interpreted, 1249 the intended operation will not take effect. 1251 From the LLN node: Waveform captures: A number of consecutive 1252 samples are measured at a high rate for a short time and then 1253 transferred from a sensor to a gateway or an edge server as a 1254 single large report. 1256 Data logs: LLN nodes may generate large logs of 1257 sampled data for later extraction. LLN nodes may also generate 1258 system logs to assist in diagnosing problems on the node or 1259 network. 1261 Large data packets: Rich data types might 1262 require more than one fragment. 1264 Uncontrolled firmware download or waveform upload can easily result 1265 in a massive increase of the traffic and saturate the network. 1267 When a fragment is lost in transmission, the lack of recovery in the 1268 original fragmentation system of RFC 4944 implies that all fragments 1269 would need to be resent, further contributing to the congestion that 1270 caused the initial loss, and potentially leading to congestion 1271 collapse. 1273 This saturation may lead to excessive radio interference, or random 1274 early discard (leaky bucket) in relaying nodes. Additional queuing 1275 and memory congestion may result while waiting for a low power next 1276 hop to emerge from its sleeping state. 1278 Considering that RFC 4944 defines an MTU is 1280 bytes and that in 1279 most incarnations (but 802.15.4g) a IEEE Std. 802.15.4 frame can 1280 limit the MAC payload to as few as 74 bytes, a packet might be 1281 fragmented into at least 18 fragments at the 6LoWPAN shim layer. 1282 Taking into account the worst-case header overhead for 6LoWPAN 1283 Fragmentation and Mesh Addressing headers will increase the number of 1284 required fragments to around 32. This level of fragmentation is much 1285 higher than that traditionally experienced over the Internet with 1286 IPv4 fragments. At the same time, the use of radios increases the 1287 probability of transmission loss and Mesh-Under techniques compound 1288 that risk over multiple hops. 1290 Mechanisms such as TCP or application-layer segmentation could be 1291 used to support end-to-end reliable transport. One option to support 1292 bulk data transfer over a frame-size-constrained LLN is to set the 1293 Maximum Segment Size to fit within the link maximum frame size. 1294 Doing so, however, can add significant header overhead to each 1295 802.15.4 frame and cause extraneous acknowledgements across the LLN 1296 compared to the method in this specification. 1298 Appendix B. Requirements 1300 For one-hop communications, a number of Low Power and Lossy Network 1301 (LLN) link-layers propose a local acknowledgment mechanism that is 1302 enough to detect and recover the loss of fragments. In a multihop 1303 environment, an end-to-end fragment recovery mechanism might be a 1304 good complement to a hop-by-hop MAC level recovery. This draft 1305 introduces a simple protocol to recover individual fragments between 1306 6LoWPAN endpoints that may be multiple hops away. 1308 The method addresses the following requirements of an LLN: 1310 Number of fragments: The recovery mechanism must support highly 1311 fragmented packets, with a maximum of 32 fragments per packet. 1313 Minimum acknowledgment overhead: Because the radio is half duplex, 1314 and because of silent time spent in the various medium access 1315 mechanisms, an acknowledgment consumes roughly as many resources 1316 as a data fragment. 1318 The new end-to-end fragment recovery mechanism should be able to 1319 acknowledge multiple fragments in a single message and not require 1320 an acknowledgment at all if fragments are already protected at a 1321 lower layer. 1323 Controlled latency: The recovery mechanism must succeed or give up 1324 within the time boundary imposed by the recovery process of the 1325 Upper Layer Protocols. 1327 Optional congestion control: The aggregation of multiple concurrent 1328 flows may lead to the saturation of the radio network and 1329 congestion collapse. 1331 The recovery mechanism should provide means for controlling the 1332 number of fragments in transit over the LLN. 1334 Appendix C. Considerations on Flow Control 1336 Considering that a multi-hop LLN can be a very sensitive environment 1337 due to the limited queuing capabilities of a large population of its 1338 nodes, this draft recommends a simple and conservative approach to 1339 Congestion Control, based on TCP congestion avoidance. 1341 Congestion on the forward path is assumed in case of packet loss, and 1342 packet loss is assumed upon time out. The draft allows controlling 1343 the number of outstanding fragments that have been transmitted but 1344 for which an acknowledgment was not received yet. 1346 Congestion on the forward path can also be indicated by an Explicit 1347 Congestion Notification (ECN) mechanism. Though whether and how ECN 1348 [RFC3168] is carried out over the LoWPAN is out of scope, this draft 1349 provides a way for the destination endpoint to echo an ECN indication 1350 back to the source endpoint in an acknowledgment message as 1351 represented in Figure 4 in Section 5.2. While the support of echoing 1352 the ECN at the receiver in mandatory, this specification does not 1353 provide the flow control mechanism that react to the congestion at 1354 teh sender endpoint. A minimalistic behaviour could be to reset the 1355 window to 1 so the fragments are sent and acknowledged one by one 1356 till the end of the datagram. 1358 It must be noted that congestion and collision are different topics. 1359 In particular, when a mesh operates on the same channel over multiple 1360 hops, then the forwarding of a fragment over a certain hop may 1361 collide with the forwarding of the next fragment that is following 1362 over a previous hop but in the same interference domain. This draft 1363 enables end-to-end flow control, but leaves it to the sender stack to 1364 pace individual fragments within a transmit window, so that a given 1365 fragment is sent only when the previous fragment has had a chance to 1366 progress beyond the interference domain of this hop. In the case of 1367 6TiSCH [I-D.ietf-6tisch-architecture], which operates over the 1368 TimeSlotted Channel Hopping [RFC7554] (TSCH) mode of operation of 1369 IEEE802.14.5, a fragment is forwarded over a different channel at a 1370 different time and it makes full sense to transmit the next fragment 1371 as soon as the previous fragment has had its chance to be forwarded 1372 at the next hop. 1374 From the standpoint of a source 6LoWPAN endpoint, an outstanding 1375 fragment is a fragment that was sent but for which no explicit 1376 acknowledgment was received yet. This means that the fragment might 1377 be on the way, received but not yet acknowledged, or the 1378 acknowledgment might be on the way back. It is also possible that 1379 either the fragment or the acknowledgment was lost on the way. 1381 From the sender standpoint, all outstanding fragments might still be 1382 in the network and contribute to its congestion. There is an 1383 assumption, though, that after a certain amount of time, a frame is 1384 either received or lost, so it is not causing congestion anymore. 1385 This amount of time can be estimated based on the round-trip time 1386 between the 6LoWPAN endpoints. For the lack of a more adapted 1387 technique, the method detailed in "Computing TCP's Retransmission 1388 Timer" [RFC6298] may be used for that computation. 1390 The reader is encouraged to read through "Congestion Control 1391 Principles" [RFC2914]. Additionally [RFC7567] and [RFC5681] provide 1392 deeper information on why this mechanism is needed and how TCP 1393 handles Congestion Control. Basically, the goal here is to manage 1394 the number of fragments present in the network; this is achieved by 1395 to reducing the number of outstanding fragments over a congested path 1396 by throttling the sources. 1398 Section 6 describes how the sender decides how many fragments are 1399 (re)sent before an acknowledgment is required, and how the sender 1400 adapts that number to the network conditions. 1402 Author's Address 1404 Pascal Thubert (editor) 1405 Cisco Systems, Inc 1406 Building D 1407 45 Allee des Ormes - BP1200 1408 06254 MOUGINS - Sophia Antipolis 1409 France 1411 Phone: +33 497 23 26 34 1412 Email: pthubert@cisco.com