idnits 2.17.00 (12 Aug 2021) /tmp/idnits19093/draft-ietf-6lo-fragment-recovery-17.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. (Using the creation date from RFC4944, updated by this document, for RFC5378 checks: 2005-07-13) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (18 March 2020) is 794 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 4919 ** Downref: Normative reference to an Informational RFC: RFC 6606 == Outdated reference: draft-ietf-6lo-minimal-fragment has been published as RFC 8930 == Outdated reference: A later version (-02) exists of draft-ietf-lwig-6lowpan-virtual-reassembly-01 == Outdated reference: draft-ietf-intarea-frag-fragile has been published as RFC 8900 == Outdated reference: draft-ietf-6tisch-architecture has been published as RFC 9030 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 6lo P. Thubert, Ed. 3 Internet-Draft Cisco Systems 4 Updates: 4944 (if approved) 18 March 2020 5 Intended status: Standards Track 6 Expires: 19 September 2020 8 6LoWPAN Selective Fragment Recovery 9 draft-ietf-6lo-fragment-recovery-17 11 Abstract 13 This draft updates RFC 4944 with a simple protocol to recover 14 individual fragments across a route-over mesh network, with a minimal 15 flow control to protect the network against bloat. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at https://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on 19 September 2020. 34 Copyright Notice 36 Copyright (c) 2020 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 41 license-info) in effect on the date of publication of this document. 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. Code Components 44 extracted from this document must include Simplified BSD License text 45 as described in Section 4.e of the Trust Legal Provisions and are 46 provided without warranty as described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2.1. BCP 14 . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 2.2. References . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.3. Other Terms . . . . . . . . . . . . . . . . . . . . . . . 5 55 3. Updating RFC 4944 . . . . . . . . . . . . . . . . . . . . . . 6 56 4. Extending draft-ietf-6lo-minimal-fragment . . . . . . . . . . 6 57 4.1. Slack in the First Fragment . . . . . . . . . . . . . . . 6 58 4.2. Gap between frames . . . . . . . . . . . . . . . . . . . 7 59 4.3. Flow Control . . . . . . . . . . . . . . . . . . . . . . 7 60 4.4. Modifying the First Fragment . . . . . . . . . . . . . . 8 61 5. New Dispatch types and headers . . . . . . . . . . . . . . . 8 62 5.1. Recoverable Fragment Dispatch type and Header . . . . . . 9 63 5.2. RFRAG Acknowledgment Dispatch type and Header . . . . . . 11 64 6. Fragment Recovery . . . . . . . . . . . . . . . . . . . . . . 12 65 6.1. Forwarding Fragments . . . . . . . . . . . . . . . . . . 15 66 6.1.1. Receiving the first fragment . . . . . . . . . . . . 15 67 6.1.2. Receiving the next fragments . . . . . . . . . . . . 16 68 6.2. Receiving RFRAG Acknowledgments . . . . . . . . . . . . . 16 69 6.3. Aborting the Transmission of a Fragmented Packet . . . . 17 70 6.4. Applying Recoverable Fragmentation along a Diverse 71 Path . . . . . . . . . . . . . . . . . . . . . . . . . . 18 72 7. Management Considerations . . . . . . . . . . . . . . . . . . 18 73 7.1. Protocol Parameters . . . . . . . . . . . . . . . . . . . 19 74 7.2. Observing the network . . . . . . . . . . . . . . . . . . 21 75 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 76 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 77 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 78 11. Normative References . . . . . . . . . . . . . . . . . . . . 23 79 12. Informative References . . . . . . . . . . . . . . . . . . . 24 80 Appendix A. Rationale . . . . . . . . . . . . . . . . . . . . . 27 81 Appendix B. Requirements . . . . . . . . . . . . . . . . . . . . 28 82 Appendix C. Considerations on Flow Control . . . . . . . . . . . 29 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 30 85 1. Introduction 87 In most Low Power and Lossy Network (LLN) applications, the bulk of 88 the traffic consists of small chunks of data (on the order of a few 89 bytes to a few tens of bytes) at a time. Given that an IEEE Std. 90 802.15.4 [IEEE.802.15.4] frame can carry a payload of 74 bytes or 91 more, fragmentation is usually not required. However, and though 92 this happens only occasionally, a number of mission critical 93 applications do require the capability to transfer larger chunks of 94 data, for instance to support the firmware upgrade of the LLN nodes 95 or the extraction of logs from LLN nodes. 97 In the former case, the large chunk of data is transferred to the LLN 98 node, whereas in the latter, the large chunk flows away from the LLN 99 node. In both cases, the size can be on the order of 10 kilobytes or 100 more and an end-to-end reliable transport is required. 102 "Transmission of IPv6 Packets over IEEE 802.15.4 Networks" [RFC4944] 103 defines the original 6LoWPAN datagram fragmentation mechanism for 104 LLNs. One critical issue with this original design is that routing 105 an IPv6 [RFC8200] packet across a route-over mesh requires the 106 reassembly of the packet at each hop. The "6TiSCH Architecture" 107 [I-D.ietf-6tisch-architecture] indicates that this may cause latency 108 along a path and impact critical resources such as memory and 109 battery; to alleviate those undesirable effects it recommends using a 110 6LoWPAN Fragment Forwarding (6FF) technique . 112 "LLN Minimal Fragment Forwarding" [FRAG-FWD] specifies the generic 113 behavior that all 6FF techniques including this specification follow, 114 and presents the associated caveats. In particular, the routing 115 information is fully indicated in the first fragment, which is always 116 forwarded first. With this specification, the first fragment is 117 identified by a Sequence of 0 as opposed to a dispatch type in 118 [RFC4944]. A state is formed and used to forward all the next 119 fragments along the same path. The Datagram_Tag is locally 120 significant to the Layer-2 source of the packet and is swapped at 121 each hop, more in Section 6. This specification encodes the 122 Datagram_Tag in one byte, which will saturate if more than 256 123 datagrams transit in fragmented form over a single hop at the same 124 time. This is not realistic at the time of this writing. Should 125 this happen in a new 6LoWPAN technology, a node will need to use 126 several Link-Layer addresses to increase its indexing capacity. 128 "Virtual reassembly buffers in 6LoWPAN" [LWIG-FRAG](VRB) proposes a 129 6FF technique that is compatible with [RFC4944] without the need to 130 define a new protocol. However, adding that capability alone to the 131 local implementation of the original 6LoWPAN fragmentation would not 132 address the inherent fragility of fragmentation (see [FRAG-ILE]) in 133 particular the issues of resources locked on the reassembling 134 endpoint and the wasted transmissions due to the loss of a single 135 fragment in a whole datagram. [Kent] compares the unreliable 136 delivery of fragments with a mechanism it calls "selective 137 acknowledgements" that recovers the loss of a fragment individually. 138 The paper illustrates the benefits that can be derived from such a 139 method in figures 1, 2 and 3, on pages 6 and 7. [RFC4944] has no 140 selective recovery and the whole datagram fails when one fragment is 141 not delivered to the reassembling endpoint. Constrained memory 142 resources are blocked on the reassembling endpoint until it times 143 out, possibly causing the loss of subsequent packets that cannot be 144 received for the lack of buffers. 146 That problem is exacerbated when forwarding fragments over multiple 147 hops since a loss at an intermediate hop will not be discovered by 148 either the fragmenting and reassembling endpoints, and the source 149 will keep on sending fragments, wasting even more resources in the 150 network since the datagram cannot arrive in its entirety, and 151 possibly contributing to the condition that caused the loss. 152 [RFC4944] is also missing signaling to abort a multi-fragment 153 transmission at any time and from either end, and, if the capability 154 to forward fragments is implemented, clean up the related state in 155 the network. It is also lacking flow control capabilities to avoid 156 participating in congestion that may in turn cause the loss of a 157 fragment and potentially the retransmission of the full datagram. 159 This specification provides a method to forward fragments over 160 typically a few hops in a route-over 6LoWPAN mesh, and a selective 161 acknowledgment to recover individual fragments between 6LoWPAN 162 endpoints. The method can help limit the congestion loss in the 163 network and addresses the requirements in Appendix B. Deployments 164 are expected to be managed and homogeneous, and an incremental 165 transition requires a flag day. 167 2. Terminology 169 2.1. BCP 14 171 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 172 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 173 "OPTIONAL" in this document are to be interpreted as described in BCP 174 14 [RFC2119][RFC8174] when, and only when, they appear in all 175 capitals, as shown here. 177 2.2. References 179 This document uses 6LoWPAN terms and concepts that are presented in 180 "IPv6 over Low-Power Wireless Personal Area Networks (6LoWPANs): 181 Overview, Assumptions, Problem Statement, and Goals" [RFC4919], 182 "Transmission of IPv6 Packets over IEEE 802.15.4 Networks" [RFC4944], 183 and "Problem Statement and Requirements for IPv6 over Low-Power 184 Wireless Personal Area Network (6LoWPAN) Routing" [RFC6606]. 186 "LLN Minimal Fragment Forwarding" [FRAG-FWD] discusses the generic 187 concept of a Virtual Reassembly Buffer (VRB) and specifies behaviors 188 and caveats that are common to a large family of 6FF techniques 189 including the mechanism specified by this document, which fully 190 inherits from that specification. It also defines terms used in this 191 document: Compressed Form, Datagram_Tag, Datagram_Size, 192 Fragment_Offset, and 6LoWPAN Fragment Forwarding endpoint (commonly 193 abbreviated as only "endpoint"). 195 Past experience with fragmentation has shown that misassociated or 196 lost fragments can lead to poor network behavior and, occasionally, 197 trouble at the application layer. The reader is encouraged to read 198 "IPv4 Reassembly Errors at High Data Rates" [RFC4963] and follow the 199 references for more information. That experience led to the 200 definition of "Path MTU discovery" [RFC8201] (PMTUD) protocol that 201 limits fragmentation over the Internet. Specifically in the case of 202 UDP, valuable additional information can be found in "UDP Usage 203 Guidelines for Application Designers" [RFC8085]. 205 "The Benefits of Using Explicit Congestion Notification (ECN)" 206 [RFC8087] provides useful information on the potential benefits and 207 pitfalls of using ECN. 209 Quoting the "Multiprotocol Label Switching (MPLS) Architecture" 210 [RFC3031]: with MPLS, 'packets are "labeled" before they are 211 forwarded' along a Label Switched Path (LSP). At subsequent hops, 212 there is no further analysis of the packet's network layer header. 213 Rather, the label is used as an index into a table which specifies 214 the next hop, and a new label". [FRAG-FWD] leverages MPLS to forward 215 fragments that actually do not have a network layer header, since the 216 fragmentation occurs below IP, and this specification makes it 217 reversible so the reverse path can be followed as well. 219 2.3. Other Terms 221 This specification uses the following terms: 223 RFRAG: Recoverable Fragment 225 RFRAG-ACK: Recoverable Fragment Acknowledgement 227 RFRAG Acknowledgment Request: An RFRAG with the Acknowledgement 228 Request flag ('X' flag) set. 230 NULL bitmap: Refers to a bitmap with all bits set to zero. 232 FULL bitmap: Refers to a bitmap with all bits set to one. 234 Reassembling endpoint: The receiving endpoint 236 Fragmenting endpoint: The sending endpoint 238 Forward direction: The direction of a path, which is followed by the 239 RFRAG. 241 Reverse direction: The reverse direction of a path, which is taken 242 by the RFRAG-ACK. 244 3. Updating RFC 4944 246 This specification updates the fragmentation mechanism that is 247 specified in "Transmission of IPv6 Packets over IEEE 802.15.4 248 Networks" [RFC4944] for use in route-over LLNs by providing a model 249 where fragments can be forwarded end-to-end across a 6LoWPAN LLN, and 250 where fragments that are lost on the way can be recovered 251 individually. A new format for fragments is introduced and new 252 dispatch types are defined in Section 5. 254 [RFC8138] allows modifying the size of a packet en route by removing 255 the consumed hops in a compressed Routing Header. This requires that 256 Fragment_Offset and Datagram_Size (see Section 2.3) are also modified 257 en route, which is difficult to do in the uncompressed form. This 258 specification expresses those fields in the Compressed Form and 259 allows modifying them en route (see Section 4.4) easily. 261 Consistently with Section 2 of [RFC6282], for the fragmentation 262 mechanism described in Section 5.3 of [RFC4944], any header that 263 cannot fit within the first fragment MUST NOT be compressed when 264 using the fragmentation mechanism described in this specification. 266 4. Extending draft-ietf-6lo-minimal-fragment 268 This specification implements the generic 6FF technique defined in 269 "LLN Minimal Fragment Forwarding" [FRAG-FWD], provides end-to-end 270 fragment recovery and mechanisms that can be used for flow control. 272 4.1. Slack in the First Fragment 274 [FRAG-FWD] allows for refragmenting in intermediate nodes, meaning 275 that some bytes from a given fragment may be left in the VRB to be 276 added to the next fragment. The need for more space in the outgoing 277 fragment than was needed for the incoming fragment arises when the 278 6LoWPAN Header Compression is not as efficient on the outgoing link 279 or the Link MTU is reduced. 281 This specification cannot allow such a refragmentation operation 282 since the fragments are recovered end-to-end based on a sequence 283 number. The Fragment_Size MUST be tailored to fit the minimal MTU 284 along the path, and the first fragment that contains a 6LoWPAN- 285 compressed header MUST have enough slack to enable a less efficient 286 compression in the next hops to still fits within the Link MTU. If 287 the fragmenting endpoint is also the 6LoWPAN compression endpoint, it 288 will elide the IID of the source IPv6 address if it matches the Link- 289 Layer address [RFC6282]. In a network with a consistent MTU, it MUST 290 compute the Fragment_Size as if the MTU was 8 bytes less, so the next 291 hop can expand the IID within the same fragment. 293 4.2. Gap between frames 295 [FRAG-FWD] requires that a configurable interval of time is inserted 296 between transmissions to the same next hop and in particular between 297 fragments of a same datagram. In the case of half duplex interfaces, 298 this inter-frame gap ensures that the next hop is done forwarding the 299 previous frame and is capable of receiving the next one. 301 In the case of a mesh operating at a single frequency with 302 omnidirectional antennas, a larger inter-frame gap is required to 303 protect the frame against hidden terminal collisions with the 304 previous frame of the same flow that is still progressing along a 305 common path. 307 The inter-frame gap is useful even for unfragmented datagrams, but it 308 becomes a necessity for fragments that are typically generated in a 309 fast sequence and are all sent over the exact same path. 311 4.3. Flow Control 313 The inter-frame gap is the only protection that [FRAG-FWD] imposes by 314 default. This document enables to group fragments in windows and 315 request intermediate acknowledgements so the number of in-flight 316 fragments can be bounded. This document also adds an ECN mechanism 317 that can be used to adapt the size of the window, the size of the 318 fragments, and/or the inter-frame gap to protect the network. 320 This specification enables the fragmenting endpoint to apply a flow 321 control mechanism to tune those parameters, but the mechanism itself 322 is out of scope. In most cases, the expectation is that most 323 datagrams will require only a few fragments, and that only the last 324 fragment will be acknowledged. A basic implementation of the 325 fragmenting endpoint is NOT REQUIRED to vary the size of the window, 326 the duration of the inter-frame gap or the size of a fragment in the 327 middle of the transmission of a datagram, and it MAY ignore the ECN 328 signal or simply reset the window to 1 (see Appendix C for more) 329 until the end of this datagram upon detecting a congestion. 331 An intermediate node that experiences a congestion MAY set the ECN 332 bit in a fragment, and the reassembling endpoint echoes the ECN bit 333 at most once at the next opportunity to acknowledge back. 335 The size of the fragments is typically computed from the Link MTU to 336 maximize the size of the resulting frames. The size of the window 337 and the duration of the inter-frame gap SHOULD be configurable, to 338 roughly adapt the size of the window to the number of hops in an 339 average path, and to follow the general recommendations in 340 [FRAG-FWD], respectively. 342 4.4. Modifying the First Fragment 344 The compression of the Hop Limit, of the source and destination 345 addresses in the IPv6 Header, and of the Routing Header may change en 346 route in a Route-Over mesh LLN. If the size of the first fragment is 347 modified, then the intermediate node MUST adapt the Datagram_Size, 348 encoded in the Fragment_Size field, to reflect that difference. 350 The intermediate node MUST also save the difference of Datagram_Size 351 of the first fragment in the VRB and add it to the Fragment_Offset of 352 all the subsequent fragments that it forwards for that datagram. 354 5. New Dispatch types and headers 356 This document specifies an alternative to the 6LoWPAN fragmentation 357 sublayer [RFC4944] to emulate an Link MTU up to 2048 bytes for the 358 upper layer, which can be the 6LoWPAN Header Compression sublayer 359 that is defined in the "Compression Format for IPv6 Datagrams" 360 [RFC6282] specification. This specification also provides a reliable 361 transmission of the fragments over a multihop 6LoWPAN route-over mesh 362 network and a minimal flow control to reduce the chances of 363 congestion loss. 365 A 6LoWPAN Fragment Forwarding [FRAG-FWD] technique derived from MPLS 366 enables the forwarding of individual fragments across a 6LoWPAN 367 route-over mesh without reassembly at each hop. The Datagram_Tag is 368 used as a label; it is locally unique to the node that owns the 369 source Link-Layer address of the fragment, so together the Link-Layer 370 address and the label can identify the fragment globally within the 371 lifetime of the datagram. A node may build the Datagram_Tag in its 372 own locally-significant way, as long as the chosen Datagram_Tag stays 373 unique to the particular datagram for its lifetime. The result is 374 that the label does not need to be globally unique but also that it 375 must be swapped at each hop as the source Link-Layer address changes. 377 In the following sections, a "Datagram_Tag" extends the semantics 378 defined in [RFC4944] Section 5.3."Fragmentation Type and Header". 379 The Datagram_Tag is a locally unique identifier for the datagram from 380 the perspective of the sender. This means that the Datagram_Tag 381 identifies a datagram uniquely in the network when associated with 382 the source of the datagram. As the datagram gets forwarded, the 383 source changes and the Datagram_Tag must be swapped as detailed in 384 [FRAG-FWD]. 386 This specification extends RFC 4944 [RFC4944] with 2 new Dispatch 387 types, for Recoverable Fragment (RFRAG) and for the RFRAG 388 Acknowledgment back. The new 6LoWPAN Dispatch types are taken from 389 Page 0 [RFC8025] as indicated in Table 1 in Section 9. 391 5.1. Recoverable Fragment Dispatch type and Header 393 In this specification, if the packet is compressed then the size and 394 offset of the fragments are expressed with respect to the Compressed 395 Form of the packet form as opposed to the uncompressed (native) form. 397 The format of the fragment header is shown in Figure 1. It is the 398 same for all fragments though the Fragment_Offset is overloaded. The 399 format has a length and an offset, as well as a Sequence field. This 400 would be redundant if the offset was computed as the product of the 401 Sequence by the length, but this is not the case. The position of a 402 fragment in the reassembly buffer is neither correlated with the 403 value of the Sequence field nor with the order in which the fragments 404 are received. This enables refragmenting to cope with an MTU 405 deduction, see the example of the fragment seq. 5 that is retried 406 end-to-end as smaller fragments seq. 13 and 14 in Section 6.2. 408 The first fragment is recognized by a Sequence of 0; it carries its 409 Fragment_Size and the Datagram_Size of the compressed packet before 410 it is fragmented, whereas the other fragments carry their 411 Fragment_Size and Fragment_Offset. The last fragment for a datagram 412 is recognized when its Fragment_Offset and its Fragment_Size add up 413 to the stored Datagram_Size of the packet identified by the sender 414 Link-Layer address and the Datagram_Tag. 416 1 2 3 417 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 419 |1 1 1 0 1 0 0|E| Datagram_Tag | 420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 421 |X| Sequence| Fragment_Size | Fragment_Offset | 422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 424 X set == Ack-Request 426 Figure 1: RFRAG Dispatch type and Header 428 X: 1 bit; Ack-Request: when set, the fragmenting endpoint requires 429 anLink-layer RFRAG Acknowledgment from the reassembling endpoint. 431 E: 1 bit; Explicit Congestion Notification; the "E" flag is cleared 432 by the source of the fragment and set by intermediate routers to 433 signal that this fragment experienced congestion along its path. 435 Fragment_Size: 10-bit unsigned integer; the size of this fragment in 436 a unit that depends on the Link-Layer technology. Unless 437 overridden by a more specific specification, that unit is the 438 byte, which allows fragments up to 1024 bytes. 440 Datagram_Tag: 8 bits; an identifier of the datagram that is locally 441 unique to the Link-Layer sender. 443 Sequence: 5-bit unsigned integer; the sequence number of the 444 fragment in the acknowledgement bitmap. Fragments are numbered 445 [0..N] where N is in [0..31]. A Sequence of 0 indicates the first 446 fragment in a datagram, but non-zero values are not indicative of 447 the position in the reassembly buffer. 449 Fragment_Offset: 16-bit unsigned integer. 451 When the Fragment_Offset is set to a non-0 value, its semantics 452 depend on the value of the Sequence field as follows: 454 * For a first fragment (i.e., with a Sequence of 0), this field 455 indicates the Datagram_Size of the compressed datagram, to help 456 the reassembling endpoint allocate an adapted buffer for the 457 reception and reassembly operations. The fragment may be 458 stored for local reassembly. Alternatively, it may be routed 459 based on the destination IPv6 address. In that case, a VRB 460 state must be installed as described in Section 6.1.1. 461 * When the Sequence is not 0, this field indicates the offset of 462 the fragment in the Compressed Form of the datagram. The 463 fragment may be added to a local reassembly buffer or forwarded 464 based on an existing VRB as described in Section 6.1.2. 466 A Fragment_Offset that is set to a value of 0 indicates an abort 467 condition and all state regarding the datagram should be cleaned 468 up once the processing of the fragment is complete; the processing 469 of the fragment depends on whether there is a VRB already 470 established for this datagram, and the next hop is still 471 reachable: 473 * if a VRB already exists and the next hop is still reachable, 474 the fragment is to be forwarded along the associated Label 475 Switched Path (LSP) as described in Section 6.1.2, without 476 checking the value of the Sequence field; 477 * else, if the Sequence is 0, then the fragment is to be routed 478 as described in Section 6.1.1, but no state is conserved 479 afterwards. In that case, the session if it exists is aborted 480 and the packet is also forwarded in an attempt to clean up the 481 next hops along the path indicated by the IPv6 header (possibly 482 including a routing header). 483 * else (the Sequence is nonzero and either no VRB exists or the 484 next hop is unavailable), the fragment cannot be forwarded or 485 routed; the fragment is discarded and an abort RFRAG-ACK is 486 sent back to the source as described in Section 6.1.2. 488 There is no requirement on the reassembling endpoint to check that 489 the received fragments are consecutive and non-overlapping. The 490 fragmenting endpoint knows that the datagram is fully received when 491 the acknowledged fragments cover the whole datagram, which is always 492 the case with a FULL bitmap. This may be useful in particular in the 493 case where the MTU changes and a fragment Sequence is retried with a 494 smaller Fragment_Size, the remainder of the original fragment being 495 retried with new Sequence values. 497 Recoverable Fragments are sequenced and a bitmap is used in the RFRAG 498 Acknowledgment to indicate the received fragments by setting the 499 individual bits that correspond to their sequence. 501 5.2. RFRAG Acknowledgment Dispatch type and Header 503 This specification also defines a 4-byte RFRAG Acknowledgment bitmap 504 that is used by the reassembling endpoint to confirm selectively the 505 reception of individual fragments. A given offset in the bitmap maps 506 one-to-one with a given sequence number and indicates which fragment 507 is acknowledged as follows: 509 1 2 3 510 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 511 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 512 | RFRAG Acknowledgment Bitmap | 513 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 514 ^ ^ 515 | | bitmap indicating whether: 516 | +----- Fragment with Sequence 9 was received 517 +----------------------- Fragment with Sequence 0 was received 519 Figure 2: RFRAG Acknowledgment Bitmap Encoding 521 Figure 3 shows an example Acknowledgment bitmap which indicates that 522 all fragments from Sequence 0 to 20 were received, except for 523 fragments 1, 2 and 16 were lost and must be retried. 525 1 2 3 526 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 528 |1|0|0|1|1|1|1|1|1|1|1|1|1|1|1|1|0|1|1|1|1|0|0|0|0|0|0|0|0|0|0|0| 529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 Figure 3: Example RFRAG Acknowledgment Bitmap 533 The RFRAG Acknowledgment Bitmap is included in an RFRAG 534 Acknowledgment header, as follows: 536 1 2 3 537 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 |1 1 1 0 1 0 1|E| Datagram_Tag | 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 | RFRAG Acknowledgment Bitmap (32 bits) | 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 544 Figure 4: RFRAG Acknowledgment Dispatch type and Header 546 E: 1 bit; Explicit Congestion Notification Echo 548 When set, the fragmenting endpoint indicates that at least one of 549 the acknowledged fragments was received with an Explicit 550 Congestion Notification, indicating that the path followed by the 551 fragments is subject to congestion. More in Appendix C. 553 Datagram_Tag: 8 bits; an identifier of the datagram that is locally 554 unique to the Link-Layer recipient. 556 RFRAG Acknowledgment Bitmap: An RFRAG Acknowledgment Bitmap, whereby 557 setting the bit at offset x indicates that fragment x was 558 received, as shown in Figure 2. A NULL bitmap indicates that the 559 fragmentation process is aborted. A FULL bitmap indicates that 560 the fragmentation process is complete; all fragments were received 561 at the reassembly endpoint. 563 6. Fragment Recovery 565 The Recoverable Fragment header RFRAG is used to transport a fragment 566 and optionally request an RFRAG Acknowledgment that will confirm the 567 good reception of one or more fragments. An RFRAG Acknowledgment is 568 carried as a standalone fragment header (i.e., with no 6LoWPAN 569 payload) in a message that is propagated back to the fragmenting 570 endpoint. To achieve this, each hop that performed an MPLS-like 571 operation on fragments reverses that operation for the RFRAG_ACK by 572 sending a frame from the next hop to the previous hop as known by its 573 Link-Layer address in the VRB. The Datagram_Tag in the RFRAG_ACK is 574 unique to the reassembling endpoint and is enough information for an 575 intermediate hop to locate the VRB that contains the Datagram_Tag 576 used by the previous hop and the Layer-2 information associated with 577 it (interface and Link-Layer address). 579 The fragmenting endpoint that fragments the packets at the 6LoWPAN 580 level also controls the number of acknowledgments by setting the Ack- 581 Request flag in the RFRAG packets. The fragmenting endpoint may set 582 the Ack-Request flag on any fragment to perform congestion control by 583 limiting the number of outstanding fragments, which are the fragments 584 that have been sent but for which reception or loss was not 585 positively confirmed by the reassembling endpoint. The maximum 586 number of outstanding fragments is controlled by the Window-Size. It 587 is configurable and may vary in case of ECN notification. When the 588 endpoint that reassembles the packets at the 6LoWPAN level receives a 589 fragment with the Ack-Request flag set, it MUST send an RFRAG 590 Acknowledgment back to the originator to confirm reception of all the 591 fragments it has received so far. 593 The Ack-Request ('X') set in an RFRAG marks the end of a window. 594 This flag MUST be set on the last fragment if the fragmenting 595 endpoint wishes to perform an automatic repeat request (ARQ) process 596 for the datagram, and it MAY be set in any intermediate fragment for 597 the purpose of flow control. 599 This ARQ process MUST be protected by a Retransmission Time Out (RTO) 600 timer, and the fragment that carries the 'X' flag MAY be retried upon 601 a time out for a configurable number of times (see Section 7.1) with 602 an exponential backoff. Upon exhaustion of the retries the 603 fragmenting endpoint may either abort the transmission of the 604 datagram or resend the first fragment with an 'X' flag set in order 605 to establish a new path for the datagram and obtain the list of 606 fragments that were received over the old path in the acknowledgment 607 bitmap. When the knows that an underlying link-layer mechanism 608 protects the fragments, it may refrain from using the RFRAG 609 Acknowledgment mechanism, and never set the Ack-Request bit. 611 The reassembling endpoint MAY issue unsolicited acknowledgments. An 612 unsolicited acknowledgment signals to the fragmenting endpoint that 613 it can resume sending in case it has reached its maximum number of 614 outstanding fragments. Another use is to inform the fragmenting 615 endpoint that the reassembling endpoint aborted the processing of an 616 individual datagram. 618 The RFRAG Acknowledgment carries an ECN indication for flow control 619 (see Appendix C). The reassembling endpoint of a fragment with the 620 'E' (ECN) flag set MUST echo that information at most once by setting 621 the 'E' (ECN) flag in the next RFRAG Acknowledgment. 623 In order to protect the datagram, the fragmenting endpoint transfers 624 a controlled number of fragments and flags the last fragment of a 625 window with an RFRAG Acknowledgment Request. The reassembling 626 endpoint MUST acknowledge a fragment with the acknowledgment request 627 bit set. If any fragment immediately preceding an acknowledgment 628 request is still missing, the reassembling endpoint MAY intentionally 629 delay its acknowledgment to allow in-transit fragments to arrive. 631 Because it might defeat the round-trip delay computation, delaying 632 the acknowledgment should be configurable and not enabled by default. 634 When enough fragments are received to cover the whole datagram, the 635 reassembling endpoint reconstructs the packet, passes it to the upper 636 layer, sends an RFRAG Acknowledgment on the reverse path with a FULL 637 bitmap, and arms a short timer, e.g., on the order of an average 638 round-trip delay in the network. The FULL bitmap is used as opposed 639 to a bitmap that acknowledges only the received fragments to let the 640 intermediate nodes know that the datagram is fully received. As the 641 timer runs, the reassembling endpoint absorbs the fragments that were 642 still in flight for that datagram without creating a new state, 643 acknowledging the ones that that bear an Ack-Request with an FRAG 644 Acknowledgment and the FULL bitmap. The reassembling endpoint aborts 645 the communication if fragments with matching source and Datagram-Tag 646 continue to be received after the timer expires. 648 Note that acknowledgments might consume precious resources so the use 649 of unsolicited acknowledgments SHOULD be configurable and not enabled 650 by default. 652 An observation is that streamlining forwarding of fragments generally 653 reduces the latency over the LLN mesh, providing room for retries 654 within existing upper-layer reliability mechanisms. The fragmenting 655 endpoint protects the transmission over the LLN mesh with a retry 656 timer that is configured for a use case and may be adapted 657 dynamically, e.g., according to the method detailed in [RFC6298]. It 658 is expected that the upper layer retries obey the recommendations in 659 [RFC8085], in which case a single round of fragment recovery should 660 fit within the upper layer recovery timers. 662 Fragments MUST be sent in a round-robin fashion: the sender MUST send 663 all the fragments for a first time before it retries any lost 664 fragment; lost fragments MUST be retried in sequence, oldest first. 665 This mechanism enables the receiver to acknowledge fragments that 666 were delayed in the network before they are retried. 668 When a single radio frequency is used by contiguous hops, the 669 fragmenting endpoint SHOULD insert a delay between the frames (e.g., 670 carrying fragments) that are sent to the same next hop. The delay 671 SHOULD cover multiple transmissions so as to let a frame progress a 672 few hops and avoid hidden terminal issues. This precaution is not 673 required on channel hopping technologies such as Time Slotted Channel 674 Hopping (TSCH) [RFC6554], where nodes that communicate at Layer-2 are 675 scheduled to send and receive respectively, and different hops 676 operate on different channels. 678 6.1. Forwarding Fragments 680 This specification inherits from [FRAG-FWD] and proposes a Virtual 681 Reassembly technique to forward fragments with no intermediate 682 reconstruction of the entire datagram. 684 The IPv6 Header MUST be placed in full in the first fragment to 685 enable the routing decision. The first fragment is routed and 686 creates an LSP from the fragmenting endpoint to the reassembling 687 endpoint. The next fragments are label-switched along that LSP. As 688 a consequence, the next fragments can only follow the path that was 689 set up by the first fragment and cannot follow an alternate route. 690 The Datagram_Tag is used to carry the label, which is swapped in each 691 hop. 693 If the first fragment is too large for the path MTU, it will 694 repeatedly fail and never establish an LSP. In that case, the 695 fragmenting endpoint MAY retry the same datagram with a smaller 696 Fragment_Size, in which case it MUST abort the original attempt and 697 use a new Datagram_Tag for the new attempt. 699 6.1.1. Receiving the first fragment 701 In Route-Over mode, the source and destination Link-Layer addresses 702 in a frame change at each hop. The label that is formed and placed 703 in the Datagram_Tag by the sender is associated with the source Link- 704 Layer address and only valid (and temporarily unique) for that source 705 Link-Layer address. 707 Upon receiving the first fragment (i.e., with a Sequence of 0), an 708 intermediate router creates a VRB and the associated LSP state 709 indexed by the incoming interface, the previous-hop Link-Layer 710 address, and the Datagram_Tag, and forwards the fragment along the 711 IPv6 route that matches the destination IPv6 address in the IPv6 712 header until it reaches the reassembling endpoint, as prescribed by 713 [FRAG-FWD]. The LSP state enables to match the next incoming 714 fragments of a datagram to the abstract forwarding information of 715 next interface, source and next-hop Link-Layer addresses, and swapped 716 Datagram_Tag. 718 In addition, the router also forms a reverse LSP state indexed by the 719 interface to the next hop, the Link-Layer address the router uses as 720 source for that datagram, and the swapped Datagram_Tag. This reverse 721 LSP state enables matching the tuple (interface, destination Link- 722 Layer address, Datagram_Tag) found in an RFRAG Acknowledgment to the 723 abstract forwarding information (previous interface, previous Link- 724 Layer address, Datagram_Tag) used to forward the Fragment 725 Acknowledgment (RFRAG-ACK) back to the fragmenting endpoint. 727 6.1.2. Receiving the next fragments 729 Upon receiving the next fragment (i.e., with a non-zero Sequence), an 730 intermediate router looks up a LSP indexed by the tuple (incoming 731 interface, previous-hop Link-Layer address, Datagram_Tag) found in 732 the fragment. If it is found, the router forwards the fragment using 733 the associated VRB as prescribed by [FRAG-FWD]. 735 If the VRB for the tuple is not found, the router builds an RFRAG-ACK 736 to abort the transmission of the packet. The resulting message has 737 the following information: 739 * The source and destination Link-Layer addresses are swapped from 740 those found in the fragment and the same interface is used 741 * The Datagram_Tag is set to the Datagram_Tag found in the fragment 742 * A NULL bitmap is used to signal the abort condition 744 At this point the router is all set and can send the RFRAG-ACK back 745 to the previous router. The RFRAG-ACK should normally be forwarded 746 all the way to the source using the reverse LSP state in the VRBs in 747 the intermediate routers as described in the next section. 749 [FRAG-FWD] indicates that the reassembling endpoint stores "the 750 actual packet data from the fragments received so far, in a form that 751 makes it possible to detect when the whole packet has been received 752 and can be processed or forwarded". How this is computed is 753 implementation specific but relies on receiving all the bytes up to 754 the Datagram_Size indicated in the first fragment. An implementation 755 may receive overlapping fragments as the result of retries after an 756 MTU change. 758 6.2. Receiving RFRAG Acknowledgments 760 Upon receipt of an RFRAG-ACK, the router looks up a reverse LSP 761 indexed by the interface and destination Link-Layer address of the 762 received frame and the received Datagram_Tag in the RFRAG-ACK. If it 763 is found, the router forwards the fragment using the associated VRB 764 as prescribed by [FRAG-FWD], but using the reverse LSP so that the 765 RFRAG-ACK flows back to the fragmenting endpoint. 767 If the reverse LSP is not found, the router MUST silently drop the 768 RFRAG-ACK message. 770 Either way, if the RFRAG-ACK indicates that the fragment was entirely 771 received (FULL bitmap), it arms a short timer, and upon timeout, the 772 VRB and all the associated state are destroyed. Until the timer 773 elapses, fragments of that datagram may still be received, e.g. if 774 the RFRAG-ACK was lost on the path back and the source retried the 775 last fragment. In that case, the router generates an RFRAG-ACK with 776 a FULL bitmap back to the fragmenting endpoint if an acknowledgement 777 was requested, else it silently drops the fragment. 779 This specification does not provide a method to discover the number 780 of hops or the minimal value of MTU along those hops. In a typical 781 case, the MTU is constant and the same across the network. But 782 should the minimal MTU along the path decrease, it is possible to 783 retry a long fragment (say Sequence of 5) with several shorter 784 fragments with a Sequence that was not used before (e.g., 13 and 14). 785 Fragment 5 is marked as abandoned and will not be retried anymore. 786 Note that when this mechanism is in place, it is hard to predict the 787 total number of fragments that will be needed or the final shape of 788 the bitmap that would cover the whole packet. This is why the FULL 789 bitmap is used when the reassembling endpoint gets the whole datagram 790 regardless of which fragments were actually used to do so. 791 Intermediate nodes will unabiguously know that the process is 792 complete. Note that Path MTU Discovery is out of scope for this 793 document. 795 6.3. Aborting the Transmission of a Fragmented Packet 797 A reset is signaled on the forward path with a pseudo fragment that 798 has the Fragment_Offset set to 0. The sender of a reset SHOULD also 799 set the Sequence and Fragment_Size field to 0. 801 When the fragmenting endpoint or a router on the path decides that a 802 packet should be dropped and the fragmentation process aborted, it 803 generates a reset pseudo fragment and forwards it down the fragment 804 path. 806 Each router next along the path the way forwards the pseudo fragment 807 based on the VRB state. If an acknowledgment is not requested, the 808 VRB and all associated state are destroyed. 810 Upon reception of the pseudo fragment, the reassembling endpoint 811 cleans up all resources for the packet associated with the 812 Datagram_Tag. If an acknowledgment is requested, the reassembling 813 endpoint responds with a NULL bitmap. 815 The other way around, the reassembling endpoint might need to abort 816 the processing of a fragmented packet for internal reasons, for 817 instance if it is out of reassembly buffers, already uses all 256 818 possible values of the Datagram_Tag, or if it keeps receiving 819 fragments beyond a reasonable time while it considers that this 820 packet is already fully reassembled and was passed to the upper 821 layer. In that case, the reassembling endpoint SHOULD indicate so to 822 the fragmenting endpoint with a NULL bitmap in an RFRAG 823 Acknowledgment. The RFRAG Acknowledgment is forwarded all the way 824 back to the source of the packet and cleans up all resources on the 825 path. Upon an acknowledgment with a NULL bitmap, the fragmenting 826 endpoint MUST abort the transmission of the fragmented datagram with 827 one exception: In the particular case of the first fragment, it MAY 828 decide to retry via an alternate next hop instead. 830 6.4. Applying Recoverable Fragmentation along a Diverse Path 832 The text above can be read with the assumption of a serial path 833 between a source and a destination. Section 4.5.3 of the "6TiSCH 834 Architecture" [I-D.ietf-6tisch-architecture] defines the concept of a 835 Track that can be a complex path between a source and a destination 836 with Packet ARQ, Replication, Elimination and Overhearing (PAREO) 837 along the Track. This specification can be used along any subset of 838 the complex Track where the first fragment is flooded. The last 839 RFRAG Acknowledgment is flooded on that same subset in the reverse 840 direction. Intermediate RFRAG Acknowledgments can be flooded on any 841 sub-subset of that reverse subset that reach back to the source. 843 7. Management Considerations 845 This specification extends "On Forwarding 6LoWPAN Fragments over a 846 Multihop IPv6 Network" [FRAG-FWD] and requires the same parameters in 847 the reassembling endpoint and on intermediate nodes. There is no new 848 parameter as echoing ECN is always on. These parameters typically 849 include the reassembly timeout at the reassembling endpoint and an 850 inactivity clean-up timer on the intermediate nodes, and the number 851 of messages that can be processed in parallel in all nodes. 853 The configuration settings introduced by this specification only 854 apply to the fragmenting endpoint, which is in full control of the 855 transmission. LLNs vary a lot in size (there can be thousands of 856 nodes in a mesh), in speed (from 10 Kbps to several Mbps at the PHY 857 layer), in traffic density, and in optimizations that are desired 858 (e.g., the selection of a RPL [RFC6550] Objective Function [RFC6552] 859 impacts the shape of the routing graph). 861 For that reason, only a very generic guidance can be given on the 862 settings of the fragmenting endpoint and on whether complex 863 algorithms are needed to perform flow control or estimate the round- 864 trip time. To cover the most complex use cases, this specification 865 enables the fragmenting endpoint to vary the fragment size, the 866 window size, and the inter-frame gap, based on the number of losses, 867 the observed variations of the round-trip time and the setting of the 868 ECN bit. 870 7.1. Protocol Parameters 872 The management system SHOULD be capable of providing the parameters 873 listed in this section and an implementation MUST abide by those 874 parameters and in particular never exceed the minimum and maximum 875 configured boundaries. 877 An implementation must control the rate at which it sends packets 878 over the same path to allow the next hop to forward a packet before 879 it gets the next. In a wireless network that uses the same frequency 880 along a path, more time must be inserted to avoid hidden terminal 881 issues between fragments (more in Section 4.2). 883 This is controlled by the following parameter: 885 inter-frame gap: Indicates the minimum amount of time between 886 transmissions. The inter-frame gap protects the propagation of 887 one transmission before the next one is triggered and creates a 888 duty cycle that controls the ratio of air time and memory in 889 intermediate nodes that a particular datagram will use. 891 An implementation should consider the generic recommendations from 892 the IETF in the matter of flow control and rate management in 893 [RFC5033]. To control the flow, an implementation may use a dynamic 894 value of the window size (Window_Size), adapt the fragment size 895 (Fragment_Size), and insert an inter-frame gap that is longer than 896 necessary. In a large network where nodes contend for the bandwidth, 897 a larger Fragment_Size consumes less bandwidth but also reduces 898 fluidity and incurs higher chances of loss in transmission. This is 899 controlled by the following parameters: 901 MinFragmentSize: The MinFragmentSize is the minimum value for the 902 Fragment_Size. 904 OptFragmentSize: The OptFragmentSize is the value for the 905 Fragment_Size that the fragmenting endpoint should use to start 906 with. It is greater than or equal to MinFragmentSize. It is less 907 than or equal to MaxFragmentSize. For the first fragment, it must 908 account for the expansion of the IPv6 addresses and of the Hop 909 Limit field within MTU. For all fragments, it is a balance 910 between the expected fluidity and the overhead of Link-Layer and 911 6LoWPAN headers. For a small MTU, the idea is to keep it close to 912 the maximum, whereas for larger MTUs, it might makes sense to keep 913 it short enough, so that the duty cycle of the transmitter is 914 bounded, e.g., to transmit at least 10 frames per second. 916 MaxFragmentSize: The MaxFragmentSize is the maximum value for the 917 Fragment_Size. It MUST be lower than the minimum MTU along the 918 path. A large value augments the chances of buffer bloat and 919 transmission loss. The value MUST be less than 512 if the unit 920 that is defined for the PHY layer is the byte. 922 MinWindowSize: The minimum value of Window_Size that the fragmenting 923 endpoint can use. A value of 1 is RECOMMENDED. 925 OptWindowSize: The OptWindowSize is the value for the Window_Size 926 that the fragmenting endpoint should use to start with. It is 927 greater than or equal to MinWindowSize. It is less than or equal 928 to MaxWindowSize. A rule of a thumb for OptWindowSize could be an 929 estimation of the one-way trip time divided by the inter-frame 930 gap. If the acknowledgement back is too costly, it is possible to 931 set this to 32, meaning that only the last Fragment is 932 acknowledged in the first round. 934 MaxWindowSize: The maximum value of Window_Size that the fragmenting 935 endpoint can use. The value MUST be strictly less than 33. 937 An implementation may perform its estimate of the RTO or use a 938 configured one. The ARQ process is controlled by the following 939 parameters: 941 MinARQTimeOut: The minimum amount of time a node should wait for an 942 RFRAG Acknowledgment before it takes the next action. It MUST be 943 more than the maximum expected round-trip time in the respective 944 network. 946 OptARQTimeOut: The initial value of the RTO, which is the amount of 947 time that a fragmenting endpoint should wait for an RFRAG 948 Acknowledgment before it takes the next action. It is greater 949 than or equal to MinARQTimeOut. It is less than or equal to 950 MaxARQTimeOut. See Appendix C for recommendations on computing 951 the round-trip time. By default a value of 3 times the maximum 952 expected round-trip time in the respective network is RECOMMENDED. 954 MaxARQTimeOut: The maximum amount of time a node should wait for the 955 RFRAG Acknowledgment before it takes the next action. It must 956 cover the longest expected round-trip time, and be several times 957 less than the timeout that covers the recomposition buffer at the 958 reassembling endpoint, which is typically on the order of the 959 minute. An upper bound can be estimated to ensure that the 960 datagram is either fully transmitted or dropped before an upper 961 layer decides to retry it. 963 MaxFragRetries: The maximum number of retries for a particular 964 fragment. A default value of 3 is RECOMMENDED. An upper bound 965 can be estimated to ensure that the datagram is either fully 966 transmitted or dropped before an upper layer decides to retry it. 968 MaxDatagramRetries: The maximum number of retries from scratch for a 969 particular datagram. A default value of 1 is RECOMMENDED. An 970 upper bound can be estimated to ensure that the datagram is either 971 fully transmitted or dropped before an upper layer decides to 972 retry it. 974 An implementation may be capable of performing flow control based on 975 ECN; see in Appendix C. This is controlled by the following 976 parameter: 978 UseECN: Indicates whether the fragmenting endpoint should react to 979 ECN. The fragmenting endpoint may react to ECN by varying the 980 Window_Size between MinWindowSize and MaxWindowSize, varying the 981 Fragment_Size between MinFragmentSize and MaxFragmentSize, and/or 982 by increasing or reducing the inter-frame gap. With this 983 specification, if UseECN is set and a fragmenting endpoint detects 984 a congestion, it resets the Window_Size to 1 till the end of the 985 datagram, whereas if UseECN is reset, the endpoint does not react 986 to congestion. Future specifications may provide additional 987 parameters and capabilities. 989 7.2. Observing the network 991 The management system should monitor the number of retries and of ECN 992 settings that can be observed from the perspective of both the 993 fragmenting endpoint and the reassembling endpoint with regards to 994 the other endpoint. It may then tune the optimum size of 995 Fragment_Size and of Window_Size, OptFragmentSize, and OptWindowSize, 996 respectively, at the fragmenting endpoint towards a particular 997 reassembling endpoint, applicable to the next datagrams. The values 998 should be bounded by the expected number of hops and reduced beyond 999 that when the number of datagrams that can traverse an intermediate 1000 point may exceed its capacity and cause a congestion loss. The 1001 inter-frame gap is another tool that can be used to increase the 1002 spacing between fragments of the same datagram and reduce the ratio 1003 of time when a particular intermediate node holds a fragment of that 1004 datagram. 1006 8. Security Considerations 1008 This document specifies an instantiation of a 6FF technique and 1009 inherits from the generic description in [FRAG-FWD]. The 1010 considerations in the Security Section of [FRAG-FWD] equally apply to 1011 this document. 1013 In addition to the threats detailed therein, an attacker that is on- 1014 path can prematurely end the transmission of a datagram by sending a 1015 RFRAG Acknowledgment to the fragmenting endpoint. It can also cause 1016 extra transmissions of fragments by resetting bits in the RFRAG 1017 Acknowledgment bitmap, and of RFRAG Acknowledgments by forcing the 1018 Ack-Request bit in fragments that it forwards. 1020 As indicated in [FRAG-FWD], Secure joining and the Link-Layer 1021 security are REQUIRED to protect against those attacks, as the 1022 fragmentation protocol does not include any native security 1023 mechanisms. 1025 This specification does not recommend a particular algorithm for the 1026 estimation of the duration of the RTO that covers the detection of 1027 the loss of a fragment with the 'X' flag set; regardless, an attacker 1028 on the path may slow down or discard packets, which in turn can 1029 affect the throughput of fragmented packets. 1031 Compared to "Transmission of IPv6 Packets over IEEE 802.15.4 1032 Networks" [RFC4944], this specification reduces the Datagram_Tag to 8 1033 bits and the tag wraps faster than with [RFC4944]. But for a 1034 constrained network where a node is expected to be able to hold only 1035 one or a few large packets in memory, 256 is still a large number. 1036 Also, the acknowledgement mechanism allows cleaning up the state 1037 rapidly once the packet is fully transmitted or aborted. 1039 The abstract Virtual Recovery Buffer inherited from [FRAG-FWD] may be 1040 used to perform a Denial-of-Service (DoS) attack against the 1041 intermediate Routers since the routers need to maintain a state per 1042 flow. The particular VRB implementation technique described in 1043 [LWIG-FRAG] allows realigning which data goes in which fragment, 1044 which causes the intermediate node to store a portion of the data, 1045 which adds an attack vector that is not present with this 1046 specification. With this specification, the data that is transported 1047 in each fragment is conserved and the state to keep does not include 1048 any data that would not fit in the previous fragment. 1050 9. IANA Considerations 1052 This document allocates 2 patterns for a total of 4 dispatch values 1053 in Page 0 for recoverable fragments from the "Dispatch Type Field" 1054 registry that was created by "Transmission of IPv6 Packets over IEEE 1055 802.15.4 Networks" [RFC4944] and reformatted by "6LoWPAN Paging 1056 Dispatch" [RFC8025]. 1058 The suggested patterns (to be confirmed by IANA) are indicated in 1059 Table 1. 1061 +-------------+------+----------------------------------+-----------+ 1062 | Bit Pattern | Page | Header Type | Reference | 1063 +=============+======+==================================+===========+ 1064 | 11 10100x | 0 | RFRAG - Recoverable Fragment | THIS RFC | 1065 +-------------+------+----------------------------------+-----------+ 1066 | 11 10100x | 1-14 | Unassigned | | 1067 +-------------+------+----------------------------------+-----------+ 1068 | 11 10100x | 15 | Reserved for Experimental Use | RFC 8025 | 1069 +-------------+------+----------------------------------+-----------+ 1070 | 11 10101x | 0 | RFRAG-ACK - RFRAG | THIS RFC | 1071 | | | Acknowledgment | | 1072 +-------------+------+----------------------------------+-----------+ 1073 | 11 10101x | 1-14 | Unassigned | | 1074 +-------------+------+----------------------------------+-----------+ 1075 | 11 10101x | 15 | Reserved for Experimental Use | RFC 8025 | 1076 +-------------+------+----------------------------------+-----------+ 1078 Table 1: Additional Dispatch Value Bit Patterns 1080 10. Acknowledgments 1082 The author wishes to thank Michel Veillette, Dario Tedeschi, Laurent 1083 Toutain, Carles Gomez Montenegro, Thomas Watteyne, and Michael 1084 Richardson for in-depth reviews and comments. Also many thanks to 1085 Roman Danyliw, Peter Yee, Colin Perkins, Tirumaleswar Reddy Konda, 1086 Eric Vyncke, Warren Kumari, Magnus Westerlund, Erik Nordmark, and 1087 especially Benjamin Kaduk and Mirja Kuhlewind for their careful 1088 reviews and for helping through the IETF Last Call and IESG review 1089 process, and to Jonathan Hui, Jay Werb, Christos Polyzois, Soumitri 1090 Kolavennu, Pat Kinney, Margaret Wasserman, Richard Kelsey, Carsten 1091 Bormann, and Harry Courtice for their various contributions in the 1092 long process that lead to this document. 1094 11. Normative References 1096 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1097 "Computing TCP's Retransmission Timer", RFC 6298, 1098 DOI 10.17487/RFC6298, June 2011, 1099 . 1101 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1102 Requirement Levels", BCP 14, RFC 2119, 1103 DOI 10.17487/RFC2119, March 1997, 1104 . 1106 [RFC4944] Montenegro, G., Kushalnagar, N., Hui, J., and D. Culler, 1107 "Transmission of IPv6 Packets over IEEE 802.15.4 1108 Networks", RFC 4944, DOI 10.17487/RFC4944, September 2007, 1109 . 1111 [RFC4919] Kushalnagar, N., Montenegro, G., and C. Schumacher, "IPv6 1112 over Low-Power Wireless Personal Area Networks (6LoWPANs): 1113 Overview, Assumptions, Problem Statement, and Goals", 1114 RFC 4919, DOI 10.17487/RFC4919, August 2007, 1115 . 1117 [RFC6282] Hui, J., Ed. and P. Thubert, "Compression Format for IPv6 1118 Datagrams over IEEE 802.15.4-Based Networks", RFC 6282, 1119 DOI 10.17487/RFC6282, September 2011, 1120 . 1122 [RFC6606] Kim, E., Kaspar, D., Gomez, C., and C. Bormann, "Problem 1123 Statement and Requirements for IPv6 over Low-Power 1124 Wireless Personal Area Network (6LoWPAN) Routing", 1125 RFC 6606, DOI 10.17487/RFC6606, May 2012, 1126 . 1128 [RFC8025] Thubert, P., Ed. and R. Cragie, "IPv6 over Low-Power 1129 Wireless Personal Area Network (6LoWPAN) Paging Dispatch", 1130 RFC 8025, DOI 10.17487/RFC8025, November 2016, 1131 . 1133 [RFC8138] Thubert, P., Ed., Bormann, C., Toutain, L., and R. Cragie, 1134 "IPv6 over Low-Power Wireless Personal Area Network 1135 (6LoWPAN) Routing Header", RFC 8138, DOI 10.17487/RFC8138, 1136 April 2017, . 1138 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1139 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1140 May 2017, . 1142 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1143 (IPv6) Specification", STD 86, RFC 8200, 1144 DOI 10.17487/RFC8200, July 2017, 1145 . 1147 [FRAG-FWD] Watteyne, T., Thubert, P., and C. Bormann, "On Forwarding 1148 6LoWPAN Fragments over a Multihop IPv6 Network", Work in 1149 Progress, Internet-Draft, draft-ietf-6lo-minimal-fragment- 1150 13, 5 March 2020, . 1153 12. Informative References 1155 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1156 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1157 DOI 10.17487/RFC8201, July 2017, 1158 . 1160 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1161 Recommendations Regarding Active Queue Management", 1162 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1163 . 1165 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 1166 Label Switching Architecture", RFC 3031, 1167 DOI 10.17487/RFC3031, January 2001, 1168 . 1170 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1171 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1172 . 1174 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 1175 RFC 2914, DOI 10.17487/RFC2914, September 2000, 1176 . 1178 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1179 of Explicit Congestion Notification (ECN) to IP", 1180 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1181 . 1183 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 1184 Errors at High Data Rates", RFC 4963, 1185 DOI 10.17487/RFC4963, July 2007, 1186 . 1188 [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., 1189 Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, 1190 JP., and R. Alexander, "RPL: IPv6 Routing Protocol for 1191 Low-Power and Lossy Networks", RFC 6550, 1192 DOI 10.17487/RFC6550, March 2012, 1193 . 1195 [RFC6552] Thubert, P., Ed., "Objective Function Zero for the Routing 1196 Protocol for Low-Power and Lossy Networks (RPL)", 1197 RFC 6552, DOI 10.17487/RFC6552, March 2012, 1198 . 1200 [RFC6554] Hui, J., Vasseur, JP., Culler, D., and V. Manral, "An IPv6 1201 Routing Header for Source Routes with the Routing Protocol 1202 for Low-Power and Lossy Networks (RPL)", RFC 6554, 1203 DOI 10.17487/RFC6554, March 2012, 1204 . 1206 [RFC7554] Watteyne, T., Ed., Palattella, M., and L. Grieco, "Using 1207 IEEE 802.15.4e Time-Slotted Channel Hopping (TSCH) in the 1208 Internet of Things (IoT): Problem Statement", RFC 7554, 1209 DOI 10.17487/RFC7554, May 2015, 1210 . 1212 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1213 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1214 March 2017, . 1216 [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using 1217 Explicit Congestion Notification (ECN)", RFC 8087, 1218 DOI 10.17487/RFC8087, March 2017, 1219 . 1221 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 1222 Control Algorithms", BCP 133, RFC 5033, 1223 DOI 10.17487/RFC5033, August 2007, 1224 . 1226 [LWIG-FRAG] 1227 Bormann, C. and T. Watteyne, "Virtual reassembly buffers 1228 in 6LoWPAN", Work in Progress, Internet-Draft, draft-ietf- 1229 lwig-6lowpan-virtual-reassembly-01, 11 March 2019, 1230 . 1233 [FRAG-ILE] Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O., 1234 and F. Gont, "IP Fragmentation Considered Fragile", Work 1235 in Progress, Internet-Draft, draft-ietf-intarea-frag- 1236 fragile-17, 30 September 2019, 1237 . 1240 [I-D.ietf-6tisch-architecture] 1241 Thubert, P., "An Architecture for IPv6 over the TSCH mode 1242 of IEEE 802.15.4", Work in Progress, Internet-Draft, 1243 draft-ietf-6tisch-architecture-28, 29 October 2019, 1244 . 1247 [IEEE.802.15.4] 1248 IEEE, "IEEE Standard for Low-Rate Wireless Networks", 1249 IEEE Standard 802.15.4, DOI 10.1109/IEEE 1250 P802.15.4-REVd/D01, 1251 . 1253 [Kent] Kent, C. and J. Mogul, ""Fragmentation Considered 1254 Harmful", In Proc. SIGCOMM '87 Workshop on Frontiers in 1255 Computer Communications Technology", 1256 DOI 10.1145/55483.55524, August 1987, 1257 . 1260 Appendix A. Rationale 1262 There are a number of uses for large packets in Wireless Sensor 1263 Networks. Such usages may not be the most typical or represent the 1264 largest amount of traffic over the LLN; however, the associated 1265 functionality can be critical enough to justify extra care for 1266 ensuring effective transport of large packets across the LLN. 1268 The list of those usages includes: 1270 Towards the LLN node: Firmware update: For example, a new version 1271 of the LLN node software is downloaded from a system manager 1272 over unicast or multicast services. Such a reflashing 1273 operation typically involves updating a large number of similar 1274 LLN nodes over a relatively short period of time. 1276 Packages of Commands: A number of commands or 1277 a full configuration can be packaged as a single message to 1278 ensure consistency and enable atomic execution or complete roll 1279 back. Until such commands are fully received and interpreted, 1280 the intended operation will not take effect. 1282 From the LLN node: Waveform captures: A number of consecutive 1283 samples are measured at a high rate for a short time and then 1284 transferred from a sensor to a gateway or an edge server as a 1285 single large report. 1287 Data logs: LLN nodes may generate large logs of 1288 sampled data for later extraction. LLN nodes may also generate 1289 system logs to assist in diagnosing problems on the node or 1290 network. 1292 Large data packets: Rich data types might 1293 require more than one fragment. 1295 Uncontrolled firmware download or waveform upload can easily result 1296 in a massive increase of the traffic and saturate the network. 1298 When a fragment is lost in transmission, the lack of recovery in the 1299 original fragmentation system of RFC 4944 implies that all fragments 1300 would need to be resent, further contributing to the congestion that 1301 caused the initial loss, and potentially leading to congestion 1302 collapse. 1304 This saturation may lead to excessive radio interference, or random 1305 early discard (leaky bucket) in relaying nodes. Additional queuing 1306 and memory congestion may result while waiting for a low power next 1307 hop to emerge from its sleeping state. 1309 Considering that RFC 4944 defines an MTU is 1280 bytes and that in 1310 most incarnations (except 802.15.4g) a IEEE Std. 802.15.4 frame can 1311 limit the Link-Layer payload to as few as 74 bytes, a packet might be 1312 fragmented into at least 18 fragments at the 6LoWPAN shim layer. 1313 Taking into account the worst-case header overhead for 6LoWPAN 1314 Fragmentation and Mesh Addressing headers will increase the number of 1315 required fragments to around 32. This level of fragmentation is much 1316 higher than that traditionally experienced over the Internet with 1317 IPv4 fragments. At the same time, the use of radios increases the 1318 probability of transmission loss and Mesh-Under techniques compound 1319 that risk over multiple hops. 1321 Mechanisms such as TCP or application-layer segmentation could be 1322 used to support end-to-end reliable transport. One option to support 1323 bulk data transfer over a frame-size-constrained LLN is to set the 1324 Maximum Segment Size to fit within the link maximum frame size. 1325 Doing so, however, can add significant header overhead to each 1326 802.15.4 frame and cause extraneous acknowledgements across the LLN 1327 compared to the method in this specification. 1329 Appendix B. Requirements 1331 For one-hop communications, a number of Low Power and Lossy Network 1332 (LLN) link-layers propose a local acknowledgment mechanism that is 1333 enough to detect and recover the loss of fragments. In a multihop 1334 environment, an end-to-end fragment recovery mechanism might be a 1335 good complement to a hop-by-hop MAC recovery. This draft introduces 1336 a simple protocol to recover individual fragments between 6FF 1337 endpoints that may be multiple hops away. 1339 The method addresses the following requirements of an LLN: 1341 Number of fragments: The recovery mechanism must support highly 1342 fragmented packets, with a maximum of 32 fragments per packet. 1344 Minimum acknowledgment overhead: Because the radio is half duplex, 1345 and because of silent time spent in the various medium access 1346 mechanisms, an acknowledgment consumes roughly as many resources 1347 as a data fragment. 1349 The new end-to-end fragment recovery mechanism should be able to 1350 acknowledge multiple fragments in a single message and not require 1351 an acknowledgment at all if fragments are already protected at a 1352 lower layer. 1354 Controlled latency: The recovery mechanism must succeed or give up 1355 within the time boundary imposed by the recovery process of the 1356 Upper Layer Protocols. 1358 Optional congestion control: The aggregation of multiple concurrent 1359 flows may lead to the saturation of the radio network and 1360 congestion collapse. 1362 The recovery mechanism should provide means for controlling the 1363 number of fragments in transit over the LLN. 1365 Appendix C. Considerations on Flow Control 1367 Considering that a multi-hop LLN can be a very sensitive environment 1368 due to the limited queuing capabilities of a large population of its 1369 nodes, this draft recommends a simple and conservative approach to 1370 Congestion Control, based on TCP congestion avoidance. 1372 Congestion on the forward path is assumed in case of packet loss, and 1373 packet loss is assumed upon time out. The draft allows controlling 1374 the number of outstanding fragments that have been transmitted but 1375 for which an acknowledgment was not received yet and are still 1376 covered by the ARQ timer. 1378 Congestion on the forward path can also be indicated by an Explicit 1379 Congestion Notification (ECN) mechanism. Though whether and how ECN 1380 [RFC3168] is carried out over the LoWPAN is out of scope, this draft 1381 provides a way for the destination endpoint to echo an ECN indication 1382 back to the fragmenting endpoint in an acknowledgment message as 1383 represented in Figure 4 in Section 5.2. While the support of echoing 1384 the ECN at the reassembling endpoint is mandatory, this specification 1385 only provides a minimalistic behaviour on the fragmenting endpoint, 1386 that is to reset the window to 1 so the fragments are sent and 1387 acknowledged one by one till the end of the datagram. 1389 It must be noted that congestion and collision are different topics. 1390 In particular, when a mesh operates on the same channel over multiple 1391 hops, then the forwarding of a fragment over a certain hop may 1392 collide with the forwarding of the next fragment that is following 1393 over a previous hop but in the same interference domain. This draft 1394 enables end-to-end flow control, but leaves it to the fragmenting 1395 endpoint stack to pace individual fragments within a transmit window, 1396 so that a given fragment is sent only when the previous fragment has 1397 had a chance to progress beyond the interference domain of this hop. 1398 In the case of 6TiSCH [I-D.ietf-6tisch-architecture], which operates 1399 over the TimeSlotted Channel Hopping [RFC7554] (TSCH) mode of 1400 operation of IEEE802.14.5, a fragment is forwarded over a different 1401 channel at a different time and it makes full sense to transmit the 1402 next fragment as soon as the previous fragment has had its chance to 1403 be forwarded at the next hop. 1405 From the standpoint of a source 6LoWPAN endpoint, an outstanding 1406 fragment is a fragment that was sent but for which no explicit 1407 acknowledgment was received yet. This means that the fragment might 1408 be on the path, received but not yet acknowledged, or the 1409 acknowledgment might be on the path back. It is also possible that 1410 either the fragment or the acknowledgment was lost on the way. 1412 From the fragmenting endpoint standpoint, all outstanding fragments 1413 might still be in the network and contribute to its congestion. 1414 There is an assumption, though, that after a certain amount of time, 1415 a frame is either received or lost, so it is not causing congestion 1416 anymore. This amount of time can be estimated based on the round- 1417 trip time between the 6LoWPAN endpoints. For the lack of a more 1418 adapted technique, the method detailed in "Computing TCP's 1419 Retransmission Timer" [RFC6298] may be used for that computation. 1421 The reader is encouraged to read through "Congestion Control 1422 Principles" [RFC2914]. Additionally [RFC7567] and [RFC5681] provide 1423 deeper information on why this mechanism is needed and how TCP 1424 handles Congestion Control. Basically, the goal here is to manage 1425 the number of fragments present in the network; this is achieved by 1426 to reducing the number of outstanding fragments over a congested path 1427 by throttling the sources. 1429 Section 6 describes how the fragmenting endpoint decides how many 1430 fragments are (re)sent before an acknowledgment is required, and how 1431 the fragmenting endpoint adapts that number to the network 1432 conditions. 1434 Author's Address 1436 Pascal Thubert (editor) 1437 Cisco Systems, Inc 1438 Building D 1439 45 Allee des Ormes - BP1200 1440 06254 MOUGINS - Sophia Antipolis 1441 France 1443 Phone: +33 497 23 26 34 1444 Email: pthubert@cisco.com