idnits 2.17.00 (12 Aug 2021) /tmp/idnits34284/draft-irtf-iccrg-rledbat-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 180: '...nders. In particular, the sender MUST...' RFC 2119 keyword, line 181: '... implement [I-D.ietf-tcpm-rfc793bis] and it also MUST implement the...' RFC 2119 keyword, line 182: '... Time Stamp Option as defined in [RFC7323]. Also, the sender SHOULD...' RFC 2119 keyword, line 188: '... The rLEDBAT receiver MUST use an LBE...' RFC 2119 keyword, line 198: '...rLEDBAT receiver SHOULD use the LEDBAT...' (11 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (March 20, 2022) is 55 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 0 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Bagnulo 3 Internet-Draft A. Garcia-Martinez 4 Intended status: Experimental UC3M 5 Expires: September 21, 2022 G. Montenegro 6 Unaffiliated 7 P. Balasubramanian 8 Microsoft 9 March 20, 2022 11 rLEDBAT: receiver-driven Low Extra Delay Background Transport for TCP 12 draft-irtf-iccrg-rledbat-02.txt 14 Abstract 16 This document specifies the rLEDBAT, a set of mechanisms that enable 17 the execution of a less-than-best-effort congestion control algorithm 18 for TCP at the receiver end. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on September 21, 2022. 37 Copyright Notice 39 Copyright (c) 2022 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Motivations for rLEDBAT . . . . . . . . . . . . . . . . . . . 3 56 3. rLEDBAT mechanisms . . . . . . . . . . . . . . . . . . . . . 4 57 3.1. Controlling the receive window . . . . . . . . . . . . . 6 58 3.1.1. Avoiding window shrinking . . . . . . . . . . . . . . 7 59 3.1.2. Window Scale Option . . . . . . . . . . . . . . . . . 8 60 3.2. Measuring delays . . . . . . . . . . . . . . . . . . . . 8 61 3.2.1. Measuring the RTT to estimate the queueing delay . . 9 62 3.2.2. Measuring one way delay to estimate the queueing 63 delay . . . . . . . . . . . . . . . . . . . . . . . . 11 64 3.3. Detecting packet losses and retransmissions . . . . . . . 13 65 4. Security Considerations . . . . . . . . . . . . . . . . . . . 13 66 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 67 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 68 7. Informative References . . . . . . . . . . . . . . . . . . . 14 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 71 1. Introduction 73 LEDBAT (Low Extra Delay Background Transport) [RFC6817] is a 74 congestion-control algorithm that implements a less-than-best-effort 75 (LBE) traffic class. 77 When LEDBAT traffic shares a bottleneck with one or more TCP 78 connections using standard congestion control algorithms such as 79 Cubic [RFC8312] (hereafter standard-TCP for short), it reduces its 80 sending rate earlier and more aggressively than standard-TCP 81 congestion control, allowing standard-TCP traffic to use more of the 82 available capacity. In the absence of competing standard-TCP 83 traffic, LEDBAT aims to make an efficient use of the available 84 capacity, while keeping the queuing delay within predefined bounds. 86 LEDBAT reacts both to packet loss and to variations in delay. 87 Regarding to packet loss, LEDBAT reacts with a multiplicative 88 decrease, similar to most TCP congestion controllers. Regarding 89 delay, LEDBAT aims for a target queueing delay. When the measured 90 current queueing delay is below the target, LEDBAT increases the 91 sending rate and when the delay is above the target, it reduces the 92 sending rate. LEDBAT estimates the queuing delay by subtracting the 93 measured current one-way delay from the estimated base one-way delay 94 (i.e. the one-way delay in the absence of queues). 96 The LEDBAT specification [RFC6817] defines the LEDBAT congestion- 97 control algorithm, implemented in the sender to control its sending 98 rate. LEDBAT is specified in a protocol and layer agnostic manner. 100 LEDBAT++ [I-D.irtf-iccrg-ledbat-plus-plus] is also an LBE congestion 101 control algorithm which is inspired in LEDBAT while addressing 102 several problems identified with the original LEDBAT specification. 103 In particular the differences between LEDBAT and LEDBAT++ include: i) 104 LEDBAT++ uses the round-trip-time (RTT) (as opposed to the one way 105 delay used in LEDBAT) to estimate the queuing delay; ii) LEDBAT++ 106 uses an Additive Increase/Multiplicative Decrease algorithm to 107 achieve inter-LEDBAT++ fairness and avoid the late-comer advantage 108 observed in LEDBAT; iii) LEDBAT++ performs periodic slowdowns to 109 improve the measurement of the base delay; iv) LEDBAT++ is defined 110 for TCP. 112 In this note, we describe rLEDBAT, a set of mechanisms that enable 113 the execution of an LBE delay-based congestion control algorithm such 114 as LEDBAT or LEDBAT++ in the receiver end of a TCP connection. 116 2. Motivations for rLEDBAT 118 rLEDBAT enables new use cases and new deployment models, fostering 119 the use of LBE traffic and benefitting the global Internet by 120 improving overall allocation of resources. The following scenarios 121 are enabled by rLEDBAT: 123 Content Delivery Networks and more sophisticated file distribution 124 scenarios: Consider the case where the source of a file to be 125 distributed (e.g., a software developer that wishes to distribute 126 a software update) would prefer to use LBE and it enables LEDBAT/ 127 LEDBAT++ in the servers containing the source file. However, 128 because the file is being distributed through a CDN which 129 surrogates do not support LBE congestion control, the result is 130 that the file transfers, originated from CDN surrogates will not 131 be using LBE. Interestingly enough, in the case of the software 132 update, the developer may also control the software performing the 133 download in the client, the receiver of the file, but because 134 current LEDBAT/LEDBAT++ are sender-based algorithms, controlling 135 the client is not enough to enable LBE congestion control in the 136 communication. rLEDBAT would enable the use of LBE traffic class 137 for file distribution in this setup. 139 Interference from proxies and other middleboxes: Proxies and other 140 middleboxes are a commonplace in the Internet. For instance, in 141 the case of mobile networks, proxies are frequently used. In the 142 case of enterprise networks, it is common to deploy corporate 143 proxies for filtering and firewalling. In the case of satellite 144 links, Performance Enhancement Proxies (PEPs) are deployed to 145 mitigate the effect of the long delay in TCP connection. These 146 proxies terminate the TCP connection on both ends and prevent the 147 use of LBE congestion control in the segment between the proxy and 148 the sink of the content, the client. By enabling rLEDBAT, clients 149 would be able to enable LBE traffic between them and the proxy. 151 Receiver-defined preferences. It is frequent that the bottleneck 152 of the communication is the access link. This is particularly 153 true in the case of mobile devices. It is then especially 154 relevant for mobile devices to properly manage the capacity of the 155 access link. With current technologies, it is possible for the 156 mobile device to use different congestion control algorithms 157 expressing different preferences for the traffic. For instance, a 158 device can choose to use standard-TCP for some traffic and to use 159 LEDBAT/LEDBAT++ for other traffic. However, this would only 160 affect the outgoing traffic since both standard-TCP and LEDBAT/ 161 LEDBAT++ are sender-driven. The mobile device has no means to 162 manage the traffic in the down-link, which is in most cases, the 163 communication bottleneck for a typical eye-ball end-user. rLEDBAT 164 enables the mobile device to selectively use LBE traffic class for 165 some of the incoming traffic. For instance, by using rLEDBAT, a 166 user can use regular standard-TCP/UDP for video stream (e.g., 167 Youtube) and use rLEDBAT for other background file download. 169 3. rLEDBAT mechanisms 171 rLEDBAT provides the mechanisms to implement an LBE congestion 172 control algorithm at the receiver-end of a TCP connection. The 173 rLEDBAT receiver controls the sender's rate through the Receive 174 Window announced to the receiver in the TCP header. 176 rLEDBAT assumes that the sender is a standard TCP sender. rLEDBAT 177 does not require any rLEDBAT-specific modifications to the TCP 178 sender. The envisioned deployment model for rLEDBAT is that the 179 clients implement rLEDBAT and this enable rLEDBAT in communications 180 with existent standard TCP senders. In particular, the sender MUST 181 implement [I-D.ietf-tcpm-rfc793bis] and it also MUST implement the 182 Time Stamp Option as defined in [RFC7323]. Also, the sender SHOULD 183 implement some of the standard congestion control mechanisms, such as 184 Cubic [RFC8312] or New Reno [RFC5681]. 186 rLEDBAT does not defines a new congestion control algorithm. The LBE 187 congestion control algorithm executed in the rLEDBAT receiver is 188 defined in other documents. The rLEDBAT receiver MUST use an LBE 189 congestion control algorithm. Because rLEDBAT assumes a standard TCP 190 sender, the sender will be using a "best effort" congestion control 191 algorithm (such as Cubic or New Reno). Since rLEDBAT uses the 192 Receive Window to control the sender's rate and the sender calculates 193 the sender's window as the minimum of the Receive window and the 194 congestion window, rLEDBAT will only be effective as long as the 195 congestion control algorithm executed in the receiver yields a 196 smaller window than the one calculated by the sender. This is 197 normally the case when the receiver is using an LBE congestion 198 control algorithm. The rLEDBAT receiver SHOULD use the LEDBAT 199 congestion control algorithm [RFC6817] or the LEDBAT++ congestion 200 control algorithm [I-D.irtf-iccrg-ledbat-plus-plus]. The rLEDBAT MAY 201 use other LBE congestion control algorithms defined elsewhere. 202 Irrespectively of which congestion control algorithm is executed in 203 the receiver, an rLEDBAT connection will never be more aggressive 204 than standard TCP since it is always bounded by the congestion 205 control algorithm executed at the sender. 207 rLEDBAT is essentially composed of three types of mechanisms, namely, 208 those that provide the means to measure the packet delay (either the 209 round trip time or the one way delay, depending on the selected 210 algorithm), mechanisms to detect packet loss and the means to 211 manipulate the Receive Window to control the sender's rate. The 212 former provide input to the LBE congestion control algorithm while 213 the latter uses the congestion window computed by the LBE congestion 214 control algorithm to manipulate the Receive window, as depicted in 215 the figure. 217 +------------------------------------------+ 218 | TCP receiver | 219 | +-----------------+ | 220 | | +------------+ | | 221 | +---------------------| RTT | | | 222 | | | | Estimation | | | 223 | | | +------------+ | | 224 | | | | | 225 | | | +------------+ | | 226 | | +--------------| Loss, RTX | | | 227 | | | | | Detection | | | 228 | | | | +------------+ | | 229 | v v | | | 230 | +----------------+ | | | 231 | | LBE Congestion | | rLEDBAT | | 232 | | Control | | | | 233 | +----------------+ | | | 234 | | | +------------+ | | 235 | | | | RCV-WND | | | 236 | +---------------->| Control | | | 237 | | +------------+ | | 238 | +-----------------+ | 239 +------------------------------------------+ 241 Figure 1: The rLEDBAT architecture. 243 We describe each of the rLEDBAT components next. 245 3.1. Controlling the receive window 247 rLEDBAT uses the Receive Window (RCV.WND) of TCP to enable the 248 receiver to control the sender's rate. [I-D.ietf-tcpm-rfc793bis] 249 defines that the RCV.WND is used to announce the available receive 250 buffer to the sender for flow control purposes. In order to avoid 251 confusion, we will call fc.WND the value that a standard RFC793bis 252 TCP receiver calculates to set in the receive window for flow control 253 purposes. We call rl.WND the window value calculated by rLEDBAT 254 algorithm and we call RCV.WND the value actually included in the 255 Receive Window field of the TCP header. For a RFC793bis receiver, 256 RCV.WND == fc.WND. 258 In the case of rLEDBAT receiver, the rLEDBAT receiver MUST NOT set 259 the RCV.WND to a value larger than fc.WND and it SHOULD set the 260 RCV.WND to the minimum of rl.WND and fc.WND, honoring both. 262 When using rLEDBAT, two congestion controllers are in action in the 263 flow of data from the sender to the receiver, namely, the congestion 264 control algorithm of TCP in the sender side and the LBE congestion 265 control algorithm executed in the receiver and conveyed to the sender 266 through the RCV.WND. In the normal TCP operation, the sender uses 267 the minimum of the congestion window cwnd and the receiver window 268 RCV.WND to calculate the sender's window SND.WND. This is also true 269 for rLEDBAT, as the sender is a regular TCP sender. This guarantees 270 that the rLEDBAT flow will never transmit more aggressively than a 271 TCP flow, as the sender's congestion window limits the sending rate. 272 Moreover, because a LBE congestion control algorithm such as LEDBAT/ 273 LEDBAT++ is designed to react earlier and more aggressively to 274 congestion than regular TCP congestion control, the rl.WND contained 275 in the RCV.WND field of TCP will be in general smaller than the 276 congestion window calculated by the TCP sender, implying that the 277 rLEDBAT congestion control algorithm will be effectively controlling 278 the sender's window. 280 In summary, the sender's window is: SND.WND = min(cwnd, rl.WND, 281 fc.WND) 283 3.1.1. Avoiding window shrinking 285 The LEDBAT/LEDBAT++ algorithm executed in a rLEDBAT receiver 286 increases or decreases the rl.WND according to congestion signals 287 (variations on the estimations of the queueing delay and packet 288 loss). If the new congestion window is smaller than the current one 289 then directly announcing it in the RCV.WND may result in shrinking 290 the window, i.e., moving the right window edge to the left. 291 Shrinking the window is discouraged as per [I-D.ietf-tcpm-rfc793bis], 292 as it may cause unnecessary packet loss and performance penalty. To 293 be consistent with [I-D.ietf-tcpm-rfc793bis], the rLEDBAT receiver 294 SHOULD NOT shrink the receive window. 296 In order to avoid window shrinking, upon the reception of a data 297 packet, the announced window can be reduced in the number of bytes 298 contained in the packet at most. This may fall short to honor the 299 new calculated value of the rl.WND. So, in order to reduce the 300 window as dictated by the rLEDBAT algorithm, the receiver will 301 progressively reduce the advertised RCV.WND, always honoring that the 302 reduction is less or equal than the received bytes, until the target 303 window determined by the rLEDBAT algorithm is reached. This implies 304 that it may take up to one RTT for the rLEDBAT receiver to drain 305 enough in-flight bytes to completely close its receive window without 306 shrinking it. This is more than sufficient to honor the window 307 output from the LEDBAT/LEDBAT++ algorithms since they only allows to 308 perform at most one multiplicative decrease per RTT. 310 3.1.2. Window Scale Option 312 The Window Scale (WS) option [RFC7323] is a mean to increase the 313 maximum window size permitted by the Receive Window. The use of the 314 WS option implies that the changes in the window are expressed in the 315 units resulting of the WS option used in the TCP connection. This 316 means that the rLEDBAT client will have to accumulate the increases 317 resulting from the different received packets, and only convey a 318 change in the window when the accumulated sum of increases is equal 319 or higher than one unit used to express the receive window according 320 to the WS option in place for the TCP connection. 322 Changes in the receive window that are smaller than 1 MSS are 323 unlikely to have any immediate impact on the sender's rate, as usual 324 TCP segmentation practice results in sending full segments (i.e., 325 segments of size equal to the MSS). So, accumulating changes in the 326 receive window until completing a full MSS in the sender or in the 327 receiver makes little difference. 329 Current WS option specification [RFC7323] defines that allowed values 330 for the WS option are between 0 and 14. Assuming a MSS around 1500 331 bytes, WS option values between 0 and 11 result in the receive window 332 being expressed in units that are about 1 MSS or smaller. So, WS 333 option values between 0 and 11 have no impact in rLEDBAT. 335 WS option values higher than 11 can affect the dynamics of rLEDBAT, 336 since control may become too coarse (e.g., with WS of 14, a change in 337 one unit of the receive window implies a change of 10 MSS in the 338 effective window). 340 For the above reasons, the rLEDBAT client SHOULD set WS option values 341 lower than 12. Additional experimentation is required to explore the 342 impact of larger WS values in rLEDBAT dynamics. 344 Note that the recommendation for rLEDBAT to set the WS option value 345 to lower values does not precludes the communication with servers 346 that set the WS option values to larger values, since the WS option 347 value used is set independently for each direction of the TCP 348 connection. 350 3.2. Measuring delays 352 Both LEDBAT and LEDBAT++ measure base and current delays to estimate 353 the queueing delay. LEDBAT uses the one way delay while LEDBAT++ 354 uses the round trip time. In the next sections we describe how 355 rLEDBAt mechanisms enable the receiver to measure the one way delay 356 or the round trip time, whatever needed depending on the congestion 357 control algorithm used. 359 3.2.1. Measuring the RTT to estimate the queueing delay 361 LEDBAT++ uses the round trip time (RTT) to estimate the queueing 362 delay. In order to estimate the queueing delay using the RTT, the 363 rLEDBAT receiver estimates the base RTT (i.e., the constant 364 components of the RTT) and also measures the current RTT. By 365 subtracting these two values, we obtain the queuing delay to be used 366 by the rLEDBAT controller. 368 LEDBAT++ discovers the base RTT (RTTb) by taking the minimum value of 369 the measured RTTs over a period of time. The current RTT (RTTc) is 370 estimated using a number of recent samples and applying a filter, 371 such as the minimum (or the mean) of the last k samples. Using the 372 RTT to estimate the queueing delay has a number of shortcomings and 373 difficulties that we discuss next. 375 The queuing delay measured using the RTT includes also the queueing 376 delay experienced by the return packets in the direction from the 377 rLEDBAT receiver to the sender. This is a fundamental limitation of 378 this approach. The impact of this error is that the rLEDBAT 379 controller will also react to congestion in the reverse path 380 direction which results in an even more conservative mechanism. 382 In order to measure the RTT, the rLEDBAT client MUST enable the Time 383 Stamp (TS) option [RFC7323]. By matching the TSVal value carried in 384 outgoing packets with the TSecr value observed in incoming packets, 385 it is possible to measure the RTT. This allows the rLEDBAT receiver 386 to measure the RTT even if it is acting as a pure receiver. In a 387 pure receiver there is no data flowing from the rLEDBAT receiver to 388 the sender, making impossible to match data packets with 389 acknowledgements packets to measure the RTT, as it is usually done in 390 TCP for other purposes. 392 Depending on the frequency of the local clock used to generate the 393 values included in the TS option, several packets may carry the same 394 TSVal value. If that happens, the rLEDBAT receiver will be unable to 395 match the different outgoing packets carrying the same TSVal value 396 with the different incoming packets carrying also the same TSecr 397 value. However, it is not necessary for rLEDBAT to use all packets 398 to estimate the RTT and sampling a subset of in-flight packets per 399 RTT is enough to properly assess the queueing delay. The RTT MUST 400 then be calculated as the time since the first packet with a given 401 TSVal was sent and the first packet that was received with the same 402 value contained in the TSecr. Other packets with repeated TS values 403 SHOULD NOT be used for the RTT calculation. 405 Several issues must be addressed in order to avoid an artificial 406 increase of the observed RTT. Different issues emerge depending 407 whether the rLEDBAT capable host is sending data packets or pure ACKs 408 to measure the RTT. We next consider the issues separately. 410 3.2.1.1. Measuring RTT sending pure ACKs 412 In this scenario, the rLEDBAT node (node A) sends a pure ACK to the 413 other endpoint of the TCP connection (node B), including the TS 414 option. Upon the reception of the TS Option, host B will copy the 415 value of the TSVal into the TSecr field of the TS option and include 416 that option into the next data packet towards host A. However, there 417 are two reasons why B may not send a packet immediately back to A, 418 artificially increasing the measured RTT. The first reason is when A 419 has no data to send. The second is when A has no available window to 420 put more packets in-flight. We describe next how each of these cases 421 is addressed. 423 The case where the host B has no data to send when it receives the 424 pure Acknowledgement is expected to be rare in the rLEDBAT use cases. 425 rLEDBAT will be used mostly for background file transfers so the 426 expected common case is that the sender will have data to send 427 throughout the lifetime of the communication. However, if, for 428 example, the file is structured in blocks of data, it may be the case 429 that seldom, the sender will have to wait until the next block is 430 available to proceed with the data transfer and momentarily lack of 431 data to send. To address this situation, the filter used by the 432 congestion control algorithm executed in the receiver SHOULD discard 433 the larger samples (e.g. a min filter would achieve this) when 434 measuring the RTT using pure ACK packets. 436 The limitation of available sender's window to send more packets can 437 come either from the TCP congestion window in host B or from the 438 announced receive window from the rLEDBAT in host A. Normally, the 439 receive window will be the one to limit the sender's transmission 440 rate, since the LBE congestion control algorithm used by the rLEDBAT 441 node is designed to be more restrictive on the sender's rate than 442 standard-TCP. If the limiting factor is the congestion window in the 443 sender, it is less relevant if rLEDBAT further reduces the receive 444 window due to a bloated RTT measurement, since the rLEDBAT is not 445 actively controlling the sender's rate. Nevertheless, the proposed 446 approach to discard larger samples would also address this issue. 448 To address the case in which the limiting factor is the receive 449 window announced by rLEDBAT, the congestion control algorithm at the 450 receiver SHOULD discard the RTT measurements done using pure ACK 451 packets while reducing the window and avoid including bloated samples 452 in the queueing delay estimation. The rLEDBAT receiver is aware 453 whether a given TSVal value was sent in a pure ACK packet where the 454 window was reduced, and if so, it can discard the corresponding RTT 455 measurement. 457 3.2.1.2. Measuring the RTT sending data packets 459 In the case that the rLEDBAT node is sending data packets and 460 matching them with pure ACKs to measure the RTT, a factor that can 461 artificially increase the RTT measured is the presence of delayed 462 Acknowledgements. According to the TS option generation rules 463 [RFC7323], the value included in the TSecr for a delayed ACK is the 464 one in the TSVal field of the earliest unacknowledged segment. This 465 may artificially increase the measured RTT. 467 If both endpoints of the connection are sending data packets, 468 Acknowledgments are piggybacked into the data packets and they are 469 not delayed. Delayed ACKs only increase the RTT measurement in the 470 case that the sender has no data to send. Since the expected use 471 case for rLEDBAT is that the sender will be sending background 472 traffic to the rLEDBAT receiver, the cases where delayed ACKs 473 increase the measured RTT are expected to be rare. 475 Nevertheless, for those measurements done using data packets sent by 476 the rLEDBAT node matching pure ACKs sent from the other endpoint of 477 the connection, they will result in an increased RTT. The additional 478 increase in the measured RTT will range between the transmission 479 delay of on packet and 500 ms. The reason for this is that delayed 480 ACKs are generated every second data packet received and not delayed 481 more than 500 ms according to [I-D.ietf-tcpm-rfc793bis]. The rLEDBAT 482 receiver MAY discard the RTT measurements done using data packets 483 from the rLEBDAT receiver and matching pure ACKs, especially if it 484 has recent measurements done using other packet combinations.Also, 485 applying a filter that discard larger samples would also address this 486 issue (e.g. a min filter). 488 3.2.2. Measuring one way delay to estimate the queueing delay 490 The LEDBAT algorithm uses the one-way delay of packets as input. A 491 TCP receiver can measure the delay of incoming packets directly (as 492 opposed to the sender-based LEDBAT, where the receiver measures the 493 one-way delay and needs to convey it to the sender). 495 In the case of TCP, the receiver can use the Time Stamp option to 496 measure the one way delay by subtracting the time stamp contained in 497 the incoming packet from the local time at which the packet has 498 arrived. As noted in [RFC6817] the clock offset between the clock of 499 the sender and the clock in the receiver does not affect the LEDBAT 500 operation, since LEDBAT uses the difference between the base one way 501 delay and the current one way delay to estimate the queuing delay, 502 effectively canceling the clock offset error in the queueing delay 503 estimation. There are however two other issues that the rLEDBAT 504 receiver needs to take into account in order to properly estimate the 505 one way delay, namely, the units in which the received timestamps are 506 expressed and the clock skew. We address them next. 508 In order to measure the one way delay using TCP timestamps, the 509 rLEDBAT receiver needs to discover the units in which the values of 510 the TS option are expressed and second, to account for the skew 511 between the two clocks of the endpoints of the TCP connection. Note 512 that a mismatch of 100 ppm (parts per million) in the estimation at 513 the receiver of the clock rate of the sender accounts for 6 ms of 514 variation per minute in the measured delay for a communication, just 515 one order of magnitude below the target set for controlling the rate 516 by rLEDBAT. Typical skew for untrained clocks is reported to be 517 around 100-200 ppm [RFC6817]. 519 In order to learn both the TS units and the clock skew, the rLEDBAT 520 receiver compares how much local time has elapsed between the sender 521 has issued two packets with different TS values. By comparing the 522 local time difference and the TS value difference, the receiver can 523 assess the TS units and relative clock skews. In order for this to 524 be accurate, the packets carrying the different TS values should 525 experience equal (or at least similar delay) when traveling from the 526 sender to the receiver, as any difference in the experienced delays 527 would introduce error in the unit/skew estimation. One possible 528 approach is to select packets that experienced the minimum delay 529 (i.e. close to zero queueing delay) to make the estimations. 531 An additional difficulty regarding the estimation of the TS units and 532 clock skew in the context of (r)LEDBAT is that the LEDBAT congestion 533 controller actions directly affect the (queueing) delay experienced 534 by packets. In particular, if there is an error in the estimation of 535 the TS units/skew, the LEDBAT controller will attempt to compensate 536 it by reducing/increasing the load. The result is that the LEDBAT 537 operation interferes with the TS units/clock skew measurements. 538 Because of this, measurements are more accurate when there is no 539 traffic in the connection (in addition to the packets used for the 540 measurements). The problem is that the receiver is unaware if the 541 sender is injecting traffic at any point in time, and 542 opportunistically seize quiet intervals to preform measurements. The 543 receiver can however, force periodic slowdowns, reducing the 544 announced receive window to a few packets and perform the 545 measurements then. 547 It is possible for the rLEDBAT receiver to perform multiple 548 measurements to assess both the TS units and the relative clock skew 549 during the lifetime of the connection, in order to obtain more 550 accurate results. Clock skew measurements are more accurate if the 551 time period used to discover the skew is larger, as the impact of the 552 skew becomes more apparent. Due to the same logic, accurately 553 learning the clock skew is more pressing as the time separating the 554 two delays to compare increases. It is a reasonable approach for the 555 rLEDBAT receiver to perform an early discovery of the TS units (and 556 the clock skew) using the first few packets of the TCP connection and 557 then improve the accuracy of the TS units/clock skew estimation using 558 periodic measurements later in the lifetime of the connection. 560 3.3. Detecting packet losses and retransmissions 562 The rLEDBAT receiver is capable of detecting retransmitted packets in 563 the following way. We call RCV.HGH the highest sequence number 564 correspondent to a received byte of data (not assuming that all bytes 565 with smaller sequence numbers have been received already, there may 566 be holes) and we call TSV.HGH the TSVal value corresponding to the 567 segment in which that byte was carried. SEG.SEQ stands for the 568 sequence number of a newly received segment and we call TSV.SEQ the 569 TSVal value of the newly received segment. 571 If SEG.SEQ < RCV.HGH and TSV.SEQ > TSV.HGH then the newly received 572 segment is a retransmission. This is so because the newly received 573 segment was generated later than another already received segment 574 which contained data with a larger sequence number. This means that 575 this segment was lost and was retransmitted. 577 The proposed mechanism to detect retransmissions at the receiver 578 fails when there are window tail drops. If all packets in the tail 579 of the window are lost, the receiver will not be able to detect a 580 mismatch between the sequence numbers of the packets and the order of 581 the timestamps. In this case, rLEDBAT will not react to losses but 582 the TCP congestion controller at the sender will, most likely 583 reducing its window to 1MSS and take over the control of the sending 584 rate, until slow start ramps up and catches the current value of the 585 rLEDBAT window. 587 4. Security Considerations 589 5. IANA Considerations 591 6. Acknowledgements 593 This work was supported by the EU through the NGI Pointer RIM project 594 and previously by the H2020 5G-RANGE project and by the Spanish 595 Ministry of Economy and Competitiveness through the 5G-City project 596 (TEC2016-76795-C6-3-R). 598 7. Informative References 600 [I-D.ietf-tcpm-rfc793bis] 601 Eddy, W. M., "Transmission Control Protocol (TCP) 602 Specification", draft-ietf-tcpm-rfc793bis-28 (work in 603 progress), March 2022. 605 [I-D.irtf-iccrg-ledbat-plus-plus] 606 Balasubramanian, P., Ertugay, O., and D. Havey, "LEDBAT++: 607 Congestion Control for Background Traffic", draft-irtf- 608 iccrg-ledbat-plus-plus-01 (work in progress), August 2020. 610 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 611 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 612 . 614 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 615 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 616 DOI 10.17487/RFC6817, December 2012, 617 . 619 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 620 Scheffenegger, Ed., "TCP Extensions for High Performance", 621 RFC 7323, DOI 10.17487/RFC7323, September 2014, 622 . 624 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 625 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 626 RFC 8312, DOI 10.17487/RFC8312, February 2018, 627 . 629 Authors' Addresses 631 Marcelo Bagnulo 632 UC3M 634 Email: marcelo@it.uc3m.es 636 Alberto Garcia-Martinez 637 UC3M 639 Email: alberto@it.uc3m.es 640 Gabriel Montenegro 641 Unaffiliated 643 Email: g.e.montenegro@hotmail.com 645 Praveen Balasubramanian 646 Microsoft 648 Email: pravb@microsoft.com