idnits 2.17.00 (12 Aug 2021) /tmp/idnits19650/draft-sridharan-tcpm-ctcp-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 20. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 722. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 698. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 705. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 711. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page == The page length should not exceed 58 lines per page, but there was 14 longer pages, the longest (page 2) being 63 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 16 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. (A line matching the expected section header was found, but with an unexpected indentation: ' 1. Introduction' ) ** The document seems to lack a Security Considerations section. (A line matching the expected section header was found, but with an unexpected indentation: ' 8. Security Considerations' ) ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) (A line matching the expected section header was found, but with an unexpected indentation: ' 9. IANA Considerations' ) ** There are 441 instances of too long lines in the document, the longest one being 11 characters in excess of 72. ** The abstract seems to contain references ([AFRICA,FAST], [CUBIC], [VEGAS], [RFC2581,RFC3649], [RFC2581], [MSWRK], [CTCPI06], [RFC3649], [AFRICA], [PADHYE], [SLAC], [FAST], [BAINF01], [CTCPP06], [RFC2988], [CTCPT], [RFC2581,PADHYE]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 231: '...omputed. Basertt MUST be uninitialized...' RFC 2119 keyword, line 279: '... = 1 and k = 0.75. Note that dwnd MUST...' RFC 2119 keyword, line 289: '.... Note that dwnd MUST never be negativ...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Missing reference section? 'RFC2581' on line 607 looks like a reference -- Missing reference section? 'RFC3649' on line 647 looks like a reference -- Missing reference section? 'PADHYE' on line 640 looks like a reference -- Missing reference section? 'AFRICA' on line 612 looks like a reference -- Missing reference section? 'FAST' on line 633 looks like a reference -- Missing reference section? 'VEGAS' on line 657 looks like a reference -- Missing reference section? 'RFC2988' on line 644 looks like a reference -- Missing reference section? 'BAINF01' on line 616 looks like a reference -- Missing reference section? 'CTCPI06' on line 603 looks like a reference -- Missing reference section? 'CTCPP06' on line 619 looks like a reference -- Missing reference section? 'CTCPT' on line 625 looks like a reference -- Missing reference section? 'CUBIC' on line 629 looks like a reference -- Missing reference section? 'SLAC' on line 652 looks like a reference -- Missing reference section? 'MSWRK' on line 637 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 21 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Sridharan 3 Internet Draft Microsoft 4 Intended status: Experimental K. Tan 5 November 3, 2008 Microsoft Research 6 Expires: April 2009 D. Bansal 7 D. Thaler 8 Microsoft 10 Compound TCP: A New TCP Congestion Control for High-Speed and Long 11 Distance Networks 13 draft-sridharan-tcpm-ctcp-02.txt 15 Status of this Memo 17 By submitting this Internet-Draft, each author represents that any 18 applicable patent or other IPR claims of which he or she is aware 19 have been or will be disclosed, and any of which he or she becomes 20 aware will be disclosed, in accordance with Section 6 of BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on April 3, 2009. 40 Copyright Notice 42 Copyright (C) The IETF Trust (2007). 44 Abstract 46 Compound TCP (CTCP) is a modification to TCP's congestion control 47 mechanism for use with TCP connections with large congestion windows. 48 This document describes the Compound TCP algorithm in detail, and 49 solicits experimentation and feedback from the wider community. The 50 key idea behind CTCP is to add a scalable delay-based component to the 51 standard TCP's loss-based congestion control. The sending rate of CTCP 52 is controlled by both loss and delay components. The delay-based 53 component has a scalable window increasing rule that not only 54 efficiently uses the link capacity, but on sensing queue build up, 55 proactively reduces the sending rate. 57 Table of Contents 59 1. Introduction...................................................3 60 2. Design Goals...................................................5 61 3. Compound TCP Control Law.......................................5 62 4. Compound TCP Response Function.................................8 63 5. Automatic Selection of Gamma...................................9 64 6. Implementation Issues.........................................11 65 7. Deployment Issues.............................................12 66 8. Security Considerations.......................................13 67 9. IANA Considerations...........................................13 68 10. Conclusions..................................................13 69 11. Acknowledgments..............................................14 70 12. References...................................................15 71 12.1. Normative References.......................................15 72 12.2. Informative References.....................................15 73 Author's Addresses...............................................16 74 Intellectual Property Statement..................................17 75 Disclaimer of Validity...........................................17 77 1. Introduction 79 In this document, we collectively refer to any TCP congestion control 80 algorithm that employs a linear increase function for congestion 81 control, including TCP Reno and all its variants as Standard TCP. This 82 document describes Compound TCP, a modification to TCP's congestion 83 control mechanism for fast, long-distance networks. The standard TCP 84 congestion avoidance algorithm employs an additive increase and 85 multiplicative decrease (AIMD) scheme, which employs a conservative 86 linear growth function for increasing the congestion window and 87 multiplicative decrease function on encountering a loss. For a high- 88 speed and long delay network, it takes standard TCP an unreasonably 89 long time to recover the sending rate after a single loss event 90 [RFC2581, RFC3649]. Moreover, it is well-known now that in a steady- 91 state environment, with a packet loss rate of p, the current standard 92 TCP's average congestion window is inversely proportional to the square 93 root of the packet loss rate [RFC2581,PADHYE]. Therefore, it requires 94 an extremely small packet loss rate to sustain a large window. As an 95 example, Floyd et al. [RFC3649], pointed out that on a 10Gbps link 96 with 100ms delay, it will roughly take one hour for a standard TCP flow 97 to fully utilize the link capacity, if no packet is lost or corrupted. 98 This one hour error-free transmission requires a packet loss rate of 99 around 10^-11 with 1500-byte size packets (one packet loss over 100 2,600,000,000 packet transmission!), which is not practical in today's 101 networks. 103 There are several proposals to address this fundamental limitation of 104 TCP. One straightforward way to overcome this limitation is to modify 105 TCP's increase/decrease rule in its congestion avoidance stage. More 106 specifically, in the absence of packet loss, the sender increases 107 congestion window more quickly and decreases it more gently upon a 108 packet loss. In a mixed network environment, the aggressive behavior of 109 such an approach may severely degrade the performance of regular TCP 110 flows whenever the network path is already highly utilized. When an 111 aggressive high-speed variant flow traverses the bottleneck link with 112 other standard TCP flows, it may increase its own share of bandwidth by 113 reducing the throughput of other competing TCP flows. As a result, the 114 aggressive variants will cause much more self-induced packet losses on 115 bottleneck links, and push back the throughput of the regular TCP 116 flows. 118 Then there is the class of high-speed protocols which use variances in 119 RTT as a congestion indicator (e.g., [AFRICA,FAST]). Such delay-based 120 approaches are more-or-less derived from the seminal work of TCP-Vegas 121 [VEGAS]. An increase in RTT is considered an early indicator of 122 congestion, and the sending rate is reduced to avoid buffer overflow. The 123 problem in this approach comes when delay-based and loss-based flows 124 share the same bottleneck link. While the delay-based flows respond to 125 increases in RTT by cutting its sending rate, the loss-based flows 126 continue to increase their sending rate. As a result a delay-based flow 127 obtains far less bandwidth than its fair share. This weakness is hard to 128 remedy for purely delay-based approaches. 130 The design of Compound TCP is to satisfy the efficiency requirement and 131 the TCP friendliness requirement simultaneously. The key idea is that 132 if the link is under-utilized, the high-speed protocol should be 133 aggressive and increase the sending rate quickly. However, once the 134 link is fully utilized, being aggressive will not only adversely affect 135 standard TCP flows but will also cause instability. As noted above, 136 delay-based approaches already have the nice property of adjusting 137 aggressiveness based on the link utilization, which is observed by the 138 end-systems as an increase in RTT. CTCP incorporates a scalable delay- 139 based component into the standard TCP's congestion avoidance algorithm. 140 Using the delay component as an automatic tuning knob, CTCP is scalable 141 yet TCP friendly. 143 2. Design Goals 145 The design of CTCP is motivated by the following requirements: 147 o Improve throughput by efficiently using the spare capacity in 148 the network 149 o Good intra-protocol fairness when competing with flows that 150 have different RTTs 151 o Should not impact the performance of standard TCP flows sharing 152 the same bottleneck 153 o No additional feedback or support required from the network 155 CTCP can efficiently use the network's resources and achieve high link 156 utilization. The aggressiveness can be controlled by adopting a rapid 157 increase rule in the delay-based component. We choose CTCP to have 158 similar aggressiveness as HighSpeed TCP [RFC3649]. Our design choice is 159 motivated by the fact that HSTCP has been tested to be aggressive 160 enough in real world networks while at the same time, not exhibiting any 161 severe issues in deployment or testing experiences. and is now an 162 experimental IETF RFC. We also wanted an upper bound on the amount of 163 unfairness to standard TCP flows. However, as shown later, CTCP is able 164 to maintain TCP friendliness under high statistical multiplexing and also 165 while traversing poorly buffered links. CTCP has similar or, in some 166 cases, improved RTT fairness compared to standard TCP. As we will 167 demonstrate later this is due to the fact that the amount of backlogged 168 packets for a connection is independent of the RTT of the connection. 169 Even though CTCP does not require any feedback from the network, CTCP 170 works well in ECN capable environments. There is also no expectation on 171 the queuing algorithm deployed in the routers. 173 As is the case with most high-speed variants today, CTCP does not 174 modify the slow-start behavior of standard TCP. We agree to the belief 175 that ramping-up faster than slow-start without additional information 176 from the network can be harmful. During slow start, CTCP uses standard 177 TCP congestion window (cwnd) and does not use any additional delay 178 component. Just like standard TCP, it exits slow start when either a loss 179 happens or congestion window (cwnd) reaches ssthresh. 181 Similar to HSTCP, to ensure TCP compatibility, CTCP's scalable 182 component uses the same response function as Standard TCP when the 183 current congestion window is at most Low_Window. CTCP sets Low_Window 184 to 38 MSS-sized segments, corresponding to a packet drop rate of 10^-3 185 for TCP. 187 3. Compound TCP Control Law 189 CTCP modifies Standard TCP's loss-based control law with a scalable 190 delay-based component. To do so, a new state variable is introduced in 191 current TCP Control Block (TCB), namely dwnd (Delay Window), which 192 controls the delay-based component in CTCP. The conventional congestion 193 window, cwnd, remains untouched, which controls the loss-based component 194 in CTCP. Thus, the CTCP sending window now is controlled by both cwnd and 195 dwnd. Specifically, the TCP sending window (wnd) is now calculated as 196 follows: 198 wnd = min(cwnd + dwnd, awnd), (1) 200 where awnd is the advertised window from the receiver. 202 cwnd is updated in the same way as regular TCP in the congestion 203 avoidance phase, i.e., cwnd is increased by 1 MSS every RTT and halved 204 when a packet loss is encountered. The update to dwnd will be explained 205 in detail later in this section. The combined window for CTCP from (1) 206 above allows up to (cwnd + dwnd) packets in one RTT to be injected into 207 the network. Therefore, the 208 increment of cwnd on the arrival of an ACK is modified accordingly: 210 cwnd = cwnd + 1/(cwnd+dwnd) (2) 212 Some implementations may choose to use FlightSize (as defined in RFC 213 2581) to handle the receiver limited or the application limited case. 214 As stated above, CTCP retains the same behavior during slow start. When 215 a connection starts up, dwnd is initialized to zero while the 216 connection is in slow start phase. Thus the delay component is 217 only activated when the connection enters congestion avoidance. The 218 delay- 219 based algorithm has the following properties. It uses a scalable 220 increase rule when it infers that the network is under-utilized. It 221 also reduces the sending rate when it senses incipient congestion. By 222 reducing its sending rate, the delay-based component yields to 223 competing TCP flows and ensures TCP fairness. It reacts to packet 224 losses, again by reducing its sending rate, which is necessary to avoid 225 congestion collapse. CTCP's control law for the delay-based component 226 is derived from TCP Vegas. A state variable, called basertt tracks the 227 minimum round trip delay seen by a packet over the network path. The CTCP 228 sender also maintains a smoothed RTT srtt, updated as specified in 229 [RFC2988]. Basertt is not used till the delay component is activated so 230 basertt can be initialized to the smoothed rtt value that the sender 231 already computed. Basertt MUST be uninitialized and MUST be re-measured 232 if a retransmission timeout occurs, as the network conditions may have 233 changed. We provide some guidance on RTT sampling in Section 6 as robust 234 RTT sampling is key to how CTCP implementations perform. 236 The number of backlogged packets of the connection is estimated 237 using, 239 expected (throughput) = wnd/basertt 240 actual (throughput) = wnd/srtt 241 diff = (expected - actual) * basertt 243 The expected throughput gives the estimation of throughput CTCP gets if 244 it does not overrun (induce queueing on) the network path. The actual 245 throughput stands for the throughput CTCP sender really gets. Using this, 246 the 247 amount of data backlogged in the bottleneck queue (diff) can be 248 calculated. Congestion is detected by comparing diff to a threshold 249 gamma. If diff < gamma, the network path is assumed to be under- 250 utilized; otherwise the network path is assumed to be congested and 251 CTCP should gracefully reduce its window. 253 It is to be noted that a connection should have at least gamma packets 254 backlogged in the bottleneck queue to be able to detect incipient 255 congestion. This motivates the need for gamma to be small since the 256 implication is that even when the bottleneck buffer size is small, CTCP 257 will react early enough to ensure TCP fairness. On the other hand, if 258 gamma is too small compared to the queue size, CTCP will falsely detect 259 congestion and will adversely affect the throughput. Choosing the 260 appropriate value for gamma could be a problem because this parameter 261 depends on both network configuration and the number of concurrent 262 flows, which are generally unknown to the end-systems. Section 5 263 presents an effective way to automatically estimate gamma. 265 The increase law of the delay-based component should make CTCP more 266 scalable in high-speed and long delay pipes. We choose a binomial 267 function to increase the delay window [BAINF01]. As explained in the 268 next section we have modeled the response function for CTCP to have 269 comparable scalability to HighSpeed TCP. Since there is already a loss- 270 based component in CTCP, the delay-based component needs to be designed 271 to only fill the gap. The control law for CTCP's delay component can be 272 summarized as follows: 274 dwnd(t+1) = 275 dwnd(t) + alpha*dwnd(t)^k - 1, if diff < gamma (3) 276 dwnd(t) - eta*diff, if diff >= gamma (4) 277 dwnd(t)(1-beta), on packet loss (5) 279 where alpha = 1/8, beta = 1/2, eta = 1 and k = 0.75. Note that dwnd MUST 280 be measured in packets to match the response function in Section 4. 281 Equation (3) shows that in 282 the increase phase, dwnd only needs to increase by (alpha*dwnd(t)^k - 283 1) packets, since the loss-based component cwnd will also increase by 1 284 packet. When a packet loss occurs (detected by three duplicate ACKs), 285 dwnd is set to the difference between the desired reduced window size 286 and that can be provided by cwnd. The rule in equation (4) is very 287 important to preserve good RTT and TCP fairness. Eta defines how 288 rapidly the delay component should reduce its window when congestion is 289 detected. Note that dwnd MUST never be negative, so the CTCP window is 290 lower 291 bounded by its loss-based component, which is same as Standard TCP. 293 If a retransmission timeout occurs, dwnd should be reset to zero and 294 the delay-based component is disabled. This is because after a timeout, 295 the TCP sender enters slow-start phase. After the CTCP sender exits the 296 slow-start recovery state and enters congestion avoidance, dwnd control 297 is activated again. 299 4. Compound TCP Response Function 301 The TCP response function provides a relationship between TCP's average 302 congestion window w in MSS-sized segments as a function of the steady- 303 state packet drop rate p. To specify a modified response function for 304 CTCP, we use the analytical model in [CTCPI06] to derive a relationship 305 between w and p. Based on this model, the response function for CTCP 306 provides the following relationship between w and p, 308 w ~.1/(p^(1/(2-k))) (6) 310 As explained earlier we modeled the response function for CTCP to have 311 comparable scalability to HighSpeed TCP. The response function for 312 HighSpeed TCP is 314 w ~.1/p^0.835 (7) 316 Comparing (6) and (7) we get k to be around 0.8. Since it's difficult 317 to implement an arbitrary power we choose k = 0.75 which can be 318 implemented using a fast integer algorithm for square root. Based on 319 extensive experimentation, we chose alpha = 1/8, beta = 1/2, and eta = 320 1. Substituting the above values for alpha, beta and k in (6) we get 321 the following response function for CTCP, 323 w = 0.255/p^0.8 (8) 325 The response function for CTCP is compared with HSTCP and is 326 illustrated in Table 1 below. 328 CTCP HSTCP 329 Packet Drop Rate P Congestion Window W Congestion Window W 330 ------------------ ------------------- ------------------- 331 10^-3 64 38 332 10^-4 404 263 333 10^-5 2552 1795 334 10^-6 16107 12279 335 10^-7 101630 83981 336 10^-8 641245 574356 337 10^-9 4045987 3928088 338 10^-10 25528453 26864653 340 Table 1: TCP Response function for CTCP & HSTCP 342 The values in Table 1 illustrate that our choice of parameters makes 343 CTCP slightly more aggressive than HSTCP in moderate and low packet 344 loss rates but approaches HSTCP for larger windows. The reason we 345 choose to do this is because unlike HighSpeed TCP, CTCP's delay control 346 is capable of scaling back on detecting incipient congestion. As a 347 result, we expect CTCP to be more TCP friendly than HighSpeed TCP. We 348 show that this is in fact the case even under low buffering conditions 349 in the presence of high statistical multiplexing. The fairness 350 considerations and choice of gamma are detailed in Sections 5 and 6. 352 5. Automatic Selection of Gamma 354 To effectively detect early congestions, CTCP requires estimating the 355 backlogged packets at the bottleneck queue and compares this estimate 356 to a pre-defined threshold gamma. However, setting this threshold gamma 357 is particularly difficult for CTCP (and for many other similar delay- 358 based approaches) because gamma largely depends on the network 359 configuration and the number of concurrent flows that compete for the 360 same bottleneck link. Such flows are, unfortunately, unknown to end- 361 systems. Based on experimentation over varying conditions we originally 362 selected gamma to be 30 packets. This value appeared to provide a good 363 tradeoff between TCP fairness and throughput. However a fixed gamma can 364 still result in poor TCP friendliness over under-buffered network 365 links. One naive solution is to choose a very small value for gamma. 366 However this can falsely detect congestion and adversely affect 367 throughput. To address this problem, we instead use a method called 368 tuning-by-emulation to dynamically adjust gamma. The basic idea is to 369 estimate the backlogged packets of a Standard TCP flow along the same 370 path by simultaneously emulating the behavior of a Standard TCP flow. 371 Based on this, gamma is set so as to ensure good TCP-friendliness. CTCP 372 can then automatically adapt to different network configurations (i.e., 373 buffer provisioning) and also concurrent competing flows. 375 To ensure the effectiveness of incipient congestion detection, our 376 analytical model on CTCP shows that gamma should at least be less than 377 B/(m+l), where B is the bottleneck buffer and m and l represent the 378 number of concurrent Standard TCP flows and CTCP flows, respectively, 379 that are competing for the same bottleneck link [CTCPI06][CTCPP06] 380 [CTCPT]. Generally, both B and (m+l) are unknown to end-systems. It is 381 very difficult to estimate these values from end-systems in real-time, 382 especially the number of flows, which can vary significantly over time. 383 Fortunately there is a way to directly estimate the ratio B/(m+l), even 384 though the individual variables B and (m+l) are hard to estimate. Let's 385 first assume there are (m+l) regular TCP flows in the network. These 386 (m+l) flows should be able to fairly share the bottleneck capacity in 387 steady state. Therefore, they should also get roughly equal shares of 388 the buffers at the bottleneck, which should equal to B/(m+l). For such 389 a Standard TCP flow, although it does not know either B or (m+l), it 390 can still infer B/(m+l) easily by estimating its backlogged packets, 391 which is a rather mature technique widely used in many delay-based 392 protocols. This brings us to the core idea of CTCP's algorithm; CTCP 393 lets the sender emulate the congestion window of a Standard TCP flow. 394 Using this emulated window, we can estimate the buffer occupancy 395 (diff_reno) for a Standard TCP flow. Diff_reno can be regarded as a 396 conservative estimate of B/(m+l) assuming that the high speed flow is 397 more aggressive than Standard TCP. By choosing gamma <= diff_reno, we 398 can ensure TCP fairness. 400 The implementation is actually fairly trivial. This is because CTCP 401 already emulates Standard TCP as the loss-based component. We can 402 simply estimate the buffer occupancy of a competing Standard TCP flow 403 from state that CTCP already maintains. We choose an initial gamma = 30 404 and diff_reno is calculated as follows, 406 expected_reno (throughput) = cwnd/basertt 407 actual_reno (throughput) = cwnd/srtt 408 diff_reno = (expected - actual) * basertt 410 The difference between diff_reno and diff is simply that diff_reno is 411 computed only using the loss-based component cwnd. Since Standard TCP 412 reaches its maximum buffer occupancy just before a loss, CTCP uses the 413 diff_reno value computed in the previous round to calculate the gamma 414 for the next round. A round corresponds to the time it takes for one 415 window of data 416 to be acknowledged. It typically corresponds to one RTT. Whenever a loss 417 happens, gamma is chosen to be less 418 than diff_reno and the sample values of gamma are updated using a 419 standard exponentially weighted moving average. The pseudocode to 420 calculate gamma is shown below. Here a round tracks every window 421 worth of data. Section 7 provides more details on how to maintain a 422 round. 424 Initialization: 425 diff_reno = invalid; 426 Gamma = 30; 428 End-of-Round: 430 expected_reno = cwnd / baseRTT; 431 actual_reno = cwnd / RTT; 432 diff_reno = (Expected_reno-Actual_reno)*baseRTT; 434 On-Packet-Loss: 436 If diff_reno is valid then 437 g_sample = 3/4*Diff_reno; 438 gamma = gamma*(1-lamda)+ lamda*g_sample; 439 if (gamma < gamma_low) 440 gamma=gamma_low; 441 else if (gamma > gamma_high) 442 gamma=gamma_high; 443 fi 444 diff_reno = invalid; 445 fi 447 The recommended values for gamma_low and gamma_high are 5 and 30 448 respectively. diff_reno is set to invalid to prevent using stale 449 diff_reno data when there are consecutive losses between which no 450 samples were taken. 452 6. Implementation Issues 454 CTCP has been implemented on Microsoft Windows and there has been 455 extensive testing on production links and in Windows Beta deployments. 457 The first challenge is to design a mechanism that can precisely track 458 the changes in round trip time with minimal overhead, and can scale 459 well to support many concurrent TCP connections. Naively taking RTT 460 samples for every packet will obviously be an over-kill for both CPU 461 and system memory, especially for high-speed and long distance networks 462 where the congestion window can be very large. Therefore, CTCP needs to 463 limit the number of samples taken, but without compromising on 464 accuracy. In our implementation, we only take up to M samples per 465 window of data. M is chosen to scale with the round trip delay and 466 window size. 468 In order to further improve the efficiency in memory usage, we have 469 developed a memory allocation mechanism to dynamically allocate sample 470 buffers from a kernel fixed-size per-processor pool. The size should be 471 chosen as a function of the available system memory. As the window size 472 increases, M can be updated so that the samples are uniformly 473 distributed over the window. As M gets updated, more memory blocks are 474 allocated and linked to the existing sample buffers. If the sending 475 rate changes, either due to network conditions or due to application 476 behavior, the sample blocks are reclaimed to the global memory pool. 477 This dynamic buffer management ensures the scalability of our 478 implementation, so that it can work well even in a busy server which 479 could host tens of thousands of TCP connections simultaneously. Note 480 that it may also require a high-resolution timer to time RTT samples. 482 The rest of the implementation is rather straightforward. We add two 483 new state variables into the standard TCP Control Block, namely dwnd 484 and basertt (described in Section 3). Following the common practice of 485 high-speed protocols, CTCP reverts to standard TCP behavior when the 486 window is small. Delay-based component only kicks in when cwnd is 487 larger than some threshold, currently set to 38 packets assuming 1500 488 byte MTU. dwnd is updated at the end of each round. Note that no RTT 489 sampling and dwnd update happens during the loss recovery phase. This 490 is because the retransmission during the loss recovery phase may result 491 in inaccurate RTT samples and can adversely affect the delay-based 492 control. 494 7. Deployment Issues 496 There are several variations of TCP proposed for high speed and long 497 delay networks. We do not claim Compound TCP to be the best nor the 498 most optimal algorithm. However, based on extensive testing via 499 simulations and experimentation including those on production links as 500 well as beta deployments of a reasonable scale, we believe that 501 Compound TCP satisfies the design considerations outlined earlier in 502 this document. It effectively uses spare bandwidth in high speed 503 networks, achieves good intra-protocol fairness even in the presence of 504 differing RTTs and does not adversely impact standard TCP. Furthermore, 505 Compound TCP does not require any changes or any new feedback from the 506 network and is deployable over the current Internet in an incremental 507 fashion. It interoperates with Standard TCP and requires support only 508 on the send side of a TCP connection for it to be used. 510 We also note that similar to High Speed TCP, in environments typical of 511 much of the current Internet, Compound TCP behaves exactly like 512 Standard TCP. This it does by ensuring that it follows the standard TCP 513 algorithm without any modification any time the congestion window is 514 less than 38 packets. Only when the congestion window is greater than 515 38 packets does the delay-based component of Compound TCP get invoked. 516 Thus, for example for a connection with an RTT of 100ms, the end-to-end 517 bandwidth must be greater than 4.8Mbps for CTCP to have any difference 518 in its response to network conditions compared to standard TCP. 520 Further, we do not believe that the deployment of Compound TCP would 521 block the possible deployment of alternate experimental congestion 522 control algorithms such as Fast TCP [FAST] or CUBIC [CUBIC]. In 523 particular, Compound TCP's response has a fallback to a loss-based 524 function that has characteristics very similar to HS-TCP or N parallel 525 TCP connections. 527 8. Security Considerations 529 CTCP modifies the congestion control algorithm of TCP protocol by adding 530 a delay based component while keeping all other aspects of the protocol 531 intact. Hence, any additional security considerations for CTCP are 532 limited to the security considerations for the delay based aspect of the 533 CTCP algorithm. 535 There are a few possible security considerations for the delay based 536 component of CTCP. A receiver can explicitly delay the acknowledgements 537 or it can proactively acknowledge packets. In the former case dwnd 538 increase would be slower and the throughput would be no worse than 539 standard TCP. In the latter case the sender may end up sending traffic at 540 a higher rate. However as the packets are proactively acknowledged the 541 sender will update its basertt to be much lower than the actual RTT. So 542 any increases in measured RTT will be perceived as congestion. Further, 543 sender can implement additional mitigations to detect such a malicious 544 receiver eg by detecting if spurious acknowledgements are being 545 acknowledged too soon i.e. faster than RTT and without actually receiving 546 the packet. The delay measurements for CTCP are derived at the sender- 547 side only, without relying on timestamps. This mitigates possible attacks 548 where receiver manipulates the timestamps echoed back to the sender. 550 9. IANA Considerations 552 There are no IANA considerations regarding this proposal. 554 10. Conclusions 556 This document proposes a congestion control algorithm for TCP for high 557 speed and long delay networks. By introducing a delay-based component 558 in addition to a standard TCP-based loss component, Compound TCP is 559 able to detect and effectively use spare bandwidth that may be 560 available on a high speed and long delay network. Furthermore, the 561 delay-based component detects the onset of congestion early and 562 gracefully reduces the sending rate. The loss-based component, on the 563 other hand, ensures there is an effective response to losses in network 564 while in the absence of losses, keeps the throughput of CTCP lower 565 bounded by TCP Reno. Thus, CTCP is not timid, nor does it induce more 566 self-induced packet loss than a single standard TCP flow. Thus Compound 567 TCP is efficient in consuming available bandwidth while being friendly 568 to standard TCP. Further, the delay component does not have any RTT 569 bias thereby reducing the RTT bias of the Compound TCP vis-a-vis 570 standard TCP. 572 Compound TCP has been implemented as an optional component in Microsoft 573 Windows Vista. It has been tested and experimented through broad 574 Windows Vista beta deployments where it has been verified to meet its 575 objectives without causing any adverse impact. The Stanford Linear 576 Accelerator Center (SLAC) has also evaluated Compound TCP on production 577 links. Based on testing and evaluation done so far, we believe Compound 578 TCP is safe to deploy on the current Internet. We welcome additional 579 analysis, testing and evaluation of Compound TCP by Internet community 580 at large and continue to do additional testing ourselves. 582 11. Acknowledgments 584 The authors would like to thank Jingmin Song for all his efforts in 585 evaluating the algorithm on the test beds. We are thankful to Yee-ting 586 Lee and Les Cottrell for testing and evaluation of Compound TCP on 587 Internet2 links [SLAC]. We would like to thank Sanjay Kaniyar for his 588 insightful comments and for driving this project in Microsoft. We are 589 also thankful to the Microsft.com data center staff who helped us 590 evaluate Compound TCP on their production links. In addition, several 591 folks from the Internet research community who attended the High-Speed 592 TCP Summit at Microsoft [MSWRK] have provided valuable feedback on 593 Compound TCP. We would like to thank CTCP reviewers at ICCRG for their 594 valuable feedback; specifically we would like to thank Lachlan Andrew and 595 Doug Leith for their thorough review and excellent feedback. Finally, we 596 are thankful to the Windows Vista program beta participants who helped us 597 test and evaluate CTCP. 599 12. References 601 12.1. Normative References 603 [CTCPI06] K. Tan, Jingmin Song, Qian Zhang, Murari Sridharan, "A 604 Compound TCP Approach for High-speed and Long Distance 605 Networks", in IEEE Infocom, April 2006, Barcelona, Spain. 607 [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion 608 Control", RFC 2581, April 1999. 610 12.2. Informative References 612 [AFRICA] R. King, R. Baraniuk and R. riedi, "TCP-Africa: An 613 Adaptive and Fair Rapid Increase Rule for Scalable 614 TCP", In Proc. INFOCOM 2005. 616 [BAINF01] Bansal and H. Balakrishnan, "Binomial Congestion Control 617 Algorithms", Proc INFOCOM 2001. 619 [CTCPP06] K. Tan, J. Song, Q. Zhang, and M. Sridharan, "Compound 620 TCP: A Scalable and TCP-friendly Congestion Control 621 for High-speed Networks", in 4th International 622 workshop on Protocols for Fast Long-Distance Networks 623 (PFLDNet), 2006, Nara, Japan. 625 [CTCPT] K. Tan, J. Song, M. Sridharan, and C.Y. Ho, "CTCP: 626 Improving TCP-Friendliness Over Low-Buffered Network 627 Links", Microsoft Technical Report. 629 [CUBIC] I. Rhee, L. Xu and S. Ha, "CUBIC for fast long 630 distance networks", Internet Draft, Expires Aug 31, 631 2007, draft-rhee-tcp-cubic-00.txt 633 [FAST] C. Jin, D. Wei, S. Low, "FAST TCP: Motivation, 634 Architecture, Algorithms, Performance", in IEEE Infocom 635 2004. 637 [MSWRK] Microsoft High-Speed TCP Summit, 638 http://research.microsoft.com/events/TCPSummit/ 640 [PADHYE] J. Padhya, V. Firoiu, D. Towsley and J. Kurose, 641 "Modeling TCP Throughput: A Simple Model and its 642 Empirical Validation", in Proc. ACM SIGCOMM 1998. 644 [RFC2988] V. Paxon and M. Allman, "Computing TCP's Retransmission 645 Timer", RFC 2988, November 2000. 647 [RFC3649] S. Floyd, "HighSpeed TCP for Large Congestion 648 Windows", RFC 3649, Dec 2003. 650 Sridharan 652 [SLAC] Yee-Ting Li, "Evaluation of TCP Congestion Control 653 Algorithms on the Windows Vista Platform", SLAC-TN-06- 654 005, http://www.slac.stanford.edu/pubs/slactns/tn04/slac- 655 tn-06-005.pdf 657 [VEGAS] L. Brakmo, S. O'Malley, and L. Peterson, "TCP Vegas: 658 New techniques for congestion detection and 659 avoidance", in Proc. ACM SIGCOMM, 1994. 661 Author's Addresses 663 Murari Sridharan 664 Microsoft Corporation 665 1 Microsoft Way, Redmond 98052 667 Email: muraris@microsoft.com 669 Kun Tan 670 Microsoft Research 671 5/F, Beijing Sigma Center 672 No.49, Zhichun Road, Hai Dian District 673 Beijing China 100080 675 Email: kuntan@microsoft.com 677 Deepak Bansal 678 Microsoft Corporation 679 1 Microsoft Way, Redmond 98052 681 Email: dbansal@microsoft.com 683 Dave Thaler 684 Microsoft Corporation 685 1 Microsoft Way, Redmond 98052 687 Email: dthaler@microsoft.com 689 Intellectual Property Statement 691 The IETF takes no position regarding the validity or scope of any 692 Intellectual Property Rights or other rights that might be claimed 693 to pertain to the implementation or use of the technology described 694 in this document or the extent to which any license under such 695 rights might or might not be available; nor does it represent that 696 it has made any independent effort to identify any such rights. 697 Information on the procedures with respect to rights in RFC 698 documents can be found in BCP 78 and BCP 79. 700 Copies of IPR disclosures made to the IETF Secretariat and any 701 assurances of licenses to be made available, or the result of an 702 attempt made to obtain a general license or permission for the use 703 of such proprietary rights by implementers or users of this 704 specification can be obtained from the IETF on-line IPR repository 705 at http://www.ietf.org/ipr. 707 The IETF invites any interested party to bring to its attention any 708 copyrights, patents or patent applications, or other proprietary 709 rights that may cover technology that may be required to implement 710 this standard. Please address the information to the IETF at 711 ietf-ipr@ietf.org. 713 Disclaimer of Validity 715 This document and the information contained herein are provided on 716 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 717 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 718 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 719 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 720 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 721 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 722 FOR A PARTICULAR PURPOSE. 724 Copyright Statement 725 Copyright (C) The IETF Trust (2007). 726 This document is subject to the rights, licenses and restrictions 727 contained in BCP 78, and except as set forth therein, the authors 728 retain all their rights. 730 Acknowledgment 731 Funding for the RFC Editor function is currently provided by the 732 Internet Society.