idnits 2.17.00 (12 Aug 2021) /tmp/idnits45635/draft-nishida-newreno-modification-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 2, 2010) is 4456 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3782 (Obsoleted by RFC 6582) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nishida 3 Internet-Draft WIDE Project 4 Intended status: Standards Track March 2, 2010 5 Expires: September 3, 2010 7 NewReno Modification for Smooth Recovery After Fast Retransmission 8 draft-nishida-newreno-modification-02 10 Abstract 12 This memo describes a feeble point in Fast Recovery algorithm in 13 NewReno defined in RFC3782 and proposes a simple modification to 14 solve the problem. 16 Status of this Memo 18 This Internet-Draft is submitted to IETF in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on September 3, 2010. 39 Copyright Notice 41 Copyright (c) 2010 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4 58 3. Problem Description . . . . . . . . . . . . . . . . . . . . . 5 59 4. Possible Scenarios . . . . . . . . . . . . . . . . . . . . . . 6 60 4.1. Case 1: Small Sending Window Size at Sender . . . . . . . 6 61 4.2. Case 2: Zero Window Advertisement from Receiver . . . . . 6 62 4.3. Case 3: Lost of ACK segments . . . . . . . . . . . . . . . 7 63 5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 9 64 6. Proposed Fix . . . . . . . . . . . . . . . . . . . . . . . . . 10 65 7. Simulation Results . . . . . . . . . . . . . . . . . . . . . . 11 66 8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 67 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 68 10. Normative References . . . . . . . . . . . . . . . . . . . . . 15 69 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16 71 1. Introduction 73 There are some situations that NewReno cannot recover quickly after 74 the success of fast retransmission. This issue is resulted from a 75 feeble point in Fast Recovery algorithm in NewReno defined in RFC3782 76 [RFC3782]. This document describes the point in Fast Recovery and 77 presents possible scenarios. This memo also propose a simple 78 modification to fix this problem. 80 2. Conventions and Terminology 82 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 83 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 84 document are to be interpreted as described in [RFC2119]. 86 Since this document describes a potential risk in NewReno, it uses 87 the same terminology and definitions in RFC3782 [RFC3782]. Which 88 means this documents assumes that the reader is familiar with the 89 terms SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), 90 and FLIGHT SIZE (FlightSize) defined in [RFC2581]. 92 3. Problem Description 94 This section describes a potential risk in Fast Retransmit and Fast 95 Recovery Algorithm in RFC3782. 97 Section 3 in RFC3782 describes the Fast Retransmit and Fast Recovery 98 Algorithm in NewReno. The algorithm consists of 6 steps. The 99 following lines are the description of the fifth steps which 100 describes the behavior for the arrival of the first Full ACK after 101 first retransmission. 103 5) When an ACK arrives that acknowledges new data, this ACK could be 104 the acknowledgment elicited by the retransmission from step 2, or 105 elicited by a later retransmission. 107 Full acknowledgements: 109 If this ACK acknowledges all of the data up to and including 110 "recover", then the ACK acknowledges all the intermediate 111 segments sent between the original transmission of the lost 112 segment and the receipt of the third duplicate ACK. Set cwnd to 113 either (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh 114 where ssthresh is the value set in step 1; this is termed 115 "deflating" the window. (We note that "FlightSize" in step 1 116 referred to the amount of data outstanding in step 1, when Fast 117 Recovery was entered, while "FlightSize" in step 5 refers to the 118 amount of data outstanding in step 5, when Fast Recovery is 119 exited.) 121 According to this description, the cwnd after the first FULL ACK 122 reception will be one of the followings. 124 (1) min (ssthresh, FlightSize + SMSS) 125 (2) ssthresh 127 However, there is a risk in (1) which can cause performance 128 degradation. In (1), if FlightSize is zero, the result of (1) will 129 be 1 SMSS. (ssthresh should be bigger than 1) This means TCP can 130 transmit only 1 segment in this case. This can cause the delay in 131 ACK transmission at the receiver side if the receiver use delayed ACK 132 algorithm. The FlightSize in (1) represents the amount of data 133 outstanding in the fifth step: the moment when the new Full ACK 134 arrives. The next section describes several scenarios where the 135 FlightSize becomes zero. 137 4. Possible Scenarios 139 There are several possible situations that FlightSize becomes zero 140 when the first new full ACK arrives after fast retransmission. This 141 section describe several possible cases. 143 4.1. Case 1: Small Sending Window Size at Sender 145 This is the tcpdump example of the case. This log is recorded at A. 147 1 10:41:00.000001 A > B: . 1000:2000(1000) ack 1 win 32768 148 2 10:41:00.001001 A > B: . 2000:3000(1000) ack 1 win 32768 149 3 10:41:00.002001 A > B: . 3000:4000(1000) ack 1 win 32768 150 4 10:41:00.003001 A > B: . 4000:5000(1000) ack 1 win 32768 151 5 10:41:00.010001 B > A: . ack 1000 win 16384 152 6 10:41:00.011001 B > A: . ack 1000 win 16384 153 7 10:41:00.012001 B > A: . ack 1000 win 16384 154 8 10:41:00.013001 A > B: . 1000:2000(1000) ack 1 win 32768 155 9 10:41:00.014001 A > B: . 5000:6000(1000) ack 1 win 32768 156 10 10:41:00.024001 B > A: . ack 6000 win 16384 157 11 10:41:00.025001 A > B: . 6000:7000(1000) ack 1 win 32768 159 In this example, A sends data segments to B. At the beginning of the 160 log, the cwnd of A is 4 SMSS, hence A sends 4 segments to B (line 161 1-4). Here, if the segment sent in line 1 (segment 1000:2000) is 162 lost, B sends 3 duplicated ACKs for the lost segment (line 5-7) to 163 ask retransmission. At line 8, A receives 3 duplicated ACKs then it 164 transmits the lost segment. At line 9, A sets cwnd to ssthresh plus 165 3*SMSS (as defined in the second steps in NewReno algorithm) and cwnd 166 becomes 5 SMSS as the result. This window inflation allows A to 167 transmit one new segment. 169 Since the two segments in line 8 and 9 are usually transmitted almost 170 at the same time, the receiver may send back only one ACK for these 171 two segments (line 10) The ACK received in line 10 is the first Full 172 ACK and there is no out-standing data in this moment. Hence, new 173 cwnd is set to 1 SMSS and only one new segment is sent (line 11) 175 4.2. Case 2: Zero Window Advertisement from Receiver 177 This is the tcpdump example of the case. This log is recorded at A. 179 1 11:42:00.000001 A > B: . 1000:2000(1000) ack 1 win 32768 180 2 11:42:00.001001 A > B: . 2000:3000(1000) ack 1 win 32768 181 3 11:42:00.002001 A > B: . 3000:4000(1000) ack 1 win 32768 182 4 11:42:00.003001 A > B: . 4000:5000(1000) ack 1 win 32768 183 5 11:42:00.004001 A > B: . 5000:6000(1000) ack 1 win 32768 184 6 11:42:00.005001 A > B: . 6000:7000(1000) ack 1 win 32768 185 7 11:42:00.010001 B > A: . ack 1000 win 0 186 8 11:42:00.011001 B > A: . ack 1000 win 0 187 9 11:42:00.012001 B > A: . ack 1000 win 0 188 10 11:42:00.012201 A > B: . 1000:2000(1000) ack 1 win 32768 189 11 11:42:00.013001 B > A: . ack 1000 win 0 190 12 11:42:00.014001 B > A: . ack 1000 win 0 191 13 11:42:00.022001 B > A: . ack 7000 win 16384 192 14 11:42:00.023001 A > B: . 7000:8000(1000) ack 1 win 32768 194 In this example, A sends data segments to B. At the beginning of the 195 log, the cwnd of A is 6 SMSS, hence A sends 6 segments to B (line 196 1-6). Here, if the segment sent in line 1 (segment 1000:2000) is 197 lost, B sends duplicated ACKs for the lost segment (line 7-9 and 198 11-12) to ask retransmission. However, these duplicated ACKs sent 199 from B have zero advertised window because of buffer overflow. In 200 this case, although the cwnd at A is inflated at the reception of the 201 duplicated ACKs, it cannot transmit new segments. Hence, only the 202 lost segment is retransmitted (line 10). When B receives 203 retransmitted segment, the buffer becomes empty, then B sends a Full 204 ACK with non-zero advertised window. The ACK received in line 13 is 205 the first Full ACK and there is no out-standing data in this moment. 206 Hence, new cwnd is set to 1 SMSS and only one new segment is sent 207 (line 14) 209 4.3. Case 3: Lost of ACK segments 211 This is the tcpdump example of the case. This log is recorded at A. 213 1 12:43:00.000001 A > B: . 1000:2000(1000) ack 1 win 32768 214 2 12:43:00.001001 A > B: . 2000:3000(1000) ack 1 win 32768 215 3 12:43:00.002001 A > B: . 3000:4000(1000) ack 1 win 32768 216 4 12:43:00.003001 A > B: . 4000:5000(1000) ack 1 win 32768 217 5 12:43:00.004001 A > B: . 5000:6000(1000) ack 1 win 32768 218 6 12:43:00.005001 A > B: . 6000:7000(1000) ack 1 win 32768 219 7 12:43:00.010001 B > A: . ack 1000 win 16384 220 8 12:43:00.011001 B > A: . ack 1000 win 16384 221 9 12:43:00.012001 B > A: . ack 1000 win 16384 222 10 12:43:00.012201 A > B: . 1000:2000(1000) ack 1 win 32768 223 11 12:43:00.022001 B > A: . ack 7000 win 16384 224 12 12:43:00.023001 A > B: . 7000:8000(1000) ack 1 win 32768 226 In this example, A sends data segments to B. At the beginning of the 227 log, the cwnd of A is 6 SMSS, hence A sends 6 segments to B (line 228 1-6). Here, if the segment sent in line 1 (segment 1000:2000) is 229 lost, B generates 5 duplicated ACKS, however 2 ACK segments are lost 230 in this case. Then, only 3 duplicated ACKs arrives at A (line 7-9). 231 At line 10, A transmits the lost segment and sets cwnd to ssthresh 232 plus 3*SMSS. As the result, the cwnd becomes 6 SMSS. However, this 233 cwnd does not allow A to transmit new segments. At line 11, A 234 receives the first Full ACK and there is no out-standing data in this 235 moment. Hence, new cwnd is set to 1 SMSS and only one new segment is 236 sent (line 12) 238 5. Discussion 240 Some TCP implementations such as Linux, NS-2 Network simulator do not 241 have this issue. This is because these implementations always 242 transmit more than 1 MSS right after fast recovery. In these 243 implementations, when TCP exits Fast Recovery (when the first FULL 244 ACK is received) it also calls "open cwnd" function at the same time 245 and performs Slow Start or Congestion Avoidance algorithm. Hence, 246 even though cwnd is set to 1 MSS after Fast Recovery as described in 247 Section 3, the cwnd will be increased by 1 MSS by Slow Start. (Since 248 ssthresh should be bigger than 1 MSS at this moment, Slow Start is 249 always used to increase cwnd) 251 However, this behavior can be controversial because it enters Slow- 252 Start after Fast Recovery without receiving any packets. Although 253 this point is unclear in RFC3782, we believe that this is rather 254 aggressive behavior and TCP should not open cwnd after Fast Recovery 255 without receiving another ACKs. In fact, several implementation do 256 not perform Slow Start right after Fast Recovery. With these 257 implementations, severe performance degradations can be observed over 258 lossy networks. 260 6. Proposed Fix 262 To solve the problem mentioned above, we propose a simple fix to the 263 fifth step in NewReno. 265 The proposed solution is modifying the current cwnd adjustment: 267 (1) min (ssthresh, FlightSize + SMSS) 268 to 269 (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) 271 This fix ensures that cwnd is always larger than 1 SMSS. Hence, 272 sender TCP can always transmit at least two segments right after the 273 first Full ACK reception. This can avoid the delay of ACK 274 transmissions caused by delayed ACK algorithm. The new algorithm 275 increases 1 SMSS only when FlightSize becomes zero and behaves 276 completely the same as the previous algorithm does in other 277 situations. The new algorithm might add slight burstness since it 278 requires additional increase of cwnd. However, we believe this 279 burstness can be almost negligible. 281 7. Simulation Results 283 In order to verify the effect of the issue described in this 284 document, we implemented our algorithm in the TCP/Newreno agent in 285 ns-2.34 and conducted several simulations. We used a simple network 286 configuration as depicted in the below figure for our simulations. 287 There is one 10Mbps link between the sender and the receiver and link 288 delay is set to 2ms. The PLR on the link is set to 0.01 - 0.06 for 289 the traffic towards the receiver. The sender transmits 100000 290 packets to the receiver with one TCP connection. (FTP application 291 attached to TCP/Newreno agent is used) The receiver uses TCPSink/ 292 DelAck agent and delayed ack interval is set to 200ms. 294 _________ __________ 295 | | 10Mbps, 2ms | | 296 | sender |----------------------------| receiver | 297 |_________| PLR=0.01-0.06 |__________| 299 With this configuration, we measured the performance of TCP by using 300 the following three algorithms. alg1 is the algorithm adopted in the 301 original NS-2 code or linux. alg2 is the algorithm that seems to be 302 adopted in some other OSs. alg3 is the algorithm proposed in this 303 document. 305 alg1 ... always do slow start after fast recovery without receiving 306 ACKs 308 alg2 ... don't do slow start after fast recovery without receiving 309 ACKs 311 alg3 ... don't do slow start after fast recovery without receiving 312 ACKs. but, adjust cwnd to be always bigger than 1. 314 At first, we measured the number of events where flightsize becomes 315 zero after fast recovery. As showed in the below table, when 316 PLR=0.01, the ratio of the event is around 0.1% while it is around 317 2.0% when PLR=0.06. This means that the ratio of this event cannot 318 be negligible under congested situations. 320 number of events where flightsize becomes zero after fast recovery 322 PLR=0.01 PLR=0.02 PLR=0.03 PLR=0.04 PLR=0.05 PLR=0.06 323 alg1 108 333 687 1140 1537 1916 324 alg2 113 365 724 1182 1615 1939 325 alg3 107 371 717 1186 1587 1936 327 Next, we measured the throughput of each algorithm. As showed in the 328 below table, alg2 exhibits serious performance degradation compared 329 to the other two. alg1 maintains the best performance in all cases. 330 This is because it has a bit aggressive natures. Although alg3 is a 331 less aggressive algorithm than alg1, it attains mostly the same 332 performance as alg1. 334 throughput (kbps) 336 PLR=0.01 PLR=0.02 PLR=0.03 PLR=0.04 PLR=0.05 PLR=0.06 337 alg1 1028.49 697.87 491.29 356.36 257.71 198.65 338 alg2 825.57 451.96 284.25 190.13 137.99 107.05 339 alg3 1006.64 671.86 470.39 344.28 248.71 193.30 341 From these results, we recommend not to adopt alg2 and to use alg1 or 342 alg3. We also believe that alg3 is the best algorithm since it can 343 attain good performance while it keeps conservative nature as we 344 discuss in this draft. 346 8. Security Considerations 348 This document only propose simple modification in RFC3782. There are 349 no known additional security concerns for this algorithm. 351 9. IANA Considerations 353 This document does not create any new registries or modify the rules 354 for any existing registries managed by IANA. 356 10. Normative References 358 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 359 Requirement Levels", BCP 14, RFC 2119, March 1997. 361 [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 362 Control", RFC 2581, April 1999. 364 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 365 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 366 April 2004. 368 Author's Address 370 Yoshifumi Nishida 371 WIDE Project 372 Endo 5322 373 Fujisawa, Kanagawa 252-8520 374 Japan 376 Email: nishida@wide.ad.jp