idnits 2.17.00 (12 Aug 2021) /tmp/idnits45328/draft-nishida-tcpm-rescue-retransmission-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 15, 2011) is 4047 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675) ** Obsolete normative reference: RFC 3782 (Obsoleted by RFC 6582) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nishida 3 Internet-Draft WIDE Project 4 Intended status: Standards Track April 15, 2011 5 Expires: October 17, 2011 7 Rescue Retransmission for SACK-based Loss Recovery Algorithm 8 draft-nishida-tcpm-rescue-retransmission-00 10 Abstract 12 This memo describes an issue in the recovery algorithm in RFC3517 and 13 proposes a simple modification to avoid unnecessary timeouts for 14 performance improvement. 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on October 17, 2011. 33 Copyright Notice 35 Copyright (c) 2011 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4 52 3. Problem Description . . . . . . . . . . . . . . . . . . . . . 5 53 4. Possible Scenario . . . . . . . . . . . . . . . . . . . . . . 6 54 5. Proposed Fix . . . . . . . . . . . . . . . . . . . . . . . . . 8 55 6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 9 56 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 57 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 58 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 59 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 60 10.1. Normative References . . . . . . . . . . . . . . . . . . 13 61 10.2. Informative References . . . . . . . . . . . . . . . . . 13 62 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 14 64 1. Introduction 66 RFC3517 [RFC3517] defines conservative loss recovery algorithm based 67 on the use of the selective acknowledgment (SACK) TCP option 68 [RFC2018]. It is designed to follows the guidelines set in RFC2581 69 [RFC2581] in order to be used safely in TCP implementations. 70 However, in some situations, the loss recovery algorithm in RFC3517 71 fails to retransmit segments even though there are available pipe 72 size for the connection. This failure of the retransmission can 73 causes unnecessary timeouts which can lead performance degradation. 74 This document describes the issue and propose a simple modification 75 to solve this problem. The proposed solution allows SACK-based TCP 76 to attain the same performance as NewReno [RFC3782]. 78 2. Conventions and Terminology 80 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 81 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 82 document are to be interpreted as described in [RFC2119]. 84 3. Problem Description 86 In RFC3517, when a sender receives the duplicate ACK corresponding to 87 DupThresh ACKs, it enters loss recovery phase. In the loss recovery 88 phase, whenever sender receives ACK segments, it re-calculate the 89 size of pipes by calling Update() and SetPipe(). and determines which 90 segments should be sent by calling NextSeg(). However, there are 91 some situations where NextSeg() returns no segment although the size 92 of pipes is not zero. This behavior results from the following logic 93 in the NextSeg(). When NextSeg() tries to find segments to be 94 retransmitted, it uses the IsLost() that returns segments which are 95 most likely lost. In order to increase the accuracy, IsLost() 96 determines that the packet with 'SeqNum' is lost when DupThresh 97 discontiguous SACKed sequences have arrived above 'SeqNum' or 98 (DupThresh * SMSS) bytes with sequence numbers greater than 'SeqNum' 99 have been SACKed. If IsLost() returns no packet, NextSeg() uses new 100 segments for the next transmission. 102 In this logic, a problem can arise when a sender does not have new 103 segments to be sent. In this case, if IsLost() returns no packet, 104 NextSeg() cannot find a packet for the next transmission and packet 105 transmissions will be delayed until one of the following events 106 happens. 108 o ACKs have arrived and IsLost() finds new lost segments 110 o Application feeds data to TCP 112 o Retransmission timer expires 114 However, in some situations, such as where window size is small, the 115 number of arrived ACKs might not be enough to identify lost segments. 116 In addition, applications might feed data intermittently or might not 117 have no more data to feed. In this case, TCP will need timer 118 expiration to retransmit segments even though there are enough pipe 119 size to send a packet. 121 4. Possible Scenario 123 This section describe a possible scenario where the issue described 124 in the document happens. 126 The following is a virtual tcpdump log. 128 1 10:41:00.000001 A > B: . 1000:2000(1000) ack 1 win 32768 129 2 10:41:00.001001 A > B: . 2000:3000(1000) ack 1 win 32768 130 3 10:41:00.002001 A > B: . 3000:4000(1000) ack 1 win 32768 131 4 10:41:00.003001 A > B: . 4000:5000(1000) ack 1 win 32768 132 5 10:41:00.004001 A > B: . 5000:6000(1000) ack 1 win 32768 133 6 10:41:00.010001 B > A: . ack 1000 win 16384 < sack {2000:3000} > 134 7 10:41:00.011001 B > A: . ack 1000 win 16384 < sack {2000:4000} > 135 8 10:41:00.012001 B > A: . ack 1000 win 16384 < sack {2000:5000} > 136 9 10:41:00.015001 A > B: . 1000:2000(1000) ack 1 win 32768 137 10 10:41:00.018001 B > A: . ack 5000 win 16384 139 In this example, A sends data segments to B. At the beginning of the 140 log, the cwnd of A is 5 SMSS (SMSS=1000 octets), hence A sends 5 141 segments to B (line 1-5). Here, if the segment sent in line 1 142 (segment 1000:2000) and line 5 (segment 5000:6000) are lost, B sends 143 3 duplicated ACKs for the lost segment (line 6-8) to ask 144 retransmission for the segment 1000:2000. At line 8, A receives 145 DupThresh ACKs and retransmits the lost segment (at line 9). At this 146 time, A enters loss recovery phase and set pipe size to 2.5 SMSS. At 147 line 10, A receives the ACK triggered by the arrival of the segment 148 1000:2000. Upon the reception of the ACK at line 10, A performs the 149 following steps to determine if there are segments can be sent. 151 1. Update the pipe size by calling update() and SetPipe(). Since 152 HighACK = 5000, HighData is 6000 and IsLost(5000) returns false, 153 the value of pipe is set to 1000. 155 2. Because cwnd - pipe >= 1 SMSS, it decides to send one or more 156 segments. 158 3. Call NextSeg() to determine what segments to be sent. 160 Now, if A has no unsent data, only available packet can be sent is 161 segment 5000:6000. NextSeg() checks if this segment can be sent by 162 applying the following logics, however none of them can be applied. 164 1. rule (1) cannot be applied to this segment. Because (1.b) and 165 (1.c) return false, 167 2. rule (2) cannot be applied since there is no available unsent 168 data. 170 3. rule (3) cannot be applied to this segment. Because (1.b) 171 returns false. 173 Hence NextSeg() returns no segment in this case, which means TCP has 174 no segment to be sent until timeout happens. In case where there are 175 multiple packet loss in a window and TCP has no data to send at the 176 moment, it will be possible that TCP falls into this situation. 178 5. Proposed Fix 180 To solve the problem mentioned above, we propose to introduce one 181 variable: RescueRxt for TCP sender and add the following logic as the 182 fourth rule. 184 (4) If the conditions for rules (1), (2) and (3) fail, but there 185 exists unSACKed data, one segment of up to SMSS octets MAY be 186 returned if RescueRxt is not set. The returned segment MUST 187 include the highest unSACKed sequence number. 189 When a segment is returned by this rule, RescueRxt MUST be set to 190 the highest octets of the segment. Also, HighRxt MUST NOT be 191 updated. 193 In addition to this rule, TCP sender MUST reset RescueRxt when it 194 receives cumulative ACK for a sequence number greater than RescueRxt. 196 6. Discussion 198 The simple approach to address this issue is to send unSACKed data 199 when the conditions for rules (1), (2) and (3) failed as long as 200 there is available pipe size. A similar approach is also proposed in 201 [I-D.scheffenegger-tcpm-sack-loss-recovery]. However, this approach 202 can cause lots of unnecessary retransmissions where segments are 203 reordered but not lost. 205 The proposed fix in the document allows TCP to retransmit one segment 206 per RTT where all available data TCP has is unSACKed and not sure if 207 it is lost. Since the objective of this algorithm is to avoid 208 retransmission timeout and maintain ack clocking, but not to utilize 209 unused pipe, sending one segment per RTT is enough for this purpose. 210 By sending this one packet, the sender TCP will have a good chance to 211 receive additional ACKs from the receiver, which can trigger another 212 retransmissions in the next RTT. The variable RescueRxt ensures that 213 the retransmission by this algorithm happens only once in a RTT. 214 This logic can drastically suppress amount of unnecessary 215 retransmissions in case of reordering. 217 7. Acknowledgements 219 The authors gratefully acknowledge Richard Scheffenegger who 220 originally identified the issue described in the document and gave 221 insightful comments. The authors also would like to appreciate Mark 222 Allman and Ethan Blanton for their careful reviewing on the initial 223 idea of the logic and their valuable feedbacks. 225 8. Security Considerations 227 This document only propose simple modification in RFC3782. There are 228 no known additional security concerns for this algorithm. 230 9. IANA Considerations 232 This document does not create any new registries or modify the rules 233 for any existing registries managed by IANA. 235 10. References 237 10.1. Normative References 239 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 240 Selective Acknowledgment Options", RFC 2018, October 1996. 242 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 243 Requirement Levels", BCP 14, RFC 2119, March 1997. 245 [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 246 Control", RFC 2581, April 1999. 248 [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A 249 Conservative Selective Acknowledgment (SACK)-based Loss 250 Recovery Algorithm for TCP", RFC 3517, April 2003. 252 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 253 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 254 April 2004. 256 10.2. Informative References 258 [I-D.scheffenegger-tcpm-sack-loss-recovery] 259 Scheffenegger, R., "Improving SACK-based loss recovery for 260 TCP", draft-scheffenegger-tcpm-sack-loss-recovery-00 (work 261 in progress), November 2010. 263 Author's Address 265 Yoshifumi Nishida 266 WIDE Project 267 Endo 5322 268 Fujisawa, Kanagawa 252-8520 269 Japan 271 Email: nishida@wide.ad.jp