idnits 2.17.00 (12 Aug 2021) /tmp/idnits23205/draft-mahesh-persist-timeout-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 419. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 430. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 437. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 443. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 17, 2007) is 5323 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor M. Jethanandani 3 Extensions Cisco Systems 4 Internet-Draft M. Bashyam 5 Intended status: Informational Ocarina Systems, Inc 6 Expires: April 19, 2008 October 17, 2007 8 TCP Robustness in Persist Condition 9 draft-mahesh-persist-timeout-02 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 19, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 This document describes how a connection can remain infinitely in 43 persist condition, and its Denial of Service (DoS) implication on the 44 system, if there is no mechanism to recover from this anomaly. 46 Requirements Language 48 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 49 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 50 document are to be interpreted as described in RFC 2119 [RFC2119]. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Denial of Service Experimentation . . . . . . . . . . . . . . 4 56 3. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 57 4. Role of Application . . . . . . . . . . . . . . . . . . . . . 8 58 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 59 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 60 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 61 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 62 8.1. Normative References . . . . . . . . . . . . . . . . . . . 9 63 8.2. Informative References . . . . . . . . . . . . . . . . . . 9 64 Appendix A. An Appendix . . . . . . . . . . . . . . . . . . . . . 9 65 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 66 Intellectual Property and Copyright Statements . . . . . . . . . . 11 68 1. Introduction 70 RFC 1122 [RFC1122] Section 4.2.2.17, page 92 says that: A TCP MAY 71 keep its offered receive window closed indefinitely. As long as the 72 receiving TCP continues to send acknowledgments in response to the 73 probe segments, the sending TCP MUST allow the connection to stay 74 open. 76 The RFC goes on to say that it is important to remember that ACK 77 (acknowledgement) segments that contain no data are not reliably 78 transmitted by TCP. Therefore zero window probing SHOULD be 79 supported to prevent a connection from hanging forever if ACK 80 segments that re-opens the window is lost. 82 While the RFC is clear why the sender needs to continue to probe the 83 receiver, it is not clear why this process needs to be indefinite, 84 particularly if the receiver continually responds with a ACK and a 85 window of zero. This draft documents a negative consequence of this 86 indefinite attempt by the sender to probe for the receiver's offered 87 window. 89 One negative consequence of this indefinite attempt is that it makes 90 the sender vulnerable to a connection and send buffer exhaustion 91 attack by one or more malicious receivers. This leads to a Denial of 92 Service (DoS) where legitimate connections stop getting established 93 and well behaved already established connections stop making progress 94 in terms of data transmission. 96 Having the sender accumulate buffers and connection table entries 97 when the receiver has deliberately and maliciously closed the window 98 can ultimately lead to resource exhaustion on the sender. This 99 particular dependence on the receiver to open its zero window can be 100 easily exploited by a malicious receiver to launch a DoS attack 101 against the sender. 103 The condition where the sender has at least one buffer in the send 104 queue is referred to as persist condition. In this condition the 105 sender is waiting indefinitely for the receiver to open up its 106 window. 108 Resources that are compromised due to this sender behavior include 109 connections and send buffers, since both of these are finite pools in 110 any server. 112 The problem is applicable to TCP and TCP derived transport protocol 113 like SCTP. 115 We have done some experimention to demonstrate this problem and 116 looked at how many servers on the Internet are susceptible to it. 117 The rest of the draft will detail the experiment, suggest how the 118 problem needs to be addressed, why we believe it is the right 119 solution and what role application can play in solving this problem. 121 For TCP to persist indefinitely makes the end point vulnerable to a 122 DoS attack. We therefore clarify the purpose of zero window as 123 described in RFC 1122 and suggest that TCP end point SHOULD NOT keep 124 a connection in persist condition for an indefinite amount of time. 126 In most implementations, TCP runs in kernel mode as part of the 127 operating system. In this mode the operating system may share the 128 same address space as TCP. For the purposes of discussion, this 129 draft considers TCP protocol implementation to be a separate module 130 responsible for all resources such as buffers and connection control 131 blocks that it borrows from the operating system. The operating 132 system can enforce the maximum number of buffers it is willing to 133 give to TCP but beyond that it lets TCP decide how to manage them. 135 2. Denial of Service Experimentation 137 The effect of the receiver that stops reading data is that the sender 138 continues to send data till the receiver advertised window goes to 139 zero at which time the connection enters persist condition. Since 140 the sender has more buffers with data for the client, it will 141 continue to probe the receiver. If the sender is servicing several 142 such clients the effect compounds itself to the extent that the 143 sender runs out of buffers and/or connection resources. The sender 144 at this point cannot service new legitimate connections and even the 145 existing connections start seeing degraded service. Further, each 146 connection reserves a connection control block, which are of a finite 147 amount. Several connections in persist condition can exhaust the 148 connection control block pool. 150 To demonstrate the problem we wrote a user level program that puts 151 TCP connections on the HTTP server in persist condition. The client 152 can run on any machine and does not require a change in the kernel or 153 the operating system. 155 The client opens a TCP connection to the HTTP server with a 156 advertised MSS of 1460. It then sends a GET request for a large 157 page. The page size is large enough to ensure that the connections 158 send buffer always has more data than receivers maximum advertised 159 window. Once the window has been opened, the client application 160 stops reading data resulting in TCP closing the window and 161 advertising zero window towards the sender. For each request of a 162 multi-megabyte response, the connection can result in the sender 163 holding on to all the requested data minus the receivers advertised 164 window, in its send queue. If the receiver never closes the 165 connection, the server will continue to hold that data indefinitely 166 in its send queue. 168 The same program was then run from each client with it opening one 169 thousand connections towards the HTTP server. This was run from 170 several different machines with the result that now the server was 171 holding onto several thousand connections, each with more than one 172 megabyte worth of data on the send queue. 174 After verifying this behavior in the laboratory against both a Apache 175 and a IIS server, we then proceeded to test HTTP servers on the 176 Internet. To verify this behavior we needed to open only few 177 connections towards the servers. We chose three well known sites, 178 identified here as Site A, Site B and Site C for our test. We then 179 ran a network analyzer on the client machine to monitor the behavior 180 of the connection. These were our observations. 182 Connections to Site A went into ESTABLISHED state and after receiving 183 receivers advertised window worth of data went into persist 184 condition. The connection persisted in this mode for approximately 185 11 minutes and was then RST by the server. 187 Connections to Site B went and stayed in ESTABLISHED state. They 188 stayed in that state as long as the client kept the connection open. 189 The server in this case was Apache version 2.0. The size of the file 190 requested was 12.12M. The client received 200K worth of data and the 191 rest of the data was either queued on the send queue or in 192 application. 194 Connection to Site C went into and stayed in ESTABLISHED state. They 195 too stayed in that state as long as the client kept the connection 196 open, which was as long as five days. The server in this case was a 197 IIS server version 6.0. The size of the requested page was 1.09M (a 198 pdf file). The client had received 200K worth of data and the rest 199 of the data was either queued on the send queue or in application. 201 As can be seen from the experimentation the behavior of TCP varied 202 greatly between different sites. Site A appears to implement a User 203 Time Out (UTO) or application timeout on their connections. That 204 allowed it to clear the connections. However, once it was known what 205 the fixed timeout was, it was easy to modify the client program to 206 open another set of connections after the timeout. We discuss the 207 role of application and the use of UTO in a later section. It was 208 difficult to establish how much data was sitting on the send queue of 209 each one of these public servers as that depends on send socket 210 buffer size and how much data was written by the application. 212 Please note that it is not required for the client to issue a request 213 for a large page or for the server to open its window completely to 214 reproduce the DoS scenario. A page size larger than the advertised 215 window size is enough. We decided to do it with a larger response 216 because it enabled us to reproduce the problem with fewer number of 217 connections and client machines. 219 Persist condition clearly has a more significant impact on servers 220 that deal with a large number of connections (e.g. 200-300K 221 connections), than on end workstations that might deal with a few 222 connections at a time. This is because the server has a finite 223 number of buffers for a larger pool of connections. With dynamic 224 allocation of buffers, each connection is given resources as it needs 225 them. A high water mark set on each connection prevents the number 226 of enqueued buffers exceeding that mark till such time that the 227 number of buffers fall below a low water mark. However, that in 228 itself does not solve the problem as the high water mark is more than 229 the advertised window size. 231 3. Solution 233 The current behavior of the connection in persist condition SHALL 234 continue to exist as the default behavior. The solution proposed 235 will control the amount of time a TCP sender will spend in persist 236 condition waiting for receiver to open its window. Outlined are some 237 of the ways that this can be achieved. Default values are suggested 238 values and the implementor is free to choose their own value. 240 If the administrator of the system decides to use the proposed 241 solution, they will need to enable it explicitly. Optionally, the 242 administrator can configure a minimum and maximum threshold values 243 for connections and buffer resources for the total pool. Default 244 values of 60 and 80% of the total pool for minimum and maximum 245 respectively are assumed. 247 While implementing the solution it is important to make sure that 248 legitimate and well behaved receivers are not penalized for offering 249 zero or reduced window. Hence the solution needs to be robust. It 250 is also important that the solution be adaptive. While resources are 251 plenty, connections are allowed to spend more time in persist 252 condition. However, as resources become scarce the connections are 253 aborted sooner. 255 A fixed timeout value is not a effective solution. Malicious clients 256 can discover the timeout value and can (re)launch an attack after the 257 fixed timeout period. 259 If the solution is enabled, the global persist-condition-expiry -time 260 value will be set to infinity (or a very large value). Thereafter it 261 will adapted based on system resources availability. The persist- 262 condition-expiry-time is bounded above by the default value of 60 263 seconds and a minimum value of five seconds (or minimum persist 264 timeout). The administrator has the option to change the default 265 value. To prevent wild fluctuations in this timeout value, the time 266 will be recomputed only when resources change by at least 1%. If the 267 total pool of resources is less than minimum threshold, the persist- 268 condition-expiry-time value is set to infinity (a very large value). 269 If the resource utilization increases to being between minimum and 270 maximum, then persist-condition-expirty-time is first set to the 271 default value and thereafter decreased additively by two seconds. If 272 resources exceed the maximum, the persist-condition-expiry-time is 273 decreased multiplicatively by a factor or two. If the resource 274 utilization starts to decrease then persist-condition-expirty-time is 275 increased additively by four seconds. If the utilization falls below 276 minimum, the time is set to infinity. 278 The solution focuses on figuring on how to keep track of connections 279 in persist condition. The configured option of persist-condition- 280 expiry-time implies how long the connection will be allowed to stay 281 in persist condition. When the connection enters persist condition, 282 i.e. the receiver advertises a window of zero, the value of current 283 time - now, is saved in the connection entry. This entry is called 284 persist-condition-entry-time. In addition, the sequence number on 285 the connection is stored as persist-condition-sequence-number. 286 Thereafter every time the persist timer expires or when an ACK is 287 received that continues to advertise zero window, a check is done to 288 make sure that the difference between current time and persist- 289 condition-entry-time is not more than persist-condition-expiry-time. 290 If it is then the connection is aborted and the connection resources 291 are reclaimed. 293 The receiver's silly window avoidance mechanism will make sure that 294 the receiver cannot read a small amount of data and fool the sender 295 into taking it out of persist condition. 297 For the solution to be robust, it is also important to determine 298 which connection among the set of connections in persist condition is 299 selected to be terminated. To implement this effectively, we 300 maintain two priority queues of connections in persist condition, one 301 based on the amount of data in the send queue and another based on 302 the persist-condition-entry-time, i.e. when the connection entered 303 persist condition. 305 Whenever a buffer resource is required and the resource utilization 306 is more than the maximum, the connection with the highest amount of 307 data in the send queue is dropped, and its buffers recycled. 308 Whenever a connection resource is required and the connection 309 utilization is higher than the maximum, the connection with the 310 oldest persist-condition-entry-time is selected and dropped. This 311 achieves fairness by penalizing the connection which are consuming 312 the most resources. 314 4. Role of Application 316 Applications are agnostic to why TCP connections are not making 317 progress in terms of data transmission. TCP connections may not be 318 able to transmit data for a variety of reasons. Today TCP does not 319 provide an indication of the progress of the connection explicitly. 320 It is up to the application to conclude based on an examination of 321 the send queue backlog or implement a UTO as defined in RFC 793 322 [RFC0793]. A lot of commonly used applications do not implement the 323 UTO scheme, e.g. World Wide Web (WWW). Even if the application did 324 implement a UTO scheme, all applications running the system need to 325 have implemented the UTO for the solution to be effective. A single 326 application that has not implemented the UTO can cause the entire 327 system to be impacted negatively. 329 There are cases where the system is application agnostic. A classic 330 case of this is a TCP proxy. In that particular case, there is no 331 end application that can be informed of the state of the connection 332 for the application to take action. 334 Resources like TCP buffers are system wide resources and are not tied 335 to any particular application. TCP needs to be able to monitor 336 resource usage system wide when connections are in persist condition. 337 The application does not have the connection's sender state knowledge 338 to implement a robust and adaptive solution such as the one outlined 339 here. 341 Applications can assist TCP's role in solving this problem. They can 342 register for an event notification when the TCP connection enters or 343 exits persist condition. They can use the notification mechanism to 344 implement their own scheme of deciding which persist connections to 345 clear. They can also suggest timeout or retry values to TCP. 347 5. IANA Considerations 349 This document makes no request of IANA. 351 6. Security Considerations 353 This document discusses one security consideration. That is the 354 possible DoS attacks discussed in Section 2. 356 7. Acknowledgements 358 Thanks to Anantha Ramaiah who spent countless hours reviewing, 359 commenting and proposing changes to the draft. Ted Faber helped us 360 in clarifying the objective of this RFC. Thanks also to Fred Baker 361 and Elliot Lear for providing their feedback on the draft. 363 Our thanks to Nanda Bhajana who helped arrange the test setup to be 364 able to reproduce the DoS scenario. 366 8. References 368 8.1. Normative References 370 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 371 RFC 793, September 1981. 373 [RFC1122] Braden, R., "Requirements for Internet Hosts - 374 Communication Layers", STD 3, RFC 1122, October 1989. 376 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 377 Requirement Levels", BCP 14, RFC 2119, March 1997. 379 8.2. Informative References 381 Appendix A. An Appendix 382 Authors' Addresses 384 Mahesh Jethanandani 385 Cisco Systems 386 170 West Tasman Drive 387 San Jose, California 95134 388 USA 390 Phone: +1-408-527-8230 391 Fax: +1-408-527-0147 392 Email: mahesh@cisco.com 393 URI: www.cisco.com 395 Murali Bashyam 396 Ocarina Systems, Inc 397 Fremont, CA 398 USA 400 Phone: 401 Fax: 402 Email: mbashyam@ocarinatech.com 403 URI: 405 Full Copyright Statement 407 Copyright (C) The IETF Trust (2007). 409 This document is subject to the rights, licenses and restrictions 410 contained in BCP 78, and except as set forth therein, the authors 411 retain all their rights. 413 This document and the information contained herein are provided on an 414 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 415 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 416 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 417 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 418 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 419 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 421 Intellectual Property 423 The IETF takes no position regarding the validity or scope of any 424 Intellectual Property Rights or other rights that might be claimed to 425 pertain to the implementation or use of the technology described in 426 this document or the extent to which any license under such rights 427 might or might not be available; nor does it represent that it has 428 made any independent effort to identify any such rights. Information 429 on the procedures with respect to rights in RFC documents can be 430 found in BCP 78 and BCP 79. 432 Copies of IPR disclosures made to the IETF Secretariat and any 433 assurances of licenses to be made available, or the result of an 434 attempt made to obtain a general license or permission for the use of 435 such proprietary rights by implementers or users of this 436 specification can be obtained from the IETF on-line IPR repository at 437 http://www.ietf.org/ipr. 439 The IETF invites any interested party to bring to its attention any 440 copyrights, patents or patent applications, or other proprietary 441 rights that may cover technology that may be required to implement 442 this standard. Please address the information to the IETF at 443 ietf-ipr@ietf.org. 445 Acknowledgment 447 Funding for the RFC Editor function is provided by the IETF 448 Administrative Support Activity (IASA).