idnits 2.17.00 (12 Aug 2021) /tmp/idnits60687/draft-eddy-rfc793bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 27, 2014) is 2976 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: draft-ietf-tcpm-tcp-rfc4614bis has been published as RFC 7414 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force W. Eddy, Ed. 3 Internet-Draft MTI Systems 4 Obsoletes: 793 (if approved) March 27, 2014 5 Intended status: Standards Track 6 Expires: September 28, 2014 8 Transmission Control Protocol Specification 9 draft-eddy-rfc793bis-02 11 Abstract 13 This document specifies the Internet's Transmission Control Protocol 14 (TCP). TCP is an important transport layer protocol in the Internet 15 stack, and has continuously evolved over decades of use and growth of 16 the Internet. Over this time, a number of changes have been made to 17 TCP as it was specified in RFC 793, though these have only been 18 documented in a piecemeal fashion. This document collects and brings 19 those changes together with the protocol specification from RFC 793. 20 This document obsoletes RFC 793 and several other RFCs (TODO: list 21 all actual RFCs when finished). 23 RFC EDITOR NOTE: If approved for publication as an RFC, this should 24 be marked additionally as "STD: 7" and replace RFC 793 in that role. 26 Requirements Language 28 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 29 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 30 document are to be interpreted as described in RFC 2119 [1]. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at http://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on September 28, 2014. 49 Copyright Notice 51 Copyright (c) 2014 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 This document may contain material from IETF Documents or IETF 65 Contributions published or made publicly available before November 66 10, 2008. The person(s) controlling the copyright in some of this 67 material may not have granted the IETF Trust the right to allow 68 modifications of such material outside the IETF Standards Process. 69 Without obtaining an adequate license from the person(s) controlling 70 the copyright in such materials, this document may not be modified 71 outside the IETF Standards Process, and derivative works of it may 72 not be created outside the IETF Standards Process, except to format 73 it for publication as an RFC or to translate it into languages other 74 than English. 76 Table of Contents 78 1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 3 79 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 80 3. Functional Specification . . . . . . . . . . . . . . . . . . 5 81 3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 5 82 3.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 9 83 3.3. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 13 84 3.4. Establishing a connection . . . . . . . . . . . . . . . . 19 85 3.5. Closing a Connection . . . . . . . . . . . . . . . . . . 26 86 3.6. Precedence and Security . . . . . . . . . . . . . . . . . 28 87 3.7. Data Communication . . . . . . . . . . . . . . . . . . . 29 88 3.8. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 33 89 3.8.1. User/TCP Interface . . . . . . . . . . . . . . . . . 33 90 3.8.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 40 91 3.9. Event Processing . . . . . . . . . . . . . . . . . . . . 40 92 3.10. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 63 93 4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 68 94 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 70 95 6. Security and Privacy Considerations . . . . . . . . . . . . . 70 96 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 71 97 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 71 98 8.1. Normative References . . . . . . . . . . . . . . . . . . 71 99 8.2. Informative References . . . . . . . . . . . . . . . . . 71 100 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 72 102 1. Purpose and Scope 104 In 1981, RFC 793 [2] was released, documenting the Transmission 105 Control Protocol (TCP), and replacing earlier specifications for TCP 106 that had been published in the past. 108 Since that time, TCP has been implemented many times, and has been 109 used as a transport protocol for numerous applications on the 110 Internet. 112 For several decades, RFC 793 plus a number of other documents have 113 combined to serve as the specification for TCP [3]. Over time, a 114 number of errata have been identified on RFC 793, as well as 115 deficiencies in security, performance, and other aspects. A number 116 of enhancements has grown and been documented separately. These were 117 never accumulated together into an update to the base specification. 119 The purpose of this document is to bring together all of the IETF 120 Standards Track changes that have been made to the basic TCP 121 functional specification and unify them into an update of the RFC 793 122 protocol specification. Some companion documents are referenced for 123 important algorithms that TCP uses (e.g. for congestion control), but 124 have not been attempted to include in this document. This is a 125 conscious choice, as this base specification can be used with 126 multiple additional algorithms that are developed and incorporated 127 separately, but all TCP implementations need to implement this 128 specification as a common basis in order to interoperate. As some 129 additional TCP features have become quite complicated themselves 130 (e.g. advanced loss recovery and congestion control), future 131 companion documents may attempt to similarly bring these together. 133 In addition to the protocol specification that descibes the TCP 134 segment format, generation, and processing rules that are to be 135 implemented in code, RFC 793 and other updates also contain 136 informative and descriptive text for human readers to understand 137 aspects of the protocol design and operation. This document does not 138 attempt to alter or update this informative text, and is focused only 139 on updating the normative protocol specification. We preserve 140 references to the documentation containing the important explanations 141 and rationale, where appropriate. 143 This document is intended to be useful both in checking existing TCP 144 implementations for conformance, as well as in writing new 145 implementations. 147 2. Introduction 149 RFC 793 contains a discussion of the TCP design goals and provides 150 examples of its operation, including examples of connection 151 establishment, closing connections, and retransmitting packets to 152 repair losses. 154 This document describes the basic functionality expected in modern 155 implementations of TCP, and replaces the protocol specification in 156 RFC 793. It does not replicate or attempt to update the examples and 157 other discussion in RFC 793. Other documents are referenced to 158 provide explanation of the theory of operation, rationale, and 159 detailed discussion of design decisions. This document only focuses 160 on the normative behavior of the protocol. 162 TEMPORARY EDITOR'S NOTE: This is an early revision in the process of 163 updating RFC 793. Many planned changes are not yet incorporated. 165 ***Please do not use this revision as a basis for any work or 166 reference.*** 168 TODO: describe the subsequent structure of the document to-be (e.g. 169 will it follow the newtcp BSD implementation?), and mention that a 170 list of changes from RFC 793 will be kept in the final section 172 TEMPORARY EDITOR'S NOTE: the current revision of this document does 173 not yet collect all of the changes that will be in the final version. 174 The set of content changes planned for future revisions is roughly: 176 -00 was a proposal for the scope of the document and description 177 of the need for an update to RFC 793 179 -01 incorporated the RFC 793 section 3 content with no additional 180 changes into XML2RFC format for easy tracking of the changes 181 between RFC 793 and future revisions of the document 183 -02 incorporates the verified errata on RFC 793 as of March 20, 184 2014 186 -03 and beyond are intended to incorporate changes from other RFCs 187 that updated 793 189 3. Functional Specification 191 3.1. Header Format 193 TCP segments are sent as internet datagrams. The Internet Protocol 194 header carries several information fields, including the source and 195 destination host addresses [2]. A TCP header follows the internet 196 header, supplying information specific to the TCP protocol. This 197 division allows for the existence of host level protocols other than 198 TCP. 200 TCP Header Format 202 0 1 2 3 203 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 205 | Source Port | Destination Port | 206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 207 | Sequence Number | 208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 209 | Acknowledgment Number | 210 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 211 | Data | |U|A|P|R|S|F| | 212 | Offset| Reserved |R|C|S|S|Y|I| Window | 213 | | |G|K|H|T|N|N| | 214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 215 | Checksum | Urgent Pointer | 216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 217 | Options | Padding | 218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 219 | data | 220 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 222 TCP Header Format 224 Note that one tick mark represents one bit position. 226 Figure 1 228 Source Port: 16 bits 230 The source port number. 232 Destination Port: 16 bits 234 The destination port number. 236 Sequence Number: 32 bits 237 The sequence number of the first data octet in this segment (except 238 when SYN is present). If SYN is present the sequence number is the 239 initial sequence number (ISN) and the first data octet is ISN+1. 241 Acknowledgment Number: 32 bits 243 If the ACK control bit is set this field contains the value of the 244 next sequence number the sender of the segment is expecting to 245 receive. Once a connection is established this is always sent. 247 Data Offset: 4 bits 249 The number of 32 bit words in the TCP Header. This indicates where 250 the data begins. The TCP header (even one including options) is an 251 integral number of 32 bits long. 253 Reserved: 6 bits 255 Reserved for future use. Must be zero. 257 Control Bits: 6 bits (from left to right): 259 URG: Urgent Pointer field significant 260 ACK: Acknowledgment field significant 261 PSH: Push Function 262 RST: Reset the connection 263 SYN: Synchronize sequence numbers 264 FIN: No more data from sender 266 Window: 16 bits 268 The number of data octets beginning with the one indicated in the 269 acknowledgment field which the sender of this segment is willing to 270 accept. 272 Checksum: 16 bits 274 The checksum field is the 16 bit one's complement of the one's 275 complement sum of all 16 bit words in the header and text. If a 276 segment contains an odd number of header and text octets to be 277 checksummed, the last octet is padded on the right with zeros to 278 form a 16 bit word for checksum purposes. The pad is not 279 transmitted as part of the segment. While computing the checksum, 280 the checksum field itself is replaced with zeros. 282 The checksum also covers a 96 bit pseudo header conceptually 283 prefixed to the TCP header. This pseudo header contains the Source 284 Address, the Destination Address, the Protocol, and TCP length. 286 This gives the TCP protection against misrouted segments. This 287 information is carried in the Internet Protocol and is transferred 288 across the TCP/Network interface in the arguments or results of 289 calls by the TCP on the IP. 291 +--------+--------+--------+--------+ 292 | Source Address | 293 +--------+--------+--------+--------+ 294 | Destination Address | 295 +--------+--------+--------+--------+ 296 | zero | PTCL | TCP Length | 297 +--------+--------+--------+--------+ 299 The TCP Length is the TCP header length plus the data length in 300 octets (this is not an explicitly transmitted quantity, but is 301 computed), and it does not count the 12 octets of the pseudo 302 header. 304 Urgent Pointer: 16 bits 306 This field communicates the current value of the urgent pointer as 307 a positive offset from the sequence number in this segment. The 308 urgent pointer points to the sequence number of the last octet in a 309 sequence of urgent data. This field is only be interpreted in 310 segments with the URG control bit set. EDITOR'S NOTE: TODO need to 311 incorporate RFC 6093 here. 313 Options: variable 315 Options may occupy space at the end of the TCP header and are a 316 multiple of 8 bits in length. All options are included in the 317 checksum. An option may begin on any octet boundary. There are 318 two cases for the format of an option: 320 Case 1: A single octet of option-kind. 322 Case 2: An octet of option-kind, an octet of option-length, and 323 the actual option-data octets. 325 The option-length counts the two octets of option-kind and option- 326 length as well as the option-data octets. 328 Note that the list of options may be shorter than the data offset 329 field might imply. The content of the header beyond the End-of- 330 Option option must be header padding (i.e., zero). 332 A TCP must implement all options. 334 Currently defined options include (kind indicated in octal): 336 Kind Length Meaning 337 ---- ------ ------- 338 0 - End of option list. 339 1 - No-Operation. 340 2 4 Maximum Segment Size. 342 Specific Option Definitions 344 End of Option List 346 +--------+ 347 |00000000| 348 +--------+ 349 Kind=0 351 This option code indicates the end of the option list. This 352 might not coincide with the end of the TCP header according to 353 the Data Offset field. This is used at the end of all options, 354 not the end of each option, and need only be used if the end of 355 the options would not otherwise coincide with the end of the TCP 356 header. 358 No-Operation 360 +--------+ 361 |00000001| 362 +--------+ 363 Kind=1 365 This option code may be used between options, for example, to 366 align the beginning of a subsequent option on a word boundary. 367 There is no guarantee that senders will use this option, so 368 receivers must be prepared to process options even if they do 369 not begin on a word boundary. 371 Maximum Segment Size 373 +--------+--------+---------+--------+ 374 |00000010|00000100| max seg size | 375 +--------+--------+---------+--------+ 376 Kind=2 Length=4 378 Maximum Segment Size Option Data: 16 bits 379 If this option is present, then it communicates the maximum 380 receive segment size at the TCP which sends this segment. This 381 field may be sent in the initial connection request (i.e., in 382 segments with the SYN control bit set) and must not be sent in 383 other segments. If this option is not used, any segment size is 384 allowed. 386 Padding: variable 388 The TCP header padding is used to ensure that the TCP header ends 389 and data begins on a 32 bit boundary. The padding is composed of 390 zeros. 392 3.2. Terminology 394 Before we can discuss very much about the operation of the TCP we 395 need to introduce some detailed terminology. The maintenance of a 396 TCP connection requires the remembering of several variables. We 397 conceive of these variables being stored in a connection record 398 called a Transmission Control Block or TCB. Among the variables 399 stored in the TCB are the local and remote socket numbers, the 400 security and precedence of the connection, pointers to the user's 401 send and receive buffers, pointers to the retransmit queue and to the 402 current segment. In addition several variables relating to the send 403 and receive sequence numbers are stored in the TCB. 405 Send Sequence Variables 407 SND.UNA - send unacknowledged 408 SND.NXT - send next 409 SND.WND - send window 410 SND.UP - send urgent pointer 411 SND.WL1 - segment sequence number used for last window update 412 SND.WL2 - segment acknowledgment number used for last window 413 update 414 ISS - initial send sequence number 416 Receive Sequence Variables 418 RCV.NXT - receive next 419 RCV.WND - receive window 420 RCV.UP - receive urgent pointer 421 IRS - initial receive sequence number 423 The following diagrams may help to relate some of these variables to 424 the sequence space. 426 Send Sequence Space 428 1 2 3 4 429 ----------|----------|----------|---------- 430 SND.UNA SND.NXT SND.UNA 431 +SND.WND 433 1 - old sequence numbers which have been acknowledged 434 2 - sequence numbers of unacknowledged data 435 3 - sequence numbers allowed for new data transmission 436 4 - future sequence numbers which are not yet allowed 438 Send Sequence Space 440 Figure 2 442 The send window is the portion of the sequence space labeled 3 in 443 Figure 2. 445 Receive Sequence Space 447 1 2 3 448 ----------|----------|---------- 449 RCV.NXT RCV.NXT 450 +RCV.WND 452 1 - old sequence numbers which have been acknowledged 453 2 - sequence numbers allowed for new reception 454 3 - future sequence numbers which are not yet allowed 456 Receive Sequence Space 458 Figure 3 460 The receive window is the portion of the sequence space labeled 2 in 461 Figure 3. 463 There are also some variables used frequently in the discussion that 464 take their values from the fields of the current segment. 466 Current Segment Variables 468 SEG.SEQ - segment sequence number 469 SEG.ACK - segment acknowledgment number 470 SEG.LEN - segment length 471 SEG.WND - segment window 472 SEG.UP - segment urgent pointer 473 SEG.PRC - segment precedence value 475 A connection progresses through a series of states during its 476 lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED, 477 ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, 478 TIME-WAIT, and the fictional state CLOSED. CLOSED is fictional 479 because it represents the state when there is no TCB, and therefore, 480 no connection. Briefly the meanings of the states are: 482 LISTEN - represents waiting for a connection request from any 483 remote TCP and port. 485 SYN-SENT - represents waiting for a matching connection request 486 after having sent a connection request. 488 SYN-RECEIVED - represents waiting for a confirming connection 489 request acknowledgment after having both received and sent a 490 connection request. 492 ESTABLISHED - represents an open connection, data received can be 493 delivered to the user. The normal state for the data transfer 494 phase of the connection. 496 FIN-WAIT-1 - represents waiting for a connection termination 497 request from the remote TCP, or an acknowledgment of the 498 connection termination request previously sent. 500 FIN-WAIT-2 - represents waiting for a connection termination 501 request from the remote TCP. 503 CLOSE-WAIT - represents waiting for a connection termination 504 request from the local user. 506 CLOSING - represents waiting for a connection termination request 507 acknowledgment from the remote TCP. 509 LAST-ACK - represents waiting for an acknowledgment of the 510 connection termination request previously sent to the remote TCP 511 (this termination request sent to the remote TCP already included 512 an acknowledgment of the termination request sent from the remote 513 TCP). 515 TIME-WAIT - represents waiting for enough time to pass to be sure 516 the remote TCP received the acknowledgment of its connection 517 termination request. 519 CLOSED - represents no connection state at all. 521 A TCP connection progresses from one state to another in response to 522 events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE, 523 ABORT, and STATUS; the incoming segments, particularly those 524 containing the SYN, ACK, RST and FIN flags; and timeouts. 526 The state diagram in Figure 4 illustrates only state changes, 527 together with the causing events and resulting actions, but addresses 528 neither error conditions nor actions which are not connected with 529 state changes. In a later section, more detail is offered with 530 respect to the reaction of the TCP to events. 532 NOTA BENE: this diagram is only a summary and must not be taken as 533 the total specification. 535 +---------+ ---------\ active OPEN 536 | CLOSED | \ ----------- 537 +---------+<---------\ \ create TCB 538 | ^ \ \ snd SYN 539 passive OPEN | | CLOSE \ \ 540 ------------ | | ---------- \ \ 541 create TCB | | delete TCB \ \ 542 V | \ \ 543 rcv RST (note 1) +---------+ CLOSE | \ 544 -------------------->| LISTEN | ---------- | | 545 / +---------+ delete TCB | | 546 / rcv SYN | | SEND | | 547 / ----------- | | ------- | V 548 +---------+ snd SYN,ACK / \ snd SYN +---------+ 549 | |<----------------- ------------------>| | 550 | SYN | rcv SYN | SYN | 551 | RCVD |<-----------------------------------------------| SENT | 552 | | snd SYN,ACK | | 553 | |------------------ -------------------| | 554 +---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+ 555 | -------------- | | ----------- 556 | x | | snd ACK 557 | V V 558 | CLOSE +---------+ 559 | ------- | ESTAB | 560 | snd FIN +---------+ 561 | CLOSE | | rcv FIN 562 V ------- | | ------- 563 +---------+ snd FIN / \ snd ACK +---------+ 564 | FIN |<----------------- ------------------>| CLOSE | 565 | WAIT-1 |------------------ | WAIT | 566 +---------+ rcv FIN \ +---------+ 567 | rcv ACK of FIN ------- | CLOSE | 568 | -------------- snd ACK | ------- | 569 V x V snd FIN V 570 +---------+ +---------+ +---------+ 571 |FINWAIT-2| | CLOSING | | LAST-ACK| 572 +---------+ +---------+ +---------+ 573 | rcv ACK of FIN | rcv ACK of FIN | 574 | rcv FIN -------------- | Timeout=2MSL -------------- | 575 | ------- x V ------------ x V 576 \ snd ACK +---------+delete TCB +---------+ 577 ------------------------>|TIME WAIT|------------------>| CLOSED | 578 +---------+ +---------+ 580 note 1: The transition from SYN-RCVD to LISTEN on receiving a RST is 581 conditional on having reached SYN-RCVD after a passive open. 583 note 2: An unshown transition exists from FIN-WAIT-1 to TIME-WAIT if 584 a FIN is received and the local FIN is also acknowledged. 586 TCP Connection State Diagram 588 Figure 4 590 3.3. Sequence Numbers 592 A fundamental notion in the design is that every octet of data sent 593 over a TCP connection has a sequence number. Since every octet is 594 sequenced, each of them can be acknowledged. The acknowledgment 595 mechanism employed is cumulative so that an acknowledgment of 596 sequence number X indicates that all octets up to but not including X 597 have been received. This mechanism allows for straight-forward 598 duplicate detection in the presence of retransmission. Numbering of 599 octets within a segment is that the first data octet immediately 600 following the header is the lowest numbered, and the following octets 601 are numbered consecutively. 603 It is essential to remember that the actual sequence number space is 604 finite, though very large. This space ranges from 0 to 2**32 - 1. 605 Since the space is finite, all arithmetic dealing with sequence 606 numbers must be performed modulo 2**32. This unsigned arithmetic 607 preserves the relationship of sequence numbers as they cycle from 608 2**32 - 1 to 0 again. There are some subtleties to computer modulo 609 arithmetic, so great care should be taken in programming the 610 comparison of such values. The symbol "=<" means "less than or 611 equal" (modulo 2**32). 613 The typical kinds of sequence number comparisons which the TCP must 614 perform include: 616 (a) Determining that an acknowledgment refers to some sequence 617 number sent but not yet acknowledged. 619 (b) Determining that all sequence numbers occupied by a segment 620 have been acknowledged (e.g., to remove the segment from a 621 retransmission queue). 623 (c) Determining that an incoming segment contains sequence numbers 624 which are expected (i.e., that the segment "overlaps" the receive 625 window). 627 In response to sending data the TCP will receive acknowledgments. 628 The following comparisons are needed to process the acknowledgments. 630 SND.UNA = oldest unacknowledged sequence number 632 SND.NXT = next sequence number to be sent 634 SEG.ACK = acknowledgment from the receiving TCP (next sequence 635 number expected by the receiving TCP) 637 SEG.SEQ = first sequence number of a segment 639 SEG.LEN = the number of octets occupied by the data in the segment 640 (counting SYN and FIN) 642 SEG.SEQ+SEG.LEN-1 = last sequence number of a segment 644 A new acknowledgment (called an "acceptable ack"), is one for which 645 the inequality below holds: 647 SND.UNA < SEG.ACK =< SND.NXT 649 A segment on the retransmission queue is fully acknowledged if the 650 sum of its sequence number and length is less or equal than the 651 acknowledgment value in the incoming segment. 653 When data is received the following comparisons are needed: 655 RCV.NXT = next sequence number expected on an incoming segments, 656 and is the left or lower edge of the receive window 658 RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming 659 segment, and is the right or upper edge of the receive window 661 SEG.SEQ = first sequence number occupied by the incoming segment 663 SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming 664 segment 666 A segment is judged to occupy a portion of valid receive sequence 667 space if 669 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 671 or 673 RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 675 The first part of this test checks to see if the beginning of the 676 segment falls in the window, the second part of the test checks to 677 see if the end of the segment falls in the window; if the segment 678 passes either part of the test it contains data in the window. 680 Actually, it is a little more complicated than this. Due to zero 681 windows and zero length segments, we have four cases for the 682 acceptability of an incoming segment: 684 Segment Receive Test 685 Length Window 686 ------- ------- ------------------------------------------- 688 0 0 SEG.SEQ = RCV.NXT 690 0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 692 >0 0 not acceptable 694 >0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 695 or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 697 Note that when the receive window is zero no segments should be 698 acceptable except ACK segments. Thus, it is be possible for a TCP to 699 maintain a zero receive window while transmitting data and receiving 700 ACKs. However, even when the receive window is zero, a TCP must 701 process the RST and URG fields of all incoming segments. 703 We have taken advantage of the numbering scheme to protect certain 704 control information as well. This is achieved by implicitly 705 including some control flags in the sequence space so they can be 706 retransmitted and acknowledged without confusion (i.e., one and only 707 one copy of the control will be acted upon). Control information is 708 not physically carried in the segment data space. Consequently, we 709 must adopt rules for implicitly assigning sequence numbers to 710 control. The SYN and FIN are the only controls requiring this 711 protection, and these controls are used only at connection opening 712 and closing. For sequence number purposes, the SYN is considered to 713 occur before the first actual data octet of the segment in which it 714 occurs, while the FIN is considered to occur after the last actual 715 data octet in a segment in which it occurs. The segment length 716 (SEG.LEN) includes both data and sequence space occupying controls. 717 When a SYN is present then SEG.SEQ is the sequence number of the SYN. 719 Initial Sequence Number Selection 721 The protocol places no restriction on a particular connection being 722 used over and over again. A connection is defined by a pair of 723 sockets. New instances of a connection will be referred to as 724 incarnations of the connection. The problem that arises from this is 725 -- "how does the TCP identify duplicate segments from previous 726 incarnations of the connection?" This problem becomes apparent if 727 the connection is being opened and closed in quick succession, or if 728 the connection breaks with loss of memory and is then reestablished. 730 To avoid confusion we must prevent segments from one incarnation of a 731 connection from being used while the same sequence numbers may still 732 be present in the network from an earlier incarnation. We want to 733 assure this, even if a TCP crashes and loses all knowledge of the 734 sequence numbers it has been using. When new connections are 735 created, an initial sequence number (ISN) generator is employed which 736 selects a new 32 bit ISN. The generator is bound to a (possibly 737 fictitious) 32 bit clock whose low order bit is incremented roughly 738 every 4 microseconds. Thus, the ISN cycles approximately every 4.55 739 hours. Since we assume that segments will stay in the network no 740 more than the Maximum Segment Lifetime (MSL) and that the MSL is less 741 than 4.55 hours we can reasonably assume that ISN's will be unique. 743 For each connection there is a send sequence number and a receive 744 sequence number. The initial send sequence number (ISS) is chosen by 745 the data sending TCP, and the initial receive sequence number (IRS) 746 is learned during the connection establishing procedure. 748 For a connection to be established or initialized, the two TCPs must 749 synchronize on each other's initial sequence numbers. This is done 750 in an exchange of connection establishing segments carrying a control 751 bit called "SYN" (for synchronize) and the initial sequence numbers. 752 As a shorthand, segments carrying the SYN bit are also called "SYNs". 753 Hence, the solution requires a suitable mechanism for picking an 754 initial sequence number and a slightly involved handshake to exchange 755 the ISN's. 757 The synchronization requires each side to send it's own initial 758 sequence number and to receive a confirmation of it in acknowledgment 759 from the other side. Each side must also receive the other side's 760 initial sequence number and send a confirming acknowledgment. 762 1) A --> B SYN my sequence number is X 763 2) A <-- B ACK your sequence number is X 764 3) A <-- B SYN my sequence number is Y 765 4) A --> B ACK your sequence number is Y 767 Because steps 2 and 3 can be combined in a single message this is 768 called the three way (or three message) handshake. 770 A three way handshake is necessary because sequence numbers are not 771 tied to a global clock in the network, and TCPs may have different 772 mechanisms for picking the ISN's. The receiver of the first SYN has 773 no way of knowing whether the segment was an old delayed one or not, 774 unless it remembers the last sequence number used on the connection 775 (which is not always possible), and so it must ask the sender to 776 verify this SYN. The three way handshake and the advantages of a 777 clock-driven scheme are discussed in [3]. 779 Knowing When to Keep Quiet 781 To be sure that a TCP does not create a segment that carries a 782 sequence number which may be duplicated by an old segment remaining 783 in the network, the TCP must keep quiet for a maximum segment 784 lifetime (MSL) before assigning any sequence numbers upon starting up 785 or recovering from a crash in which memory of sequence numbers in use 786 was lost. For this specification the MSL is taken to be 2 minutes. 787 This is an engineering choice, and may be changed if experience 788 indicates it is desirable to do so. Note that if a TCP is 789 reinitialized in some sense, yet retains its memory of sequence 790 numbers in use, then it need not wait at all; it must only be sure to 791 use sequence numbers larger than those recently used. 793 The TCP Quiet Time Concept 795 This specification provides that hosts which "crash" without 796 retaining any knowledge of the last sequence numbers transmitted on 797 each active (i.e., not closed) connection shall delay emitting any 798 TCP segments for at least the agreed Maximum Segment Lifetime (MSL) 799 in the internet system of which the host is a part. In the 800 paragraphs below, an explanation for this specification is given. 801 TCP implementors may violate the "quiet time" restriction, but only 802 at the risk of causing some old data to be accepted as new or new 803 data rejected as old duplicated by some receivers in the internet 804 system. 806 TCPs consume sequence number space each time a segment is formed and 807 entered into the network output queue at a source host. The 808 duplicate detection and sequencing algorithm in the TCP protocol 809 relies on the unique binding of segment data to sequence space to the 810 extent that sequence numbers will not cycle through all 2**32 values 811 before the segment data bound to those sequence numbers has been 812 delivered and acknowledged by the receiver and all duplicate copies 813 of the segments have "drained" from the internet. Without such an 814 assumption, two distinct TCP segments could conceivably be assigned 815 the same or overlapping sequence numbers, causing confusion at the 816 receiver as to which data is new and which is old. Remember that 817 each segment is bound to as many consecutive sequence numbers as 818 there are octets of data and SYN or FIN flags in the segment. 820 Under normal conditions, TCPs keep track of the next sequence number 821 to emit and the oldest awaiting acknowledgment so as to avoid 822 mistakenly using a sequence number over before its first use has been 823 acknowledged. This alone does not guarantee that old duplicate data 824 is drained from the net, so the sequence space has been made very 825 large to reduce the probability that a wandering duplicate will cause 826 trouble upon arrival. At 2 megabits/sec. it takes 4.5 hours to use 827 up 2**32 octets of sequence space. Since the maximum segment 828 lifetime in the net is not likely to exceed a few tens of seconds, 829 this is deemed ample protection for foreseeable nets, even if data 830 rates escalate to l0's of megabits/sec. At 100 megabits/sec, the 831 cycle time is 5.4 minutes which may be a little short, but still 832 within reason. 834 The basic duplicate detection and sequencing algorithm in TCP can be 835 defeated, however, if a source TCP does not have any memory of the 836 sequence numbers it last used on a given connection. For example, if 837 the TCP were to start all connections with sequence number 0, then 838 upon crashing and restarting, a TCP might re-form an earlier 839 connection (possibly after half-open connection resolution) and emit 840 packets with sequence numbers identical to or overlapping with 841 packets still in the network which were emitted on an earlier 842 incarnation of the same connection. In the absence of knowledge 843 about the sequence numbers used on a particular connection, the TCP 844 specification recommends that the source delay for MSL seconds before 845 emitting segments on the connection, to allow time for segments from 846 the earlier connection incarnation to drain from the system. 848 Even hosts which can remember the time of day and used it to select 849 initial sequence number values are not immune from this problem 850 (i.e., even if time of day is used to select an initial sequence 851 number for each new connection incarnation). 853 Suppose, for example, that a connection is opened starting with 854 sequence number S. Suppose that this connection is not used much and 855 that eventually the initial sequence number function (ISN(t)) takes 856 on a value equal to the sequence number, say S1, of the last segment 857 sent by this TCP on a particular connection. Now suppose, at this 858 instant, the host crashes, recovers, and establishes a new 859 incarnation of the connection. The initial sequence number chosen is 860 S1 = ISN(t) -- last used sequence number on old incarnation of 861 connection! If the recovery occurs quickly enough, any old 862 duplicates in the net bearing sequence numbers in the neighborhood of 863 S1 may arrive and be treated as new packets by the receiver of the 864 new incarnation of the connection. 866 The problem is that the recovering host may not know for how long it 867 crashed nor does it know whether there are still old duplicates in 868 the system from earlier connection incarnations. 870 One way to deal with this problem is to deliberately delay emitting 871 segments for one MSL after recovery from a crash- this is the "quiet 872 time" specification. Hosts which prefer to avoid waiting are willing 873 to risk possible confusion of old and new packets at a given 874 destination may choose not to wait for the "quite time". 875 Implementors may provide TCP users with the ability to select on a 876 connection by connection basis whether to wait after a crash, or may 877 informally implement the "quite time" for all connections. 878 Obviously, even where a user selects to "wait," this is not necessary 879 after the host has been "up" for at least MSL seconds. 881 To summarize: every segment emitted occupies one or more sequence 882 numbers in the sequence space, the numbers occupied by a segment are 883 "busy" or "in use" until MSL seconds have passed, upon crashing a 884 block of space-time is occupied by the octets and SYN or FIN flags of 885 the last emitted segment, if a new connection is started too soon and 886 uses any of the sequence numbers in the space-time footprint of the 887 last segment of the previous connection incarnation, there is a 888 potential sequence number overlap area which could cause confusion at 889 the receiver. 891 3.4. Establishing a connection 893 The "three-way handshake" is the procedure used to establish a 894 connection. This procedure normally is initiated by one TCP and 895 responded to by another TCP. The procedure also works if two TCP 896 simultaneously initiate the procedure. When simultaneous attempt 897 occurs, each TCP receives a "SYN" segment which carries no 898 acknowledgment after it has sent a "SYN". Of course, the arrival of 899 an old duplicate "SYN" segment can potentially make it appear, to the 900 recipient, that a simultaneous connection initiation is in progress. 901 Proper use of "reset" segments can disambiguate these cases. 903 Several examples of connection initiation follow. Although these 904 examples do not show connection synchronization using data-carrying 905 segments, this is perfectly legitimate, so long as the receiving TCP 906 doesn't deliver the data to the user until it is clear the data is 907 valid (i.e., the data must be buffered at the receiver until the 908 connection reaches the ESTABLISHED state). The three-way handshake 909 reduces the possibility of false connections. It is the 910 implementation of a trade-off between memory and messages to provide 911 information for this checking. 913 The simplest three-way handshake is shown in Figure 5 below. The 914 figures should be interpreted in the following way. Each line is 915 numbered for reference purposes. Right arrows (-->) indicate 916 departure of a TCP segment from TCP A to TCP B, or arrival of a 917 segment at B from A. Left arrows (<--), indicate the reverse. 918 Ellipsis (...) indicates a segment which is still in the network 919 (delayed). An "XXX" indicates a segment which is lost or rejected. 920 Comments appear in parentheses. TCP states represent the state AFTER 921 the departure or arrival of the segment (whose contents are shown in 922 the center of each line). Segment contents are shown in abbreviated 923 form, with sequence number, control flags, and ACK field. Other 924 fields such as window, addresses, lengths, and text have been left 925 out in the interest of clarity. 927 TCP A TCP B 929 1. CLOSED LISTEN 931 2. SYN-SENT --> --> SYN-RECEIVED 933 3. ESTABLISHED <-- <-- SYN-RECEIVED 935 4. ESTABLISHED --> --> ESTABLISHED 937 5. ESTABLISHED --> --> ESTABLISHED 939 Basic 3-Way Handshake for Connection Synchronization 941 Figure 5 943 In line 2 of Figure 5, TCP A begins by sending a SYN segment 944 indicating that it will use sequence numbers starting with sequence 945 number 100. In line 3, TCP B sends a SYN and acknowledges the SYN it 946 received from TCP A. Note that the acknowledgment field indicates 947 TCP B is now expecting to hear sequence 101, acknowledging the SYN 948 which occupied sequence 100. 950 At line 4, TCP A responds with an empty segment containing an ACK for 951 TCP B's SYN; and in line 5, TCP A sends some data. Note that the 952 sequence number of the segment in line 5 is the same as in line 4 953 because the ACK does not occupy sequence number space (if it did, we 954 would wind up ACKing ACK's!). 956 Simultaneous initiation is only slightly more complex, as is shown in 957 Figure 6. Each TCP cycles from CLOSED to SYN-SENT to SYN-RECEIVED to 958 ESTABLISHED. 960 TCP A TCP B 962 1. CLOSED CLOSED 964 2. SYN-SENT --> ... 966 3. SYN-RECEIVED <-- <-- SYN-SENT 968 4. ... --> SYN-RECEIVED 970 5. SYN-RECEIVED --> ... 972 6. ESTABLISHED <-- <-- SYN-RECEIVED 974 7. ... --> ESTABLISHED 976 Simultaneous Connection Synchronization 978 Figure 6 980 The principle reason for the three-way handshake is to prevent old 981 duplicate connection initiations from causing confusion. To deal 982 with this, a special control message, reset, has been devised. If 983 the receiving TCP is in a non-synchronized state (i.e., SYN-SENT, 984 SYN-RECEIVED), it returns to LISTEN on receiving an acceptable reset. 985 If the TCP is in one of the synchronized states (ESTABLISHED, FIN- 986 WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), it 987 aborts the connection and informs its user. We discuss this latter 988 case under "half-open" connections below. 990 TCP A TCP B 992 1. CLOSED LISTEN 994 2. SYN-SENT --> ... 996 3. (duplicate) ... --> SYN-RECEIVED 998 4. SYN-SENT <-- <-- SYN-RECEIVED 1000 5. SYN-SENT --> --> LISTEN 1002 6. ... --> SYN-RECEIVED 1004 7. SYN-SENT <-- <-- SYN-RECEIVED 1006 8. ESTABLISHED --> --> ESTABLISHED 1008 Recovery from Old Duplicate SYN 1010 Figure 7 1012 As a simple example of recovery from old duplicates, consider 1013 Figure 7. At line 3, an old duplicate SYN arrives at TCP B. TCP B 1014 cannot tell that this is an old duplicate, so it responds normally 1015 (line 4). TCP A detects that the ACK field is incorrect and returns 1016 a RST (reset) with its SEQ field selected to make the segment 1017 believable. TCP B, on receiving the RST, returns to the LISTEN 1018 state. When the original SYN (pun intended) finally arrives at line 1019 6, the synchronization proceeds normally. If the SYN at line 6 had 1020 arrived before the RST, a more complex exchange might have occurred 1021 with RST's sent in both directions. 1023 Half-Open Connections and Other Anomalies 1025 An established connection is said to be "half-open" if one of the 1026 TCPs has closed or aborted the connection at its end without the 1027 knowledge of the other, or if the two ends of the connection have 1028 become desynchronized owing to a crash that resulted in loss of 1029 memory. Such connections will automatically become reset if an 1030 attempt is made to send data in either direction. However, half-open 1031 connections are expected to be unusual, and the recovery procedure is 1032 mildly involved. 1034 If at site A the connection no longer exists, then an attempt by the 1035 user at site B to send any data on it will result in the site B TCP 1036 receiving a reset control message. Such a message indicates to the 1037 site B TCP that something is wrong, and it is expected to abort the 1038 connection. 1040 Assume that two user processes A and B are communicating with one 1041 another when a crash occurs causing loss of memory to A's TCP. 1042 Depending on the operating system supporting A's TCP, it is likely 1043 that some error recovery mechanism exists. When the TCP is up again, 1044 A is likely to start again from the beginning or from a recovery 1045 point. As a result, A will probably try to OPEN the connection again 1046 or try to SEND on the connection it believes open. In the latter 1047 case, it receives the error message "connection not open" from the 1048 local (A's) TCP. In an attempt to establish the connection, A's TCP 1049 will send a segment containing SYN. This scenario leads to the 1050 example shown in Figure 8. After TCP A crashes, the user attempts to 1051 re-open the connection. TCP B, in the meantime, thinks the 1052 connection is open. 1054 TCP A TCP B 1056 1. (CRASH) (send 300,receive 100) 1058 2. CLOSED ESTABLISHED 1060 3. SYN-SENT --> --> (??) 1062 4. (!!) <-- <-- ESTABLISHED 1064 5. SYN-SENT --> --> (Abort!!) 1066 6. SYN-SENT CLOSED 1068 7. SYN-SENT --> --> 1070 Half-Open Connection Discovery 1072 Figure 8 1074 When the SYN arrives at line 3, TCP B, being in a synchronized state, 1075 and the incoming segment outside the window, responds with an 1076 acknowledgment indicating what sequence it next expects to hear (ACK 1077 100). TCP A sees that this segment does not acknowledge anything it 1078 sent and, being unsynchronized, sends a reset (RST) because it has 1079 detected a half-open connection. TCP B aborts at line 5. TCP A will 1080 continue to try to establish the connection; the problem is now 1081 reduced to the basic 3-way handshake of Figure 5. 1083 An interesting alternative case occurs when TCP A crashes and TCP B 1084 tries to send data on what it thinks is a synchronized connection. 1086 This is illustrated in Figure 9. In this case, the data arriving at 1087 TCP A from TCP B (line 2) is unacceptable because no such connection 1088 exists, so TCP A sends a RST. The RST is acceptable so TCP B 1089 processes it and aborts the connection. 1091 TCP A TCP B 1093 1. (CRASH) (send 300,receive 100) 1095 2. (??) <-- <-- ESTABLISHED 1097 3. --> --> (ABORT!!) 1099 Active Side Causes Half-Open Connection Discovery 1101 Figure 9 1103 In Figure 10, we find the two TCPs A and B with passive connections 1104 waiting for SYN. An old duplicate arriving at TCP B (line 2) stirs B 1105 into action. A SYN-ACK is returned (line 3) and causes TCP A to 1106 generate a RST (the ACK in line 3 is not acceptable). TCP B accepts 1107 the reset and returns to its passive LISTEN state. 1109 TCP A TCP B 1111 1. LISTEN LISTEN 1113 2. ... --> SYN-RECEIVED 1115 3. (??) <-- <-- SYN-RECEIVED 1117 4. --> --> (return to LISTEN!) 1119 5. LISTEN LISTEN 1121 Old Duplicate SYN Initiates a Reset on two Passive Sockets 1123 Figure 10 1125 A variety of other cases are possible, all of which are accounted for 1126 by the following rules for RST generation and processing. 1128 Reset Generation 1129 As a general rule, reset (RST) must be sent whenever a segment 1130 arrives which apparently is not intended for the current connection. 1131 A reset must not be sent if it is not clear that this is the case. 1133 There are three groups of states: 1135 1. If the connection does not exist (CLOSED) then a reset is sent 1136 in response to any incoming segment except another reset. In 1137 particular, SYNs addressed to a non-existent connection are 1138 rejected by this means. 1140 If the incoming segment has the ACK bit set, the reset takes its 1141 sequence number from the ACK field of the segment, otherwise the 1142 reset has sequence number zero and the ACK field is set to the sum 1143 of the sequence number and segment length of the incoming segment. 1144 The connection remains in the CLOSED state. 1146 2. If the connection is in any non-synchronized state (LISTEN, 1147 SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges 1148 something not yet sent (the segment carries an unacceptable ACK), 1149 or if an incoming segment has a security level or compartment 1150 which does not exactly match the level and compartment requested 1151 for the connection, a reset is sent. 1153 If our SYN has not been acknowledged and the precedence level of 1154 the incoming segment is higher than the precedence level requested 1155 then either raise the local precedence level (if allowed by the 1156 user and the system) or send a reset; or if the precedence level 1157 of the incoming segment is lower than the precedence level 1158 requested then continue as if the precedence matched exactly (if 1159 the remote TCP cannot raise the precedence level to match ours 1160 this will be detected in the next segment it sends, and the 1161 connection will be terminated then). If our SYN has been 1162 acknowledged (perhaps in this incoming segment) the precedence 1163 level of the incoming segment must match the local precedence 1164 level exactly, if it does not a reset must be sent. 1166 If the incoming segment has an ACK field, the reset takes its 1167 sequence number from the ACK field of the segment, otherwise the 1168 reset has sequence number zero and the ACK field is set to the sum 1169 of the sequence number and segment length of the incoming segment. 1170 The connection remains in the same state. 1172 3. If the connection is in a synchronized state (ESTABLISHED, 1173 FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), 1174 any unacceptable segment (out of window sequence number or 1175 unacceptable acknowledgment number) must elicit only an empty 1176 acknowledgment segment containing the current send-sequence number 1177 and an acknowledgment indicating the next sequence number expected 1178 to be received, and the connection remains in the same state. 1180 If an incoming segment has a security level, or compartment, or 1181 precedence which does not exactly match the level, and 1182 compartment, and precedence requested for the connection,a reset 1183 is sent and the connection goes to the CLOSED state. The reset 1184 takes its sequence number from the ACK field of the incoming 1185 segment. 1187 Reset Processing 1189 In all states except SYN-SENT, all reset (RST) segments are validated 1190 by checking their SEQ-fields. A reset is valid if its sequence 1191 number is in the window. In the SYN-SENT state (a RST received in 1192 response to an initial SYN), the RST is acceptable if the ACK field 1193 acknowledges the SYN. 1195 The receiver of a RST first validates it, then changes state. If the 1196 receiver was in the LISTEN state, it ignores it. If the receiver was 1197 in SYN-RECEIVED state and had previously been in the LISTEN state, 1198 then the receiver returns to the LISTEN state, otherwise the receiver 1199 aborts the connection and goes to the CLOSED state. If the receiver 1200 was in any other state, it aborts the connection and advises the user 1201 and goes to the CLOSED state. 1203 3.5. Closing a Connection 1205 CLOSE is an operation meaning "I have no more data to send." The 1206 notion of closing a full-duplex connection is subject to ambiguous 1207 interpretation, of course, since it may not be obvious how to treat 1208 the receiving side of the connection. We have chosen to treat CLOSE 1209 in a simplex fashion. The user who CLOSEs may continue to RECEIVE 1210 until he is told that the other side has CLOSED also. Thus, a 1211 program could initiate several SENDs followed by a CLOSE, and then 1212 continue to RECEIVE until signaled that a RECEIVE failed because the 1213 other side has CLOSED. We assume that the TCP will signal a user, 1214 even if no RECEIVEs are outstanding, that the other side has closed, 1215 so the user can terminate his side gracefully. A TCP will reliably 1216 deliver all buffers SENT before the connection was CLOSED so a user 1217 who expects no data in return need only wait to hear the connection 1218 was CLOSED successfully to know that all his data was received at the 1219 destination TCP. Users must keep reading connections they close for 1220 sending until the TCP says no more data. 1222 There are essentially three cases: 1224 1) The user initiates by telling the TCP to CLOSE the connection 1225 2) The remote TCP initiates by sending a FIN control signal 1227 3) Both users CLOSE simultaneously 1229 Case 1: Local user initiates the close 1231 In this case, a FIN segment can be constructed and placed on the 1232 outgoing segment queue. No further SENDs from the user will be 1233 accepted by the TCP, and it enters the FIN-WAIT-1 state. RECEIVEs 1234 are allowed in this state. All segments preceding and including 1235 FIN will be retransmitted until acknowledged. When the other TCP 1236 has both acknowledged the FIN and sent a FIN of its own, the first 1237 TCP can ACK this FIN. Note that a TCP receiving a FIN will ACK 1238 but not send its own FIN until its user has CLOSED the connection 1239 also. 1241 Case 2: TCP receives a FIN from the network 1243 If an unsolicited FIN arrives from the network, the receiving TCP 1244 can ACK it and tell the user that the connection is closing. The 1245 user will respond with a CLOSE, upon which the TCP can send a FIN 1246 to the other TCP after sending any remaining data. The TCP then 1247 waits until its own FIN is acknowledged whereupon it deletes the 1248 connection. If an ACK is not forthcoming, after the user timeout 1249 the connection is aborted and the user is told. 1251 Case 3: both users close simultaneously 1253 A simultaneous CLOSE by users at both ends of a connection causes 1254 FIN segments to be exchanged. When all segments preceding the 1255 FINs have been processed and acknowledged, each TCP can ACK the 1256 FIN it has received. Both will, upon receiving these ACKs, delete 1257 the connection. 1259 TCP A TCP B 1261 1. ESTABLISHED ESTABLISHED 1263 2. (Close) 1264 FIN-WAIT-1 --> --> CLOSE-WAIT 1266 3. FIN-WAIT-2 <-- <-- CLOSE-WAIT 1268 4. (Close) 1269 TIME-WAIT <-- <-- LAST-ACK 1271 5. TIME-WAIT --> --> CLOSED 1273 6. (2 MSL) 1274 CLOSED 1276 Normal Close Sequence 1278 Figure 11 1280 TCP A TCP B 1282 1. ESTABLISHED ESTABLISHED 1284 2. (Close) (Close) 1285 FIN-WAIT-1 --> ... FIN-WAIT-1 1286 <-- <-- 1287 ... --> 1289 3. CLOSING --> ... CLOSING 1290 <-- <-- 1291 ... --> 1293 4. TIME-WAIT TIME-WAIT 1294 (2 MSL) (2 MSL) 1295 CLOSED CLOSED 1297 Simultaneous Close Sequence 1299 Figure 12 1301 3.6. Precedence and Security 1303 The intent is that connection be allowed only between ports operating 1304 with exactly the same security and compartment values and at the 1305 higher of the precedence level requested by the two ports. 1307 The precedence and security parameters used in TCP are exactly those 1308 defined in the Internet Protocol (IP) [2]. Throughout this TCP 1309 specification the term "security/compartment" is intended to indicate 1310 the security parameters used in IP including security, compartment, 1311 user group, and handling restriction. 1313 A connection attempt with mismatched security/compartment values or a 1314 lower precedence value must be rejected by sending a reset. 1315 Rejecting a connection due to too low a precedence only occurs after 1316 an acknowledgment of the SYN has been received. 1318 Note that TCP modules which operate only at the default value of 1319 precedence will still have to check the precedence of incoming 1320 segments and possibly raise the precedence level they use on the 1321 connection. 1323 The security parameters may be used even in a non-secure environment 1324 (the values would indicate unclassified data), thus hosts in non- 1325 secure environments must be prepared to receive the security 1326 parameters, though they need not send them. 1328 3.7. Data Communication 1330 Once the connection is established data is communicated by the 1331 exchange of segments. Because segments may be lost due to errors 1332 (checksum test failure), or network congestion, TCP uses 1333 retransmission (after a timeout) to ensure delivery of every segment. 1334 Duplicate segments may arrive due to network or TCP retransmission. 1335 As discussed in the section on sequence numbers the TCP performs 1336 certain tests on the sequence and acknowledgment numbers in the 1337 segments to verify their acceptability. 1339 The sender of data keeps track of the next sequence number to use in 1340 the variable SND.NXT. The receiver of data keeps track of the next 1341 sequence number to expect in the variable RCV.NXT. The sender of 1342 data keeps track of the oldest unacknowledged sequence number in the 1343 variable SND.UNA. If the data flow is momentarily idle and all data 1344 sent has been acknowledged then the three variables will be equal. 1346 When the sender creates a segment and transmits it the sender 1347 advances SND.NXT. When the receiver accepts a segment it advances 1348 RCV.NXT and sends an acknowledgment. When the data sender receives 1349 an acknowledgment it advances SND.UNA. The extent to which the 1350 values of these variables differ is a measure of the delay in the 1351 communication. The amount by which the variables are advanced is the 1352 length of the data and SYN or FIN flags in the segment. Note that 1353 once in the ESTABLISHED state all segments must carry current 1354 acknowledgment information. 1356 The CLOSE user call implies a push function, as does the FIN control 1357 flag in an incoming segment. 1359 Retransmission Timeout 1361 NOTE: TODO this needs to be updated in light of 1122 4.2.2.15 and 1362 errata 573; this will be done as part of RFC 1122 incorporation into 1363 this document. 1364 Because of the variability of the networks that compose an 1365 internetwork system and the wide range of uses of TCP connections the 1366 retransmission timeout must be dynamically determined. One procedure 1367 for determining a retransmission timeout is given here as an 1368 illustration. 1370 An Example Retransmission Timeout Procedure 1372 Measure the elapsed time between sending a data octet with a 1373 particular sequence number and receiving an acknowledgment that 1374 covers that sequence number (segments sent do not have to match 1375 segments received). This measured elapsed time is the Round Trip 1376 Time (RTT). Next compute a Smoothed Round Trip Time (SRTT) as: 1378 SRTT = ( ALPHA * SRTT ) + ((1-ALPHA) * RTT) 1380 and based on this, compute the retransmission timeout (RTO) as: 1382 RTO = min[UBOUND,max[LBOUND,(BETA*SRTT)]] 1384 where UBOUND is an upper bound on the timeout (e.g., 1 minute), 1385 LBOUND is a lower bound on the timeout (e.g., 1 second), ALPHA is 1386 a smoothing factor (e.g., .8 to .9), and BETA is a delay variance 1387 factor (e.g., 1.3 to 2.0). 1389 The Communication of Urgent Information 1391 The objective of the TCP urgent mechanism is to allow the sending 1392 user to stimulate the receiving user to accept some urgent data and 1393 to permit the receiving TCP to indicate to the receiving user when 1394 all the currently known urgent data has been received by the user. 1396 This mechanism permits a point in the data stream to be designated as 1397 the end of urgent information. Whenever this point is in advance of 1398 the receive sequence number (RCV.NXT) at the receiving TCP, that TCP 1399 must tell the user to go into "urgent mode"; when the receive 1400 sequence number catches up to the urgent pointer, the TCP must tell 1401 user to go into "normal mode". If the urgent pointer is updated 1402 while the user is in "urgent mode", the update will be invisible to 1403 the user. 1405 The method employs a urgent field which is carried in all segments 1406 transmitted. The URG control flag indicates that the urgent field is 1407 meaningful and must be added to the segment sequence number to yield 1408 the urgent pointer. The absence of this flag indicates that there is 1409 no urgent data outstanding. 1411 To send an urgent indication the user must also send at least one 1412 data octet. If the sending user also indicates a push, timely 1413 delivery of the urgent information to the destination process is 1414 enhanced. 1416 Managing the Window 1418 The window sent in each segment indicates the range of sequence 1419 numbers the sender of the window (the data receiver) is currently 1420 prepared to accept. There is an assumption that this is related to 1421 the currently available data buffer space available for this 1422 connection. 1424 Indicating a large window encourages transmissions. If more data 1425 arrives than can be accepted, it will be discarded. This will result 1426 in excessive retransmissions, adding unnecessarily to the load on the 1427 network and the TCPs. Indicating a small window may restrict the 1428 transmission of data to the point of introducing a round trip delay 1429 between each new segment transmitted. 1431 The mechanisms provided allow a TCP to advertise a large window and 1432 to subsequently advertise a much smaller window without having 1433 accepted that much data. This, so called "shrinking the window," is 1434 strongly discouraged. The robustness principle dictates that TCPs 1435 will not shrink the window themselves, but will be prepared for such 1436 behavior on the part of other TCPs. 1438 The sending TCP must be prepared to accept from the user and send at 1439 least one octet of new data even if the send window is zero. The 1440 sending TCP must regularly retransmit to the receiving TCP even when 1441 the window is zero. Two minutes is recommended for the 1442 retransmission interval when the window is zero. This retransmission 1443 is essential to guarantee that when either TCP has a zero window the 1444 re-opening of the window will be reliably reported to the other. 1446 When the receiving TCP has a zero window and a segment arrives it 1447 must still send an acknowledgment showing its next expected sequence 1448 number and current window (zero). 1450 The sending TCP packages the data to be transmitted into segments 1451 which fit the current window, and may repackage segments on the 1452 retransmission queue. Such repackaging is not required, but may be 1453 helpful. 1455 In a connection with a one-way data flow, the window information will 1456 be carried in acknowledgment segments that all have the same sequence 1457 number so there will be no way to reorder them if they arrive out of 1458 order. This is not a serious problem, but it will allow the window 1459 information to be on occasion temporarily based on old reports from 1460 the data receiver. A refinement to avoid this problem is to act on 1461 the window information from segments that carry the highest 1462 acknowledgment number (that is segments with acknowledgment number 1463 equal or greater than the highest previously received). 1465 The window management procedure has significant influence on the 1466 communication performance. The following comments are suggestions to 1467 implementers. 1469 Window Management Suggestions 1471 Allocating a very small window causes data to be transmitted in 1472 many small segments when better performance is achieved using 1473 fewer large segments. 1475 One suggestion for avoiding small windows is for the receiver to 1476 defer updating a window until the additional allocation is at 1477 least X percent of the maximum allocation possible for the 1478 connection (where X might be 20 to 40). 1480 Another suggestion is for the sender to avoid sending small 1481 segments by waiting until the window is large enough before 1482 sending data. If the user signals a push function then the data 1483 must be sent even if it is a small segment. 1485 Note that the acknowledgments should not be delayed or unnecessary 1486 retransmissions will result. One strategy would be to send an 1487 acknowledgment when a small segment arrives (with out updating the 1488 window information), and then to send another acknowledgment with 1489 new window information when the window is larger. 1491 The segment sent to probe a zero window may also begin a break up 1492 of transmitted data into smaller and smaller segments. If a 1493 segment containing a single data octet sent to probe a zero window 1494 is accepted, it consumes one octet of the window now available. 1495 If the sending TCP simply sends as much as it can whenever the 1496 window is non zero, the transmitted data will be broken into 1497 alternating big and small segments. As time goes on, occasional 1498 pauses in the receiver making window allocation available will 1499 result in breaking the big segments into a small and not quite so 1500 big pair. And after a while the data transmission will be in 1501 mostly small segments. 1503 The suggestion here is that the TCP implementations need to 1504 actively attempt to combine small window allocations into larger 1505 windows, since the mechanisms for managing the window tend to lead 1506 to many small windows in the simplest minded implementations. 1508 3.8. Interfaces 1510 There are of course two interfaces of concern: the user/TCP interface 1511 and the TCP/lower-level interface. We have a fairly elaborate model 1512 of the user/TCP interface, but the interface to the lower level 1513 protocol module is left unspecified here, since it will be specified 1514 in detail by the specification of the lower level protocol. For the 1515 case that the lower level is IP we note some of the parameter values 1516 that TCPs might use. 1518 3.8.1. User/TCP Interface 1520 The following functional description of user commands to the TCP is, 1521 at best, fictional, since every operating system will have different 1522 facilities. Consequently, we must warn readers that different TCP 1523 implementations may have different user interfaces. However, all 1524 TCPs must provide a certain minimum set of services to guarantee that 1525 all TCP implementations can support the same protocol hierarchy. 1526 This section specifies the functional interfaces required of all TCP 1527 implementations. 1529 TCP User Commands 1531 The following sections functionally characterize a USER/TCP 1532 interface. The notation used is similar to most procedure or 1533 function calls in high level languages, but this usage is not 1534 meant to rule out trap type service calls (e.g., SVCs, UUOs, 1535 EMTs). 1537 The user commands described below specify the basic functions the 1538 TCP must perform to support interprocess communication. 1539 Individual implementations must define their own exact format, and 1540 may provide combinations or subsets of the basic functions in 1541 single calls. In particular, some implementations may wish to 1542 automatically OPEN a connection on the first SEND or RECEIVE 1543 issued by the user for a given connection. 1545 In providing interprocess communication facilities, the TCP must 1546 not only accept commands, but must also return information to the 1547 processes it serves. The latter consists of: 1549 (a) general information about a connection (e.g., interrupts, 1550 remote close, binding of unspecified foreign socket). 1552 (b) replies to specific user commands indicating success or 1553 various types of failure. 1555 Open 1557 Format: OPEN (local port, foreign socket, active/passive [, 1558 timeout] [, precedence] [, security/compartment] [, options]) 1559 -> local connection name 1561 We assume that the local TCP is aware of the identity of the 1562 processes it serves and will check the authority of the process 1563 to use the connection specified. Depending upon the 1564 implementation of the TCP, the local network and TCP 1565 identifiers for the source address will either be supplied by 1566 the TCP or the lower level protocol (e.g., IP). These 1567 considerations are the result of concern about security, to the 1568 extent that no TCP be able to masquerade as another one, and so 1569 on. Similarly, no process can masquerade as another without 1570 the collusion of the TCP. 1572 If the active/passive flag is set to passive, then this is a 1573 call to LISTEN for an incoming connection. A passive open may 1574 have either a fully specified foreign socket to wait for a 1575 particular connection or an unspecified foreign socket to wait 1576 for any call. A fully specified passive call can be made 1577 active by the subsequent execution of a SEND. 1579 A transmission control block (TCB) is created and partially 1580 filled in with data from the OPEN command parameters. 1582 On an active OPEN command, the TCP will begin the procedure to 1583 synchronize (i.e., establish) the connection at once. 1585 The timeout, if present, permits the caller to set up a timeout 1586 for all data submitted to TCP. If data is not successfully 1587 delivered to the destination within the timeout period, the TCP 1588 will abort the connection. The present global default is five 1589 minutes. 1591 The TCP or some component of the operating system will verify 1592 the users authority to open a connection with the specified 1593 precedence or security/compartment. The absence of precedence 1594 or security/compartment specification in the OPEN call 1595 indicates the default values must be used. 1597 TCP will accept incoming requests as matching only if the 1598 security/compartment information is exactly the same and only 1599 if the precedence is equal to or higher than the precedence 1600 requested in the OPEN call. 1602 The precedence for the connection is the higher of the values 1603 requested in the OPEN call and received from the incoming 1604 request, and fixed at that value for the life of the 1605 connection.Implementers may want to give the user control of 1606 this precedence negotiation. For example, the user might be 1607 allowed to specify that the precedence must be exactly matched, 1608 or that any attempt to raise the precedence be confirmed by the 1609 user. 1611 A local connection name will be returned to the user by the 1612 TCP. The local connection name can then be used as a short 1613 hand term for the connection defined by the pair. 1616 Send 1618 Format: SEND (local connection name, buffer address, byte 1619 count, PUSH flag, URGENT flag [,timeout]) 1621 This call causes the data contained in the indicated user 1622 buffer to be sent on the indicated connection. If the 1623 connection has not been opened, the SEND is considered an 1624 error. Some implementations may allow users to SEND first; in 1625 which case, an automatic OPEN would be done. If the calling 1626 process is not authorized to use this connection, an error is 1627 returned. 1629 If the PUSH flag is set, the data must be transmitted promptly 1630 to the receiver, and the PUSH bit will be set in the last TCP 1631 segment created from the buffer. If the PUSH flag is not set, 1632 the data may be combined with data from subsequent SENDs for 1633 transmission efficiency. 1635 If the URGENT flag is set, segments sent to the destination TCP 1636 will have the urgent pointer set. The receiving TCP will 1637 signal the urgent condition to the receiving process if the 1638 urgent pointer indicates that data preceding the urgent pointer 1639 has not been consumed by the receiving process. The purpose of 1640 urgent is to stimulate the receiver to process the urgent data 1641 and to indicate to the receiver when all the currently known 1642 urgent data has been received. The number of times the sending 1643 user's TCP signals urgent will not necessarily be equal to the 1644 number of times the receiving user will be notified of the 1645 presence of urgent data. 1647 If no foreign socket was specified in the OPEN, but the 1648 connection is established (e.g., because a LISTENing connection 1649 has become specific due to a foreign segment arriving for the 1650 local socket), then the designated buffer is sent to the 1651 implied foreign socket. Users who make use of OPEN with an 1652 unspecified foreign socket can make use of SEND without ever 1653 explicitly knowing the foreign socket address. 1655 However, if a SEND is attempted before the foreign socket 1656 becomes specified, an error will be returned. Users can use 1657 the STATUS call to determine the status of the connection. In 1658 some implementations the TCP may notify the user when an 1659 unspecified socket is bound. 1661 If a timeout is specified, the current user timeout for this 1662 connection is changed to the new one. 1664 In the simplest implementation, SEND would not return control 1665 to the sending process until either the transmission was 1666 complete or the timeout had been exceeded. However, this 1667 simple method is both subject to deadlocks (for example, both 1668 sides of the connection might try to do SENDs before doing any 1669 RECEIVEs) and offers poor performance, so it is not 1670 recommended. A more sophisticated implementation would return 1671 immediately to allow the process to run concurrently with 1672 network I/O, and, furthermore, to allow multiple SENDs to be in 1673 progress. Multiple SENDs are served in first come, first 1674 served order, so the TCP will queue those it cannot service 1675 immediately. 1677 We have implicitly assumed an asynchronous user interface in 1678 which a SEND later elicits some kind of SIGNAL or pseudo- 1679 interrupt from the serving TCP. An alternative is to return a 1680 response immediately. For instance, SENDs might return 1681 immediate local acknowledgment, even if the segment sent had 1682 not been acknowledged by the distant TCP. We could 1683 optimistically assume eventual success. If we are wrong, the 1684 connection will close anyway due to the timeout. In 1685 implementations of this kind (synchronous), there will still be 1686 some asynchronous signals, but these will deal with the 1687 connection itself, and not with specific segments or buffers. 1689 In order for the process to distinguish among error or success 1690 indications for different SENDs, it might be appropriate for 1691 the buffer address to be returned along with the coded response 1692 to the SEND request. TCP-to-user signals are discussed below, 1693 indicating the information which should be returned to the 1694 calling process. 1696 Receive 1698 Format: RECEIVE (local connection name, buffer address, byte 1699 count) -> byte count, urgent flag, push flag 1701 This command allocates a receiving buffer associated with the 1702 specified connection. If no OPEN precedes this command or the 1703 calling process is not authorized to use this connection, an 1704 error is returned. 1706 In the simplest implementation, control would not return to the 1707 calling program until either the buffer was filled, or some 1708 error occurred, but this scheme is highly subject to deadlocks. 1709 A more sophisticated implementation would permit several 1710 RECEIVEs to be outstanding at once. These would be filled as 1711 segments arrive. This strategy permits increased throughput at 1712 the cost of a more elaborate scheme (possibly asynchronous) to 1713 notify the calling program that a PUSH has been seen or a 1714 buffer filled. 1716 If enough data arrive to fill the buffer before a PUSH is seen, 1717 the PUSH flag will not be set in the response to the RECEIVE. 1718 The buffer will be filled with as much data as it can hold. If 1719 a PUSH is seen before the buffer is filled the buffer will be 1720 returned partially filled and PUSH indicated. 1722 If there is urgent data the user will have been informed as 1723 soon as it arrived via a TCP-to-user signal. The receiving 1724 user should thus be in "urgent mode". If the URGENT flag is 1725 on, additional urgent data remains. If the URGENT flag is off, 1726 this call to RECEIVE has returned all the urgent data, and the 1727 user may now leave "urgent mode". Note that data following the 1728 urgent pointer (non-urgent data) cannot be delivered to the 1729 user in the same buffer with preceding urgent data unless the 1730 boundary is clearly marked for the user. 1732 To distinguish among several outstanding RECEIVEs and to take 1733 care of the case that a buffer is not completely filled, the 1734 return code is accompanied by both a buffer pointer and a byte 1735 count indicating the actual length of the data received. 1737 Alternative implementations of RECEIVE might have the TCP 1738 allocate buffer storage, or the TCP might share a ring buffer 1739 with the user. 1741 Close 1743 Format: CLOSE (local connection name) 1745 This command causes the connection specified to be closed. If 1746 the connection is not open or the calling process is not 1747 authorized to use this connection, an error is returned. 1748 Closing connections is intended to be a graceful operation in 1749 the sense that outstanding SENDs will be transmitted (and 1750 retransmitted), as flow control permits, until all have been 1751 serviced. Thus, it should be acceptable to make several SEND 1752 calls, followed by a CLOSE, and expect all the data to be sent 1753 to the destination. It should also be clear that users should 1754 continue to RECEIVE on CLOSING connections, since the other 1755 side may be trying to transmit the last of its data. Thus, 1756 CLOSE means "I have no more to send" but does not mean "I will 1757 not receive any more." It may happen (if the user level 1758 protocol is not well thought out) that the closing side is 1759 unable to get rid of all its data before timing out. In this 1760 event, CLOSE turns into ABORT, and the closing TCP gives up. 1762 The user may CLOSE the connection at any time on his own 1763 initiative, or in response to various prompts from the TCP 1764 (e.g., remote close executed, transmission timeout exceeded, 1765 destination inaccessible). 1767 Because closing a connection requires communication with the 1768 foreign TCP, connections may remain in the closing state for a 1769 short time. Attempts to reopen the connection before the TCP 1770 replies to the CLOSE command will result in error responses. 1772 Close also implies push function. 1774 Status 1776 Format: STATUS (local connection name) -> status data 1778 This is an implementation dependent user command and could be 1779 excluded without adverse effect. Information returned would 1780 typically come from the TCB associated with the connection. 1782 This command returns a data block containing the following 1783 information: 1785 local socket, 1786 foreign socket, 1787 local connection name, 1788 receive window, 1789 send window, 1790 connection state, 1791 number of buffers awaiting acknowledgment, 1792 number of buffers pending receipt, 1793 urgent state, 1794 precedence, 1795 security/compartment, 1796 and transmission timeout. 1798 Depending on the state of the connection, or on the 1799 implementation itself, some of this information may not be 1800 available or meaningful. If the calling process is not 1801 authorized to use this connection, an error is returned. This 1802 prevents unauthorized processes from gaining information about 1803 a connection. 1805 Abort 1807 Format: ABORT (local connection name) 1809 This command causes all pending SENDs and RECEIVES to be 1810 aborted, the TCB to be removed, and a special RESET message to 1811 be sent to the TCP on the other side of the connection. 1812 Depending on the implementation, users may receive abort 1813 indications for each outstanding SEND or RECEIVE, or may simply 1814 receive an ABORT-acknowledgment. 1816 TCP-to-User Messages 1818 It is assumed that the operating system environment provides a 1819 means for the TCP to asynchronously signal the user program. 1820 When the TCP does signal a user program, certain information is 1821 passed to the user. Often in the specification the information 1822 will be an error message. In other cases there will be 1823 information relating to the completion of processing a SEND or 1824 RECEIVE or other user call. 1826 The following information is provided: 1828 Local Connection Name Always 1829 Response String Always 1830 Buffer Address Send & Receive 1831 Byte count (counts bytes received) Receive 1832 Push flag Receive 1833 Urgent flag Receive 1835 3.8.2. TCP/Lower-Level Interface 1837 The TCP calls on a lower level protocol module to actually send and 1838 receive information over a network. One case is that of the ARPA 1839 internetwork system where the lower level module is the Internet 1840 Protocol (IP) [2]. 1842 If the lower level protocol is IP it provides arguments for a type of 1843 service and for a time to live. TCP uses the following settings for 1844 these parameters: 1846 Type of Service = Precedence: given by user, Delay: normal, 1847 Throughput: normal, Reliability: normal; or binary XXX00000, where 1848 XXX are the three bits determining precedence, e.g. 000 means 1849 routine precedence. 1851 Time to Live = one minute, or 00111100. 1853 Note that the assumed maximum segment lifetime is two minutes. 1854 Here we explicitly ask that a segment be destroyed if it cannot 1855 be delivered by the internet system within one minute. 1857 If the lower level is IP (or other protocol that provides this 1858 feature) and source routing is used, the interface must allow the 1859 route information to be communicated. This is especially important 1860 so that the source and destination addresses used in the TCP checksum 1861 be the originating source and ultimate destination. It is also 1862 important to preserve the return route to answer connection requests. 1864 Any lower level protocol will have to provide the source address, 1865 destination address, and protocol fields, and some way to determine 1866 the "TCP length", both to provide the functional equivalent service 1867 of IP and to be used in the TCP checksum. 1869 3.9. Event Processing 1871 The processing depicted in this section is an example of one possible 1872 implementation. Other implementations may have slightly different 1873 processing sequences, but they should differ from those in this 1874 section only in detail, not in substance. 1876 The activity of the TCP can be characterized as responding to events. 1877 The events that occur can be cast into three categories: user calls, 1878 arriving segments, and timeouts. This section describes the 1879 processing the TCP does in response to each of the events. In many 1880 cases the processing required depends on the state of the connection. 1882 Events that occur: 1884 User Calls 1886 OPEN 1887 SEND 1888 RECEIVE 1889 CLOSE 1890 ABORT 1891 STATUS 1893 Arriving Segments 1895 SEGMENT ARRIVES 1897 Timeouts 1899 USER TIMEOUT 1900 RETRANSMISSION TIMEOUT 1901 TIME-WAIT TIMEOUT 1903 The model of the TCP/user interface is that user commands receive an 1904 immediate return and possibly a delayed response via an event or 1905 pseudo interrupt. In the following descriptions, the term "signal" 1906 means cause a delayed response. 1908 Error responses are given as character strings. For example, user 1909 commands referencing connections that do not exist receive "error: 1910 connection not open". 1912 Please note in the following that all arithmetic on sequence numbers, 1913 acknowledgment numbers, windows, et cetera, is modulo 2**32 the size 1914 of the sequence number space. Also note that "=<" means less than or 1915 equal to (modulo 2**32). 1917 A natural way to think about processing incoming segments is to 1918 imagine that they are first tested for proper sequence number (i.e., 1919 that their contents lie in the range of the expected "receive window" 1920 in the sequence number space) and then that they are generally queued 1921 and processed in sequence number order. 1923 When a segment overlaps other already received segments we 1924 reconstruct the segment to contain just the new data, and adjust the 1925 header fields to be consistent. 1927 Note that if no state change is mentioned the TCP stays in the same 1928 state. 1930 OPEN Call 1932 CLOSED STATE (i.e., TCB does not exist) 1934 Create a new transmission control block (TCB) to hold 1935 connection state information. Fill in local socket identifier, 1936 foreign socket, precedence, security/compartment, and user 1937 timeout information. Note that some parts of the foreign 1938 socket may be unspecified in a passive OPEN and are to be 1939 filled in by the parameters of the incoming SYN segment. 1940 Verify the security and precedence requested are allowed for 1941 this user, if not return "error: precedence not allowed" or 1942 "error: security/compartment not allowed." If passive enter 1943 the LISTEN state and return. If active and the foreign socket 1944 is unspecified, return "error: foreign socket unspecified"; if 1945 active and the foreign socket is specified, issue a SYN 1946 segment. An initial send sequence number (ISS) is selected. A 1947 SYN segment of the form is sent. Set 1948 SND.UNA to ISS, SND.NXT to ISS+1, enter SYN-SENT state, and 1949 return. 1951 If the caller does not have access to the local socket 1952 specified, return "error: connection illegal for this process". 1953 If there is no room to create a new connection, return "error: 1954 insufficient resources". 1956 LISTEN STATE 1958 If active and the foreign socket is specified, then change the 1959 connection from passive to active, select an ISS. Send a SYN 1960 segment, set SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT 1961 state. Data associated with SEND may be sent with SYN segment 1962 or queued for transmission after entering ESTABLISHED state. 1963 The urgent bit if requested in the command must be sent with 1964 the data segments sent as a result of this command. If there 1965 is no room to queue the request, respond with "error: 1966 insufficient resources". If Foreign socket was not specified, 1967 then return "error: foreign socket unspecified". 1969 SYN-SENT STATE 1970 SYN-RECEIVED STATE 1971 ESTABLISHED STATE 1972 FIN-WAIT-1 STATE 1973 FIN-WAIT-2 STATE 1974 CLOSE-WAIT STATE 1975 CLOSING STATE 1976 LAST-ACK STATE 1977 TIME-WAIT STATE 1979 Return "error: connection already exists". 1981 SEND Call 1983 CLOSED STATE (i.e., TCB does not exist) 1985 If the user does not have access to such a connection, then 1986 return "error: connection illegal for this process". 1988 Otherwise, return "error: connection does not exist". 1990 LISTEN STATE 1992 If the foreign socket is specified, then change the connection 1993 from passive to active, select an ISS. Send a SYN segment, set 1994 SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT state. Data 1995 associated with SEND may be sent with SYN segment or queued for 1996 transmission after entering ESTABLISHED state. The urgent bit 1997 if requested in the command must be sent with the data segments 1998 sent as a result of this command. If there is no room to queue 1999 the request, respond with "error: insufficient resources". If 2000 Foreign socket was not specified, then return "error: foreign 2001 socket unspecified". 2003 SYN-SENT STATE 2004 SYN-RECEIVED STATE 2006 Queue the data for transmission after entering ESTABLISHED 2007 state. If no space to queue, respond with "error: insufficient 2008 resources". 2010 ESTABLISHED STATE 2011 CLOSE-WAIT STATE 2013 Segmentize the buffer and send it with a piggybacked 2014 acknowledgment (acknowledgment value = RCV.NXT). If there is 2015 insufficient space to remember this buffer, simply return 2016 "error: insufficient resources". 2018 If the urgent flag is set, then SND.UP <- SND.NXT-1 and set the 2019 urgent pointer in the outgoing segments. 2021 FIN-WAIT-1 STATE 2022 FIN-WAIT-2 STATE 2023 CLOSING STATE 2024 LAST-ACK STATE 2025 TIME-WAIT STATE 2027 Return "error: connection closing" and do not service request. 2029 RECEIVE Call 2031 CLOSED STATE (i.e., TCB does not exist) 2033 If the user does not have access to such a connection, return 2034 "error: connection illegal for this process". 2036 Otherwise return "error: connection does not exist". 2038 LISTEN STATE 2039 SYN-SENT STATE 2040 SYN-RECEIVED STATE 2042 Queue for processing after entering ESTABLISHED state. If 2043 there is no room to queue this request, respond with "error: 2044 insufficient resources". 2046 ESTABLISHED STATE 2047 FIN-WAIT-1 STATE 2048 FIN-WAIT-2 STATE 2050 If insufficient incoming segments are queued to satisfy the 2051 request, queue the request. If there is no queue space to 2052 remember the RECEIVE, respond with "error: insufficient 2053 resources". 2055 Reassemble queued incoming segments into receive buffer and 2056 return to user. Mark "push seen" (PUSH) if this is the case. 2058 If RCV.UP is in advance of the data currently being passed to 2059 the user notify the user of the presence of urgent data. 2061 When the TCP takes responsibility for delivering data to the 2062 user that fact must be communicated to the sender via an 2063 acknowledgment. The formation of such an acknowledgment is 2064 described below in the discussion of processing an incoming 2065 segment. 2067 CLOSE-WAIT STATE 2069 Since the remote side has already sent FIN, RECEIVEs must be 2070 satisfied by text already on hand, but not yet delivered to the 2071 user. If no text is awaiting delivery, the RECEIVE will get a 2072 "error: connection closing" response. Otherwise, any remaining 2073 text can be used to satisfy the RECEIVE. 2075 CLOSING STATE 2076 LAST-ACK STATE 2077 TIME-WAIT STATE 2079 Return "error: connection closing". 2081 CLOSE Call 2083 CLOSED STATE (i.e., TCB does not exist) 2085 If the user does not have access to such a connection, return 2086 "error: connection illegal for this process". 2088 Otherwise, return "error: connection does not exist". 2090 LISTEN STATE 2092 Any outstanding RECEIVEs are returned with "error: closing" 2093 responses. Delete TCB, enter CLOSED state, and return. 2095 SYN-SENT STATE 2097 Delete the TCB and return "error: closing" responses to any 2098 queued SENDs, or RECEIVEs. 2100 SYN-RECEIVED STATE 2102 If no SENDs have been issued and there is no pending data to 2103 send, then form a FIN segment and send it, and enter FIN-WAIT-1 2104 state; otherwise queue for processing after entering 2105 ESTABLISHED state. 2107 ESTABLISHED STATE 2109 Queue this until all preceding SENDs have been segmentized, 2110 then form a FIN segment and send it. In any case, enter FIN- 2111 WAIT-1 state. 2113 FIN-WAIT-1 STATE 2114 FIN-WAIT-2 STATE 2116 Strictly speaking, this is an error and should receive a 2117 "error: connection closing" response. An "ok" response would 2118 be acceptable, too, as long as a second FIN is not emitted (the 2119 first FIN may be retransmitted though). 2121 CLOSE-WAIT STATE 2123 Queue this request until all preceding SENDs have been 2124 segmentized; then send a FIN segment, enter LAST-ACK state. 2126 CLOSING STATE 2127 LAST-ACK STATE 2128 TIME-WAIT STATE 2129 Respond with "error: connection closing". 2131 ABORT Call 2133 CLOSED STATE (i.e., TCB does not exist) 2135 If the user should not have access to such a connection, return 2136 "error: connection illegal for this process". 2138 Otherwise return "error: connection does not exist". 2140 LISTEN STATE 2142 Any outstanding RECEIVEs should be returned with "error: 2143 connection reset" responses. Delete TCB, enter CLOSED state, 2144 and return. 2146 SYN-SENT STATE 2148 All queued SENDs and RECEIVEs should be given "connection 2149 reset" notification, delete the TCB, enter CLOSED state, and 2150 return. 2152 SYN-RECEIVED STATE 2153 ESTABLISHED STATE 2154 FIN-WAIT-1 STATE 2155 FIN-WAIT-2 STATE 2156 CLOSE-WAIT STATE 2158 Send a reset segment: 2160 2162 All queued SENDs and RECEIVEs should be given "connection 2163 reset" notification; all segments queued for transmission 2164 (except for the RST formed above) or retransmission should be 2165 flushed, delete the TCB, enter CLOSED state, and return. 2167 CLOSING STATE LAST-ACK STATE TIME-WAIT STATE 2169 Respond with "ok" and delete the TCB, enter CLOSED state, and 2170 return. 2172 STATUS Call 2174 CLOSED STATE (i.e., TCB does not exist) 2176 If the user should not have access to such a connection, return 2177 "error: connection illegal for this process". 2179 Otherwise return "error: connection does not exist". 2181 LISTEN STATE 2183 Return "state = LISTEN", and the TCB pointer. 2185 SYN-SENT STATE 2187 Return "state = SYN-SENT", and the TCB pointer. 2189 SYN-RECEIVED STATE 2191 Return "state = SYN-RECEIVED", and the TCB pointer. 2193 ESTABLISHED STATE 2195 Return "state = ESTABLISHED", and the TCB pointer. 2197 FIN-WAIT-1 STATE 2199 Return "state = FIN-WAIT-1", and the TCB pointer. 2201 FIN-WAIT-2 STATE 2203 Return "state = FIN-WAIT-2", and the TCB pointer. 2205 CLOSE-WAIT STATE 2207 Return "state = CLOSE-WAIT", and the TCB pointer. 2209 CLOSING STATE 2211 Return "state = CLOSING", and the TCB pointer. 2213 LAST-ACK STATE 2215 Return "state = LAST-ACK", and the TCB pointer. 2217 TIME-WAIT STATE 2219 Return "state = TIME-WAIT", and the TCB pointer. 2221 SEGMENT ARRIVES 2223 If the state is CLOSED (i.e., TCB does not exist) then 2225 all data in the incoming segment is discarded. An incoming 2226 segment containing a RST is discarded. An incoming segment not 2227 containing a RST causes a RST to be sent in response. The 2228 acknowledgment and sequence field values are selected to make 2229 the reset sequence acceptable to the TCP that sent the 2230 offending segment. 2232 If the ACK bit is off, sequence number zero is used, 2234 2236 If the ACK bit is on, 2238 2240 Return. 2242 If the state is LISTEN then 2244 first check for an RST 2246 An incoming RST should be ignored. Return. 2248 second check for an ACK 2250 Any acknowledgment is bad if it arrives on a connection 2251 still in the LISTEN state. An acceptable reset segment 2252 should be formed for any arriving ACK-bearing segment. The 2253 RST should be formatted as follows: 2255 2257 Return. 2259 third check for a SYN 2261 If the SYN bit is set, check the security. If the security/ 2262 compartment on the incoming segment does not exactly match 2263 the security/compartment in the TCB then send a reset and 2264 return. 2266 2268 If the SEG.PRC is greater than the TCB.PRC then if allowed 2269 by the user and the system set TCB.PRC<-SEG.PRC, if not 2270 allowed send a reset and return. 2272 2274 If the SEG.PRC is less than the TCB.PRC then continue. 2276 Set RCV.NXT to SEG.SEQ+1, IRS is set to SEG.SEQ and any 2277 other control or text should be queued for processing later. 2278 ISS should be selected and a SYN segment sent of the form: 2280 2282 SND.NXT is set to ISS+1 and SND.UNA to ISS. The connection 2283 state should be changed to SYN-RECEIVED. Note that any 2284 other incoming control or data (combined with SYN) will be 2285 processed in the SYN-RECEIVED state, but processing of SYN 2286 and ACK should not be repeated. If the listen was not fully 2287 specified (i.e., the foreign socket was not fully 2288 specified), then the unspecified fields should be filled in 2289 now. 2291 fourth other text or control 2293 Any other control or text-bearing segment (not containing 2294 SYN) must have an ACK and thus would be discarded by the ACK 2295 processing. An incoming RST segment could not be valid, 2296 since it could not have been sent in response to anything 2297 sent by this incarnation of the connection. So you are 2298 unlikely to get here, but if you do, drop the segment, and 2299 return. 2301 If the state is SYN-SENT then 2303 first check the ACK bit 2305 If the ACK bit is set 2307 If SEG.ACK =< ISS, or SEG.ACK > SND.NXT, send a reset 2308 (unless the RST bit is set, if so drop the segment and 2309 return) 2311 2313 and discard the segment. Return. 2315 If SND.UNA < SEG.ACK =< SND.NXT then the ACK is 2316 acceptable. (TODO: in processing Errata ID 3300, it was 2317 noted that some stacks in the wild that do not send data 2318 on the SYN are just checking that SEG.ACK == SND.NXT ... 2319 think about whether anything should be said about that 2320 here) 2322 second check the RST bit 2324 If the RST bit is set 2326 If the ACK was acceptable then signal the user "error: 2327 connection reset", drop the segment, enter CLOSED state, 2328 delete TCB, and return. Otherwise (no ACK) drop the 2329 segment and return. 2331 third check the security and precedence 2333 If the security/compartment in the segment does not exactly 2334 match the security/compartment in the TCB, send a reset 2336 If there is an ACK 2338 2340 Otherwise 2342 2344 If there is an ACK 2346 The precedence in the segment must match the precedence 2347 in the TCB, if not, send a reset 2349 2351 If there is no ACK 2353 If the precedence in the segment is higher than the 2354 precedence in the TCB then if allowed by the user and the 2355 system raise the precedence in the TCB to that in the 2356 segment, if not allowed to raise the prec then send a 2357 reset. 2359 2361 If the precedence in the segment is lower than the 2362 precedence in the TCB continue. 2364 If a reset was sent, discard the segment and return. 2366 fourth check the SYN bit 2368 This step should be reached only if the ACK is ok, or there 2369 is no ACK, and it the segment did not contain a RST. 2371 If the SYN bit is on and the security/compartment and 2372 precedence are acceptable then, RCV.NXT is set to SEG.SEQ+1, 2373 IRS is set to SEG.SEQ. SND.UNA should be advanced to equal 2374 SEG.ACK (if there is an ACK), and any segments on the 2375 retransmission queue which are thereby acknowledged should 2376 be removed. 2378 If SND.UNA > ISS (our SYN has been ACKed), change the 2379 connection state to ESTABLISHED, form an ACK segment 2381 2383 and send it. Data or controls which were queued for 2384 transmission may be included. If there are other controls 2385 or text in the segment then continue processing at the sixth 2386 step below where the URG bit is checked, otherwise return. 2388 Otherwise enter SYN-RECEIVED, form a SYN,ACK segment 2390 2392 and send it. Set the variables: 2394 SND.WND <- SEG.WND 2395 SND.WL1 <- SEG.SEQ 2396 SND.WL2 <- SEG.ACK 2398 If there are other controls or text in the segment, queue 2399 them for processing after the ESTABLISHED state has been 2400 reached, return. 2402 fifth, if neither of the SYN or RST bits is set then drop the 2403 segment and return. 2405 Otherwise, 2407 first check sequence number 2409 SYN-RECEIVED STATE 2410 ESTABLISHED STATE 2411 FIN-WAIT-1 STATE 2412 FIN-WAIT-2 STATE 2413 CLOSE-WAIT STATE 2414 CLOSING STATE 2415 LAST-ACK STATE 2416 TIME-WAIT STATE 2418 Segments are processed in sequence. Initial tests on 2419 arrival are used to discard old duplicates, but further 2420 processing is done in SEG.SEQ order. If a segment's 2421 contents straddle the boundary between old and new, only the 2422 new parts should be processed. 2424 There are four cases for the acceptability test for an 2425 incoming segment: 2427 Segment Receive Test 2428 Length Window 2429 ------- ------- ------------------------------------------- 2431 0 0 SEG.SEQ = RCV.NXT 2433 0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 2435 >0 0 not acceptable 2437 >0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 2438 or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 2440 If the RCV.WND is zero, no segments will be acceptable, but 2441 special allowance should be made to accept valid ACKs, URGs 2442 and RSTs. 2444 If an incoming segment is not acceptable, an acknowledgment 2445 should be sent in reply (unless the RST bit is set, if so 2446 drop the segment and return): 2448 2450 After sending the acknowledgment, drop the unacceptable 2451 segment and return. 2453 In the following it is assumed that the segment is the 2454 idealized segment that begins at RCV.NXT and does not exceed 2455 the window. One could tailor actual segments to fit this 2456 assumption by trimming off any portions that lie outside the 2457 window (including SYN and FIN), and only processing further 2458 if the segment then begins at RCV.NXT. Segments with higher 2459 beginning sequence numbers should be held for later 2460 processing. 2462 second check the RST bit, 2464 SYN-RECEIVED STATE 2466 If the RST bit is set 2468 If this connection was initiated with a passive OPEN 2469 (i.e., came from the LISTEN state), then return this 2470 connection to LISTEN state and return. The user need 2471 not be informed. If this connection was initiated 2472 with an active OPEN (i.e., came from SYN-SENT state) 2473 then the connection was refused, signal the user 2474 "connection refused". In either case, all segments on 2475 the retransmission queue should be removed. And in 2476 the active OPEN case, enter the CLOSED state and 2477 delete the TCB, and return. 2479 ESTABLISHED 2480 FIN-WAIT-1 2481 FIN-WAIT-2 2482 CLOSE-WAIT 2484 If the RST bit is set then, any outstanding RECEIVEs and 2485 SEND should receive "reset" responses. All segment 2486 queues should be flushed. Users should also receive an 2487 unsolicited general "connection reset" signal. Enter the 2488 CLOSED state, delete the TCB, and return. 2490 CLOSING STATE 2491 LAST-ACK STATE 2492 TIME-WAIT 2494 If the RST bit is set then, enter the CLOSED state, 2495 delete the TCB, and return. 2497 third check security and precedence 2499 SYN-RECEIVED 2501 If the security/compartment and precedence in the segment 2502 do not exactly match the security/compartment and 2503 precedence in the TCB then send a reset, and return. 2505 ESTABLISHED 2506 FIN-WAIT-1 2507 FIN-WAIT-2 2508 CLOSE-WAIT 2509 CLOSING 2510 LAST-ACK 2511 TIME-WAIT 2513 If the security/compartment and precedence in the segment 2514 do not exactly match the security/compartment and 2515 precedence in the TCB then send a reset, any outstanding 2516 RECEIVEs and SEND should receive "reset" responses. All 2517 segment queues should be flushed. Users should also 2518 receive an unsolicited general "connection reset" signal. 2519 Enter the CLOSED state, delete the TCB, and return. 2521 Note this check is placed following the sequence check to 2522 prevent a segment from an old connection between these ports 2523 with a different security or precedence from causing an 2524 abort of the current connection. 2526 fourth, check the SYN bit, 2528 SYN-RECEIVED 2529 ESTABLISHED STATE 2530 FIN-WAIT STATE-1 2531 FIN-WAIT STATE-2 2532 CLOSE-WAIT STATE 2533 CLOSING STATE 2534 LAST-ACK STATE 2535 TIME-WAIT STATE 2537 TODO: need to incorporate RFC 1122 4.2.2.20(e) here 2539 If the SYN is in the window it is an error, send a reset, 2540 any outstanding RECEIVEs and SEND should receive "reset" 2541 responses, all segment queues should be flushed, the user 2542 should also receive an unsolicited general "connection 2543 reset" signal, enter the CLOSED state, delete the TCB, 2544 and return. 2546 If the SYN is not in the window this step would not be 2547 reached and an ack would have been sent in the first step 2548 (sequence number check). 2550 fifth check the ACK field, 2552 if the ACK bit is off drop the segment and return 2553 if the ACK bit is on 2555 SYN-RECEIVED STATE 2557 If SND.UNA < SEG.ACK =< SND.NXT then enter ESTABLISHED 2558 state and continue processing with variables below set 2559 to: 2561 SND.WND <- SEG.WND 2562 SND.WL1 <- SEG.SEQ 2563 SND.WL2 <- SEG.ACK 2565 If the segment acknowledgment is not acceptable, 2566 form a reset segment, 2568 2570 and send it. 2572 ESTABLISHED STATE 2574 If SND.UNA < SEG.ACK =< SND.NXT then, set SND.UNA <- 2575 SEG.ACK. Any segments on the retransmission queue 2576 which are thereby entirely acknowledged are removed. 2577 Users should receive positive acknowledgments for 2578 buffers which have been SENT and fully acknowledged 2579 (i.e., SEND buffer should be returned with "ok" 2580 response). If the ACK is a duplicate (SEG.ACK =< 2581 SND.UNA), it can be ignored. If the ACK acks 2582 something not yet sent (SEG.ACK > SND.NXT) then send 2583 an ACK, drop the segment, and return. 2585 If SND.UNA =< SEG.ACK =< SND.NXT, the send window 2586 should be updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 2587 = SEG.SEQ and SND.WL2 =< SEG.ACK)), set SND.WND <- 2588 SEG.WND, set SND.WL1 <- SEG.SEQ, and set SND.WL2 <- 2589 SEG.ACK. 2591 Note that SND.WND is an offset from SND.UNA, that 2592 SND.WL1 records the sequence number of the last 2593 segment used to update SND.WND, and that SND.WL2 2594 records the acknowledgment number of the last segment 2595 used to update SND.WND. The check here prevents using 2596 old segments to update the window. 2598 FIN-WAIT-1 STATE 2599 In addition to the processing for the ESTABLISHED 2600 state, if our FIN is now acknowledged then enter FIN- 2601 WAIT-2 and continue processing in that state. 2603 FIN-WAIT-2 STATE 2605 In addition to the processing for the ESTABLISHED 2606 state, if the retransmission queue is empty, the 2607 user's CLOSE can be acknowledged ("ok") but do not 2608 delete the TCB. 2610 CLOSE-WAIT STATE 2612 Do the same processing as for the ESTABLISHED state. 2614 CLOSING STATE 2616 In addition to the processing for the ESTABLISHED 2617 state, if the ACK acknowledges our FIN then enter the 2618 TIME-WAIT state, otherwise ignore the segment. 2620 LAST-ACK STATE 2622 The only thing that can arrive in this state is an 2623 acknowledgment of our FIN. If our FIN is now 2624 acknowledged, delete the TCB, enter the CLOSED state, 2625 and return. 2627 TIME-WAIT STATE 2629 The only thing that can arrive in this state is a 2630 retransmission of the remote FIN. Acknowledge it, and 2631 restart the 2 MSL timeout. 2633 sixth, check the URG bit, 2635 ESTABLISHED STATE 2636 FIN-WAIT-1 STATE 2637 FIN-WAIT-2 STATE 2639 If the URG bit is set, RCV.UP <- max(RCV.UP,SEG.UP), and 2640 signal the user that the remote side has urgent data if 2641 the urgent pointer (RCV.UP) is in advance of the data 2642 consumed. If the user has already been signaled (or is 2643 still in the "urgent mode") for this continuous sequence 2644 of urgent data, do not signal the user again. 2646 CLOSE-WAIT STATE 2647 CLOSING STATE 2648 LAST-ACK STATE 2649 TIME-WAIT 2651 This should not occur, since a FIN has been received from 2652 the remote side. Ignore the URG. 2654 seventh, process the segment text, 2656 ESTABLISHED STATE 2657 FIN-WAIT-1 STATE 2658 FIN-WAIT-2 STATE 2660 Once in the ESTABLISHED state, it is possible to deliver 2661 segment text to user RECEIVE buffers. Text from segments 2662 can be moved into buffers until either the buffer is full 2663 or the segment is empty. If the segment empties and 2664 carries an PUSH flag, then the user is informed, when the 2665 buffer is returned, that a PUSH has been received. 2667 When the TCP takes responsibility for delivering the data 2668 to the user it must also acknowledge the receipt of the 2669 data. 2671 Once the TCP takes responsibility for the data it 2672 advances RCV.NXT over the data accepted, and adjusts 2673 RCV.WND as appropriate to the current buffer 2674 availability. The total of RCV.NXT and RCV.WND should 2675 not be reduced. 2677 Please note the window management suggestions in section 2678 3.7. 2680 Send an acknowledgment of the form: 2682 2684 This acknowledgment should be piggybacked on a segment 2685 being transmitted if possible without incurring undue 2686 delay. 2688 CLOSE-WAIT STATE 2689 CLOSING STATE 2690 LAST-ACK STATE 2691 TIME-WAIT STATE 2693 This should not occur, since a FIN has been received from 2694 the remote side. Ignore the segment text. 2696 eighth, check the FIN bit, 2698 Do not process the FIN if the state is CLOSED, LISTEN or 2699 SYN-SENT since the SEG.SEQ cannot be validated; drop the 2700 segment and return. 2702 If the FIN bit is set, signal the user "connection closing" 2703 and return any pending RECEIVEs with same message, advance 2704 RCV.NXT over the FIN, and send an acknowledgment for the 2705 FIN. Note that FIN implies PUSH for any segment text not 2706 yet delivered to the user. 2708 SYN-RECEIVED STATE 2709 ESTABLISHED STATE 2711 Enter the CLOSE-WAIT state. 2713 FIN-WAIT-1 STATE 2715 If our FIN has been ACKed (perhaps in this segment), 2716 then enter TIME-WAIT, start the time-wait timer, turn 2717 off the other timers; otherwise enter the CLOSING 2718 state. 2720 FIN-WAIT-2 STATE 2722 Enter the TIME-WAIT state. Start the time-wait timer, 2723 turn off the other timers. 2725 CLOSE-WAIT STATE 2727 Remain in the CLOSE-WAIT state. 2729 CLOSING STATE 2731 Remain in the CLOSING state. 2733 LAST-ACK STATE 2735 Remain in the LAST-ACK state. 2737 TIME-WAIT STATE 2739 Remain in the TIME-WAIT state. Restart the 2 MSL 2740 time-wait timeout. 2742 and return. 2744 USER TIMEOUT 2746 USER TIMEOUT 2748 For any state if the user timeout expires, flush all queues, 2749 signal the user "error: connection aborted due to user timeout" 2750 in general and for any outstanding calls, delete the TCB, enter 2751 the CLOSED state and return. 2753 RETRANSMISSION TIMEOUT 2755 For any state if the retransmission timeout expires on a 2756 segment in the retransmission queue, send the segment at the 2757 front of the retransmission queue again, reinitialize the 2758 retransmission timer, and return. 2760 TIME-WAIT TIMEOUT 2762 If the time-wait timeout expires on a connection delete the 2763 TCB, enter the CLOSED state and return. 2765 3.10. Glossary 2767 1822 BBN Report 1822, "The Specification of the Interconnection of 2768 a Host and an IMP". The specification of interface between a 2769 host and the ARPANET. 2771 ACK 2772 A control bit (acknowledge) occupying no sequence space, 2773 which indicates that the acknowledgment field of this segment 2774 specifies the next sequence number the sender of this segment 2775 is expecting to receive, hence acknowledging receipt of all 2776 previous sequence numbers. 2778 ARPANET message 2779 The unit of transmission between a host and an IMP in the 2780 ARPANET. The maximum size is about 1012 octets (8096 bits). 2782 ARPANET packet 2783 A unit of transmission used internally in the ARPANET between 2784 IMPs. The maximum size is about 126 octets (1008 bits). 2786 connection 2787 A logical communication path identified by a pair of sockets. 2789 datagram 2790 A message sent in a packet switched computer communications 2791 network. 2793 Destination Address 2794 The destination address, usually the network and host 2795 identifiers. 2797 FIN 2798 A control bit (finis) occupying one sequence number, which 2799 indicates that the sender will send no more data or control 2800 occupying sequence space. 2802 fragment 2803 A portion of a logical unit of data, in particular an 2804 internet fragment is a portion of an internet datagram. 2806 FTP 2807 A file transfer protocol. 2809 header 2810 Control information at the beginning of a message, segment, 2811 fragment, packet or block of data. 2813 host 2814 A computer. In particular a source or destination of 2815 messages from the point of view of the communication network. 2817 Identification 2818 An Internet Protocol field. This identifying value assigned 2819 by the sender aids in assembling the fragments of a datagram. 2821 IMP 2822 The Interface Message Processor, the packet switch of the 2823 ARPANET. 2825 internet address 2826 A source or destination address specific to the host level. 2828 internet datagram 2829 The unit of data exchanged between an internet module and the 2830 higher level protocol together with the internet header. 2832 internet fragment 2833 A portion of the data of an internet datagram with an 2834 internet header. 2836 IP 2837 Internet Protocol. 2839 IRS 2840 The Initial Receive Sequence number. The first sequence 2841 number used by the sender on a connection. 2843 ISN 2844 The Initial Sequence Number. The first sequence number used 2845 on a connection, (either ISS or IRS). Selected on a clock 2846 based procedure. 2848 ISS 2849 The Initial Send Sequence number. The first sequence number 2850 used by the sender on a connection. 2852 leader 2853 Control information at the beginning of a message or block of 2854 data. In particular, in the ARPANET, the control information 2855 on an ARPANET message at the host-IMP interface. 2857 left sequence 2858 This is the next sequence number to be acknowledged by the 2859 data receiving TCP (or the lowest currently unacknowledged 2860 sequence number) and is sometimes referred to as the left 2861 edge of the send window. 2863 local packet 2864 The unit of transmission within a local network. 2866 module 2867 An implementation, usually in software, of a protocol or 2868 other procedure. 2870 MSL 2871 Maximum Segment Lifetime, the time a TCP segment can exist in 2872 the internetwork system. Arbitrarily defined to be 2 2873 minutes. 2875 octet 2876 An eight bit byte. 2878 Options 2879 An Option field may contain several options, and each option 2880 may be several octets in length. The options are used 2881 primarily in testing situations; for example, to carry 2882 timestamps. Both the Internet Protocol and TCP provide for 2883 options fields. 2885 packet 2886 A package of data with a header which may or may not be 2887 logically complete. More often a physical packaging than a 2888 logical packaging of data. 2890 port 2891 The portion of a socket that specifies which logical input or 2892 output channel of a process is associated with the data. 2894 process 2895 A program in execution. A source or destination of data from 2896 the point of view of the TCP or other host-to-host protocol. 2898 PUSH 2899 A control bit occupying no sequence space, indicating that 2900 this segment contains data that must be pushed through to the 2901 receiving user. 2903 RCV.NXT 2904 receive next sequence number 2906 RCV.UP 2907 receive urgent pointer 2909 RCV.WND 2910 receive window 2912 receive next sequence number 2913 This is the next sequence number the local TCP is expecting 2914 to receive. 2916 receive window 2917 This represents the sequence numbers the local (receiving) 2918 TCP is willing to receive. Thus, the local TCP considers 2919 that segments overlapping the range RCV.NXT to RCV.NXT + 2920 RCV.WND - 1 carry acceptable data or control. Segments 2921 containing sequence numbers entirely outside of this range 2922 are considered duplicates and discarded. 2924 RST 2925 A control bit (reset), occupying no sequence space, 2926 indicating that the receiver should delete the connection 2927 without further interaction. The receiver can determine, 2928 based on the sequence number and acknowledgment fields of the 2929 incoming segment, whether it should honor the reset command 2930 or ignore it. In no case does receipt of a segment 2931 containing RST give rise to a RST in response. 2933 RTP 2934 Real Time Protocol: A host-to-host protocol for communication 2935 of time critical information. 2937 SEG.ACK 2938 segment acknowledgment 2940 SEG.LEN 2941 segment length 2943 SEG.PRC 2944 segment precedence value 2946 SEG.SEQ 2947 segment sequence 2949 SEG.UP 2950 segment urgent pointer field 2952 SEG.WND 2953 segment window field 2955 segment 2956 A logical unit of data, in particular a TCP segment is the 2957 unit of data transfered between a pair of TCP modules. 2959 segment acknowledgment 2960 The sequence number in the acknowledgment field of the 2961 arriving segment. 2963 segment length 2964 The amount of sequence number space occupied by a segment, 2965 including any controls which occupy sequence space. 2967 segment sequence 2968 The number in the sequence field of the arriving segment. 2970 send sequence 2971 This is the next sequence number the local (sending) TCP will 2972 use on the connection. It is initially selected from an 2973 initial sequence number curve (ISN) and is incremented for 2974 each octet of data or sequenced control transmitted. 2976 send window 2977 This represents the sequence numbers which the remote 2978 (receiving) TCP is willing to receive. It is the value of 2979 the window field specified in segments from the remote (data 2980 receiving) TCP. The range of new sequence numbers which may 2981 be emitted by a TCP lies between SND.NXT and SND.UNA + 2982 SND.WND - 1. (Retransmissions of sequence numbers between 2983 SND.UNA and SND.NXT are expected, of course.) 2985 SND.NXT 2986 send sequence 2988 SND.UNA 2989 left sequence 2991 SND.UP 2992 send urgent pointer 2994 SND.WL1 2995 segment sequence number at last window update 2997 SND.WL2 2998 segment acknowledgment number at last window update 3000 SND.WND 3001 send window 3003 socket 3004 An address which specifically includes a port identifier, 3005 that is, the concatenation of an Internet Address with a TCP 3006 port. 3008 Source Address 3009 The source address, usually the network and host identifiers. 3011 SYN 3012 A control bit in the incoming segment, occupying one sequence 3013 number, used at the initiation of a connection, to indicate 3014 where the sequence numbering will start. 3016 TCB 3017 Transmission control block, the data structure that records 3018 the state of a connection. 3020 TCB.PRC 3021 The precedence of the connection. 3023 TCP 3024 Transmission Control Protocol: A host-to-host protocol for 3025 reliable communication in internetwork environments. 3027 TOS 3028 Type of Service, an Internet Protocol field. 3030 Type of Service 3031 An Internet Protocol field which indicates the type of 3032 service for this internet fragment. 3034 URG 3035 A control bit (urgent), occupying no sequence space, used to 3036 indicate that the receiving user should be notified to do 3037 urgent processing as long as there is data to be consumed 3038 with sequence numbers less than the value indicated in the 3039 urgent pointer. 3041 urgent pointer 3042 A control field meaningful only when the URG bit is on. This 3043 field communicates the value of the urgent pointer which 3044 indicates the data octet associated with the sending user's 3045 urgent call. 3047 4. Changes from RFC 793 3049 TODO: this entire section will need to be edited and condensed before 3050 the document is finalized. It currently represents a plan for future 3051 updates mixed with notes on what changes have already been completed. 3053 It should likely be an appendix, and in the final RFC, only the 3054 changes should be listed, and not what particular revision of the I-D 3055 they were made within. 3057 The -00 revision of this document was merely a proposal and rough 3058 plan for updating RFC 793. 3060 The -01 revision of this document incorporates the content of RFC 793 3061 Section 3 titled "FUNCTIONAL SPECIFICATION". Other content from RFC 3062 793 has not been incorporated. The -01 revision of this document 3063 makes some minor formatting changes to the RFC 793 content in order 3064 to convert the content into XML2RFC format and account for left-out 3065 parts of RFC 793. For instance, figure numbering differs and some 3066 indentation is not exactly the same. 3068 The -02 revision of this document incorporates errata that have been 3069 verified: 3071 Errata ID 573: Reported by Bob Braden (note: This errata basically 3072 is just a reminder that RFC 1122 updates 793. Some of the 3073 associated changes are left pending to a separate revision that 3074 incorporates 1122. Bob's mention of PUSH in 793 section 2.8 was 3075 not applicable here because that section was not part of the 3076 "functional specification" and the note on the urgent pointer will 3077 be revised when changes to account for RFC 6093 are incorporated. 3078 Also the 1122 text on the retransmission timeout also has been 3079 updated by subsequent RFCs, so the change here deviates from Bob's 3080 suggestion to apply the 1122 text.) 3081 Errata ID 574: Reported by Yin Shuming 3082 Errata ID 700: Reported by Yin Shuming 3083 Errata ID 701: Reported by Yin Shuming 3084 Errata ID 1283: Reported by Pei-chun Cheng 3085 Errata ID 1561: Reported by Constantin Hagemeier 3086 Errata ID 1562: Reported by Constantin Hagemeier 3087 Errata ID 1564: Reported by Constantin Hagemeier 3088 Errata ID 1565: Reported by Constantin Hagemeier 3089 Errata ID 1571: Reported by Constantin Hagemeier 3090 Errata ID 1572: Reported by Constantin Hagemeier 3091 Errata ID 2296: Reported by Vishwas Manral 3092 Errata ID 2297: Reported by Vishwas Manral 3093 Errata ID 2298: Reported by Vishwas Manral 3094 Errata ID 2748: Reported by Mykyta Yevstifeyev 3095 Errata ID 2749: Reported by Mykyta Yevstifeyev 3096 Errata ID 2934: Reported by Constantin Hagemeier 3097 Errata ID 3213: Reported by EugnJun Yi 3098 Errata ID 3300: Reported by Botong Huang 3099 Errata ID 3301: Reported by Botong Huang 3100 Note: Some verified errata were not used in this update, as they 3101 relate to sections of RFC 793 elided from this document. These 3102 include Errata ID 572, 575, and 1569. 3103 Note: Errata ID 3602 was not applied in this revision as it is 3104 duplicative of the 1122 corrections. 3105 There is an errata 3305 currently reported that need to be 3106 verified, held, or rejected by the ADs; it is addressing the same 3107 issue as draft-gont-tcpm-tcp-seq-validation and was not attempted 3108 to be applied to this document. 3110 Not related to RFC 793 content, this revision also makes small tweaks 3111 to the introductory text, fixes indentation of the pseudoheader 3112 diagram, and notes that the Security Considerations should also 3113 include privacy, when this section is written. 3115 TODO: Incomplete list of planned changes - these need to be added to 3116 and made more specific, as the document proceeds: 3118 1. incorporate 1122 additions 3119 2. point to major additional docs like 1323bis and 5681 3120 3. incorporate relevant parts of 3168 (ECN) 3121 4. incorporate 6093 (urgent pointer) 3122 5. incorporate 6528 (sequence number) 3123 6. incorporate Fernando's new number-checking fixes (if past the 3124 IESG in time) 3125 7. point to PMTUD? 3126 8. point to 5461 (soft errors) 3127 9. mention 5961 state machine option 3128 10. mention 6161 (reducing TIME-WAIT) 3129 11. incorporate 6429 (ZWP/persist) 3130 12. incorporate 6691 (MSS) 3132 5. IANA Considerations 3134 This memo includes no request to IANA. Existing IANA registries for 3135 TCP parameters are sufficient. 3137 TODO: check whether entries pointing to 793 and other documents 3138 obsoleted by this one should be updated to point to this one instead. 3140 6. Security and Privacy Considerations 3142 TODO 3144 Editor's Note: Scott Brim mentioned that this should include a 3145 PERPASS/privacy review. 3147 7. Acknowledgements 3149 This document is largely a revision of RFC 793, which Jon Postel was 3150 the editor of. Due to his excellent work, it was able to last for 3151 three decades before we felt the need to revise it. 3153 Andre Oppermann was a contributor and helped to edit the first 3154 revision of this document. 3156 We are thankful for the assistance of the IETF TCPM working group 3157 chairs: 3159 Michael Scharf 3160 Yoshifumi Nishida 3161 Pasi Sarolahti 3163 On the TCPM mailing list, and at the IETF 88 meeting in Vancouver, 3164 helpful comments, critiques, and reviews were received from (listed 3165 alphebetically): David Borman, Yuchung Cheng, Martin Duke, Kevin 3166 Lahey, Kevin Mason, Matt Mathis, Hagen Paul Pfeifer, Anthony 3167 Sabatini, Joe Touch, Reji Varghese, Lloyd Wood, and Alex Zimmermann. 3169 This document includes content from errata that were reported by 3170 (listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan, 3171 Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta 3172 Yevstifeyev, EungJun Yi, Botong Huang. 3174 8. References 3176 8.1. Normative References 3178 [1] Bradner, S., "Key words for use in RFCs to Indicate 3179 Requirement Levels", BCP 14, RFC 2119, March 1997. 3181 8.2. Informative References 3183 [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 3184 793, September 1981. 3186 [3] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. 3187 Zimmermann, "A Roadmap for Transmission Control Protocol 3188 (TCP) Specification Documents", draft-ietf-tcpm-tcp- 3189 rfc4614bis-00 (work in progress), August 2013. 3191 Author's Address 3193 Wesley M. Eddy (editor) 3194 MTI Systems 3195 US 3197 Email: wes@mti-systems.com