idnits 2.17.00 (12 Aug 2021) /tmp/idnits61554/draft-eddy-rfc793bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 20, 2014) is 2983 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: draft-ietf-tcpm-tcp-rfc4614bis has been published as RFC 7414 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force W. Eddy 3 Internet-Draft MTI Systems 4 Obsoletes: 793 (if approved) March 20, 2014 5 Intended status: Standards Track 6 Expires: September 21, 2014 8 Transmission Control Protocol Specification 9 draft-eddy-rfc793bis-01 11 Abstract 13 This document specifies the Internet's Transmission Control Protocol 14 (TCP). TCP is an important transport layer protocol in the Internet 15 stack, and has continuously evolved over decades of use and growth of 16 the Internet. Over this time, a number of changes have been made to 17 TCP as it was specified in RFC 793, though these have only been 18 documented in a piecemeal fashion. This document collects and brings 19 those changes together with the protocol specification from RFC 793. 20 This document obsoletes RFC 793 and several other RFCs (TODO: list 21 actual RFCs). 23 Requirements Language 25 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 26 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 27 document are to be interpreted as described in RFC 2119 [1]. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on September 21, 2014. 46 Copyright Notice 48 Copyright (c) 2014 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 This document may contain material from IETF Documents or IETF 62 Contributions published or made publicly available before November 63 10, 2008. The person(s) controlling the copyright in some of this 64 material may not have granted the IETF Trust the right to allow 65 modifications of such material outside the IETF Standards Process. 66 Without obtaining an adequate license from the person(s) controlling 67 the copyright in such materials, this document may not be modified 68 outside the IETF Standards Process, and derivative works of it may 69 not be created outside the IETF Standards Process, except to format 70 it for publication as an RFC or to translate it into languages other 71 than English. 73 Table of Contents 75 1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 3 76 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 77 3. Functional Specification . . . . . . . . . . . . . . . . . . 4 78 3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 4 79 3.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 9 80 3.3. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 14 81 3.4. Establishing a connection . . . . . . . . . . . . . . . . 20 82 3.5. Closing a Connection . . . . . . . . . . . . . . . . . . 27 83 3.6. Precedence and Security . . . . . . . . . . . . . . . . . 29 84 3.7. Data Communication . . . . . . . . . . . . . . . . . . . 30 85 3.8. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 34 86 3.8.1. User/TCP Interface . . . . . . . . . . . . . . . . . 34 87 3.8.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 40 88 3.9. Event Processing . . . . . . . . . . . . . . . . . . . . 41 89 3.10. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 64 90 4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 69 91 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 70 92 6. Security Considerations . . . . . . . . . . . . . . . . . . . 71 93 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 71 94 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 71 95 8.1. Normative References . . . . . . . . . . . . . . . . . . 71 96 8.2. Informative References . . . . . . . . . . . . . . . . . 71 97 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 71 99 1. Purpose and Scope 101 In 1983, RFC 793 [2] was released, documenting the Transmission 102 Control Protocol (TCP), and replacing earlier specifications for TCP 103 that had been published in the past. 105 Since that time, TCP has been implemented many times, and has been 106 used as a transport protocol for numerous applications on the 107 Internet. 109 For several decades, RFC 793 plus a number of other documents have 110 combined to serve as the specification for TCP [3]. Over time, 111 errata have been identified on RFC 793, as well as deficiencies in 112 security, performance, and other aspects. A number of enhancements 113 has grown and been documented separately. 115 The purpose of this document is to bring together all of the IETF 116 Standards Track changes that have been made to the basic TCP 117 functional specification and unify them into an update of the RFC 793 118 protocol specification. Some companion documents are referenced for 119 important algorithms that TCP uses (e.g. for congestion control), but 120 have not been attempted to include in this document. This is a 121 concious choice, as this base specification can be used with multiple 122 additional algorithms that are developed and incorporated separately, 123 but all TCP implementations need to implement this specification as a 124 common basis in order to interoperate. As some additional TCP 125 features have become quite complicated themselves (e.g. advanced loss 126 recovery and congestion control), future companion documents may 127 attempt to similarly bring these together. 129 In addition to the protocol specification that descibes the TCP 130 segment format, generation, and processing rules that are to be 131 implemented in code, RFC 793 and other updates also contain 132 informative and descriptive text for human readers to understand 133 aspects of the protocol design and operation. This document does not 134 attempt to alter or update those parts of RFC 793, and is focused 135 only on updating the normative protocol specification. We preserve 136 references to the documentation containing the important explanations 137 and rationale, where appropriate. 139 This document is intended to be useful both in checking existing TCP 140 implementations for conformance, as well as in writing new 141 implementations. 143 2. Introduction 145 RFC 793 contains a discussion of the TCP design goals and provides 146 examples of its operation, including examples of connection 147 establishment, closing connections, and retransmitting packets to 148 repair losses. 150 This document describes the functionality expected in modern 151 implementations of TCP, and replaces the protocol specification in 152 RFC 793. It does not replicate or attempt to update the examples and 153 other discussion in RFC 793. Other documents are referenced to 154 provide explanation of the theory of operation, rationale, and 155 detailed discussion of design decisions. This document only focuses 156 on the normative behavior of the protocol. 158 TEMPORARY EDITOR'S NOTE: This is an early revision in the process of 159 updating RFC 793. Many planned changes are not yet incorporated. 160 Please do not use this revision as a basis for any work or reference. 162 TODO: describe the subsequent structure of the document to-be (e.g. 163 will it follow the newtcp BSD implementation?), and mention that a 164 list of changes from RFC 793 will be kept in the final section 166 TEMPORARY EDITOR'S NOTE: the current revision of this document does 167 not yet collect all of the changes that will be in the final version. 168 The set of content changes planned for future revisions is roughly: 170 -00 was a proposal for the scope of the document and description 171 of the need for an update to RFC 793 173 -01 incorporated the RFC 793 section 3 content with no additional 174 changes into XML2RFC format for easy tracking of the changes 175 between RFC 793 and future revisions of the document 177 -02 is planned to incorporate the verified errata on RFC 793 179 -03 and beyond are intended to incorporate changes from other RFCs 180 that updated 793 182 3. Functional Specification 184 3.1. Header Format 186 TCP segments are sent as internet datagrams. The Internet Protocol 187 header carries several information fields, including the source and 188 destination host addresses [2]. A TCP header follows the internet 189 header, supplying information specific to the TCP protocol. This 190 division allows for the existence of host level protocols other than 191 TCP. 193 TCP Header Format 195 0 1 2 3 196 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 197 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 198 | Source Port | Destination Port | 199 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 200 | Sequence Number | 201 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 202 | Acknowledgment Number | 203 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 204 | Data | |U|A|P|R|S|F| | 205 | Offset| Reserved |R|C|S|S|Y|I| Window | 206 | | |G|K|H|T|N|N| | 207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 208 | Checksum | Urgent Pointer | 209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 210 | Options | Padding | 211 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 212 | data | 213 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 215 TCP Header Format 217 Note that one tick mark represents one bit position. 219 Figure 1 221 Source Port: 16 bits 223 The source port number. 225 Destination Port: 16 bits 227 The destination port number. 229 Sequence Number: 32 bits 231 The sequence number of the first data octet in this segment (except 232 when SYN is present). If SYN is present the sequence number is the 233 initial sequence number (ISN) and the first data octet is ISN+1. 235 Acknowledgment Number: 32 bits 236 If the ACK control bit is set this field contains the value of the 237 next sequence number the sender of the segment is expecting to 238 receive. Once a connection is established this is always sent. 240 Data Offset: 4 bits 242 The number of 32 bit words in the TCP Header. This indicates where 243 the data begins. The TCP header (even one including options) is an 244 integral number of 32 bits long. 246 Reserved: 6 bits 248 Reserved for future use. Must be zero. 250 Control Bits: 6 bits (from left to right): 252 URG: Urgent Pointer field significant 253 ACK: Acknowledgment field significant 254 PSH: Push Function 255 RST: Reset the connection 256 SYN: Synchronize sequence numbers 257 FIN: No more data from sender 259 Window: 16 bits 261 The number of data octets beginning with the one indicated in the 262 acknowledgment field which the sender of this segment is willing to 263 accept. 265 Checksum: 16 bits 267 The checksum field is the 16 bit one's complement of the one's 268 complement sum of all 16 bit words in the header and text. If a 269 segment contains an odd number of header and text octets to be 270 checksummed, the last octet is padded on the right with zeros to 271 form a 16 bit word for checksum purposes. The pad is not 272 transmitted as part of the segment. While computing the checksum, 273 the checksum field itself is replaced with zeros. 275 The checksum also covers a 96 bit pseudo header conceptually 276 prefixed to the TCP header. This pseudo header contains the Source 277 Address, the Destination Address, the Protocol, and TCP length. 278 This gives the TCP protection against misrouted segments. This 279 information is carried in the Internet Protocol and is transferred 280 across the TCP/Network interface in the arguments or results of 281 calls by the TCP on the IP. 283 +--------+--------+--------+--------+ 284 | Source Address | 285 +--------+--------+--------+--------+ 286 | Destination Address | 287 +--------+--------+--------+--------+ 288 | zero | PTCL | TCP Length | 289 +--------+--------+--------+--------+ 291 The TCP Length is the TCP header length plus the data length in 292 octets (this is not an explicitly transmitted quantity, but is 293 computed), and it does not count the 12 octets of the pseudo 294 header. 296 Urgent Pointer: 16 bits 298 This field communicates the current value of the urgent pointer as 299 a positive offset from the sequence number in this segment. The 300 urgent pointer points to the sequence number of the octet following 301 the urgent data. This field is only be interpreted in segments 302 with the URG control bit set. 304 Options: variable 306 Options may occupy space at the end of the TCP header and are a 307 multiple of 8 bits in length. All options are included in the 308 checksum. An option may begin on any octet boundary. There are 309 two cases for the format of an option: 311 Case 1: A single octet of option-kind. 313 Case 2: An octet of option-kind, an octet of option-length, and 314 the actual option-data octets. 316 The option-length counts the two octets of option-kind and option- 317 length as well as the option-data octets. 319 Note that the list of options may be shorter than the data offset 320 field might imply. The content of the header beyond the End-of- 321 Option option must be header padding (i.e., zero). 323 A TCP must implement all options. 325 Currently defined options include (kind indicated in octal): 327 Kind Length Meaning 328 ---- ------ ------- 329 0 - End of option list. 330 1 - No-Operation. 331 2 4 Maximum Segment Size. 333 Specific Option Definitions 335 End of Option List 337 +--------+ 338 |00000000| 339 +--------+ 340 Kind=0 342 This option code indicates the end of the option list. This 343 might not coincide with the end of the TCP header according to 344 the Data Offset field. This is used at the end of all options, 345 not the end of each option, and need only be used if the end of 346 the options would not otherwise coincide with the end of the TCP 347 header. 349 No-Operation 351 +--------+ 352 |00000001| 353 +--------+ 354 Kind=1 356 This option code may be used between options, for example, to 357 align the beginning of a subsequent option on a word boundary. 358 There is no guarantee that senders will use this option, so 359 receivers must be prepared to process options even if they do 360 not begin on a word boundary. 362 Maximum Segment Size 364 +--------+--------+---------+--------+ 365 |00000010|00000100| max seg size | 366 +--------+--------+---------+--------+ 367 Kind=2 Length=4 369 Maximum Segment Size Option Data: 16 bits 371 If this option is present, then it communicates the maximum 372 receive segment size at the TCP which sends this segment. This 373 field must only be sent in the initial connection request (i.e., 374 in segments with the SYN control bit set). If this option is 375 not used, any segment size is allowed. 377 Padding: variable 379 The TCP header padding is used to ensure that the TCP header ends 380 and data begins on a 32 bit boundary. The padding is composed of 381 zeros. 383 3.2. Terminology 385 Before we can discuss very much about the operation of the TCP we 386 need to introduce some detailed terminology. The maintenance of a 387 TCP connection requires the remembering of several variables. We 388 conceive of these variables being stored in a connection record 389 called a Transmission Control Block or TCB. Among the variables 390 stored in the TCB are the local and remote socket numbers, the 391 security and precedence of the connection, pointers to the user's 392 send and receive buffers, pointers to the retransmit queue and to the 393 current segment. In addition several variables relating to the send 394 and receive sequence numbers are stored in the TCB. 396 Send Sequence Variables 398 SND.UNA - send unacknowledged 399 SND.NXT - send next 400 SND.WND - send window 401 SND.UP - send urgent pointer 402 SND.WL1 - segment sequence number used for last window update 403 SND.WL2 - segment acknowledgment number used for last window 404 update 405 ISS - initial send sequence number 407 Receive Sequence Variables 409 RCV.NXT - receive next 410 RCV.WND - receive window 411 RCV.UP - receive urgent pointer 412 IRS - initial receive sequence number 414 The following diagrams may help to relate some of these variables to 415 the sequence space. 417 Send Sequence Space 419 1 2 3 4 420 ----------|----------|----------|---------- 421 SND.UNA SND.NXT SND.UNA 422 +SND.WND 424 1 - old sequence numbers which have been acknowledged 425 2 - sequence numbers of unacknowledged data 426 3 - sequence numbers allowed for new data transmission 427 4 - future sequence numbers which are not yet allowed 429 Send Sequence Space 431 Figure 2 433 The send window is the portion of the sequence space labeled 3 in 434 Figure 2. 436 Receive Sequence Space 438 1 2 3 439 ----------|----------|---------- 440 RCV.NXT RCV.NXT 441 +RCV.WND 443 1 - old sequence numbers which have been acknowledged 444 2 - sequence numbers allowed for new reception 445 3 - future sequence numbers which are not yet allowed 447 Receive Sequence Space 449 Figure 3 451 The receive window is the portion of the sequence space labeled 2 in 452 Figure 3. 454 There are also some variables used frequently in the discussion that 455 take their values from the fields of the current segment. 457 Current Segment Variables 459 SEG.SEQ - segment sequence number 460 SEG.ACK - segment acknowledgment number 461 SEG.LEN - segment length 462 SEG.WND - segment window 463 SEG.UP - segment urgent pointer 464 SEG.PRC - segment precedence value 466 A connection progresses through a series of states during its 467 lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED, 468 ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, 469 TIME-WAIT, and the fictional state CLOSED. CLOSED is fictional 470 because it represents the state when there is no TCB, and therefore, 471 no connection. Briefly the meanings of the states are: 473 LISTEN - represents waiting for a connection request from any 474 remote TCP and port. 476 SYN-SENT - represents waiting for a matching connection request 477 after having sent a connection request. 479 SYN-RECEIVED - represents waiting for a confirming connection 480 request acknowledgment after having both received and sent a 481 connection request. 483 ESTABLISHED - represents an open connection, data received can be 484 delivered to the user. The normal state for the data transfer 485 phase of the connection. 487 FIN-WAIT-1 - represents waiting for a connection termination 488 request from the remote TCP, or an acknowledgment of the 489 connection termination request previously sent. 491 FIN-WAIT-2 - represents waiting for a connection termination 492 request from the remote TCP. 494 CLOSE-WAIT - represents waiting for a connection termination 495 request from the local user. 497 CLOSING - represents waiting for a connection termination request 498 acknowledgment from the remote TCP. 500 LAST-ACK - represents waiting for an acknowledgment of the 501 connection termination request previously sent to the remote TCP 502 (which includes an acknowledgment of its connection termination 503 request). 505 TIME-WAIT - represents waiting for enough time to pass to be sure 506 the remote TCP received the acknowledgment of its connection 507 termination request. 509 CLOSED - represents no connection state at all. 511 A TCP connection progresses from one state to another in response to 512 events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE, 513 ABORT, and STATUS; the incoming segments, particularly those 514 containing the SYN, ACK, RST and FIN flags; and timeouts. 516 The state diagram in Figure 4 illustrates only state changes, 517 together with the causing events and resulting actions, but addresses 518 neither error conditions nor actions which are not connected with 519 state changes. In a later section, more detail is offered with 520 respect to the reaction of the TCP to events. 522 NOTE BENE: this diagram is only a summary and must not be taken as 523 the total specification. 525 +---------+ ---------\ active OPEN 526 | CLOSED | \ ----------- 527 +---------+<---------\ \ create TCB 528 | ^ \ \ snd SYN 529 passive OPEN | | CLOSE \ \ 530 ------------ | | ---------- \ \ 531 create TCB | | delete TCB \ \ 532 V | \ \ 533 +---------+ CLOSE | \ 534 | LISTEN | ---------- | | 535 +---------+ delete TCB | | 536 rcv SYN | | SEND | | 537 ----------- | | ------- | V 538 +---------+ snd SYN,ACK / \ snd SYN +---------+ 539 | |<----------------- ------------------>| | 540 | SYN | rcv SYN | SYN | 541 | RCVD |<-----------------------------------------------| SENT | 542 | | snd ACK | | 543 | |------------------ -------------------| | 544 +---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+ 545 | -------------- | | ----------- 546 | x | | snd ACK 547 | V V 548 | CLOSE +---------+ 549 | ------- | ESTAB | 550 | snd FIN +---------+ 551 | CLOSE | | rcv FIN 552 V ------- | | ------- 553 +---------+ snd FIN / \ snd ACK +---------+ 554 | FIN |<----------------- ------------------>| CLOSE | 555 | WAIT-1 |------------------ | WAIT | 556 +---------+ rcv FIN \ +---------+ 557 | rcv ACK of FIN ------- | CLOSE | 558 | -------------- snd ACK | ------- | 559 V x V snd FIN V 560 +---------+ +---------+ +---------+ 561 |FINWAIT-2| | CLOSING | | LAST-ACK| 562 +---------+ +---------+ +---------+ 563 | rcv ACK of FIN | rcv ACK of FIN | 564 | rcv FIN -------------- | Timeout=2MSL -------------- | 565 | ------- x V ------------ x V 566 \ snd ACK +---------+delete TCB +---------+ 567 ------------------------>|TIME WAIT|------------------>| CLOSED | 568 +---------+ +---------+ 570 TCP Connection State Diagram 572 Figure 4 574 3.3. Sequence Numbers 576 A fundamental notion in the design is that every octet of data sent 577 over a TCP connection has a sequence number. Since every octet is 578 sequenced, each of them can be acknowledged. The acknowledgment 579 mechanism employed is cumulative so that an acknowledgment of 580 sequence number X indicates that all octets up to but not including X 581 have been received. This mechanism allows for straight-forward 582 duplicate detection in the presence of retransmission. Numbering of 583 octets within a segment is that the first data octet immediately 584 following the header is the lowest numbered, and the following octets 585 are numbered consecutively. 587 It is essential to remember that the actual sequence number space is 588 finite, though very large. This space ranges from 0 to 2**32 - 1. 589 Since the space is finite, all arithmetic dealing with sequence 590 numbers must be performed modulo 2**32. This unsigned arithmetic 591 preserves the relationship of sequence numbers as they cycle from 592 2**32 - 1 to 0 again. There are some subtleties to computer modulo 593 arithmetic, so great care should be taken in programming the 594 comparison of such values. The symbol "=<" means "less than or 595 equal" (modulo 2**32). 597 The typical kinds of sequence number comparisons which the TCP must 598 perform include: 600 (a) Determining that an acknowledgment refers to some sequence 601 number sent but not yet acknowledged. 603 (b) Determining that all sequence numbers occupied by a segment 604 have been acknowledged (e.g., to remove the segment from a 605 retransmission queue). 607 (c) Determining that an incoming segment contains sequence numbers 608 which are expected (i.e., that the segment "overlaps" the receive 609 window). 611 In response to sending data the TCP will receive acknowledgments. 612 The following comparisons are needed to process the acknowledgments. 614 SND.UNA = oldest unacknowledged sequence number 616 SND.NXT = next sequence number to be sent 618 SEG.ACK = acknowledgment from the receiving TCP (next sequence 619 number expected by the receiving TCP) 621 SEG.SEQ = first sequence number of a segment 622 SEG.LEN = the number of octets occupied by the data in the segment 623 (counting SYN and FIN) 625 SEG.SEQ+SEG.LEN-1 = last sequence number of a segment 627 A new acknowledgment (called an "acceptable ack"), is one for which 628 the inequality below holds: 630 SND.UNA < SEG.ACK =< SND.NXT 632 A segment on the retransmission queue is fully acknowledged if the 633 sum of its sequence number and length is less or equal than the 634 acknowledgment value in the incoming segment. 636 When data is received the following comparisons are needed: 638 RCV.NXT = next sequence number expected on an incoming segments, 639 and is the left or lower edge of the receive window 641 RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming 642 segment, and is the right or upper edge of the receive window 644 SEG.SEQ = first sequence number occupied by the incoming segment 646 SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming 647 segment 649 A segment is judged to occupy a portion of valid receive sequence 650 space if 652 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 654 or 656 RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 658 The first part of this test checks to see if the beginning of the 659 segment falls in the window, the second part of the test checks to 660 see if the end of the segment falls in the window; if the segment 661 passes either part of the test it contains data in the window. 663 Actually, it is a little more complicated than this. Due to zero 664 windows and zero length segments, we have four cases for the 665 acceptability of an incoming segment: 667 Segment Receive Test 668 Length Window 669 ------- ------- ------------------------------------------- 671 0 0 SEG.SEQ = RCV.NXT 673 0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 675 >0 0 not acceptable 677 >0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 678 or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 680 Note that when the receive window is zero no segments should be 681 acceptable except ACK segments. Thus, it is be possible for a TCP to 682 maintain a zero receive window while transmitting data and receiving 683 ACKs. However, even when the receive window is zero, a TCP must 684 process the RST and URG fields of all incoming segments. 686 We have taken advantage of the numbering scheme to protect certain 687 control information as well. This is achieved by implicitly 688 including some control flags in the sequence space so they can be 689 retransmitted and acknowledged without confusion (i.e., one and only 690 one copy of the control will be acted upon). Control information is 691 not physically carried in the segment data space. Consequently, we 692 must adopt rules for implicitly assigning sequence numbers to 693 control. The SYN and FIN are the only controls requiring this 694 protection, and these controls are used only at connection opening 695 and closing. For sequence number purposes, the SYN is considered to 696 occur before the first actual data octet of the segment in which it 697 occurs, while the FIN is considered to occur after the last actual 698 data octet in a segment in which it occurs. The segment length 699 (SEG.LEN) includes both data and sequence space occupying controls. 700 When a SYN is present then SEG.SEQ is the sequence number of the SYN. 702 Initial Sequence Number Selection 704 The protocol places no restriction on a particular connection being 705 used over and over again. A connection is defined by a pair of 706 sockets. New instances of a connection will be referred to as 707 incarnations of the connection. The problem that arises from this is 708 -- "how does the TCP identify duplicate segments from previous 709 incarnations of the connection?" This problem becomes apparent if 710 the connection is being opened and closed in quick succession, or if 711 the connection breaks with loss of memory and is then reestablished. 713 To avoid confusion we must prevent segments from one incarnation of a 714 connection from being used while the same sequence numbers may still 715 be present in the network from an earlier incarnation. We want to 716 assure this, even if a TCP crashes and loses all knowledge of the 717 sequence numbers it has been using. When new connections are 718 created, an initial sequence number (ISN) generator is employed which 719 selects a new 32 bit ISN. The generator is bound to a (possibly 720 fictitious) 32 bit clock whose low order bit is incremented roughly 721 every 4 microseconds. Thus, the ISN cycles approximately every 4.55 722 hours. Since we assume that segments will stay in the network no 723 more than the Maximum Segment Lifetime (MSL) and that the MSL is less 724 than 4.55 hours we can reasonably assume that ISN's will be unique. 726 For each connection there is a send sequence number and a receive 727 sequence number. The initial send sequence number (ISS) is chosen by 728 the data sending TCP, and the initial receive sequence number (IRS) 729 is learned during the connection establishing procedure. 731 For a connection to be established or initialized, the two TCPs must 732 synchronize on each other's initial sequence numbers. This is done 733 in an exchange of connection establishing segments carrying a control 734 bit called "SYN" (for synchronize) and the initial sequence numbers. 735 As a shorthand, segments carrying the SYN bit are also called "SYNs". 736 Hence, the solution requires a suitable mechanism for picking an 737 initial sequence number and a slightly involved handshake to exchange 738 the ISN's. 740 The synchronization requires each side to send it's own initial 741 sequence number and to receive a confirmation of it in acknowledgment 742 from the other side. Each side must also receive the other side's 743 initial sequence number and send a confirming acknowledgment. 745 1) A --> B SYN my sequence number is X 746 2) A <-- B ACK your sequence number is X 747 3) A <-- B SYN my sequence number is Y 748 4) A --> B ACK your sequence number is Y 750 Because steps 2 and 3 can be combined in a single message this is 751 called the three way (or three message) handshake. 753 A three way handshake is necessary because sequence numbers are not 754 tied to a global clock in the network, and TCPs may have different 755 mechanisms for picking the ISN's. The receiver of the first SYN has 756 no way of knowing whether the segment was an old delayed one or not, 757 unless it remembers the last sequence number used on the connection 758 (which is not always possible), and so it must ask the sender to 759 verify this SYN. The three way handshake and the advantages of a 760 clock-driven scheme are discussed in [3]. 762 Knowing When to Keep Quiet 763 To be sure that a TCP does not create a segment that carries a 764 sequence number which may be duplicated by an old segment remaining 765 in the network, the TCP must keep quiet for a maximum segment 766 lifetime (MSL) before assigning any sequence numbers upon starting up 767 or recovering from a crash in which memory of sequence numbers in use 768 was lost. For this specification the MSL is taken to be 2 minutes. 769 This is an engineering choice, and may be changed if experience 770 indicates it is desirable to do so. Note that if a TCP is 771 reinitialized in some sense, yet retains its memory of sequence 772 numbers in use, then it need not wait at all; it must only be sure to 773 use sequence numbers larger than those recently used. 775 The TCP Quiet Time Concept 777 This specification provides that hosts which "crash" without 778 retaining any knowledge of the last sequence numbers transmitted on 779 each active (i.e., not closed) connection shall delay emitting any 780 TCP segments for at least the agreed Maximum Segment Lifetime (MSL) 781 in the internet system of which the host is a part. In the 782 paragraphs below, an explanation for this specification is given. 783 TCP implementors may violate the "quiet time" restriction, but only 784 at the risk of causing some old data to be accepted as new or new 785 data rejected as old duplicated by some receivers in the internet 786 system. 788 TCPs consume sequence number space each time a segment is formed and 789 entered into the network output queue at a source host. The 790 duplicate detection and sequencing algorithm in the TCP protocol 791 relies on the unique binding of segment data to sequence space to the 792 extent that sequence numbers will not cycle through all 2**32 values 793 before the segment data bound to those sequence numbers has been 794 delivered and acknowledged by the receiver and all duplicate copies 795 of the segments have "drained" from the internet. Without such an 796 assumption, two distinct TCP segments could conceivably be assigned 797 the same or overlapping sequence numbers, causing confusion at the 798 receiver as to which data is new and which is old. Remember that 799 each segment is bound to as many consecutive sequence numbers as 800 there are octets of data in the segment. 802 Under normal conditions, TCPs keep track of the next sequence number 803 to emit and the oldest awaiting acknowledgment so as to avoid 804 mistakenly using a sequence number over before its first use has been 805 acknowledged. This alone does not guarantee that old duplicate data 806 is drained from the net, so the sequence space has been made very 807 large to reduce the probability that a wandering duplicate will cause 808 trouble upon arrival. At 2 megabits/sec. it takes 4.5 hours to use 809 up 2**32 octets of sequence space. Since the maximum segment 810 lifetime in the net is not likely to exceed a few tens of seconds, 811 this is deemed ample protection for foreseeable nets, even if data 812 rates escalate to l0's of megabits/sec. At 100 megabits/sec, the 813 cycle time is 5.4 minutes which may be a little short, but still 814 within reason. 816 The basic duplicate detection and sequencing algorithm in TCP can be 817 defeated, however, if a source TCP does not have any memory of the 818 sequence numbers it last used on a given connection. For example, if 819 the TCP were to start all connections with sequence number 0, then 820 upon crashing and restarting, a TCP might re-form an earlier 821 connection (possibly after half-open connection resolution) and emit 822 packets with sequence numbers identical to or overlapping with 823 packets still in the network which were emitted on an earlier 824 incarnation of the same connection. In the absence of knowledge 825 about the sequence numbers used on a particular connection, the TCP 826 specification recommends that the source delay for MSL seconds before 827 emitting segments on the connection, to allow time for segments from 828 the earlier connection incarnation to drain from the system. 830 Even hosts which can remember the time of day and used it to select 831 initial sequence number values are not immune from this problem 832 (i.e., even if time of day is used to select an initial sequence 833 number for each new connection incarnation). 835 Suppose, for example, that a connection is opened starting with 836 sequence number S. Suppose that this connection is not used much and 837 that eventually the initial sequence number function (ISN(t)) takes 838 on a value equal to the sequence number, say S1, of the last segment 839 sent by this TCP on a particular connection. Now suppose, at this 840 instant, the host crashes, recovers, and establishes a new 841 incarnation of the connection. The initial sequence number chosen is 842 S1 = ISN(t) -- last used sequence number on old incarnation of 843 connection! If the recovery occurs quickly enough, any old 844 duplicates in the net bearing sequence numbers in the neighborhood of 845 S1 may arrive and be treated as new packets by the receiver of the 846 new incarnation of the connection. 848 The problem is that the recovering host may not know for how long it 849 crashed nor does it know whether there are still old duplicates in 850 the system from earlier connection incarnations. 852 One way to deal with this problem is to deliberately delay emitting 853 segments for one MSL after recovery from a crash- this is the "quite 854 time" specification. Hosts which prefer to avoid waiting are willing 855 to risk possible confusion of old and new packets at a given 856 destination may choose not to wait for the "quite time". 857 Implementors may provide TCP users with the ability to select on a 858 connection by connection basis whether to wait after a crash, or may 859 informally implement the "quite time" for all connections. 860 Obviously, even where a user selects to "wait," this is not necessary 861 after the host has been "up" for at least MSL seconds. 863 To summarize: every segment emitted occupies one or more sequence 864 numbers in the sequence space, the numbers occupied by a segment are 865 "busy" or "in use" until MSL seconds have passed, upon crashing a 866 block of space-time is occupied by the octets of the last emitted 867 segment, if a new connection is started too soon and uses any of the 868 sequence numbers in the space-time footprint of the last segment of 869 the previous connection incarnation, there is a potential sequence 870 number overlap area which could cause confusion at the receiver. 872 3.4. Establishing a connection 874 The "three-way handshake" is the procedure used to establish a 875 connection. This procedure normally is initiated by one TCP and 876 responded to by another TCP. The procedure also works if two TCP 877 simultaneously initiate the procedure. When simultaneous attempt 878 occurs, each TCP receives a "SYN" segment which carries no 879 acknowledgment after it has sent a "SYN". Of course, the arrival of 880 an old duplicate "SYN" segment can potentially make it appear, to the 881 recipient, that a simultaneous connection initiation is in progress. 882 Proper use of "reset" segments can disambiguate these cases. 884 Several examples of connection initiation follow. Although these 885 examples do not show connection synchronization using data-carrying 886 segments, this is perfectly legitimate, so long as the receiving TCP 887 doesn't deliver the data to the user until it is clear the data is 888 valid (i.e., the data must be buffered at the receiver until the 889 connection reaches the ESTABLISHED state). The three-way handshake 890 reduces the possibility of false connections. It is the 891 implementation of a trade-off between memory and messages to provide 892 information for this checking. 894 The simplest three-way handshake is shown in Figure 5 below. The 895 figures should be interpreted in the following way. Each line is 896 numbered for reference purposes. Right arrows (-->) indicate 897 departure of a TCP segment from TCP A to TCP B, or arrival of a 898 segment at B from A. Left arrows (<--), indicate the reverse. 899 Ellipsis (...) indicates a segment which is still in the network 900 (delayed). An "XXX" indicates a segment which is lost or rejected. 901 Comments appear in parentheses. TCP states represent the state AFTER 902 the departure or arrival of the segment (whose contents are shown in 903 the center of each line). Segment contents are shown in abbreviated 904 form, with sequence number, control flags, and ACK field. Other 905 fields such as window, addresses, lengths, and text have been left 906 out in the interest of clarity. 908 TCP A TCP B 910 1. CLOSED LISTEN 912 2. SYN-SENT --> --> SYN-RECEIVED 914 3. ESTABLISHED <-- <-- SYN-RECEIVED 916 4. ESTABLISHED --> --> ESTABLISHED 918 5. ESTABLISHED --> --> ESTABLISHED 920 Basic 3-Way Handshake for Connection Synchronization 922 Figure 5 924 In line 2 of Figure 5, TCP A begins by sending a SYN segment 925 indicating that it will use sequence numbers starting with sequence 926 number 100. In line 3, TCP B sends a SYN and acknowledges the SYN it 927 received from TCP A. Note that the acknowledgment field indicates 928 TCP B is now expecting to hear sequence 101, acknowledging the SYN 929 which occupied sequence 100. 931 At line 4, TCP A responds with an empty segment containing an ACK for 932 TCP B's SYN; and in line 5, TCP A sends some data. Note that the 933 sequence number of the segment in line 5 is the same as in line 4 934 because the ACK does not occupy sequence number space (if it did, we 935 would wind up ACKing ACK's!). 937 Simultaneous initiation is only slightly more complex, as is shown in 938 Figure 6. Each TCP cycles from CLOSED to SYN-SENT to SYN-RECEIVED to 939 ESTABLISHED. 941 TCP A TCP B 943 1. CLOSED CLOSED 945 2. SYN-SENT --> ... 947 3. SYN-RECEIVED <-- <-- SYN-SENT 949 4. ... --> SYN-RECEIVED 951 5. SYN-RECEIVED --> ... 953 6. ESTABLISHED <-- <-- SYN-RECEIVED 955 7. ... --> ESTABLISHED 957 Simultaneous Connection Synchronization 959 Figure 6 961 The principle reason for the three-way handshake is to prevent old 962 duplicate connection initiations from causing confusion. To deal 963 with this, a special control message, reset, has been devised. If 964 the receiving TCP is in a non-synchronized state (i.e., SYN-SENT, 965 SYN-RECEIVED), it returns to LISTEN on receiving an acceptable reset. 966 If the TCP is in one of the synchronized states (ESTABLISHED, FIN- 967 WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), it 968 aborts the connection and informs its user. We discuss this latter 969 case under "half-open" connections below. 971 TCP A TCP B 973 1. CLOSED LISTEN 975 2. SYN-SENT --> ... 977 3. (duplicate) ... --> SYN-RECEIVED 979 4. SYN-SENT <-- <-- SYN-RECEIVED 981 5. SYN-SENT --> --> LISTEN 983 6. ... --> SYN-RECEIVED 985 7. SYN-SENT <-- <-- SYN-RECEIVED 987 8. ESTABLISHED --> --> ESTABLISHED 989 Recovery from Old Duplicate SYN 991 Figure 7 993 As a simple example of recovery from old duplicates, consider 994 Figure 7. At line 3, an old duplicate SYN arrives at TCP B. TCP B 995 cannot tell that this is an old duplicate, so it responds normally 996 (line 4). TCP A detects that the ACK field is incorrect and returns 997 a RST (reset) with its SEQ field selected to make the segment 998 believable. TCP B, on receiving the RST, returns to the LISTEN 999 state. When the original SYN (pun intended) finally arrives at line 1000 6, the synchronization proceeds normally. If the SYN at line 6 had 1001 arrived before the RST, a more complex exchange might have occurred 1002 with RST's sent in both directions. 1004 Half-Open Connections and Other Anomalies 1006 An established connection is said to be "half-open" if one of the 1007 TCPs has closed or aborted the connection at its end without the 1008 knowledge of the other, or if the two ends of the connection have 1009 become desynchronized owing to a crash that resulted in loss of 1010 memory. Such connections will automatically become reset if an 1011 attempt is made to send data in either direction. However, half-open 1012 connections are expected to be unusual, and the recovery procedure is 1013 mildly involved. 1015 If at site A the connection no longer exists, then an attempt by the 1016 user at site B to send any data on it will result in the site B TCP 1017 receiving a reset control message. Such a message indicates to the 1018 site B TCP that something is wrong, and it is expected to abort the 1019 connection. 1021 Assume that two user processes A and B are communicating with one 1022 another when a crash occurs causing loss of memory to A's TCP. 1023 Depending on the operating system supporting A's TCP, it is likely 1024 that some error recovery mechanism exists. When the TCP is up again, 1025 A is likely to start again from the beginning or from a recovery 1026 point. As a result, A will probably try to OPEN the connection again 1027 or try to SEND on the connection it believes open. In the latter 1028 case, it receives the error message "connection not open" from the 1029 local (A's) TCP. In an attempt to establish the connection, A's TCP 1030 will send a segment containing SYN. This scenario leads to the 1031 example shown in Figure 8. After TCP A crashes, the user attempts to 1032 re-open the connection. TCP B, in the meantime, thinks the 1033 connection is open. 1035 TCP A TCP B 1037 1. (CRASH) (send 300,receive 100) 1039 2. CLOSED ESTABLISHED 1041 3. SYN-SENT --> --> (??) 1043 4. (!!) <-- <-- ESTABLISHED 1045 5. SYN-SENT --> --> (Abort!!) 1047 6. SYN-SENT CLOSED 1049 7. SYN-SENT --> --> 1051 Half-Open Connection Discovery 1053 Figure 8 1055 When the SYN arrives at line 3, TCP B, being in a synchronized state, 1056 and the incoming segment outside the window, responds with an 1057 acknowledgment indicating what sequence it next expects to hear (ACK 1058 100). TCP A sees that this segment does not acknowledge anything it 1059 sent and, being unsynchronized, sends a reset (RST) because it has 1060 detected a half-open connection. TCP B aborts at line 5. TCP A will 1061 continue to try to establish the connection; the problem is now 1062 reduced to the basic 3-way handshake of Figure 5. 1064 An interesting alternative case occurs when TCP A crashes and TCP B 1065 tries to send data on what it thinks is a synchronized connection. 1067 This is illustrated in Figure 9. In this case, the data arriving at 1068 TCP A from TCP B (line 2) is unacceptable because no such connection 1069 exists, so TCP A sends a RST. The RST is acceptable so TCP B 1070 processes it and aborts the connection. 1072 TCP A TCP B 1074 1. (CRASH) (send 300,receive 100) 1076 2. (??) <-- <-- ESTABLISHED 1078 3. --> --> (ABORT!!) 1080 Active Side Causes Half-Open Connection Discovery 1082 Figure 9 1084 In Figure 10, we find the two TCPs A and B with passive connections 1085 waiting for SYN. An old duplicate arriving at TCP B (line 2) stirs B 1086 into action. A SYN-ACK is returned (line 3) and causes TCP A to 1087 generate a RST (the ACK in line 3 is not acceptable). TCP B accepts 1088 the reset and returns to its passive LISTEN state. 1090 TCP A TCP B 1092 1. LISTEN LISTEN 1094 2. ... --> SYN-RECEIVED 1096 3. (??) <-- <-- SYN-RECEIVED 1098 4. --> --> (return to LISTEN!) 1100 5. LISTEN LISTEN 1102 Old Duplicate SYN Initiates a Reset on two Passive Sockets 1104 Figure 10 1106 A variety of other cases are possible, all of which are accounted for 1107 by the following rules for RST generation and processing. 1109 Reset Generation 1110 As a general rule, reset (RST) must be sent whenever a segment 1111 arrives which apparently is not intended for the current connection. 1112 A reset must not be sent if it is not clear that this is the case. 1114 There are three groups of states: 1116 1. If the connection does not exist (CLOSED) then a reset is sent 1117 in response to any incoming segment except another reset. In 1118 particular, SYNs addressed to a non-existent connection are 1119 rejected by this means. 1121 If the incoming segment has an ACK field, the reset takes its 1122 sequence number from the ACK field of the segment, otherwise the 1123 reset has sequence number zero and the ACK field is set to the sum 1124 of the sequence number and segment length of the incoming segment. 1125 The connection remains in the CLOSED state. 1127 2. If the connection is in any non-synchronized state (LISTEN, 1128 SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges 1129 something not yet sent (the segment carries an unacceptable ACK), 1130 or if an incoming segment has a security level or compartment 1131 which does not exactly match the level and compartment requested 1132 for the connection, a reset is sent. 1134 If our SYN has not been acknowledged and the precedence level of 1135 the incoming segment is higher than the precedence level requested 1136 then either raise the local precedence level (if allowed by the 1137 user and the system) or send a reset; or if the precedence level 1138 of the incoming segment is lower than the precedence level 1139 requested then continue as if the precedence matched exactly (if 1140 the remote TCP cannot raise the precedence level to match ours 1141 this will be detected in the next segment it sends, and the 1142 connection will be terminated then). If our SYN has been 1143 acknowledged (perhaps in this incoming segment) the precedence 1144 level of the incoming segment must match the local precedence 1145 level exactly, if it does not a reset must be sent. 1147 If the incoming segment has an ACK field, the reset takes its 1148 sequence number from the ACK field of the segment, otherwise the 1149 reset has sequence number zero and the ACK field is set to the sum 1150 of the sequence number and segment length of the incoming segment. 1151 The connection remains in the same state. 1153 3. If the connection is in a synchronized state (ESTABLISHED, 1154 FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), 1155 any unacceptable segment (out of window sequence number or 1156 unacceptible acknowledgment number) must elicit only an empty 1157 acknowledgment segment containing the current send-sequence number 1158 and an acknowledgment indicating the next sequence number expected 1159 to be received, and the connection remains in the same state. 1161 If an incoming segment has a security level, or compartment, or 1162 precedence which does not exactly match the level, and 1163 compartment, and precedence requested for the connection,a reset 1164 is sent and connection goes to the CLOSED state. The reset takes 1165 its sequence number from the ACK field of the incoming segment. 1167 Reset Processing 1169 In all states except SYN-SENT, all reset (RST) segments are validated 1170 by checking their SEQ-fields. A reset is valid if its sequence 1171 number is in the window. In the SYN-SENT state (a RST received in 1172 response to an initial SYN), the RST is acceptable if the ACK field 1173 acknowledges the SYN. 1175 The receiver of a RST first validates it, then changes state. If the 1176 receiver was in the LISTEN state, it ignores it. If the receiver was 1177 in SYN-RECEIVED state and had previously been in the LISTEN state, 1178 then the receiver returns to the LISTEN state, otherwise the receiver 1179 aborts the connection and goes to the CLOSED state. If the receiver 1180 was in any other state, it aborts the connection and advises the user 1181 and goes to the CLOSED state. 1183 3.5. Closing a Connection 1185 CLOSE is an operation meaning "I have no more data to send." The 1186 notion of closing a full-duplex connection is subject to ambiguous 1187 interpretation, of course, since it may not be obvious how to treat 1188 the receiving side of the connection. We have chosen to treat CLOSE 1189 in a simplex fashion. The user who CLOSEs may continue to RECEIVE 1190 until he is told that the other side has CLOSED also. Thus, a 1191 program could initiate several SENDs followed by a CLOSE, and then 1192 continue to RECEIVE until signaled that a RECEIVE failed because the 1193 other side has CLOSED. We assume that the TCP will signal a user, 1194 even if no RECEIVEs are outstanding, that the other side has closed, 1195 so the user can terminate his side gracefully. A TCP will reliably 1196 deliver all buffers SENT before the connection was CLOSED so a user 1197 who expects no data in return need only wait to hear the connection 1198 was CLOSED successfully to know that all his data was received at the 1199 destination TCP. Users must keep reading connections they close for 1200 sending until the TCP says no more data. 1202 There are essentially three cases: 1204 1) The user initiates by telling the TCP to CLOSE the connection 1205 2) The remote TCP initiates by sending a FIN control signal 1207 3) Both users CLOSE simultaneously 1209 Case 1: Local user initiates the close 1211 In this case, a FIN segment can be constructed and placed on the 1212 outgoing segment queue. No further SENDs from the user will be 1213 accepted by the TCP, and it enters the FIN-WAIT-1 state. RECEIVEs 1214 are allowed in this state. All segments preceding and including 1215 FIN will be retransmitted until acknowledged. When the other TCP 1216 has both acknowledged the FIN and sent a FIN of its own, the first 1217 TCP can ACK this FIN. Note that a TCP receiving a FIN will ACK 1218 but not send its own FIN until its user has CLOSED the connection 1219 also. 1221 Case 2: TCP receives a FIN from the network 1223 If an unsolicited FIN arrives from the network, the receiving TCP 1224 can ACK it and tell the user that the connection is closing. The 1225 user will respond with a CLOSE, upon which the TCP can send a FIN 1226 to the other TCP after sending any remaining data. The TCP then 1227 waits until its own FIN is acknowledged whereupon it deletes the 1228 connection. If an ACK is not forthcoming, after the user timeout 1229 the connection is aborted and the user is told. 1231 Case 3: both users close simultaneously 1233 A simultaneous CLOSE by users at both ends of a connection causes 1234 FIN segments to be exchanged. When all segments preceding the 1235 FINs have been processed and acknowledged, each TCP can ACK the 1236 FIN it has received. Both will, upon receiving these ACKs, delete 1237 the connection. 1239 TCP A TCP B 1241 1. ESTABLISHED ESTABLISHED 1243 2. (Close) 1244 FIN-WAIT-1 --> --> CLOSE-WAIT 1246 3. FIN-WAIT-2 <-- <-- CLOSE-WAIT 1248 4. (Close) 1249 TIME-WAIT <-- <-- LAST-ACK 1251 5. TIME-WAIT --> --> CLOSED 1253 6. (2 MSL) 1254 CLOSED 1256 Normal Close Sequence 1258 Figure 11 1260 TCP A TCP B 1262 1. ESTABLISHED ESTABLISHED 1264 2. (Close) (Close) 1265 FIN-WAIT-1 --> ... FIN-WAIT-1 1266 <-- <-- 1267 ... --> 1269 3. CLOSING --> ... CLOSING 1270 <-- <-- 1271 ... --> 1273 4. TIME-WAIT TIME-WAIT 1274 (2 MSL) (2 MSL) 1275 CLOSED CLOSED 1277 Simultaneous Close Sequence 1279 Figure 12 1281 3.6. Precedence and Security 1283 The intent is that connection be allowed only between ports operating 1284 with exactly the same security and compartment values and at the 1285 higher of the precedence level requested by the two ports. 1287 The precedence and security parameters used in TCP are exactly those 1288 defined in the Internet Protocol (IP) [2]. Throughout this TCP 1289 specification the term "security/compartment" is intended to indicate 1290 the security parameters used in IP including security, compartment, 1291 user group, and handling restriction. 1293 A connection attempt with mismatched security/compartment values or a 1294 lower precedence value must be rejected by sending a reset. 1295 Rejecting a connection due to too low a precedence only occurs after 1296 an acknowledgment of the SYN has been received. 1298 Note that TCP modules which operate only at the default value of 1299 precedence will still have to check the precedence of incoming 1300 segments and possibly raise the precedence level they use on the 1301 connection. 1303 The security paramaters may be used even in a non-secure environment 1304 (the values would indicate unclassified data), thus hosts in non- 1305 secure environments must be prepared to receive the security 1306 parameters, though they need not send them. 1308 3.7. Data Communication 1310 Once the connection is established data is communicated by the 1311 exchange of segments. Because segments may be lost due to errors 1312 (checksum test failure), or network congestion, TCP uses 1313 retransmission (after a timeout) to ensure delivery of every segment. 1314 Duplicate segments may arrive due to network or TCP retransmission. 1315 As discussed in the section on sequence numbers the TCP performs 1316 certain tests on the sequence and acknowledgment numbers in the 1317 segments to verify their acceptability. 1319 The sender of data keeps track of the next sequence number to use in 1320 the variable SND.NXT. The receiver of data keeps track of the next 1321 sequence number to expect in the variable RCV.NXT. The sender of 1322 data keeps track of the oldest unacknowledged sequence number in the 1323 variable SND.UNA. If the data flow is momentarily idle and all data 1324 sent has been acknowledged then the three variables will be equal. 1326 When the sender creates a segment and transmits it the sender 1327 advances SND.NXT. When the receiver accepts a segment it advances 1328 RCV.NXT and sends an acknowledgment. When the data sender receives 1329 an acknowledgment it advances SND.UNA. The extent to which the 1330 values of these variables differ is a measure of the delay in the 1331 communication. The amount by which the variables are advanced is the 1332 length of the data in the segment. Note that once in the ESTABLISHED 1333 state all segments must carry current acknowledgment information. 1335 The CLOSE user call implies a push function, as does the FIN control 1336 flag in an incoming segment. 1338 Retransmission Timeout 1340 Because of the variability of the networks that compose an 1341 internetwork system and the wide range of uses of TCP connections the 1342 retransmission timeout must be dynamically determined. One procedure 1343 for determining a retransmission time out is given here as an 1344 illustration. 1346 An Example Retransmission Timeout Procedure 1348 Measure the elapsed time between sending a data octet with a 1349 particular sequence number and receiving an acknowledgment that 1350 covers that sequence number (segments sent do not have to match 1351 segments received). This measured elapsed time is the Round Trip 1352 Time (RTT). Next compute a Smoothed Round Trip Time (SRTT) as: 1354 SRTT = ( ALPHA * SRTT ) + ((1-ALPHA) * RTT) 1356 and based on this, compute the retransmission timeout (RTO) as: 1358 RTO = min[UBOUND,max[LBOUND,(BETA*SRTT)]] 1360 where UBOUND is an upper bound on the timeout (e.g., 1 minute), 1361 LBOUND is a lower bound on the timeout (e.g., 1 second), ALPHA is 1362 a smoothing factor (e.g., .8 to .9), and BETA is a delay variance 1363 factor (e.g., 1.3 to 2.0). 1365 The Communication of Urgent Information 1367 The objective of the TCP urgent mechanism is to allow the sending 1368 user to stimulate the receiving user to accept some urgent data and 1369 to permit the receiving TCP to indicate to the receiving user when 1370 all the currently known urgent data has been received by the user. 1372 This mechanism permits a point in the data stream to be designated as 1373 the end of urgent information. Whenever this point is in advance of 1374 the receive sequence number (RCV.NXT) at the receiving TCP, that TCP 1375 must tell the user to go into "urgent mode"; when the receive 1376 sequence number catches up to the urgent pointer, the TCP must tell 1377 user to go into "normal mode". If the urgent pointer is updated 1378 while the user is in "urgent mode", the update will be invisible to 1379 the user. 1381 The method employs a urgent field which is carried in all segments 1382 transmitted. The URG control flag indicates that the urgent field is 1383 meaningful and must be added to the segment sequence number to yield 1384 the urgent pointer. The absence of this flag indicates that there is 1385 no urgent data outstanding. 1387 To send an urgent indication the user must also send at least one 1388 data octet. If the sending user also indicates a push, timely 1389 delivery of the urgent information to the destination process is 1390 enhanced. 1392 Managing the Window 1394 The window sent in each segment indicates the range of sequence 1395 numbers the sender of the window (the data receiver) is currently 1396 prepared to accept. There is an assumption that this is related to 1397 the currently available data buffer space available for this 1398 connection. 1400 Indicating a large window encourages transmissions. If more data 1401 arrives than can be accepted, it will be discarded. This will result 1402 in excessive retransmissions, adding unnecessarily to the load on the 1403 network and the TCPs. Indicating a small window may restrict the 1404 transmission of data to the point of introducing a round trip delay 1405 between each new segment transmitted. 1407 The mechanisms provided allow a TCP to advertise a large window and 1408 to subsequently advertise a much smaller window without having 1409 accepted that much data. This, so called "shrinking the window," is 1410 strongly discouraged. The robustness principle dictates that TCPs 1411 will not shrink the window themselves, but will be prepared for such 1412 behavior on the part of other TCPs. 1414 The sending TCP must be prepared to accept from the user and send at 1415 least one octet of new data even if the send window is zero. The 1416 sending TCP must regularly retransmit to the receiving TCP even when 1417 the window is zero. Two minutes is recommended for the 1418 retransmission interval when the window is zero. This retransmission 1419 is essential to guarantee that when either TCP has a zero window the 1420 re-opening of the window will be reliably reported to the other. 1422 When the receiving TCP has a zero window and a segment arrives it 1423 must still send an acknowledgment showing its next expected sequence 1424 number and current window (zero). 1426 The sending TCP packages the data to be transmitted into segments 1427 which fit the current window, and may repackage segments on the 1428 retransmission queue. Such repackaging is not required, but may be 1429 helpful. 1431 In a connection with a one-way data flow, the window information will 1432 be carried in acknowledgment segments that all have the same sequence 1433 number so there will be no way to reorder them if they arrive out of 1434 order. This is not a serious problem, but it will allow the window 1435 information to be on occasion temporarily based on old reports from 1436 the data receiver. A refinement to avoid this problem is to act on 1437 the window information from segments that carry the highest 1438 acknowledgment number (that is segments with acknowledgment number 1439 equal or greater than the highest previously received). 1441 The window management procedure has significant influence on the 1442 communication performance. The following comments are suggestions to 1443 implementers. 1445 Window Management Suggestions 1447 Allocating a very small window causes data to be transmitted in 1448 many small segments when better performance is achieved using 1449 fewer large segments. 1451 One suggestion for avoiding small windows is for the receiver to 1452 defer updating a window until the additional allocation is at 1453 least X percent of the maximum allocation possible for the 1454 connection (where X might be 20 to 40). 1456 Another suggestion is for the sender to avoid sending small 1457 segments by waiting until the window is large enough before 1458 sending data. If the the user signals a push function then the 1459 data must be sent even if it is a small segment. 1461 Note that the acknowledgments should not be delayed or unnecessary 1462 retransmissions will result. One strategy would be to send an 1463 acknowledgment when a small segment arrives (with out updating the 1464 window information), and then to send another acknowledgment with 1465 new window information when the window is larger. 1467 The segment sent to probe a zero window may also begin a break up 1468 of transmitted data into smaller and smaller segments. If a 1469 segment containing a single data octet sent to probe a zero window 1470 is accepted, it consumes one octet of the window now available. 1471 If the sending TCP simply sends as much as it can whenever the 1472 window is non zero, the transmitted data will be broken into 1473 alternating big and small segments. As time goes on, occasional 1474 pauses in the receiver making window allocation available will 1475 result in breaking the big segments into a small and not quite so 1476 big pair. And after a while the data transmission will be in 1477 mostly small segments. 1479 The suggestion here is that the TCP implementations need to 1480 actively attempt to combine small window allocations into larger 1481 windows, since the mechanisms for managing the window tend to lead 1482 to many small windows in the simplest minded implementations. 1484 3.8. Interfaces 1486 There are of course two interfaces of concern: the user/TCP interface 1487 and the TCP/lower-level interface. We have a fairly elaborate model 1488 of the user/TCP interface, but the interface to the lower level 1489 protocol module is left unspecified here, since it will be specified 1490 in detail by the specification of the lowel level protocol. For the 1491 case that the lower level is IP we note some of the parameter values 1492 that TCPs might use. 1494 3.8.1. User/TCP Interface 1496 The following functional description of user commands to the TCP is, 1497 at best, fictional, since every operating system will have different 1498 facilities. Consequently, we must warn readers that different TCP 1499 implementations may have different user interfaces. However, all 1500 TCPs must provide a certain minimum set of services to guarantee that 1501 all TCP implementations can support the same protocol hierarchy. 1502 This section specifies the functional interfaces required of all TCP 1503 implementations. 1505 TCP User Commands 1507 The following sections functionally characterize a USER/TCP 1508 interface. The notation used is similar to most procedure or 1509 function calls in high level languages, but this usage is not 1510 meant to rule out trap type service calls (e.g., SVCs, UUOs, 1511 EMTs). 1513 The user commands described below specify the basic functions the 1514 TCP must perform to support interprocess communication. 1515 Individual implementations must define their own exact format, and 1516 may provide combinations or subsets of the basic functions in 1517 single calls. In particular, some implementations may wish to 1518 automatically OPEN a connection on the first SEND or RECEIVE 1519 issued by the user for a given connection. 1521 In providing interprocess communication facilities, the TCP must 1522 not only accept commands, but must also return information to the 1523 processes it serves. The latter consists of: 1525 (a) general information about a connection (e.g., interrupts, 1526 remote close, binding of unspecified foreign socket). 1528 (b) replies to specific user commands indicating success or 1529 various types of failure. 1531 Open 1533 Format: OPEN (local port, foreign socket, active/passive [, 1534 timeout] [, precedence] [, security/compartment] [, options]) 1535 -> local connection name 1537 We assume that the local TCP is aware of the identity of the 1538 processes it serves and will check the authority of the process 1539 to use the connection specified. Depending upon the 1540 implementation of the TCP, the local network and TCP 1541 identifiers for the source address will either be supplied by 1542 the TCP or the lower level protocol (e.g., IP). These 1543 considerations are the result of concern about security, to the 1544 extent that no TCP be able to masquerade as another one, and so 1545 on. Similarly, no process can masquerade as another without 1546 the collusion of the TCP. 1548 If the active/passive flag is set to passive, then this is a 1549 call to LISTEN for an incoming connection. A passive open may 1550 have either a fully specified foreign socket to wait for a 1551 particular connection or an unspecified foreign socket to wait 1552 for any call. A fully specified passive call can be made 1553 active by the subsequent execution of a SEND. 1555 A transmission control block (TCB) is created and partially 1556 filled in with data from the OPEN command parameters. 1558 On an active OPEN command, the TCP will begin the procedure to 1559 synchronize (i.e., establish) the connection at once. 1561 The timeout, if present, permits the caller to set up a timeout 1562 for all data submitted to TCP. If data is not successfully 1563 delivered to the destination within the timeout period, the TCP 1564 will abort the connection. The present global default is five 1565 minutes. 1567 The TCP or some component of the operating system will verify 1568 the users authority to open a connection with the specified 1569 precedence or security/compartment. The absence of precedence 1570 or security/compartment specification in the OPEN call 1571 indicates the default values must be used. 1573 TCP will accept incoming requests as matching only if the 1574 security/compartment information is exactly the same and only 1575 if the precedence is equal to or higher than the precedence 1576 requested in the OPEN call. 1578 The precedence for the connection is the higher of the values 1579 requested in the OPEN call and received from the incoming 1580 request, and fixed at that value for the life of the 1581 connection.Implementers may want to give the user control of 1582 this precedence negotiation. For example, the user might be 1583 allowed to specify that the precedence must be exactly matched, 1584 or that any attempt to raise the precedence be confirmed by the 1585 user. 1587 A local connection name will be returned to the user by the 1588 TCP. The local connection name can then be used as a short 1589 hand term for the connection defined by the pair. 1592 Send 1594 Format: SEND (local connection name, buffer address, byte 1595 count, PUSH flag, URGENT flag [,timeout]) 1597 This call causes the data contained in the indicated user 1598 buffer to be sent on the indicated connection. If the 1599 connection has not been opened, the SEND is considered an 1600 error. Some implementations may allow users to SEND first; in 1601 which case, an automatic OPEN would be done. If the calling 1602 process is not authorized to use this connection, an error is 1603 returned. 1605 If the PUSH flag is set, the data must be transmitted promptly 1606 to the receiver, and the PUSH bit will be set in the last TCP 1607 segment created from the buffer. If the PUSH flag is not set, 1608 the data may be combined with data from subsequent SENDs for 1609 transmission efficiency. 1611 If the URGENT flag is set, segments sent to the destination TCP 1612 will have the urgent pointer set. The receiving TCP will 1613 signal the urgent condition to the receiving process if the 1614 urgent pointer indicates that data preceding the urgent pointer 1615 has not been consumed by the receiving process. The purpose of 1616 urgent is to stimulate the receiver to process the urgent data 1617 and to indicate to the receiver when all the currently known 1618 urgent data has been received. The number of times the sending 1619 user's TCP signals urgent will not necessarily be equal to the 1620 number of times the receiving user will be notified of the 1621 presence of urgent data. 1623 If no foreign socket was specified in the OPEN, but the 1624 connection is established (e.g., because a LISTENing connection 1625 has become specific due to a foreign segment arriving for the 1626 local socket), then the designated buffer is sent to the 1627 implied foreign socket. Users who make use of OPEN with an 1628 unspecified foreign socket can make use of SEND without ever 1629 explicitly knowing the foreign socket address. 1631 However, if a SEND is attempted before the foreign socket 1632 becomes specified, an error will be returned. Users can use 1633 the STATUS call to determine the status of the connection. In 1634 some implementations the TCP may notify the user when an 1635 unspecified socket is bound. 1637 If a timeout is specified, the current user timeout for this 1638 connection is changed to the new one. 1640 In the simplest implementation, SEND would not return control 1641 to the sending process until either the transmission was 1642 complete or the timeout had been exceeded. However, this 1643 simple method is both subject to deadlocks (for example, both 1644 sides of the connection might try to do SENDs before doing any 1645 RECEIVEs) and offers poor performance, so it is not 1646 recommended. A more sophisticated implementation would return 1647 immediately to allow the process to run concurrently with 1648 network I/O, and, furthermore, to allow multiple SENDs to be in 1649 progress. Multiple SENDs are served in first come, first 1650 served order, so the TCP will queue those it cannot service 1651 immediately. 1653 We have implicitly assumed an asynchronous user interface in 1654 which a SEND later elicits some kind of SIGNAL or pseudo- 1655 interrupt from the serving TCP. An alternative is to return a 1656 response immediately. For instance, SENDs might return 1657 immediate local acknowledgment, even if the segment sent had 1658 not been acknowledged by the distant TCP. We could 1659 optimistically assume eventual success. If we are wrong, the 1660 connection will close anyway due to the timeout. In 1661 implementations of this kind (synchronous), there will still be 1662 some asynchronous signals, but these will deal with the 1663 connection itself, and not with specific segments or buffers. 1665 In order for the process to distinguish among error or success 1666 indications for different SENDs, it might be appropriate for 1667 the buffer address to be returned along with the coded response 1668 to the SEND request. TCP-to-user signals are discussed below, 1669 indicating the information which should be returned to the 1670 calling process. 1672 Receive 1674 Format: RECEIVE (local connection name, buffer address, byte 1675 count) -> byte count, urgent flag, push flag 1677 This command allocates a receiving buffer associated with the 1678 specified connection. If no OPEN precedes this command or the 1679 calling process is not authorized to use this connection, an 1680 error is returned. 1682 In the simplest implementation, control would not return to the 1683 calling program until either the buffer was filled, or some 1684 error occurred, but this scheme is highly subject to deadlocks. 1685 A more sophisticated implementation would permit several 1686 RECEIVEs to be outstanding at once. These would be filled as 1687 segments arrive. This strategy permits increased throughput at 1688 the cost of a more elaborate scheme (possibly asynchronous) to 1689 notify the calling program that a PUSH has been seen or a 1690 buffer filled. 1692 If enough data arrive to fill the buffer before a PUSH is seen, 1693 the PUSH flag will not be set in the response to the RECEIVE. 1694 The buffer will be filled with as much data as it can hold. If 1695 a PUSH is seen before the buffer is filled the buffer will be 1696 returned partially filled and PUSH indicated. 1698 If there is urgent data the user will have been informed as 1699 soon as it arrived via a TCP-to-user signal. The receiving 1700 user should thus be in "urgent mode". If the URGENT flag is 1701 on, additional urgent data remains. If the URGENT flag is off, 1702 this call to RECEIVE has returned all the urgent data, and the 1703 user may now leave "urgent mode". Note that data following the 1704 urgent pointer (non-urgent data) cannot be delivered to the 1705 user in the same buffer with preceeding urgent data unless the 1706 boundary is clearly marked for the user. 1708 To distinguish among several outstanding RECEIVEs and to take 1709 care of the case that a buffer is not completely filled, the 1710 return code is accompanied by both a buffer pointer and a byte 1711 count indicating the actual length of the data received. 1713 Alternative implementations of RECEIVE might have the TCP 1714 allocate buffer storage, or the TCP might share a ring buffer 1715 with the user. 1717 Close 1719 Format: CLOSE (local connection name) 1720 This command causes the connection specified to be closed. If 1721 the connection is not open or the calling process is not 1722 authorized to use this connection, an error is returned. 1723 Closing connections is intended to be a graceful operation in 1724 the sense that outstanding SENDs will be transmitted (and 1725 retransmitted), as flow control permits, until all have been 1726 serviced. Thus, it should be acceptable to make several SEND 1727 calls, followed by a CLOSE, and expect all the data to be sent 1728 to the destination. It should also be clear that users should 1729 continue to RECEIVE on CLOSING connections, since the other 1730 side may be trying to transmit the last of its data. Thus, 1731 CLOSE means "I have no more to send" but does not mean "I will 1732 not receive any more." It may happen (if the user level 1733 protocol is not well thought out) that the closing side is 1734 unable to get rid of all its data before timing out. In this 1735 event, CLOSE turns into ABORT, and the closing TCP gives up. 1737 The user may CLOSE the connection at any time on his own 1738 initiative, or in response to various prompts from the TCP 1739 (e.g., remote close executed, transmission timeout exceeded, 1740 destination inaccessible). 1742 Because closing a connection requires communication with the 1743 foreign TCP, connections may remain in the closing state for a 1744 short time. Attempts to reopen the connection before the TCP 1745 replies to the CLOSE command will result in error responses. 1747 Close also implies push function. 1749 Status 1751 Format: STATUS (local connection name) -> status data 1753 This is an implementation dependent user command and could be 1754 excluded without adverse effect. Information returned would 1755 typically come from the TCB associated with the connection. 1757 This command returns a data block containing the following 1758 information: 1760 local socket, 1761 foreign socket, 1762 local connection name, 1763 receive window, 1764 send window, 1765 connection state, 1766 number of buffers awaiting acknowledgment, 1767 number of buffers pending receipt, 1768 urgent state, 1769 precedence, 1770 security/compartment, 1771 and transmission timeout. 1773 Depending on the state of the connection, or on the 1774 implementation itself, some of this information may not be 1775 available or meaningful. If the calling process is not 1776 authorized to use this connection, an error is returned. This 1777 prevents unauthorized processes from gaining information about 1778 a connection. 1780 Abort 1782 Format: ABORT (local connection name) 1784 This command causes all pending SENDs and RECEIVES to be 1785 aborted, the TCB to be removed, and a special RESET message to 1786 be sent to the TCP on the other side of the connection. 1787 Depending on the implementation, users may receive abort 1788 indications for each outstanding SEND or RECEIVE, or may simply 1789 receive an ABORT-acknowledgment. 1791 TCP-to-User Messages 1793 It is assumed that the operating system environment provides a 1794 means for the TCP to asynchronously signal the user program. 1795 When the TCP does signal a user program, certain information is 1796 passed to the user. Often in the specification the information 1797 will be an error message. In other cases there will be 1798 information relating to the completion of processing a SEND or 1799 RECEIVE or other user call. 1801 The following information is provided: 1803 Local Connection Name Always 1804 Response String Always 1805 Buffer Address Send & Receive 1806 Byte count (counts bytes received) Receive 1807 Push flag Receive 1808 Urgent flag Receive 1810 3.8.2. TCP/Lower-Level Interface 1812 The TCP calls on a lower level protocol module to actually send and 1813 receive information over a network. One case is that of the ARPA 1814 internetwork system where the lower level module is the Internet 1815 Protocol (IP) [2]. 1817 If the lower level protocol is IP it provides arguments for a type of 1818 service and for a time to live. TCP uses the following settings for 1819 these parameters: 1821 Type of Service = Precedence: routine, Delay: normal, Throughput: 1822 normal, Reliability: normal; or 00000000. 1824 Time to Live = one minute, or 00111100. 1826 Note that the assumed maximum segment lifetime is two minutes. 1827 Here we explicitly ask that a segment be destroyed if it cannot 1828 be delivered by the internet system within one minute. 1830 If the lower level is IP (or other protocol that provides this 1831 feature) and source routing is used, the interface must allow the 1832 route information to be communicated. This is especially important 1833 so that the source and destination addresses used in the TCP checksum 1834 be the originating source and ultimate destination. It is also 1835 important to preserve the return route to answer connection requests. 1837 Any lower level protocol will have to provide the source address, 1838 destination address, and protocol fields, and some way to determine 1839 the "TCP length", both to provide the functional equivlent service of 1840 IP and to be used in the TCP checksum. 1842 3.9. Event Processing 1844 The processing depicted in this section is an example of one possible 1845 implementation. Other implementations may have slightly different 1846 processing sequences, but they should differ from those in this 1847 section only in detail, not in substance. 1849 The activity of the TCP can be characterized as responding to events. 1850 The events that occur can be cast into three categories: user calls, 1851 arriving segments, and timeouts. This section describes the 1852 processing the TCP does in response to each of the events. In many 1853 cases the processing required depends on the state of the connection. 1855 Events that occur: 1857 User Calls 1859 OPEN 1860 SEND 1861 RECEIVE 1862 CLOSE 1863 ABORT 1864 STATUS 1866 Arriving Segments 1868 SEGMENT ARRIVES 1870 Timeouts 1872 USER TIMEOUT 1873 RETRANSMISSION TIMEOUT 1874 TIME-WAIT TIMEOUT 1876 The model of the TCP/user interface is that user commands receive an 1877 immediate return and possibly a delayed response via an event or 1878 pseudo interrupt. In the following descriptions, the term "signal" 1879 means cause a delayed response. 1881 Error responses are given as character strings. For example, user 1882 commands referencing connections that do not exist receive "error: 1883 connection not open". 1885 Please note in the following that all arithmetic on sequence numbers, 1886 acknowledgment numbers, windows, et cetera, is modulo 2**32 the size 1887 of the sequence number space. Also note that "=<" means less than or 1888 equal to (modulo 2**32). 1890 A natural way to think about processing incoming segments is to 1891 imagine that they are first tested for proper sequence number (i.e., 1892 that their contents lie in the range of the expected "receive window" 1893 in the sequence number space) and then that they are generally queued 1894 and processed in sequence number order. 1896 When a segment overlaps other already received segments we 1897 reconstruct the segment to contain just the new data, and adjust the 1898 header fields to be consistent. 1900 Note that if no state change is mentioned the TCP stays in the same 1901 state. 1903 OPEN Call 1905 CLOSED STATE (i.e., TCB does not exist) 1907 Create a new transmission control block (TCB) to hold 1908 connection state information. Fill in local socket identifier, 1909 foreign socket, precedence, security/compartment, and user 1910 timeout information. Note that some parts of the foreign 1911 socket may be unspecified in a passive OPEN and are to be 1912 filled in by the parameters of the incoming SYN segment. 1913 Verify the security and precedence requested are allowed for 1914 this user, if not return "error: precedence not allowed" or 1915 "error: security/compartment not allowed." If passive enter 1916 the LISTEN state and return. If active and the foreign socket 1917 is unspecified, return "error: foreign socket unspecified"; if 1918 active and the foreign socket is specified, issue a SYN 1919 segment. An initial send sequence number (ISS) is selected. A 1920 SYN segment of the form is sent. Set 1921 SND.UNA to ISS, SND.NXT to ISS+1, enter SYN-SENT state, and 1922 return. 1924 If the caller does not have access to the local socket 1925 specified, return "error: connection illegal for this process". 1926 If there is no room to create a new connection, return "error: 1927 insufficient resources". 1929 LISTEN STATE 1931 If active and the foreign socket is specified, then change the 1932 connection from passive to active, select an ISS. Send a SYN 1933 segment, set SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT 1934 state. Data associated with SEND may be sent with SYN segment 1935 or queued for transmission after entering ESTABLISHED state. 1936 The urgent bit if requested in the command must be sent with 1937 the data segments sent as a result of this command. If there 1938 is no room to queue the request, respond with "error: 1939 insufficient resources". If Foreign socket was not specified, 1940 then return "error: foreign socket unspecified". 1942 SYN-SENT STATE 1943 SYN-RECEIVED STATE 1944 ESTABLISHED STATE 1945 FIN-WAIT-1 STATE 1946 FIN-WAIT-2 STATE 1947 CLOSE-WAIT STATE 1948 CLOSING STATE 1949 LAST-ACK STATE 1950 TIME-WAIT STATE 1952 Return "error: connection already exists". 1954 SEND Call 1956 CLOSED STATE (i.e., TCB does not exist) 1958 If the user does not have access to such a connection, then 1959 return "error: connection illegal for this process". 1961 Otherwise, return "error: connection does not exist". 1963 LISTEN STATE 1965 If the foreign socket is specified, then change the connection 1966 from passive to active, select an ISS. Send a SYN segment, set 1967 SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT state. Data 1968 associated with SEND may be sent with SYN segment or queued for 1969 transmission after entering ESTABLISHED state. The urgent bit 1970 if requested in the command must be sent with the data segments 1971 sent as a result of this command. If there is no room to queue 1972 the request, respond with "error: insufficient resources". If 1973 Foreign socket was not specified, then return "error: foreign 1974 socket unspecified". 1976 SYN-SENT STATE 1977 SYN-RECEIVED STATE 1979 Queue the data for transmission after entering ESTABLISHED 1980 state. If no space to queue, respond with "error: insufficient 1981 resources". 1983 ESTABLISHED STATE 1984 CLOSE-WAIT STATE 1986 Segmentize the buffer and send it with a piggybacked 1987 acknowledgment (acknowledgment value = RCV.NXT). If there is 1988 insufficient space to remember this buffer, simply return 1989 "error: insufficient resources". 1991 If the urgent flag is set, then SND.UP <- SND.NXT-1 and set the 1992 urgent pointer in the outgoing segments. 1994 FIN-WAIT-1 STATE 1995 FIN-WAIT-2 STATE 1996 CLOSING STATE 1997 LAST-ACK STATE 1998 TIME-WAIT STATE 2000 Return "error: connection closing" and do not service request. 2002 RECEIVE Call 2004 CLOSED STATE (i.e., TCB does not exist) 2006 If the user does not have access to such a connection, return 2007 "error: connection illegal for this process". 2009 Otherwise return "error: connection does not exist". 2011 LISTEN STATE 2012 SYN-SENT STATE 2013 SYN-RECEIVED STATE 2015 Queue for processing after entering ESTABLISHED state. If 2016 there is no room to queue this request, respond with "error: 2017 insufficient resources". 2019 ESTABLISHED STATE 2020 FIN-WAIT-1 STATE 2021 FIN-WAIT-2 STATE 2023 If insufficient incoming segments are queued to satisfy the 2024 request, queue the request. If there is no queue space to 2025 remember the RECEIVE, respond with "error: insufficient 2026 resources". 2028 Reassemble queued incoming segments into receive buffer and 2029 return to user. Mark "push seen" (PUSH) if this is the case. 2031 If RCV.UP is in advance of the data currently being passed to 2032 the user notify the user of the presence of urgent data. 2034 When the TCP takes responsibility for delivering data to the 2035 user that fact must be communicated to the sender via an 2036 acknowledgment. The formation of such an acknowledgment is 2037 described below in the discussion of processing an incoming 2038 segment. 2040 CLOSE-WAIT STATE 2042 Since the remote side has already sent FIN, RECEIVEs must be 2043 satisfied by text already on hand, but not yet delivered to the 2044 user. If no text is awaiting delivery, the RECEIVE will get a 2045 "error: connection closing" response. Otherwise, any remaining 2046 text can be used to satisfy the RECEIVE. 2048 CLOSING STATE 2049 LAST-ACK STATE 2050 TIME-WAIT STATE 2052 Return "error: connection closing". 2054 CLOSE Call 2056 CLOSED STATE (i.e., TCB does not exist) 2058 If the user does not have access to such a connection, return 2059 "error: connection illegal for this process". 2061 Otherwise, return "error: connection does not exist". 2063 LISTEN STATE 2065 Any outstanding RECEIVEs are returned with "error: closing" 2066 responses. Delete TCB, enter CLOSED state, and return. 2068 SYN-SENT STATE 2070 Delete the TCB and return "error: closing" responses to any 2071 queued SENDs, or RECEIVEs. 2073 SYN-RECEIVED STATE 2075 If no SENDs have been issued and there is no pending data to 2076 send, then form a FIN segment and send it, and enter FIN-WAIT-1 2077 state; otherwise queue for processing after entering 2078 ESTABLISHED state. 2080 ESTABLISHED STATE 2082 Queue this until all preceding SENDs have been segmentized, 2083 then form a FIN segment and send it. In any case, enter FIN- 2084 WAIT-1 state. 2086 FIN-WAIT-1 STATE 2087 FIN-WAIT-2 STATE 2089 Strictly speaking, this is an error and should receive a 2090 "error: connection closing" response. An "ok" response would 2091 be acceptable, too, as long as a second FIN is not emitted (the 2092 first FIN may be retransmitted though). 2094 CLOSE-WAIT STATE 2096 Queue this request until all preceding SENDs have been 2097 segmentized; then send a FIN segment, enter CLOSING state. 2099 CLOSING STATE 2100 LAST-ACK STATE 2101 TIME-WAIT STATE 2102 Respond with "error: connection closing". 2104 ABORT Call 2106 CLOSED STATE (i.e., TCB does not exist) 2108 If the user should not have access to such a connection, return 2109 "error: connection illegal for this process". 2111 Otherwise return "error: connection does not exist". 2113 LISTEN STATE 2115 Any outstanding RECEIVEs should be returned with "error: 2116 connection reset" responses. Delete TCB, enter CLOSED state, 2117 and return. 2119 SYN-SENT STATE 2121 All queued SENDs and RECEIVEs should be given "connection 2122 reset" notification, delete the TCB, enter CLOSED state, and 2123 return. 2125 SYN-RECEIVED STATE 2126 ESTABLISHED STATE 2127 FIN-WAIT-1 STATE 2128 FIN-WAIT-2 STATE 2129 CLOSE-WAIT STATE 2131 Send a reset segment: 2133 2135 All queued SENDs and RECEIVEs should be given "connection 2136 reset" notification; all segments queued for transmission 2137 (except for the RST formed above) or retransmission should be 2138 flushed, delete the TCB, enter CLOSED state, and return. 2140 CLOSING STATE LAST-ACK STATE TIME-WAIT STATE 2142 Respond with "ok" and delete the TCB, enter CLOSED state, and 2143 return. 2145 STATUS Call 2147 CLOSED STATE (i.e., TCB does not exist) 2149 If the user should not have access to such a connection, return 2150 "error: connection illegal for this process". 2152 Otherwise return "error: connection does not exist". 2154 LISTEN STATE 2156 Return "state = LISTEN", and the TCB pointer. 2158 SYN-SENT STATE 2160 Return "state = SYN-SENT", and the TCB pointer. 2162 SYN-RECEIVED STATE 2164 Return "state = SYN-RECEIVED", and the TCB pointer. 2166 ESTABLISHED STATE 2168 Return "state = ESTABLISHED", and the TCB pointer. 2170 FIN-WAIT-1 STATE 2172 Return "state = FIN-WAIT-1", and the TCB pointer. 2174 FIN-WAIT-2 STATE 2176 Return "state = FIN-WAIT-2", and the TCB pointer. 2178 CLOSE-WAIT STATE 2180 Return "state = CLOSE-WAIT", and the TCB pointer. 2182 CLOSING STATE 2184 Return "state = CLOSING", and the TCB pointer. 2186 LAST-ACK STATE 2188 Return "state = LAST-ACK", and the TCB pointer. 2190 TIME-WAIT STATE 2192 Return "state = TIME-WAIT", and the TCB pointer. 2194 SEGMENT ARRIVES 2196 If the state is CLOSED (i.e., TCB does not exist) then 2198 all data in the incoming segment is discarded. An incoming 2199 segment containing a RST is discarded. An incoming segment not 2200 containing a RST causes a RST to be sent in response. The 2201 acknowledgment and sequence field values are selected to make 2202 the reset sequence acceptable to the TCP that sent the 2203 offending segment. 2205 If the ACK bit is off, sequence number zero is used, 2207 2209 If the ACK bit is on, 2211 2213 Return. 2215 If the state is LISTEN then 2217 first check for an RST 2219 An incoming RST should be ignored. Return. 2221 second check for an ACK 2223 Any acknowledgment is bad if it arrives on a connection 2224 still in the LISTEN state. An acceptable reset segment 2225 should be formed for any arriving ACK-bearing segment. The 2226 RST should be formatted as follows: 2228 2230 Return. 2232 third check for a SYN 2234 If the SYN bit is set, check the security. If the security/ 2235 compartment on the incoming segment does not exactly match 2236 the security/compartment in the TCB then send a reset and 2237 return. 2239 2241 If the SEG.PRC is greater than the TCB.PRC then if allowed 2242 by the user and the system set TCB.PRC<-SEG.PRC, if not 2243 allowed send a reset and return. 2245 2247 If the SEG.PRC is less than the TCB.PRC then continue. 2249 Set RCV.NXT to SEG.SEQ+1, IRS is set to SEG.SEQ and any 2250 other control or text should be queued for processing later. 2251 ISS should be selected and a SYN segment sent of the form: 2253 2255 SND.NXT is set to ISS+1 and SND.UNA to ISS. The connection 2256 state should be changed to SYN-RECEIVED. Note that any 2257 other incoming control or data (combined with SYN) will be 2258 processed in the SYN-RECEIVED state, but processing of SYN 2259 and ACK should not be repeated. If the listen was not fully 2260 specified (i.e., the foreign socket was not fully 2261 specified), then the unspecified fields should be filled in 2262 now. 2264 fourth other text or control 2266 Any other control or text-bearing segment (not containing 2267 SYN) must have an ACK and thus would be discarded by the ACK 2268 processing. An incoming RST segment could not be valid, 2269 since it could not have been sent in response to anything 2270 sent by this incarnation of the connection. So you are 2271 unlikely to get here, but if you do, drop the segment, and 2272 return. 2274 If the state is SYN-SENT then 2276 first check the ACK bit 2278 If the ACK bit is set 2280 If SEG.ACK =< ISS, or SEG.ACK > SND.NXT, send a reset 2281 (unless the RST bit is set, if so drop the segment and 2282 return) 2284 2286 and discard the segment. Return. 2288 If SND.UNA =< SEG.ACK =< SND.NXT then the ACK is 2289 acceptable. 2291 second check the RST bit 2293 If the RST bit is set 2295 If the ACK was acceptable then signal the user "error: 2296 connection reset", drop the segment, enter CLOSED state, 2297 delete TCB, and return. Otherwise (no ACK) drop the 2298 segment and return. 2300 third check the security and precedence 2302 If the security/compartment in the segment does not exactly 2303 match the security/compartment in the TCB, send a reset 2305 If there is an ACK 2307 2309 Otherwise 2311 2313 If there is an ACK 2315 The precedence in the segment must match the precedence 2316 in the TCB, if not, send a reset 2318 2320 If there is no ACK 2322 If the precedence in the segment is higher than the 2323 precedence in the TCB then if allowed by the user and the 2324 system raise the precedence in the TCB to that in the 2325 segment, if not allowed to raise the prec then send a 2326 reset. 2328 2330 If the precedence in the segment is lower than the 2331 precedence in the TCB continue. 2333 If a reset was sent, discard the segment and return. 2335 fourth check the SYN bit 2336 This step should be reached only if the ACK is ok, or there 2337 is no ACK, and it the segment did not contain a RST. 2339 If the SYN bit is on and the security/compartment and 2340 precedence are acceptable then, RCV.NXT is set to SEG.SEQ+1, 2341 IRS is set to SEG.SEQ. SND.UNA should be advanced to equal 2342 SEG.ACK (if there is an ACK), and any segments on the 2343 retransmission queue which are thereby acknowledged should 2344 be removed. 2346 If SND.UNA > ISS (our SYN has been ACKed), change the 2347 connection state to ESTABLISHED, form an ACK segment 2349 2351 and send it. Data or controls which were queued for 2352 transmission may be included. If there are other controls 2353 or text in the segment then continue processing at the sixth 2354 step below where the URG bit is checked, otherwise return. 2356 Otherwise enter SYN-RECEIVED, form a SYN,ACK segment 2358 2360 and send it. If there are other controls or text in the 2361 segment, queue them for processing after the ESTABLISHED 2362 state has been reached, return. 2364 fifth, if neither of the SYN or RST bits is set then drop the 2365 segment and return. 2367 Otherwise, 2369 first check sequence number 2371 SYN-RECEIVED STATE 2372 ESTABLISHED STATE 2373 FIN-WAIT-1 STATE 2374 FIN-WAIT-2 STATE 2375 CLOSE-WAIT STATE 2376 CLOSING STATE 2377 LAST-ACK STATE 2378 TIME-WAIT STATE 2380 Segments are processed in sequence. Initial tests on 2381 arrival are used to discard old duplicates, but further 2382 processing is done in SEG.SEQ order. If a segment's 2383 contents straddle the boundary between old and new, only the 2384 new parts should be processed. 2386 There are four cases for the acceptability test for an 2387 incoming segment: 2389 Segment Receive Test 2390 Length Window 2391 ------- ------- ------------------------------------------- 2393 0 0 SEG.SEQ = RCV.NXT 2395 0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 2397 >0 0 not acceptable 2399 >0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 2400 or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 2402 If the RCV.WND is zero, no segments will be acceptable, but 2403 special allowance should be made to accept valid ACKs, URGs 2404 and RSTs. 2406 If an incoming segment is not acceptable, an acknowledgment 2407 should be sent in reply (unless the RST bit is set, if so 2408 drop the segment and return): 2410 2412 After sending the acknowledgment, drop the unacceptable 2413 segment and return. 2415 In the following it is assumed that the segment is the 2416 idealized segment that begins at RCV.NXT and does not exceed 2417 the window. One could tailor actual segments to fit this 2418 assumption by trimming off any portions that lie outside the 2419 window (including SYN and FIN), and only processing further 2420 if the segment then begins at RCV.NXT. Segments with higher 2421 begining sequence numbers may be held for later processing. 2423 second check the RST bit, 2425 SYN-RECEIVED STATE 2427 If the RST bit is set 2428 If this connection was initiated with a passive OPEN 2429 (i.e., came from the LISTEN state), then return this 2430 connection to LISTEN state and return. The user need 2431 not be informed. If this connection was initiated 2432 with an active OPEN (i.e., came from SYN-SENT state) 2433 then the connection was refused, signal the user 2434 "connection refused". In either case, all segments on 2435 the retransmission queue should be removed. And in 2436 the active OPEN case, enter the CLOSED state and 2437 delete the TCB, and return. 2439 ESTABLISHED 2440 FIN-WAIT-1 2441 FIN-WAIT-2 2442 CLOSE-WAIT 2444 If the RST bit is set then, any outstanding RECEIVEs and 2445 SEND should receive "reset" responses. All segment 2446 queues should be flushed. Users should also receive an 2447 unsolicited general "connection reset" signal. Enter the 2448 CLOSED state, delete the TCB, and return. 2450 CLOSING STATE 2451 LAST-ACK STATE 2452 TIME-WAIT 2454 If the RST bit is set then, enter the CLOSED state, 2455 delete the TCB, and return. 2457 third check security and precedence 2459 SYN-RECEIVED 2461 If the security/compartment and precedence in the segment 2462 do not exactly match the security/compartment and 2463 precedence in the TCB then send a reset, and return. 2465 ESTABLISHED STATE 2467 If the security/compartment and precedence in the segment 2468 do not exactly match the security/compartment and 2469 precedence in the TCB then send a reset, any outstanding 2470 RECEIVEs and SEND should receive "reset" responses. All 2471 segment queues should be flushed. Users should also 2472 receive an unsolicited general "connection reset" signal. 2473 Enter the CLOSED state, delete the TCB, and return. 2475 Note this check is placed following the sequence check to 2476 prevent a segment from an old connection between these ports 2477 with a different security or precedence from causing an 2478 abort of the current connection. 2480 fourth, check the SYN bit, 2482 SYN-RECEIVED 2483 ESTABLISHED STATE 2484 FIN-WAIT STATE-1 2485 FIN-WAIT STATE-2 2486 CLOSE-WAIT STATE 2487 CLOSING STATE 2488 LAST-ACK STATE 2489 TIME-WAIT STATE 2491 If the SYN is in the window it is an error, send a reset, 2492 any outstanding RECEIVEs and SEND should receive "reset" 2493 responses, all segment queues should be flushed, the user 2494 should also receive an unsolicited general "connection 2495 reset" signal, enter the CLOSED state, delete the TCB, 2496 and return. 2498 If the SYN is not in the window this step would not be 2499 reached and an ack would have been sent in the first step 2500 (sequence number check). 2502 fifth check the ACK field, 2504 if the ACK bit is off drop the segment and return 2506 if the ACK bit is on 2508 SYN-RECEIVED STATE 2510 If SND.UNA =< SEG.ACK =< SND.NXT then enter 2511 ESTABLISHED state and continue processing. 2513 If the segment acknowledgment is not acceptable, 2514 form a reset segment, 2516 2518 and send it. 2520 ESTABLISHED STATE 2521 If SND.UNA < SEG.ACK =< SND.NXT then, set SND.UNA <- 2522 SEG.ACK. Any segments on the retransmission queue 2523 which are thereby entirely acknowledged are removed. 2524 Users should receive positive acknowledgments for 2525 buffers which have been SENT and fully acknowledged 2526 (i.e., SEND buffer should be returned with "ok" 2527 response). If the ACK is a duplicate (SEG.ACK < 2528 SND.UNA), it can be ignored. If the ACK acks 2529 something not yet sent (SEG.ACK > SND.NXT) then send 2530 an ACK, drop the segment, and return. 2532 If SND.UNA < SEG.ACK =< SND.NXT, the send window 2533 should be updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 2534 = SEG.SEQ and SND.WL2 =< SEG.ACK)), set SND.WND <- 2535 SEG.WND, set SND.WL1 <- SEG.SEQ, and set SND.WL2 <- 2536 SEG.ACK. 2538 Note that SND.WND is an offset from SND.UNA, that 2539 SND.WL1 records the sequence number of the last 2540 segment used to update SND.WND, and that SND.WL2 2541 records the acknowledgment number of the last segment 2542 used to update SND.WND. The check here prevents using 2543 old segments to update the window. 2545 FIN-WAIT-1 STATE 2547 In addition to the processing for the ESTABLISHED 2548 state, if our FIN is now acknowledged then enter FIN- 2549 WAIT-2 and continue processing in that state. 2551 FIN-WAIT-2 STATE 2553 In addition to the processing for the ESTABLISHED 2554 state, if the retransmission queue is empty, the 2555 user's CLOSE can be acknowledged ("ok") but do not 2556 delete the TCB. 2558 CLOSE-WAIT STATE 2560 Do the same processing as for the ESTABLISHED state. 2562 CLOSING STATE 2564 In addition to the processing for the ESTABLISHED 2565 state, if the ACK acknowledges our FIN then enter the 2566 TIME-WAIT state, otherwise ignore the segment. 2568 LAST-ACK STATE 2569 The only thing that can arrive in this state is an 2570 acknowledgment of our FIN. If our FIN is now 2571 acknowledged, delete the TCB, enter the CLOSED state, 2572 and return. 2574 TIME-WAIT STATE 2576 The only thing that can arrive in this state is a 2577 retransmission of the remote FIN. Acknowledge it, and 2578 restart the 2 MSL timeout. 2580 sixth, check the URG bit, 2582 ESTABLISHED STATE 2583 FIN-WAIT-1 STATE 2584 FIN-WAIT-2 STATE 2586 If the URG bit is set, RCV.UP <- max(RCV.UP,SEG.UP), and 2587 signal the user that the remote side has urgent data if 2588 the urgent pointer (RCV.UP) is in advance of the data 2589 consumed. If the user has already been signaled (or is 2590 still in the "urgent mode") for this continuous sequence 2591 of urgent data, do not signal the user again. 2593 CLOSE-WAIT STATE 2594 CLOSING STATE 2595 LAST-ACK STATE 2596 TIME-WAIT 2598 This should not occur, since a FIN has been received from 2599 the remote side. Ignore the URG. 2601 seventh, process the segment text, 2603 ESTABLISHED STATE 2604 FIN-WAIT-1 STATE 2605 FIN-WAIT-2 STATE 2607 Once in the ESTABLISHED state, it is possible to deliver 2608 segment text to user RECEIVE buffers. Text from segments 2609 can be moved into buffers until either the buffer is full 2610 or the segment is empty. If the segment empties and 2611 carries an PUSH flag, then the user is informed, when the 2612 buffer is returned, that a PUSH has been received. 2614 When the TCP takes responsibility for delivering the data 2615 to the user it must also acknowledge the receipt of the 2616 data. 2618 Once the TCP takes responsibility for the data it 2619 advances RCV.NXT over the data accepted, and adjusts 2620 RCV.WND as apporopriate to the current buffer 2621 availability. The total of RCV.NXT and RCV.WND should 2622 not be reduced. 2624 Please note the window management suggestions in section 2625 3.7. 2627 Send an acknowledgment of the form: 2629 2631 This acknowledgment should be piggybacked on a segment 2632 being transmitted if possible without incurring undue 2633 delay. 2635 CLOSE-WAIT STATE 2636 CLOSING STATE 2637 LAST-ACK STATE 2638 TIME-WAIT STATE 2640 This should not occur, since a FIN has been received from 2641 the remote side. Ignore the segment text. 2643 eighth, check the FIN bit, 2645 Do not process the FIN if the state is CLOSED, LISTEN or 2646 SYN-SENT since the SEG.SEQ cannot be validated; drop the 2647 segment and return. 2649 If the FIN bit is set, signal the user "connection closing" 2650 and return any pending RECEIVEs with same message, advance 2651 RCV.NXT over the FIN, and send an acknowledgment for the 2652 FIN. Note that FIN implies PUSH for any segment text not 2653 yet delivered to the user. 2655 SYN-RECEIVED STATE 2656 ESTABLISHED STATE 2658 Enter the CLOSE-WAIT state. 2660 FIN-WAIT-1 STATE 2662 If our FIN has been ACKed (perhaps in this segment), 2663 then enter TIME-WAIT, start the time-wait timer, turn 2664 off the other timers; otherwise enter the CLOSING 2665 state. 2667 FIN-WAIT-2 STATE 2669 Enter the TIME-WAIT state. Start the time-wait timer, 2670 turn off the other timers. 2672 CLOSE-WAIT STATE 2674 Remain in the CLOSE-WAIT state. 2676 CLOSING STATE 2678 Remain in the CLOSING state. 2680 LAST-ACK STATE 2682 Remain in the LAST-ACK state. 2684 TIME-WAIT STATE 2686 Remain in the TIME-WAIT state. Restart the 2 MSL 2687 time-wait timeout. 2689 and return. 2691 USER TIMEOUT 2693 USER TIMEOUT 2695 For any state if the user timeout expires, flush all queues, 2696 signal the user "error: connection aborted due to user timeout" 2697 in general and for any outstanding calls, delete the TCB, enter 2698 the CLOSED state and return. 2700 RETRANSMISSION TIMEOUT 2702 For any state if the retransmission timeout expires on a 2703 segment in the retransmission queue, send the segment at the 2704 front of the retransmission queue again, reinitialize the 2705 retransmission timer, and return. 2707 TIME-WAIT TIMEOUT 2709 If the time-wait timeout expires on a connection delete the 2710 TCB, enter the CLOSED state and return. 2712 3.10. Glossary 2714 1822 BBN Report 1822, "The Specification of the Interconnection of 2715 a Host and an IMP". The specification of interface between a 2716 host and the ARPANET. 2718 ACK 2719 A control bit (acknowledge) occupying no sequence space, 2720 which indicates that the acknowledgment field of this segment 2721 specifies the next sequence number the sender of this segment 2722 is expecting to receive, hence acknowledging receipt of all 2723 previous sequence numbers. 2725 ARPANET message 2726 The unit of transmission between a host and an IMP in the 2727 ARPANET. The maximum size is about 1012 octets (8096 bits). 2729 ARPANET packet 2730 A unit of transmission used internally in the ARPANET between 2731 IMPs. The maximum size is about 126 octets (1008 bits). 2733 connection 2734 A logical communication path identified by a pair of sockets. 2736 datagram 2737 A message sent in a packet switched computer communications 2738 network. 2740 Destination Address 2741 The destination address, usually the network and host 2742 identifiers. 2744 FIN 2745 A control bit (finis) occupying one sequence number, which 2746 indicates that the sender will send no more data or control 2747 occupying sequence space. 2749 fragment 2750 A portion of a logical unit of data, in particular an 2751 internet fragment is a portion of an internet datagram. 2753 FTP 2754 A file transfer protocol. 2756 header 2757 Control information at the beginning of a message, segment, 2758 fragment, packet or block of data. 2760 host 2761 A computer. In particular a source or destination of 2762 messages from the point of view of the communication network. 2764 Identification 2765 An Internet Protocol field. This identifying value assigned 2766 by the sender aids in assembling the fragments of a datagram. 2768 IMP 2769 The Interface Message Processor, the packet switch of the 2770 ARPANET. 2772 internet address 2773 A source or destination address specific to the host level. 2775 internet datagram 2776 The unit of data exchanged between an internet module and the 2777 higher level protocol together with the internet header. 2779 internet fragment 2780 A portion of the data of an internet datagram with an 2781 internet header. 2783 IP 2784 Internet Protocol. 2786 IRS 2787 The Initial Receive Sequence number. The first sequence 2788 number used by the sender on a connection. 2790 ISN 2791 The Initial Sequence Number. The first sequence number used 2792 on a connection, (either ISS or IRS). Selected on a clock 2793 based procedure. 2795 ISS 2796 The Initial Send Sequence number. The first sequence number 2797 used by the sender on a connection. 2799 leader 2800 Control information at the beginning of a message or block of 2801 data. In particular, in the ARPANET, the control information 2802 on an ARPANET message at the host-IMP interface. 2804 left sequence 2805 This is the next sequence number to be acknowledged by the 2806 data receiving TCP (or the lowest currently unacknowledged 2807 sequence number) and is sometimes referred to as the left 2808 edge of the send window. 2810 local packet 2811 The unit of transmission within a local network. 2813 module 2814 An implementation, usually in software, of a protocol or 2815 other procedure. 2817 MSL 2818 Maximum Segment Lifetime, the time a TCP segment can exist in 2819 the internetwork system. Arbitrarily defined to be 2 2820 minutes. 2822 octet 2823 An eight bit byte. 2825 Options 2826 An Option field may contain several options, and each option 2827 may be several octets in length. The options are used 2828 primarily in testing situations; for example, to carry 2829 timestamps. Both the Internet Protocol and TCP provide for 2830 options fields. 2832 packet 2833 A package of data with a header which may or may not be 2834 logically complete. More often a physical packaging than a 2835 logical packaging of data. 2837 port 2838 The portion of a socket that specifies which logical input or 2839 output channel of a process is associated with the data. 2841 process 2842 A program in execution. A source or destination of data from 2843 the point of view of the TCP or other host-to-host protocol. 2845 PUSH 2846 A control bit occupying no sequence space, indicating that 2847 this segment contains data that must be pushed through to the 2848 receiving user. 2850 RCV.NXT 2851 receive next sequence number 2853 RCV.UP 2854 receive urgent pointer 2856 RCV.WND 2857 receive window 2859 receive next sequence number 2860 This is the next sequence number the local TCP is expecting 2861 to receive. 2863 receive window 2864 This represents the sequence numbers the local (receiving) 2865 TCP is willing to receive. Thus, the local TCP considers 2866 that segments overlapping the range RCV.NXT to RCV.NXT + 2867 RCV.WND - 1 carry acceptable data or control. Segments 2868 containing sequence numbers entirely outside of this range 2869 are considered duplicates and discarded. 2871 RST 2872 A control bit (reset), occupying no sequence space, 2873 indicating that the receiver should delete the connection 2874 without further interaction. The receiver can determine, 2875 based on the sequence number and acknowledgment fields of the 2876 incoming segment, whether it should honor the reset command 2877 or ignore it. In no case does receipt of a segment 2878 containing RST give rise to a RST in response. 2880 RTP 2881 Real Time Protocol: A host-to-host protocol for communication 2882 of time critical information. 2884 SEG.ACK 2885 segment acknowledgment 2887 SEG.LEN 2888 segment length 2890 SEG.PRC 2891 segment precedence value 2893 SEG.SEQ 2894 segment sequence 2896 SEG.UP 2897 segment urgent pointer field 2899 SEG.WND 2900 segment window field 2902 segment 2903 A logical unit of data, in particular a TCP segment is the 2904 unit of data transfered between a pair of TCP modules. 2906 segment acknowledgment 2907 The sequence number in the acknowledgment field of the 2908 arriving segment. 2910 segment length 2911 The amount of sequence number space occupied by a segment, 2912 including any controls which occupy sequence space. 2914 segment sequence 2915 The number in the sequence field of the arriving segment. 2917 send sequence 2918 This is the next sequence number the local (sending) TCP will 2919 use on the connection. It is initially selected from an 2920 initial sequence number curve (ISN) and is incremented for 2921 each octet of data or sequenced control transmitted. 2923 send window 2924 This represents the sequence numbers which the remote 2925 (receiving) TCP is willing to receive. It is the value of 2926 the window field specified in segments from the remote (data 2927 receiving) TCP. The range of new sequence numbers which may 2928 be emitted by a TCP lies between SND.NXT and SND.UNA + 2929 SND.WND - 1. (Retransmissions of sequence numbers between 2930 SND.UNA and SND.NXT are expected, of course.) 2932 SND.NXT 2933 send sequence 2935 SND.UNA 2936 left sequence 2938 SND.UP 2939 send urgent pointer 2941 SND.WL1 2942 segment sequence number at last window update 2944 SND.WL2 2945 segment acknowledgment number at last window update 2947 SND.WND 2948 send window 2950 socket 2951 An address which specifically includes a port identifier, 2952 that is, the concatenation of an Internet Address with a TCP 2953 port. 2955 Source Address 2956 The source address, usually the network and host identifiers. 2958 SYN 2959 A control bit in the incoming segment, occupying one sequence 2960 number, used at the initiation of a connection, to indicate 2961 where the sequence numbering will start. 2963 TCB 2964 Transmission control block, the data structure that records 2965 the state of a connection. 2967 TCB.PRC 2968 The precedence of the connection. 2970 TCP 2971 Transmission Control Protocol: A host-to-host protocol for 2972 reliable communication in internetwork environments. 2974 TOS 2975 Type of Service, an Internet Protocol field. 2977 Type of Service 2978 An Internet Protocol field which indicates the type of 2979 service for this internet fragment. 2981 URG 2982 A control bit (urgent), occupying no sequence space, used to 2983 indicate that the receiving user should be notified to do 2984 urgent processing as long as there is data to be consumed 2985 with sequence numbers less than the value indicated in the 2986 urgent pointer. 2988 urgent pointer 2989 A control field meaningful only when the URG bit is on. This 2990 field communicates the value of the urgent pointer which 2991 indicates the data octet associated with the sending user's 2992 urgent call. 2994 4. Changes from RFC 793 2996 TODO: this entire section will need to be edited and condensed before 2997 the document is finalized. It currently represents a plan for future 2998 updates. 3000 The -00 version of this document was merely a proposal and rough plan 3001 for updating RFC 793. 3003 The -01 revision of this document incorporates the content of RFC 793 3004 Section 3 titled "FUNCTIONAL SPECIFICATION". Other content from RFC 3005 793 has not been incorporated. The -01 revision of this document 3006 makes some minor formatting changes to the RFC 793 content in order 3007 to convert the content into XML2RFC format and account for left-out 3008 parts of RFC 793. For instance, figure numbering differs and some 3009 indentation is not exactly the same. 3011 TODO: Incomplete list of changes - these need to be added to and made 3012 more specific, as the document proceeds: 3014 1. incorporate the accepted errata 3016 2. incorporate 1122 additions 3018 3. point to major additional docs like 1323bis and 5681 3020 4. incorporate relevant parts of 3168 (ECN) 3022 5. incorporate 6093 (urgent pointer) 3024 6. incorporate 6528 (sequence number) 3026 7. incorporate Fernando's new number-checking fixes (if past the 3027 IESG in time) 3029 8. point to PMTUD? 3031 9. point to 5461 (soft errors) 3033 10. mention 5961 state machine option 3035 11. mention 6161 (reducing TIME-WAIT) 3037 12. incorporate 6429 (ZWP/persist) 3039 13. incorporate 6691 (MSS) 3041 5. IANA Considerations 3043 This memo includes no request to IANA. Existing IANA registries for 3044 TCP parameters are sufficient. 3046 TODO: check whether entries pointing to 793 and other documents 3047 obsoleted by this one should be updated to point to this one instead. 3049 6. Security Considerations 3051 TODO 3053 7. Acknowledgements 3055 This document is largely a revision of RFC 793, which Jon Postel was 3056 the editor of. Due to his excellent work, it was able to last for 3057 three decades before we felt the need to revise it. 3059 Andre Oppermann was a contributor and helped to edit the first 3060 revision of this document. 3062 We are thankful for the assistance of the IETF TCPM working group 3063 chairs: 3065 Michael Scharf 3066 Yoshifumi Nishida 3067 Pasi Sarolahti 3069 On the TCPM mailing list, and at the IETF 88 meeting in Vancouver, 3070 helpful comments, critiques, and reviews were received from (listed 3071 alphebetically): David Borman, Yuchung Cheng, Martin Duke, Kevin 3072 Lahey, Kevin Mason, Matt Mathis, Hagen Paul Pfeifer, Anthony 3073 Sabatini, Joe Touch, Reji Varghese, Lloyd Wood, and Alex Zimmermann. 3075 8. References 3077 8.1. Normative References 3079 [1] Bradner, S., "Key words for use in RFCs to Indicate 3080 Requirement Levels", BCP 14, RFC 2119, March 1997. 3082 8.2. Informative References 3084 [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 3085 793, September 1981. 3087 [3] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. 3088 Zimmermann, "A Roadmap for Transmission Control Protocol 3089 (TCP) Specification Documents", draft-ietf-tcpm-tcp- 3090 rfc4614bis-00 (work in progress), August 2013. 3092 Author's Address 3093 Wesley M. Eddy 3094 MTI Systems 3095 US 3097 Email: wes@mti-systems.com