Buffer Size Option Data: 16 bits If this option is present, then it communicates the receive buffer size at the TCP which sends this segment. This field should only be sent in the initial connection request (i.e., in segments with the SYN control bit set). If this option is not used, the default buffer size of one octet is assumed. Padding: variable The TCP header padding is used to ensure that the TCP header ends and data begins on a 32 bit boundary. The padding is composed of zeros. 3.2. Terminology Before we can discuss very much about the operation of the TCP we need to introduce some detailed terminology. The maintenance of a TCP connection requires the remembering of several variables. We conceive of these variables being stored in a connection record called a Transmission Control Block or TCB. Among the variables stored in the TCB are the local and remote socket numbers, the security and precedence of the connection, pointers to the user's send and receive buffers, pointers to the retransmit queue and to the current segment. In addition several variables relating to the send and receive sequence numbers are stored in the TCB. Send Sequence Variables SND.UNA - send unacknowledged SND.NXT - send sequence SND.WND - send window SND.BS - send buffer size SND.UP - send urgent pointer SND.WL - send sequence number used for last window update SND.LBB - send last buffer beginning ISS - initial send sequence number Receive Sequence Variables RCV.NXT - receive sequence RCV.WND - receive window RCV.BS - receive buffer size RCV.UP - receive urgent pointer RCV.LBB - receive last buffer beginning IRS - initial receive sequence number
The following diagrams may help to relate some of these variables to the sequence space. Send Sequence Space 1 2 3 4 ----------|----------|----------|---------- SND.UNA SND.NXT SND.UNA +SND.WND 1 - old sequence numbers which have been acknowledged 2 - sequence numbers of unacknowledged data 3 - sequence numbers allowed for new data transmission 4 - future sequence numbers which are not yet allowed Send Sequence Space Figure 4. Receive Sequence Space 1 2 3 ----------|----------|---------- RCV.NXT RCV.NXT +RCV.WND 1 - old sequence numbers which have been acknowledged 2 - sequence numbers allowed for new reception 3 - future sequence numbers which are not yet allowed Receive Sequence Space Figure 5. There are also some variables used frequently in the discussion that take their values from the fields of the current segment.
Current Segment Variables SEG.SEQ - segment sequence number SEG.ACK - segment acknowledgment number SEG.LEN - segment length SEG.WND - segment window SEG.UP - segment urgent pointer SEG.PRC - segment precedence value A connection progresses through a series of states during its lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED, ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, TIME-WAIT, CLOSE-WAIT, CLOSING, and the fictional state CLOSED. CLOSED is fictional because it represents the state when there is no TCB, and therefore, no connection. Briefly the meanings of the states are: LISTEN - represents waiting for a connection request from any remote TCP and port. SYN-SENT - represents waiting for a matching connection request after having sent a connection request. SYN-RECEIVED - represents waiting for a confirming connection request acknowledgment after having both received and sent a connection request. ESTABLISHED - represents an open connection, ready to transmit and receive data segments. FIN-WAIT-1 - represents waiting for a connection termination request from the remote TCP, or an acknowledgment of the connection termination request previously sent. FIN-WAIT-2 - represents waiting for a connection termination request from the remote TCP. TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request. CLOSE-WAIT - represents waiting for a connection termination request from the local user. CLOSING - represents waiting for a connection termination request acknowledgment from the remote TCP. CLOSED - represents no connection state at all.
A TCP connection progresses from one state to another in response to events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE, ABORT, and STATUS; the incoming segments, particularly those containing the SYN and FIN flags; and timeouts. The Glossary contains a more complete list of terms and their definitions. The state diagram in figure 6 only illustrates state changes, together with the causing events and resulting actions, but addresses neither error conditions nor actions which are not connected with state changes. In a later section, more detail is offered with respect to the reaction of the TCP to events.
+---------+ ---------\ active OPEN | CLOSED | \ ----------- +---------+<---------\ \ create TCB | ^ \ \ snd SYN passive OPEN | | CLOSE \ \ ------------ | | ---------- \ \ create TCB | | delete TCB \ \ V | \ \ +---------+ CLOSE | \ | LISTEN | ---------- | | +---------+ delete TCB | | rcv SYN | | SEND | | ----------- | | ------- | V +---------+ snd SYN,ACK / \ snd SYN +---------+ | |<----------------- ------------------>| | | SYN | rcv SYN | SYN | | RCVD |<-----------------------------------------------| SENT | | | snd ACK | | | |------------------ -------------------| | +---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+ | -------------- | | ----------- | x | | snd ACK | V V | CLOSE +---------+ | ------- | ESTAB | | snd FIN +---------+ | CLOSE | | rcv FIN V ------- | | ------- +---------+ snd FIN / \ snd ACK +---------+ | FIN |<----------------- ------------------>| CLOSE | | WAIT-1 |------------------ -------------------| WAIT | +---------+ rcv FIN \ / CLOSE +---------+ | rcv ACK of FIN ------- | | ------- | -------------- snd ACK | | snd FIN V x V V +---------+ +---------+ |FINWAIT-2| | CLOSING | +---------+ +---------+ | rcv FIN | rcv ACK of FIN | ------- Timeout=2MSL | -------------- V snd ACK ------------ V delete TCB +---------+ delete TCB +---------+ |TIME WAIT|----------------->| CLOSED | +---------+ +---------+ TCP Connection State Diagram Figure 6.
3.3. Sequence Numbers A fundamental notion in the design is that every octet of data sent over a TCP connection has a sequence number. Since every octet is sequenced, each of them can be acknowledged. The acknowledgment mechanism employed is cumulative so that an acknowledgment of sequence number X indicates that all octets up to but not including X have been received. This mechanism allows for straight-forward duplicate detection in the presence of retransmission. Numbering of octets within a segment is that the first data octet immediately following the header is the lowest numbered, and the following octets are numbered consecutively. It is essential to remember that the actual sequence number space is finite, though very large. This space ranges from 0 to 2**32 - 1. Since the space is finite, all arithmetic dealing with sequence numbers must be performed modulo 2**32. This unsigned arithmetic preserves the relationship of sequence numbers as they cycle from 2**32 - 1 to 0 again. There are some subtleties to computer modulo arithmetic, so great care should be taken in programming the comparison of such values. The typical kinds of sequence number comparisons which the TCP must perform include: (a) Determining that an acknowledgment refers to some sequence number sent but not yet acknowledged. (b) Determining that all sequence numbers occupied by a segment have been acknowledged (e.g., to remove the segment from a retransmission queue). (c) Determining that an incoming segment contains sequence numbers which are expected (i.e., that the segment "overlaps" the receive window).
On send connections the following comparisons are needed: older sequence numbers newer sequence numbers SND.UNA SEG.ACK SND.NXT | | | ----|----XXXXXXX------XXXXXXXXXX---------XXXXXX----|---- | | | | | | | | | Segment 1 Segment 2 Segment 3 <----- sequence space -----> Sending Sequence Space Information Figure 7. SND.UNA = oldest unacknowledged sequence number SND.NXT = next sequence number to be sent SEG.ACK = acknowledgment (next sequence number expected by the acknowledging TCP) SEG.SEQ = first sequence number of a segment SEG.SEQ+SEG.LEN-1 = last sequence number of a segment A new acknowledgment (called an "acceptable ack"), is one for which the inequality below holds: SND.UNA < SEG.ACK =< SND.NXT All arithmetic is modulo 2**32 and that comparisons are unsigned. "=<" means "less than or equal". A segment on the retransmission queue is fully acknowledged if the sum of its sequence number and length is less than the acknowledgment value in the incoming segment. SEG.LEN is the number of octets occupied by the data in the segment. It is important to note that SEG.LEN must be non-zero; segments which do not occupy any sequence space (e.g., empty acknowledgment segments) are never placed on the retransmission queue, so would not go through this particular test.
On receive connections the following comparisons are needed: older sequence numbers newer sequence numbers RCV.NXT RCV.NXT+RCV.WND | | ---------XXX|XXX------XXXXXXXXXX---------XXX|XX--------- | | | | | | | | Segment 1 Segment 2 Segment 3 <----- sequence space -----> Receiving Sequence Space Information Figure 8. RCV.NXT = next sequence number expected on incoming segments RCV.NXT+RCV.WND = last sequence number expected on incoming segments, plus one SEG.SEQ = first sequence number occupied by the incoming segment SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming segment A segment is judged to occupy a portion of valid receive sequence space if 0 =< (SEG.SEQ+SEG.LEN-1 - RCV.NXT) < (RCV.NXT+RCV.WND - RCV.NXT) SEG.SEQ+SEG.LEN-1 is the last sequence number occupied by the segment; RCV.NXT is the next sequence number expected on an incoming segment; and RCV.NXT+RCV.WND is the right edge of the receive window. Actually, it is a little more complicated than this. Due to zero windows and zero length segments, we have four cases for the acceptability of an incoming segment:
Segment Receive Test Length Window ------- ------- ------------------------------------------- 0 0 SEG.SEQ = RCV.NXT 0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND >0 0 not acceptable >0 >0 RCV.NXT < SEG.SEQ+SEG.LEN =< RCV.NXT+RCV.WND Note that the acceptance test for a segment, since it requires the end of a segment to lie in the window, is somewhat more restrictive than is absolutely necessary. If at least the first sequence number of the segment lies in the receive window, or if some part of the segment lies in the receive window, then the segment might be judged acceptable. Thus, in figure 8, at least segments 1 and 2 are acceptable by the strict rule, and segment 3 may or may not be, depending on the strictness of interpretation of the rule. Note that when the receive window is zero no segments should be acceptable except ACK segments. Thus, it should be possible for a TCP to maintain a zero receive window while transmitting data and receiving ACKs. We have taken advantage of the numbering scheme to protect certain control information as well. This is achieved by implicitly including some control flags in the sequence space so they can be retransmitted and acknowledged without confusion (i.e., one and only one copy of the control will be acted upon). Control information is not physically carried in the segment data space. Consequently, we must adopt rules for implicitly assigning sequence numbers to control. The SYN and FIN are the only controls requiring this protection, and these controls are used only at connection opening and closing. For sequence number purposes, the SYN is considered to occur before the first actual data octet of the segment in which it occurs, while the FIN is considered to occur after the last actual data octet in a segment in which it occurs. The segment length includes both data and sequence space occupying controls. When a SYN is present then SEG.SEQ is the sequence number of the SYN. Initial Sequence Number Selection The protocol places no restriction on a particular connection being used over and over again. A connection is defined by a pair of sockets. New instances of a connection will be referred to as incarnations of the connection. The problem that arises owing to this
is -- "how does the TCP identify duplicate segments from previous incarnations of the connection?" This problem becomes apparent if the connection is being opened and closed in quick succession, or if the connection breaks with loss of memory and is then reestablished. To avoid confusion we must prevent segments from one incarnation of a connection from being used while the same sequence numbers may still be present in the network from an earlier incarnation. We want to assure this, even if a TCP crashes and loses all knowledge of the sequence numbers it has been using. When new connections are created, an initial sequence number (ISN) generator is employed which selects a new 32 bit ISN. The generator is bound to a (possibly fictitious) 32 bit clock whose low order bit is incremented roughly every 4 microseconds. Thus, the ISN cycles approximately every 4.55 hours. Since we assume that segments will stay in the network no more than tens of seconds or minutes, at worst, we can reasonably assume that ISN's will be unique. For each connection there is a send sequence number and a receive sequence number. The initial send sequence number (ISS) is chosen by the data sending TCP, and the initial receive sequence number (IRS) is learned during the connection establishing procedure. For a connection to be established or initialized, the two TCPs must synchronize on each other's initial sequence numbers. This is done in an exchange of connection establishing messages carrying a control bit called "SYN" (for synchronize) and the initial sequence numbers. As a shorthand, messages carrying the SYN bit are also called "SYNs". Hence, the solution requires a suitable mechanism for picking an initial sequence number and a slightly involved handshake to exchange the ISN's. A "three way handshake" is necessary because sequence numbers are not tied to a global clock in the network, and TCPs may have different mechanisms for picking the ISN's. The receiver of the first SYN has no way of knowing whether the segment was an old delayed one or not, unless it remembers the last sequence number used on the connection (which is not always possible), and so it must ask the sender to verify this SYN. The "three way handshake" and the advantages of a "clock-driven" scheme are discussed in . Knowing When to Keep Quiet To be sure that a TCP does not create a segment that carries a sequence number which may be duplicated by an old segment remaining in the network, the TCP must keep quiet for a maximum segment lifetime (MSL) before assigning any sequence numbers upon starting up or recovering from a crash in which memory of sequence numbers in use was
lost. For this specification the MSL is taken to be 2 minutes. This is an engineering choice, and may be changed if experience indicates it is desirable to do so. Note that if a TCP is reinitialized in some sense, yet retains its memory of sequence numbers in use, then it need not wait at all; it must only be sure to use sequence numbers larger than those recently used. It should be noted that this strategy does not protect against spoofing or other replay type duplicate message problems. 3.4. Establishing a connection The "three-way handshake" is the procedure used to establish a connection. This procedure normally is initiated by one TCP and responded to by another TCP. The procedure also works if two TCP simultaneously initiate the procedure. When simultaneous attempt occurs, the TCP receives a "SYN" segment which carries no acknowledgment after it has sent a "SYN". Of course, the arrival of an old duplicate "SYN" segment can potentially make it appear, to the recipient, that a simultaneous connection initiation is in progress. Proper use of "reset" segments can disambiguate these cases. Several examples of connection initiation follow. Although these examples do not show connection synchronization using data-carrying segments, this is perfectly legitimate, so long as the receiving TCP doesn't deliver the data to the user until it is clear the data is valid (i.e., the data must be buffered at the receiver until the connection reaches the ESTABLISHED state). The three-way handshake reduces the possibility of false connections. It is the implementation of a trade-off between memory and messages to provide information for this checking. The simplest three-way handshake is shown in figure 9 below. The figures should be interpreted in the following way. Each line is numbered for reference purposes. Right arrows (-->) indicate departure of a TCP segment from TCP A to TCP B, or arrival of a segment at B from A. Left arrows (<--), indicate the reverse. Ellipsis (...) indicates a segment which is still in the network (delayed). An "XXX" indicates a segment which is lost or rejected. Comments appear in parentheses. TCP states represent the state AFTER the departure or arrival of the segment (whose contents are shown in the center of each line). Segment contents are shown in abbreviated form, with sequence number, control flags, and ACK field. Other fields such as window, addresses, lengths, and text have been left out in the interest of clarity.
TCP A TCP B 1. CLOSED LISTEN 2. SYN-SENT --> <SEQ=100><CTL=SYN> --> SYN-RECEIVED 3. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 4. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK> --> ESTABLISHED 5. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK><DATA> --> ESTABLISHED Basic 3-Way Handshake for Connection Synchronization Figure 9. In line 2 of figure 9, TCP A begins by sending a SYN segment indicating that it will use sequence numbers starting with sequence number 100. In line 3, TCP B sends a SYN and acknowledges the SYN it received from TCP A. Note that the acknowledgment field indicates TCP B is now expecting to hear sequence 101, acknowledging the SYN which occupied sequence 100. At line 4, TCP A responds with an empty segment containing an ACK for TCP B's SYN; and in line 5, TCP A sends some data. Note that the sequence number of the segment in line 5 is the same as in line 4 because the ACK does not occupy sequence number space (if it did, we would wind up ACKing ACK's!). Simultaneous initiation is only slightly more complex, as is shown in figure 10. Each TCP cycles from CLOSED to SYN-SENT to SYN-RECEIVED to ESTABLISHED. The principle reason for the three-way handshake is to prevent old duplicate connection initiations from causing confusion. To deal with this, a special control message, reset, has been devised. If the receiving TCP is in a non-synchronized state (i.e., SYN-SENT, SYN-RECEIVED), it returns to LISTEN on receiving an acceptable reset. If the TCP is in one of the synchronized states (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, TIME-WAIT, CLOSE-WAIT, CLOSING), it aborts the connection and informs its user. We discuss this latter case under "half-open" connections below.
TCP A TCP B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=101><ACK=301><CTL=ACK> ... 6. ESTABLISHED <-- <SEQ=301><ACK=101><CTL=ACK> <-- SYN-RECEIVED 7. ... <SEQ=101><ACK=301><CTL=ACK> --> ESTABLISHED Simultaneous Connection Synchronization Figure 10. TCP A TCP B 1. CLOSED LISTEN 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. (duplicate) ... <SEQ=1000><CTL=SYN> --> SYN-RECEIVED 4. SYN-SENT <-- <SEQ=300><ACK=1001><CTL=SYN,ACK> <-- SYN-RECEIVED 5. SYN-SENT --> <SEQ=1001><CTL=RST> --> LISTEN 6. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 7. SYN-SENT <-- <SEQ=400><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 8. ESTABLISHED --> <SEQ=101><ACK=401><CTL=ACK> --> ESTABLISHED Recovery from Old Duplicate SYN Figure 11. As a simple example of recovery from old duplicates, consider
figure 11. At line 3, an old duplicate SYN arrives at TCP B. TCP B cannot tell that this is an old duplicate, so it responds normally (line 4). TCP A detects that the ACK field is incorrect and returns a RST (reset) with its SEQ field selected to make the segment believable. TCP B, on receiving the RST, returns to the LISTEN state. When the original SYN (pun intended) finally arrives at line 6, the synchronization proceeds normally. If the SYN at line 6 had arrived before the RST, a more complex exchange might have occurred with RST's sent in both directions. Half-Open Connections and Other Anomalies An established connection is said to be "half-open" if one of the TCPs has closed or aborted the connection at its end without the knowledge of the other, or if the two ends of the connection have become desynchronized owing to a crash that resulted in loss of memory. Such connections will automatically become reset if an attempt is made to send data in either direction. However, half-open connections are expected to be unusual, and the recovery procedure is mildly involved. If at site A the connection no longer exists, then an attempt by the user at site B to send any data on it will result in the site B TCP receiving a reset control message. Such a message should indicate to the site B TCP that something is wrong, and it is expected to abort the connection. Assume that two user processes A and B are communicating with one another when a crash occurs causing loss of memory to A's TCP. Depending on the operating system supporting A's TCP, it is likely that some error recovery mechanism exists. When the TCP is up again, A is likely to start again from the beginning or from a recovery point. As a result, A will probably try to OPEN the connection again or try to SEND on the connection it believes open. In the latter case, it receives the error message "connection not open" from the local (A's) TCP. In an attempt to establish the connection, A's TCP will send a segment containing SYN. This scenario leads to the example shown in figure 12. After TCP A crashes, the user attempts to re-open the connection. TCP B, in the meantime, thinks the connection is open.
TCP A TCP B 1. (CRASH) (send 300,receive 100) 2. CLOSED ESTABLISHED 3. SYN-SENT --> <SEQ=400><CTL=SYN> --> (??) 4. (!!) <-- <SEQ=300><ACK=100><CTL=ACK> <-- ESTABLISHED 5. SYN-SENT --> <SEQ=100><CTL=RST> --> (Abort!!) 6. CLOSED 7. SYN-SENT --> <SEQ=400><CTL=SYN> --> Half-Open Connection Discovery Figure 12. When the SYN arrives at line 3, TCP B, being in a synchronized state, responds with an acknowledgment indicating what sequence it next expects to hear (ACK 100). TCP A sees that this segment does not acknowledge anything it sent and, being unsynchronized, sends a reset (RST) because it has detected a half-open connection. TCP B aborts at line 5. TCP A will continue to try to establish the connection; the problem is now reduced to the basic 3-way handshake of figure 9. An interesting alternative case occurs when TCP A crashes and TCP B tries to send data on what it thinks is a synchronized connection. This is illustrated in figure 13. In this case, the data arriving at TCP A from TCP B (line 2) is unacceptable because no such connection exists, so TCP A sends a RST. The RST is acceptable so TCP B processes it and aborts the connection.
TCP A TCP B 1. (CRASH) (send 300,receive 100) 2. (??) <-- <SEQ=300><ACK=100><DATA=10><CTL=ACK> <-- ESTABLISHED 3. --> <SEQ=100><CTL=RST> --> (ABORT!!) Active Side Causes Half-Open Connection Discovery Figure 13. In figure 14, we find the two TCPs A and B with passive connections waiting for SYN. An old duplicate arriving at TCP B (line 2) stirs B into action. A SYN-ACK is returned (line 3) and causes TCP A to generate a RST (the ACK in line 3 is not acceptable). TCP B accepts the reset and returns to its passive LISTEN state. TCP A TCP B 1. LISTEN LISTEN 2. ... <SEQ=Z><CTL=SYN> --> SYN-RECEIVED 3. (??) <-- <SEQ=X><ACK=Z+1><CTL=SYN,ACK> <-- SYN-RECEIVED 4. --> <SEQ=Z+1><CTL=RST> --> (return to LISTEN!) 5. LISTEN LISTEN Old Duplicate SYN Initiates a Reset on two Passive Sockets Figure 14. A variety of other cases are possible, all of which are accounted for by the following rules for RST generation and processing. Reset Generation As a general rule, reset (RST) should be sent whenever a segment arrives which apparently is not intended for the current or a future incarnation of the connection. A reset should not be sent if it is not clear that this is the case. Thus, if any segment arrives for a nonexistent connection, a reset should be sent. If a segment ACKs
something which has never been sent on the current connection, then one of the following two cases applies. 1. If the connection is in any non-synchronized state (LISTEN, SYN-SENT, SYN-RECEIVED) or if the connection does not exist, a reset (RST) should be formed and sent for any segment that acknowledges something not yet sent. The RST should take its SEQ field from the ACK field of the offending segment (if the ACK control bit was set), and its ACK bit should be reset (zero), except to refuse a initial SYN. A reset is also sent if an incoming segment has a security level or compartment which does not exactly match the level and compartment requested for the connection. If the precedence of the incoming segment is less than the precedence level requested a reset is sent. 2. If the connection is in a synchronized state (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, TIME-WAIT, CLOSE-WAIT, CLOSING), any unacceptable segment should elicit only an empty acknowledgment segment containing the current send-sequence number and an acknowledgment indicating the next sequence number expected to be received. Reset Processing All reset (RST) segments are validated by checking their SEQ-fields. A reset is valid if its sequence number is in the window. In the case of a RST received in response to an initial SYN any sequence number is acceptable if the ACK field acknowledges the SYN. The receiver of a RST first validates it, then changes state. If the receiver was in the LISTEN state, it ignores it. If the receiver was in SYN-RECEIVED state and had previously been in the LISTEN state, then the receiver returns to the LISTEN state, otherwise the receiver aborts the connection and goes to the CLOSED state. If the receiver was in any other state, it aborts the connection and advises the user and goes to the CLOSED state. 3.5. Closing a Connection CLOSE is an operation meaning "I have no more data to send." The notion of closing a full-duplex connection is subject to ambiguous interpretation, of course, since it may not be obvious how to treat the receiving side of the connection. We have chosen to treat CLOSE in a simplex fashion. The user who CLOSEs may continue to RECEIVE until he is told that the other side has CLOSED also. Thus, a program could initiate several SENDs followed by a CLOSE, and then continue to RECEIVE until signaled that a RECEIVE failed because the other side has CLOSED. We assume that the TCP will signal a user, even if no RECEIVEs are outstanding, that the other side has closed, so the user
can terminate his side gracefully. A TCP will reliably deliver all buffers SENT before the connection was CLOSED so a user who expects no data in return need only wait to hear the connection was CLOSED successfully to know that all his data was received at the destination TCP. There are essentially three cases: 1) The user initiates by telling the TCP to CLOSE the connection 2) The remote TCP initiates by sending a FIN control signal 3) Both users CLOSE simultaneously Case 1: Local user initiates the close In this case, a FIN segment can be constructed and placed on the outgoing segment queue. No further SENDs from the user will be accepted by the TCP, and it enters the FIN-WAIT-1 state. RECEIVEs are allowed in this state. All segments preceding and including FIN will be retransmitted until acknowledged. When the other TCP has both acknowledged the FIN and sent a FIN of its own, the first TCP can ACK this FIN. It should be noted that a TCP receiving a FIN will ACK but not send its own FIN until its user has CLOSED the connection also. Case 2: TCP receives a FIN from the network If an unsolicited FIN arrives from the network, the receiving TCP can ACK it and tell the user that the connection is closing. The user should respond with a CLOSE, upon which the TCP can send a FIN to the other TCP. The TCP then waits until its own FIN is acknowledged whereupon it deletes the connection. If an ACK is not forthcoming, after a timeout the connection is aborted and the user is told. Case 3: both users close simultaneously A simultaneous CLOSE by users at both ends of a connection causes FIN segments to be exchanged. When all segments preceding the FINs have been processed and acknowledged, each TCP can ACK the FIN it has received. Both will, upon receiving these ACKs, delete the connection.
TCP A TCP B 1. ESTABLISHED ESTABLISHED 2. (Close) FIN-WAIT-1 --> <SEQ=100><CTL=FIN> --> CLOSE-WAIT 3. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT 4. (Close) TIME-WAIT <-- <SEQ=301><CTL=FIN> <-- CLOSING 5. TIME-WAIT --> <SEQ=100><ACK=301><CTL=ACK> --> CLOSED 6. (2 MSL) CLOSED Normal Close Sequence Figure 15. TCP A TCP B 1. ESTABLISHED ESTABLISHED 2. (Close) (Close) FIN-WAIT-1 --> <SEQ=100><CTL=FIN> ... FIN-WAIT-1 <-- <SEQ=300><CTL=FIN> <-- ... <SEQ=100><CTL=FIN> --> 3. CLOSING --> <SEQ=100><ACK=301><CTL=ACK> ... CLOSING <-- <SEQ=300><ACK=101><CTL=ACK> <-- ... <SEQ=100><ACK=301><CTL=ACK> --> 4. CLOSED CLOSED Simultaneous Close Sequence Figure 16.
3.6. Precedence and Security The intent is that connection be allowed only between ports operating with exactly the same security and compartment values and at the higher of the precedence level requested by the two parts. The precedence levels are: flash override - 111 flash - 110 immediate - 10X priority - 01X routine - 00X The security levels are: top secret - 11 secret - 10 confidential - 01 unclassified - 00 The compartments are assigned by the Defense Communications Agency. The defaults are precedence: routine, security: unclassified, compartment: zero. A host which does not implement precedence or security feature should clear these fields to zero for segments it sends. A connection attempt with mismatched security/compartment values or a lower precedence value should be rejected by sending a reset. Note that TCP modules which operate only at the default value of precedence will still have to check the precedence of incoming segments and possibly raise the precedence level they use on the connection. 3.7. Data Communication Once the connection is established data is communicated by the exchange of segments. Because segments may be lost due to errors (checksum test failure), or network congestion, TCP uses retransmission (after a timeout) to ensure delivery of every segment. Duplicate segments may arrive due to network or TCP retransmission. As discussed in the section on sequence numbers the TCP performs certain tests on the sequence and acknowledgment numbers in the segments to verify their acceptability. The sender of data keeps track of the next sequence number to use in the variable SND.NXT. The receiver of data keeps track of the next
sequence number to expect in the variable RCV.NXT. The sender of data keeps track of the oldest unacknowledged sequence number in the variable SND.UNA. If the data flow is momentarily idle and all data sent has been acknowledged then the three variables will be equal. When the sender creates a segment and transmits it the sender advances SND.NXT. When the receiver accepts a segment it advances RCV.NXT and sends an acknowledgment. When the data sender receives an acknowledgment it advances SND.UNA. The extent to which the values of these variables differ is a measure of the delay in the communication. Normally the amount by which the variables are advanced is the length of the data in the segment. However, when letters are used there are special provisions for coordination the sequence numbers, the letter boundaries, and the receive buffer boundaries. End of Letter Sequence Number Adjustments There is provision in TCP for the receiver of data to optionally communicate to the sender of data on a connection at the time of the connection synchronization the receiver's buffer size. If this is done the receiver must use this fixed size of buffers for the lifetime of the connection. If a buffer size is communicated then there is a coordination between receive buffers, letters, and sequence numbers. Each time a buffer is completed either due to being filled or due to an end of letter, the sequence number is incremented through the end of that buffer. That is, whenever an EOL is transmitted, the sender advances its send sequence number, SND.NXT, by an amount sufficient to consume all the unused space in the receiver's buffer. The amount of space consumed in this fashion is subtracted from the send window just as is the space consumed by actual data. And, whenever an EOL is received, the receiver advances its receive sequence number, RCV.NXT, by an amount sufficient to consume all the unused space in the receiver's buffer. The amount of space consumed in this fashion is subtracted from the receive window just as is the space consumed by actual data.
older sequence numbers newer sequence numbers | Buffer 1 | Buffer 2 | | ----+-------------------------------+----------------- XXXXXXXXXXXXXXXXXXXXX+++++++++++ | | | |<-----SEG.LEN------>| | | | | | | | SEG.SEQ A B XXX - data octets from segment +++ - phantom data <----- sequence space -----> End of Letter Adjustment Figure 17. In the case illustrated above, if the segment does not carry an EOL flag, the next value of SND.NXT or RCV.NXT will be A. If it does carry an EOL flag, the next value will be B. The exchange of buffer size and sequencing information is done in units of octets. If no buffer size is stated, then the buffer size is assumed to be 1 octet. The receiver tells the sender the size of the buffer in a SYN segment that contains the 16 bit buffer size data in an option field in the TCP header. Each EOL advances the sequence number (SN) to the next buffer boundary While LBB < SEG.SEQ+SEG.LEN Do LBB <- LBB + BS End SN <- LBB where LBB is the Last Buffer Beginning, and BS is the buffer size. The CLOSE user call implies an end of letter, as does the FIN control flag in an incoming segment. The Communication of Urgent Information The objective of the TCP urgent mechanism is to allow the sending user to stimulate the receiving user to accept some urgent data and to permit the receiving TCP to indicate to the receiving user when all the currently known urgent data has been received by the user.
This mechanism permits a point in the data stream to be designated as the end of "urgent" information. Whenever this point is in advance of the receive sequence number (RCV.NXT) at the receiving TCP, that TCP should tell the user to go into "urgent mode"; when the receive sequence number catches up to the urgent pointer, the TCP should tell user to go into "normal mode". If the urgent pointer is updated while the user is in "read fast" mode, the update will be invisible to the user. The method employs a urgent field which is carried in all segments transmitted. The URG control flag indicates that the urgent field is meaningful and should be added to the segment sequence number to yield the urgent pointer. The absence of this flag indicates that the urgent pointer has not changed. To send an urgent indication the user must also send at least one data octet. If the sending user also indicates end of letter, timely delivery of the urgent information to the destination process is enhanced. Managing the Window The window sent in each segment indicates the range of sequence number the sender of the window (the data receiver) is currently prepared to accept. There is an assumption that this is related to the currently available data buffer space available for this connection. The window information is a guideline to be aimed at. Indicating a large window encourages transmissions. If more data arrives than can be accepted, it will be discarded. This will result in excessive retransmissions, adding unnecessarily to the load on the network and the TCPs. Indicating a small window may restrict the transmission of data to the point of introducing a round trip delay between each new segment transmitted. The mechanisms provided allow a TCP to advertise a large window and to subsequently advertise a much smaller window without having accepted that much data. This, so called "shrinking the window," is strongly discouraged. The robustness principle dictates that TCPs will not shrink the window themselves, but will be prepared for such behavior on the part of other TCPs. The sending TCP must be prepared to accept and send at least one octet of new data even if the send window is zero. The sending TCP should regularly retransmit to the receiving TCP even when the window is zero. Two minutes is recommended for the retransmission interval when the window is zero. This retransmission is essential to guarantee
that when either TCP has a zero window the re-opening of the window will be reliably reported to the other. The sending TCP packages the data to be transmitted into segments which fit the current window, and may repackage segments on the retransmission queue. Such repackaging is not required, but may be helpful. Users must keep reading connections they close for sending until the TCP says no more data. In a connection with a one-way data flow, the window information will be carried in acknowledgment segments that all have the same sequence number so there will be no way to reorder them if they arrive out of order. This is not a serious problem, but it will allow the window information to be on occasion temporarily based on old reports from the data receiver. 3.8. Interfaces There are of course two interfaces of concern: the user/TCP interface and the TCP/IP interface. We have a fairly elaborate model of the user/TCP interface, but only a sketch of the interface to the lower level protocol module. User/TCP Interface The functional description of user commands to the TCP is, at best, fictional, since every operating system will have different facilities. Consequently, we must warn readers that different TCP implementations may have different user interfaces. However, all TCPs must provide a certain minimum set of services to guarantee that all TCP implementations can support the same protocol hierarchy. This section specifies the functional interfaces required of all TCP implementations. TCP User Commands The following sections functionally characterize a USER/TCP interface. The notation used is similar to most procedure or function calls in high level languages, but this usage is not meant to rule out trap type service calls (e.g., SVCs, UUOs, EMTs). The user commands described below specify the basic functions the TCP must perform to support interprocess communication. Individual implementations should define their own exact format, and may provide combinations or subsets of the basic functions in
single calls. In particular, some implementations may wish to automatically OPEN a connection on the first SEND or RECEIVE issued by the user for a given connection. In providing interprocess communication facilities, the TCP must not only accept commands, but must also return information to the processes it serves. The latter consists of: (a) general information about a connection (e.g., interrupts, remote close, binding of unspecified foreign socket). (b) replies to specific user commands indicating success or various types of failure. Open Format: OPEN (local port, foreign socket, active/passive [, buffer size] [, timeout] [, precedence] [, security/compartment]) -> local connection name We assume that the local TCP is aware of the identity of the processes it serves and will check the authority of the process to use the connection specified. Depending upon the implementation of the TCP, the local network and TCP identifiers for the source address will either be supplied by the TCP or by the processes that serve it (e.g., the program which interfaces the TCP network). These considerations are the result of concern about security, to the extent that no TCP be able to masquerade as another one, and so on. Similarly, no process can masquerade as another without the collusion of the TCP. If the active/passive flag is set to passive, then this is a call to LISTEN for an incoming connection. A passive open may have either a fully specified foreign socket to wait for a particular connection or an unspecified foreign socket to wait for any call. A fully specified passive call can be made active by the subsequent execution of a SEND. A full-duplex transmission control block (TCB) is created and partially filled in with data from the OPEN command parameters. On an active OPEN command, the TCP will begin the procedure to synchronize (i.e., establish) the connection at once. The buffer size, if present, indicates that the caller will always receive data from the connection in that size of buffers. This buffer size is a measure of the buffer between the user and
the local TCP. The buffer size between the two TCPs may be different. The timeout, if present, permits the caller to set up a timeout for all buffers transmitted on the connection. If a buffer is not successfully delivered to the destination within the timeout period, the TCP will abort the connection. The present global default is 30 seconds. The buffer retransmission rate may vary; most likely, it will be related to the measured time for responses from the remote TCP. The TCP or some component of the operating system will verify the users authority to open a connection with the specified precedence or security/compartment. The absence of precedence or security/compartment specification in the OPEN call indicates the default values should be used. TCP will accept incoming requests as matching only if the security/compartment information is exactly the same and only if the precedence is equal to or higher than the precedence requested in the OPEN call. The precedence for the connection is the higher of the values requested in the OPEN call and received from the incoming request, and fixed at that value for the life of the connection. Depending on the TCP implementation, either a local connection name will be returned to the user by the TCP, or the user will specify this local connection name (in which case another parameter is needed in the call). The local connection name can then be used as a short hand term for the connection defined by the <local socket, foreign socket> pair. Send Format: SEND(local connection name, buffer address, byte count, EOL flag, URGENT flag [, timeout]) This call causes the data contained in the indicated user buffer to be sent on the indicated connection. If the connection has not been opened, the SEND is considered an error. Some implementations may allow users to SEND first; in which case, an automatic OPEN would be done. If the calling process is not authorized to use this connection, an error is returned. If the EOL flag is set, the data is the End Of a Letter, and the EOL bit will be set in the last TCP segment created from the