RFC 0675

Specification of Internet Transmission Control Program

Pages: 70
Obsoleted by: 7805

Part 2 of 3 – Pages 20 to 43

noToC RFC0675 - Page 20 prevText

   16 bits: Source TCP address

   24 bits: Destination port address

   24 bits: Source port address

   16 bits: Checksum (if EOS bit is set)

4.2.2  TRANSMISSION CONTROL BLOCK

   It is highly likely that any implementation will include shared data
   structures among parts of the TCP and some asynchronous means of
   signaling users when letters have been delivered.

   One typical data structure is the Transmission Control Block (TCB)
   which is created and maintained during the lifetime of a given
   connection. The TCB contains the following information (field sizes
   are notional only and may vary from one implementation to another):

      16 bits: Local connection name

      48 bits: Local socket

      48 bits: Foreign socket

      16 bits: Receive window size in octets

      32 bits: Receive left window edge (next sequence number expected)

      16 bits: Receive packet buffer size of TCB (may be less than
      window)

      16 bits: Send window size in octets

      32 bits: Send left window edge (earliest unacknowledged octet)

      32 bits: Next packet sequence number

      16 bits: Send packet buffer size of TCB (may be less than window)

      8 bits: Connection state

         E/C - 1 if TCP has been synchronized at least once (i.e. has
         been established, else O, meaning it is closed; this bit is
         reset after FINS are exchanged and the user has done a CLOSE).
         The bit is not reset if the connection is only desynchronized
         on send or receive or both directions.

noToC RFC0675 - Page 21

         SS - SYNCed on send side (if set) else desynchronized

         SR - SYNCed on receive side (if set, else desynchronized)

   16 bits: Special flags

      S1 - SYN sent if set

      S2 - SYN verified if set

      R - SYN received if set

      Y - FIN sent if set

      C - CLOSE from local user received if set

      U - Foreign socket unspecified if set

      SDS - Send side DSN sent if set

      SDV - Send side DSN verified if set

      RDR - Receive side DSN received if set

   Initially, all bits are off [no pun intended] (i.e. SS, SR, E/C, S1,
   S2, R, F, C, SDS, SDV, RDR =0). When R is set, so is SR. When S1 and
   S2 are both set, so is SS. SR is reset when RDR is set. SS is reset
   when both SDS and SDV are set. These bits are used to keep track of
   connection state and to aid in arriving packet processing (e.g. Can
   sequence number be validated? Only if SR is set.).

   16 bits: Retransmission timeout (in eighths of a second#]

   16 bits: Head of Send buffer queue [buffers SENT from user to TCP,
   but not packetized]

   16 bits: Tail of Send buffer queue

   16 bits: Pointer to last octet packetized in partially packetized
   buffer (refers to the buffer at the head of the queue)

   16 bits: Head of Send packet queue

   16 bits: Tail of Send packet queue

   16 bits: Head of Packetized buffer Queue

   16 bits: Tail of Packetized buffer queue

noToC RFC0675 - Page 22

   16 bits: Head of Retransmit packet queue

   16 bits: Tail of Retransmit packet queue

   16 bits: Head of Receive buffer queue [queue of buffers given by user
   to RECEIVE letters, but unfilled]

   16 bits: Tail of Receive buffer queue

   16 bits: Head of Receive packet queue

   16 bits: Tail of receive packet queue

   16 bits: Pointer to last contiguous receive packet

   16 bits: Pointer to last octet filled in partly filled buffer

   16 bits: Pointer to next octet to read from partly emptied packet

      [Note: The above two pointers refer to the head of the receive
      buffer and receive packet queues respectively]

   16 bits: Forward TCB pointer

   16 bits: Backward TCB pointer

4.3  CONNECTION MANAGEMENT

4.3.1  INITIAL SEQUENCE NUMBER SELECTION

   The protocol places no restriction on a particular connection being
   used over and over again. New instances of a connection will be
   referred to as incarnations of the connection. The problem that
   arises owing to this is, "how does the TCP identify duplicate packets
   from previous incarnations of the connection?". This problem becomes
   harmfully apparent if the connection is being opened and closed in
   quick succession, or if the connection breaks with loss of memory and
   is then reestablished.

   The essence of the solution [TOML74] is that the initial sequence
   number [ISN] must be chosen so that a particular sequence number can
   never refer to an "o1d" octet, Once the connection is established the
   sequencing mechanism provided by the TCP filters out duplicates.

   For an association to be established or initialized, the two TCP's
   must synchronize on each other's initial sequence numbers. Hence the
   solution requires a suitable mechanism for picking an initial
   sequence number [ISN], and a slightly involved handshake to exchange

noToC RFC0675 - Page 23

   the ISN's. A "three way handshake" is necessary because sequence
   numbers are not tied to a global clock in the network, and TCP's may
   have different mechanisms for picking the ISN's. The receiver of the
   first SYN has no way of knowing whether the packet was an old delayed
   one or not, unless it remembers the last sequence number used on the
   connection which is not always possible, and so it must ask the
   sender to verify this SYN.

   The "three way handshake" and the advantages of a "clock-driven"
   scheme are discussed in [TOML74]. More on the subject, and algorithms
   for implementing the clock-driven scheme can be found in [DALA74].

4.3.2 ESTABLISHING A CONNECTION

   The "three way handshake" is essentially a unidirectional attempt to
   establish the connection, i.e. there is an initiator and a responder.
   The TCP's should however be able to establish the connection even if
   a simultaneous attempt is made by both TCP's to establish the
   connection. Simultaneous attempts are treated like "collisions" in
   "Aloha" systems and these conflicts are resolved into unidirectional
   attempts to establish the connection. This scheme was adopted because

      (i) Connections will normally have a passive and an active end,
      and so the mechanism should in most cases be as simple as
      possible.

      (ii) It is easy to implement as special cases do not have to be
      accounted for.

   The example below indicates what a three way handshake between TCP's
   A and B looks like

         A                                                 B

         --> <SEQ x><SYN>                                  -->

         <-- <SEQ y><SYN, ACK x+l>                         <--

         --> <SEQ x+1><ACK y+l><DATA BYTES>                -->

   The receiver of a "SYN" is able to determine whether the "SYN" was
   real (and not an old duplicate) when a positive "ACK" is returned for
   the receiver's "SYN,ACK" in response to the "SYN". The sender of a
   "SYN" gets verification on receipt of a "SYN,ACK" whose "ACK" part
   references the sequence number proposed in the original "SYN" [pun
   intended]. If the TCP is in the state where it is waiting for a
   response to its SYN, but gets a SYN instead, then it always thinks
   this is a collision and goes into the state prior to having sent the

noToC RFC0675 - Page 24

   SYN, i.e. it forgets that it had sent a SYN. The TCP will try to
   establish the connection again after some time, unless it has to
   respond to an arriving SYN. Even if the wait times in the two TCPs
   are the same, the varying delays in network transmission will usually
   be adequate to avoid a collision on the next cycle of attempts to
   send SYN.

   When establishing a connection, the state of the TCP is represented
   by 3 bits --

      S1 S2 R

      S1 = 1 -- SYN sent

      S2 = 1 -- My SYN verified

      R = 1 -- SYN received

   Some examples of attempts to establish the connection are now shown.
   The state of the connection is indicated when a change occurs. We
   specifically do not show the cases in which connection
   synchronization is carried out with packets containing both SYN and
   data. We do this to simplify the explanation, but we do not rule out
   an implementation which is capable of dealing with data arriving in
   the first packet (it has to be stored temporarily without
   acknowledgment or delivery to the user until the arriving SYN has
   been verified).

   The "three way handshake" now looks like --

              A                                            B
      ------------                                      ------------
      S1 S2 R                                                S1 S2 R

      0  0 0                                                 0  0 0

             --> <SEQ x><SYN>                           -->

      1  0 0                                                 0  0 1

             <-- <SEQ y><SYN, ACK x+l>                  <--

      1  1 1                                                 1  0 1

             --> <SEQ x+1><ACK y+1>(DATA OCTETS)        -->

      1  1 1                                                 1  1 1

noToC RFC0675 - Page 25

   The scenario for a simultaneous attempt to establish the connection
   without the arrival of any delayed duplicates is --

                    A                                     B
            ------------                               ------------
            S1 S2 R                                         S1 S2 R

             0  0 0                                          0  0 0

      (M1)   1  0 0 --> <SEQ x><SYN>                    ...

      (M2)   0  0 0 <-- <SEQ y><SYN)                    <--  1  0 0

      (M1)              B returns no SYN sent           -->  0  0 0

      (M1)   1  0 0 --> <SEQ z><SYN>      *             -->  0  0 1

      (M3)   1  1 1 <-- <SEQ y+1><SYN,ACK z+1>          <--  1  0 1

      (M4)   1  1 1 --> <SEQ z+1><ACK y+1><DATA>        -->  1  1 1

      Note: "..." means that a message does not arrive, but is delayed
      in the network. State changes are upon arrival or upon departure
      of a given message, as the case may be. Packets containing the SYN
      or INT or DSN bits implicitly contain a "dummy" data octet which
      is never delivered to the user, but which causes the packet
      sequence numbers to be incremented by 1 even if no real data is
      sent. This permits the acknowledgment of these controls without
      acknowledging receipt of any data which might also have been
      carried in the packet. A packet containing a FIN bit has a dummy
      octet following the last octet of data (if any) in the packet.

      * Once in state 000 sender selects new ISN z when attempting to
      establish the connection again.

4.3.3 HALF-OPEN CONNECTIONS

   An established connection is said to be a "half-open" connection if
   one of the TCP's has closed the connection at its end without the
   knowledge of the other, or if the two ends of the connection have
   become desynchronized owing to a crash that resulted in loss of
   memory. Such connections will automatically become reset if an
   attempt is made to send data in either direction. However, half-open
   connections are expected to be unusual, and the recovery procedure is
   somewhat involved.

noToC RFC0675 - Page 26

   If one end of the connection no longer exists, then any attempt by
   the other user to send any data on it will result in the sender
   receiving the event code "Connection does not exist at foreign TCP".
   Such an error message should indicate to the user process that
   something is wrong and it is expected to CLOSE the connection.

   Assume that two user processes A and B are communicating with one
   another when a crash occurs causing loss of memory to B's TCP.
   Depending on the operating system supporting B's TCP, it is likely
   that some error recovery mechanism exists. When the TCP is up again B
   is likely to start again from the beginning or from a recovery point.
   As a result B will probably try to OPEN the connection again or try
   to SEND on the connection it believes open. In the latter case 1t
   receives the error message "connection not open" from the local TCP.
   In an attempt to establish the connection B's TCP will send a packet
   containing SYN. A's TCP thinks that the connection is already
   established and so will respond with the error "unacceptable SYN (or
   SYN/ACK) arrived at foreign TCP". B's TCP knows that this refers to
   the SYN it just sent out, and so should reset the connection and
   inform the user process of this fact.

   It may happen that B is passive and only wants to receive data. In
   this case A's data will not reach B because the TCP at B thinks the
   connection is not established. As a result A'S TCP will timeout and
   send a QRY to B's TCP. B's TCP will send STATUS saying the connection
   is not synched. A's TCP will treat this as if an implicit CLOSE had
   occurred and tell the user process, A, that the connection is
   closing. A is expected to respond with a CLOSE command to his TCP.
   However, A's TCP does not send a FIN to B's TCP, since it would not
   be accepted anyway on the unsynced connection. Eventually A will try
   to reopen the connection or B will give up and CLOSE. If B CLOSES,
   B's TCP will simply delete the connection since it was not
   established as far as B's TCP is concerned. No message will be sent
   to A'S TCP as a result.

4.3.4  RESYNCHRONIZING A CONNECTION

   Details of resynchronization have not yet been specified since the
   need for this should be infrequent in the initial testing stages.

4.3.5 CLOSING A CONNECTION

   There are essentially three cases:

      a) The user initiates by telling the TCP to CLOSE the connection

      b) The remote TCP initiates by sending a FIN control signal

noToC RFC0675 - Page 27

      c) Both users CLOSE simultaneously

   Two bits are used to maintain control over the closing of a
   connection: these are called the "FIN sent" bit [F] and the "USER
   Closed" bit, [C] respectively. The control procedure uses these two
   bits to assure that the connection is properly closed.

   Case 1: Local user initiates the close

      In this case, both the F and C bits are initially zero, but the C
      bit is set immediately upon receipt of the user call "CLOSE." When
      the FIN is sent out by the TCP, the F bit is set. All pending
      RECEIVES are terminated and the user is told that they have been
      prematurely terminated ("connection closing"} without data.
      Similarly, any pending SENDS are terminated with the same
      response, "connection closing."

      Several responses may arrive as the result of sending a FIN. The
      one which is generally expected is a matching FIN. When this is
      received, the TCB CAN BE ELIMINATED. If a "connection does not
      exist at foreign TCP" message comes in response to the FIN, then
      the TCB can likewise be eliminated. If no response is forthcoming,
      or if "Foreign TCP inaccessible" arrives then the resolution is
      moot. One might simply timeout and discard the TCB. Since the
      local user wants to CLOSE anyway, this is probably satisfactory,
      although it will leave a potential "half-open" connection at the
      other side. We deal with half open connections in section 4.3.3.

      When the acknowledging FIN arrives after the connection state bits
      are set (F=1, C=1), then the TCB can be deleted.

   Case 2: TCP receives a FIN from the network

      First of all, a FIN must have a sequence number which lies in the
      valid receive window. If not, it is discarded and the left window
      edge is sent as acknowledgment. If the FIN can be processed, it is
      handled (possibly out of order, since it is taken as an imperative
      to shut down the connection). All pending RECEIVES and SENDS are
      responded to by showing that they were terminated by the other
      side's close request (i.e. "connection closing"). The user is also
      told by an unsolicited event or signal that the connection has
      been closed (in some systems, the user might have to request
      STATUS to get this information). Finally, the TCP sends FIN in
      response.

      Thus, because a FIN arrived, a FIN is sent back, so the F bit is
      set. However, the TCB stays around until the local user does a
      CLOSE in acknowledgment of the unsolicited signal that the

noToC RFC0675 - Page 28

      connection has been closed by the other side. Thus, the C bit
      remains unset until this happens. If the C and F bits go from (F=1
      C=O) to (F=l, C=1), then the connection is closed and the TCB can
      be removed.

   Case 3: both users close simultaneously

      If this happens, both connections will be in the (F=1, C=1) state.
      When the FINs arrive, the connections w11i be shut down. If one
      FIN fails to arrive, we have two choices. One is to insist on
      acknowledgments for FINs, in which case the missing one will be
      retransmitted. Another is merely to permit the half-open
      connection to remain (we prefer this solution}. It can timeout
      independently and go away after a while. If an attempt is made to
      reestablish the connection, the initiator will discover the
      existence of the open connection since an "inappropriate SYN
      received" message will be sent by the TCP which holds the "half-
      open" connection. The receiver of this message can tell the other
      TCP to reset the connection. We cannot permit the holder of the
      half-open connection to reset automatically on receipt of the SYN
      since its receipt is not necessarily prima facie evidence of a
      half open connection. (The SYN could be a delayed duplicate.)

4.3.6.  CONNECTION STATE and its relation to USER and INCOMING CONTROL
   REQUESTS

   In order to formalize the action taken by the TCP when it receives
   commands from the User, or Control information from the network, we
   define a connection to be in one of 7 states at any instant. These
   are known as the TCB Major States. Each Major State is simply a
   convenient name for a particular setting or group of settings of the
   state bits, as follows:

      S1 S2  R  U  F  C   #   name

       -  -  -  -  -  -   0   no TCB

       0  0  0 0/1 0  0   1   unsync

       1  0  0  0  0  0   2   SYN sent

       1  0  1 0/1 0  0   3   SYN received

       1  1  1  0  0  0   4   established

       1 0/1 1 0/1 1  1   5   FIN wait

       1  1  1  0  1  0   6   FIN received

noToC RFC0675 - Page 29

   The connection moves from state to state as shown below. The
   transition from one state to another will be represented as

      [X, Y]<cause><action>

   which means that there is a transition from state X to state Y owing
   to <cause>. The action taken by the TCP is specified as <action>. We
   use this notation to give the important state transitions, often
   simplifying the cause and action fields to take into account a number
   of situations. Figure 1 illustrates these transitions in traditional
   state diagram form. Section 4.4.6 and section 4.4.7 fully specify the
   effect of all User commands and Control information arriving from the
   network.

      [0,l] <OPEN> <create TCB>

      [1,2] <SEND,INTERRUPT, or collision timeout> <send SYN>

      [1,3] <SYN arrives> <send SYN,ACK>

      [1,0] <CLOSE> <remove TCB>

      [2,1] <SYN arrives (collision)> <set timeout, forget SYNs>

      [2,0] <CLOSE> <remove TCB>

      [2,4] <appropriate SYN,ACK arrives> <send ACK>

      [3,4] <appropriate ACK arrives> <none>

      [3,1] <error arrives or timeout> <(forget SYN)>

      [3,5] <CLOSE> <send FIN>

      [4,5] <CLOSE> <send FIN>

      [4,6] <appropriate FIN arrives> <send FIN, inform user>

      [5,0] <FIN or error arrives, or timeout> <remove TCB>

      [6,0] <CLOSE> <remove TCB>

4.4  STRUCTURE 0F THE TCP

4.4.l  INTRODUCTION [See figure 2.1]

   There are many possible implementations of the TCP. We offer one
   conceptual framework in which to view the various algorithms that

noToC RFC0675 - Page 30

   make up the TCP design. In our concept, the TCP is written in two
   parts, an interrupt or signal driven part (consisting of four
   processes), and a reentrant library of subroutines or system calls
   which interface the user process to the TCP. The subroutines
   communicate with the interrupt part through shared data structures
   (TCB's, shared buffer queues etc.). The four processes are the Output
   Packet Handler which sends packets to the packet switch; the
   Packetizer which formats letters into internet packets; the Input
   Packet Handler which processes incoming packets; and the Reassembler
   which builds letters for users.

   The ultimate bottleneck is the pipe through which arriving and
   departing packets must travel. This is the Host/Packet Switch
   interface. The interrupt driven TCP shares among all TCB's its
   limited packet buffer resources for sending and receiving packets.
   From the standpoint of controlling buffer congestion, it appears
   better to TREAT INCOMING PACKETS WITH HIGHER PRIORITY THAN OUTGOING
   PACKETS. That is, packet buffers which can be released by copying
   their contents into user buffers clearly help to reduce congestion.
   Neither the packetizer nor the input packet handler should be allowed
   to take up all available packet buffer space; an analogous problem
   arises in the IMP in the allocation of store and forward, and
   reassembly buffer space. One policy is to permit neither contender
   more than, say, two-thirds of the space. The buffer allocation
   routines can enforce these limits and reject buffer requests as
   needed. Conceptually, the scheduler can monitor the amounts of
   storage dedicated to the input and output routines, and can force
   either to sleep if its buffer allocation exceeds the limit.

   As an example, we can consider what happens when a user executes a
   SEND call to the TCP service routines. The buffer containing the
   letter is placed on a SEND buffer queue associated with the user's
   TCB. A 'packetizer' process is awakened to look through all the TCB's
   for 'packetizing' work. The packetizer will keep a roving pointer
   through the TCB list which enables it to pick up new buffers from the
   TCB queue and packetize them into output buffers. The packetizer
   takes no more than one letter at a time from any single TCB. The
   packetizer attempts to maintain a non-empty queue of output packets
   so that the output handler will not fall idle waiting for the
   packetizing operation. However, since arriving packets compete with
   departing packets, care must be taken to prevent either class from
   occupying all of the shared packet buffer space. Similarly since the
   TCB's all compete for space in service to their connections, neither
   input nor output packet space should be dominated by any one TCB.

   When a packet is created, it is placed on a FIFO SEND packet queue
   associated with its origin TCB. The packetizer wakes the output
   handler and then continues to packetize a few more buffers, perhaps,

noToC RFC0675 - Page 31

   before going to sleep. The output handler is awakened either by a
   'hungry' packet switch or by the packetizer; in either case, it uses
   a roving TCB pointer to select the next TCB for service. The send
   packet queue can be used as a 'work queue' for the output handler.
   After a packet has been sent, but usually before an ACK is returned,
   the output handler moves the packet to a retransmission queue
   associated with each TCB.

   Retransmission timeouts can refer to specific packets and the
   retransmission list can be searched for the specific packet. If an
   ACK is received, the retransmission entry can be removed from the
   retransmit queue. The send packet queue contains only packets waiting
   to be sent for the first time. INTERRUPT requests can remove entries
   in both the send packet queue and the retransmit packet queue.

   Since packets are never in more than one queue at a time, it appears
   possible for INT, FIN or RESET commands to remove packets from the
   receive, send, or retransmit packet queues with the assurance that an
   already issued signal to enter the reassembler, the packetizer or the
   output handler will not be confusing.

   Handling the INTERRUPT and CLOSE functions can however require some
   care to avoid confusing the scheduler, and the various processes. The
   scheduler must maintain status information for the processes. This
   information includes the current TCB being serviced. When an
   INTERRUPT is issued by a local process, the output queue of letters
   associated with the local port reference is to be deleted. The
   packetizer, for example, may however be working at that time on the
   same queue. As usual, simultaneous reading and writing of the TCB
   queue pointers must be inhibited through some sort of semaphore or
   lockout mechanism. When the packetizer wants to serve the next send
   buffer queue, it must lock out all other access to the queue, remove
   the head of the queue (assuming of course that there are enough
   buffers for packetization), advance the head of the queue, and then
   unlock access to the queue.

   If the packetizer keeps only a TCB pointer in a global place called
   CPTCB (current packetizer TCB address), and always uses the address
   in CPTCB to find the TCB in which to examine the send buffer queue,
   then removal of the output buffer queue does not require changes to
   any working storage belonging to the packetizer. Even more important,
   the arrival and processing of a RESET or CLOSE, which clears the
   system of a given TCB, can update the CPTCB pointer, as long as the
   removal does not occur while the packetizer is still working on the
   TCB.

noToC RFC0675 - Page 32

   Incoming packets are examined by the input packet handler. Here they
   are checked for valid connection sockets, and acknowledgments are
   processed, causing packets to be removed, possibly, from the SEND or
   RETRANSMIT packet queues as needed. As an example, consider the
   receipt of a valid FIN request on a particular TCB. If a FIN had not
   been sent before (i.e. F bit not set), then a FIN packet is
   constructed and sent after having cleared out the SEND buffer and
   SEND packet queues as well as the RETRANSMIT queue. Otherwise, if the
   F and C bits are both set, all queues are emptied and the TCB is
   returned to free storage.

   Packets which should be reassembled into letters and sent to users
   are queued by the input packet handler, on the receive packet queue,
   for processing by the reassembly process. The reassembler looks at
   its FIFO work queue and tries to move packets into user buffers which
   are queued up in an input buffer queue on each TCB. If a packet has
   arrived out of order, it can be queued for processing in the correct
   sequence. Each time a packet is moved into a user buffer, the left
   window edge of the receiving TCB is moved to the right so that
   outgoing packets can carry the correct ACK information. If the SEND
   buffer queue is empty, then the reassembler creates a packet to carry
   the ACK.

   As packets are moved 1nto buffers and they are filled, the buffers
   are dequeued from the RECEIVE buffer queue and passed to the user.
   The reassembler can also be awakened by the RECEIVE user call should
   it have a non-empty receive packet queue with an empty RECEIVE buffer
   queue. The awakened reassembler goes to work on each TCB, keeping a
   roving pointer, and sleeping if a cycle is made of all TCB's without
   finding any work.

4.4.2  INPUT PACKET HANDLER [See figure 2.2]

   The Input Packet Handler is awakened when a packet arrives from the
   network. It first verifies that the packet is for an existing TCB
   (i.e. the local and foreign socket numbers are matched with those of
   existing TCB's). If this fails, an error message is constructed and
   queued on the send packet queue of a dummy TCB. A signal is also sent
   to the output packet handler. Generally, things to be transmitted
   from the dummy TCB have a default retransmission timeout of zero, and
   will not be retransmitted. (We use the idea of a dummy TCB so that
   all packets containing errors, or RESET can be sent by the output
   packet handler, instead of having the originator of them interface to
   the net. These packets, it will be noticed, do not belong to any
   TCB).

noToC RFC0675 - Page 33

   The input packet handler looks out for control or error information
   and acts appropriately. Section 4.4.7 discusses this in greater
   detail, but as an example, if the incoming packet is a RESET request
   of any kind (i.e. all connections from designated TCP or given
   connection), and is believable, then the input packet handler clears
   out the related TCB(s), empties the send and receive packet queues,
   and prepares error returns for outstanding user SEND(s) and
   RECEIVE(s) on each reset TCB. The TCB's are marked unused and
   returned to storage. If the RESET refers to an unknown connection, it
   is ignored.

   Any ACK's contained in incoming packets are used to update the send
   left window edge, and to remove the ACK'ed packets from the TCB
   retransmit packet queue. If the packet being removed was the end of a
   user buffer, then the buffer must be dequeued from the packetized
   buffer queue, and the User informed. The packetizer is also signaled.
   Only one signal, or one for each packet, will have to be sent,
   depending on the scheduling scheme for the processes. See section
   4.4.7 for a detailed discussion.

   The packet sequence number, the current receive window size, and the
   receive left window edge determine whether the packet lies within the
   window or outside of it.

      Let W = window size

         S = size of sequence number space

         L = left window edge

         R = L+W-1 = right window edge

         x = sequence number to be tested

      For any sequence number, x, if

         (R-x) mod S <= W

      then x is within the window.

   A packet should be rejected only if all of it lies outside the
   window. This is easily tested by letting x be, first the packet
   sequence number, and then the sum of packet sequence number and
   packet text length, less one. If the packet lies outside the window,
   and there are no packets waiting to be sent, then the input packet
   handler should construct a dummy ACK and queue it for output on the

noToC RFC0675 - Page 34

   send packet queue, and signal the output packet handler. Successfully
   received packets are placed on the receive packet queue in the
   appropriate sequence order, and the reassembler signaled.

   The packet window check can not be made if the associated TCB is not
   in the 'established' state, so care must be taken to check for
   control and TCB state before doing the window check.

4.4.3  REASSEMBLER [See figure 2.3]

   The Reassembler process is activated by both the Input Packet Handler
   and the RECEIVE user call. While the reassembler is asleep, if
   multiple signals arrive, all but one can be discarded. This is
   important as the reassembler does not know the source of the signal.
   This is so in order that "dangling" signals from work in TCB's that
   have subsequently been removed don't confuse it. Each signal simply
   means that there may be work to be done. If the reassembler is awake
   when a signal arrives, it may be necessary to put 1t in a
   "hyperawake" state so that even if the reassembler tries to quit, the
   scheduler will run it one more time.

   When the reassembler is awakened it looks at the receive packet queue
   for each TCB. If there are some packets there then it sees whether
   the RECEIVE buffer queue is empty. If it is then the reassembler
   gives up on this TCB and goes on to the next one, otherwise if the
   first packet matches the left window edge, then the packet can be
   moved into the User's buffer. The reassembler keeps transferring
   packets into the User's buffer until the letter is completely
   transferred, or something causes it to stop. Note that a buffer may
   be partly filled and then a sequence 'hole' is encountered in the
   receive packet queue. The reassembler must mark progress so that the
   buffer can be filled up starting at the right place when the 'hole'
   is filled. Similarly a packet might be only partially emptied when a
   buffer is filled, so progress in the packet must be marked.

   If a letter was successfully transferred to a User buffer then the
   reassembler signals the User that a letter has arrived and dequeues
   the buffer associated with it from the TCB RECEIVE buffer queue. If
   the buffer is filled then the User is signaled and the buffer
   dequeued as before. The event code indicates whether the buffer
   contains all or part of a letter, as described in section 2.4.

   In every case when a packet is delivered to a buffer, the receive
   left window edge is updated, and the packetizer is signaled. This
   updating must take account of the extra octet included in the
   sequencing for certain control functions [SYN, INT, FIN, DSN]. If the
   send packet queue is empty then the reassembler must create a packet
   to carry the ACK, and place it on the send packet queue.

noToC RFC0675 - Page 35

   Note that the reassembler never works on a TCB for more than one User
   buffer's worth of time, in order to give all TCB's equal service.

   Scheduling of the reassembler is a big issue, but perhaps running to
   completion will be satisfactory, or else it can be time sliced. In
   the latter case it will continue from where it left off, but a new
   signal may have arrived producing some possible work. This work will
   be processed as part of the old incomplete signal, and so some
   wasteful processing may occur when the reassembler wakes up again.
   This is the general problem of trying to implement a protocol that is
   fundamentally asynchronous, but at least it is immune to harmful
   race-conditions. E.g. if we were to have the reassembler 'remove' the
   signal that caused it to wake up, just before it went to sleep (in
   order that new arriving ones were discarded) then a new signal may
   arrive at a critical time causing 1t not to be recognized; thus
   leaving some work pending, and this may result in a deadlock [see
   previous comments on "hyperawake" state].

4.4.4  PACKETIZER [See figure 2.4]

   The Packetizer process gets work from both the Input Packet Handler
   and the SEND user call. The signal from the SEND user call indicates
   that there is something new to send, while the one from the input
   packet handler indicates that more TCP buffers may be available from
   delivered packets. This latter signal is to prevent deadlocks in
   certain kind of scheduling schemes. We assume the same treatment of
   signals as discussed in section 4.4.3.

   When the packetizer is awakened it looks at the SEND buffer queue for
   each TCB. If there is a new or partial letter awaiting packetization,
   it tries to packetize the letter, TCB buffer and window permitting.
   It packetizes no more than one letter for a TCB before servicing
   another TCB. For every packet produced it signals the output packet
   handler (to prevent deadlock in a time sliced scheduling scheme). If
   a 'run till completion' scheme is used then one signal only need be
   produced, the first time a packet is produced since awakening. If
   packetization is not possible the packetizer goes on to the next TCB.

   If a partial buffer was transferred then the packetizer must mark
   progress in the SEND buffer queue. Completely packetized buffers are
   dequeued from the SEND buffer queue, and placed on a Packetized
   buffer queue, so that the buffer can be returned to the user when an
   ACK for the last bit is received.

   When the packetizer packetizes a letter it must see whether it is the
   first piece of data being sent on the connection, in which case it
   must include the SYN bit. Some implementations may not permit data to
   be sent with SYN and others may discard any data received with SYN.

noToC RFC0675 - Page 36

   The Packetizer goes to sleep if it finds no more work at any TCB.

4.4.5  OUTPUT PACKET HANDLER [see figure 2.5]

   When activated by the packetizer, or the input packet handler, or
   some of the user call routines, the Output Packet Handler attempts to
   transmit packets on the net (may involve going through some other
   network interface program). It looks at the TCB's in turn,
   transmitting some packets from the send packet queue. These are
   dequeued and put on the retransmit queue along with the time when
   they should be retransmitted.

   All data packets that are transmitted have the latest receive left
   window edge in the ACK field. Error and control messages may have no
   ACK [ACK bit off], or set the ACK field to refer to a received
   packet's sequence number.

   The RETRANSMIT PROCESS:

   This process can either be viewed as a separate process, or as part
   of the output packet handler. Its implementation can vary; it could
   either perform its function, by being woken up at regular intervals,
   or when the retransmission time occurs for every packet put on the
   retransmit queue. In the first case the retransmit queue for each TCB
   is examined to see if there is anything to retransmit. If there is, a
   packet is placed on the send packet queue of the corresponding TCB.
   The output packet handler is also signaled.

   Another "demon" process monitors all user Send buffers and
   retransmittable control messages sent on each connection, but not yet
   acknowledged. If the global retransmission timeout is exceeded for
   any of these, the User is notified and he may choose to continue or
   close the connection. A QUERY packet may also be sent to ascertain
   the state of the connection [this facilitates recovery from half open
   connections as described in section 4.3.3].

4.4.6  USER CALL PROCESSING

   OPEN [See figure 3.1]

      1. If the process calling does not own the specified local socket,
      return with <type 1><ELP 1 "connection illegal for this process">.

      2. If no foreign socket is specified, construct a new TCB and add
      it to the list of existing TCB's. Select a new local connection
      name and return it along with <type 1><OLP 0 "success">. If there
      is no room for the TCB, respond with <type 1><ELT 4 "No room for
      TCB">.

noToC RFC0675 - Page 37

      3. If a foreign socket is specified, verify that there is no
      existing TCB with the same <local socket, foreign socket> pair
      (i.e. same connection), otherwise return <type l><ELP 6
      "connection already open">. If there is no TCB space, return as in
      (2), otherwise, create the TCB and link it with the others,
      returning a local connection name with the success event code.

      Note: if a TCB is created, be sure to copy the timeout parameter
      into it, and set the "U" bit to 0 if a foreign socket is
      specified, else set U to 1 (to show unspecified foreign socket).

   SEND [see figure 3.2]

      1. Search for TCB with local connection name specified. If none
      found, return <type 10><ELP 3 "connection not open">

      2. If TCB is found, check foreign socket specification. If not set
      (i.e. U = 1 in TCB), return <type 10><ELT 5 "foreign socket
      unspecified">. If the connection is in the "closing" state (i.e.
      state 5 or 6), return <type 3><ELP 12 "connection closing"> and do
      not process the buffer.

      3. Put the buffer on the Send buffer queue and signal the
      packetizer that there is work to do.

   INTERRUPT [see figure 3.3]

      1. Validate existence of the referenced connection, sending out
      error messages of the form <type 3><ELP 3 "connection not open">
      or <type 3><ELT 5 "foreign socket unspecified"> as appropriate. If
      the local connection refers to a connection not accessible to the
      process interrupting, send <type 3><ELP 1 "connection illegal for
      this process">.

      2. If the connection is in the "closing" state (i.e. states 5 or
      6), return <type 3><ELT 12 "connection closing"> and do not send
      an INT packet to the destination.

      3. Any pending SEND buffers should be returned with <type 10><ELP
      10 "buffer flushed due to interrupt">. An INT packet should be
      created and placed on the output packet queue, and the output
      packet handler should be signaled.

   RECEIVE [See figure 3.4]

      1. If the caller does not have access to the referenced local
      connection name, return <type 20><ELP 1 "connection illegal for
      this process">. And if the connection is not open, return <type

noToC RFC0675 - Page 38

      20><ELP 3 "connection not open"). If the connection is in the
      closing state (e.g. a FIN has been received or a user CLOSE is
      being processed), return <type 20><ELP 12 "connection closing">.

      2. Otherwise, put the buffer on the receive buffer queue and
      signal the reassembler that buffer space is available.

   CLOSE [See figure 3.5]

      1. If the connection is not accessible to the caller, return <type
      2><ELP 1 "connection illegal for this process">. If there is no
      such connection respond with <type 2><ELP 3 "connection not
      open">.

      2. If the R bit is 0 (i.e. connection is in state 1 or 2), simply
      remove the TCB.

      3. If the R bit is set and the F bit is set, then remove the TCB.

      4. Otherwise, if the R bit is set, but F is 0 (i.e. states 3 or
      4), return all buffers to the User with <type x><ELP 12
      "connection closing">, clear all output and input packet queues
      for this connection, create a FIN packet, and signal the output
      packet handler. Set the C and F bits to show this action.

   STATUS [See figure 3.6]

      1. If the connection is illegal for the caller to access, send
      <type 30><ELP 1 "connection illegal for this process">.

      2. If the connection does not exist, return <type 30><ELP 3
      "connection not open">.

      3. Otherwise set status information from the TCB and return it via
      <type 30><O-T 0 "status data...">.

4.4.7  NETWORK CONTROL PROCESSING

   The Input Packet Handler examines the header to see if there is any
   control information or error codes present. We do not discuss the
   action taken for various special function codes, as it is often
   implementation dependent, but we describe those that affect the state
   of the connection. After initial screening by the IPC [see section
   4.4.2 and figure 2.2], control and error packets are processed as
   shown in figures 4.l-4.7. [ACK and data processing is done within the
   IPC.]

noToC RFC0675 - Page 39

4.4.8  TCP ERROR HANDLING

   Error messages have CD=001 and do not carry user data. Depending on
   the error, zero or more octets of error information will be carried
   in the packet text field. We explicitly assume that this data is
   restricted in length so as to fall below the GATEWAY fragmentation
   threshold (probably 512 bits of data and header). Errors generally
   refer to specific connections, so the source and destination socket
   identifiers are relevant here. The ACK field of an error packet
   contains the sequence number of the packet that caused the error, and
   the ACK bit is off. [RESET and STATUS special functions may use the
   ACK field in the same way.] This allows the receiver of an error
   message to determine which packet caused the error. Error packets are
   not ACK'ed or retransmitted.


4.5.  BUFFER AND WINDOW ALLOCATION

4.5.1  INTRODUCTION

   The TCP manages buffer and window allocation on connections for two
   main purposes: equitably sharing limited TCP buffer space among all
   connections (multiplexing function), and limiting attempts to send
   packets, so that the receiver is not swamped (flow control function).
   For further details on the operation and advantages of the window
   mechanism see CEKA74.

   Good allocation schemes are one of the hardest problems of TCP
   design, and much experimentation must be done to develop efficient
   and effective algorithms. Hence the following suggestions are merely
   initial thoughts. Different implementations are encouraged with the
   hope that results can be compared and better schemes developed.

   Several of the measurements discussed in a later section are aimed at
   providing information on the performance of allocation mechanisms.
   This should aid in determining significant parameters and evaluating
   alternate schemes.

4.5.2 The SEND Side

   The window is determined by the receiver. Currently the sender has no
   control over the SEND window size, and never transmits beyond the
   right window edge. There exists the possibility of specifying two
   more special function codes so that the sender can request the
   receiver to INCREASE or DECREASE the window size, without specifying
   by how much. The receiver, of course, needn't satisfy this request.

noToC RFC0675 - Page 40

   Buffers must be allocated for outgoing packets from a TCP buffer
   pool. The TCP may not be willing to allocate a full window's worth of
   buffers, so buffer space for a connection may be less than what the
   window would permit. No deadlocks are possible even if there is
   insufficient buffer or window space for one letter, since the
   receiver will ACK parts of letters as they are put into the user's
   buffer, thus advancing the window and freeing buffers for the
   remainder of the letter.

   It is not mandatory that the TCP buffer outgoing packets until
   acknowledgments for them are received, since it is possible to
   reconstruct them from the actual letters sent by the user.

   However, for purposes of retransmission and processing efficiency it
   is very convenient to do.

4.5.3  The RECEIVE Side

   At the receiving side there are two requirements for buffering:

   (l) Rate Discrepancy:

      If the sender produces data much faster or much slower than the
      receiver consumes it, little buffering is needed to maintain the
      receiver at near maximum rate of operation. Simple queuing
      analysis indicates that when the production and consumption
      (arrival and service) rates are similar in magnitude, more
      buffering is needed to reduce the effect of stochastic or bursty
      arrivals and to keep the receiver busy.

   (2) Disorderly Arrivals:

      When packets arrive out of order, they must be buffered until the
      missing packets arrive so that packets (or letters) are delivered
      in sequence. We do not advocate the philosophy that they be
      discarded, unless they have to be, otherwise a poor effective
      bandwidth may be observed. Path length, packet size, traffic
      level, routing, timeouts, window size, and other factors affect
      the amount by which packets come out of order. This is expected to
      be a major area of investigation.

   The considerations for choosing an appropriate window are as follows:

   Suppose that the receiver knows the sender's retransmission timeout,
   also, that the receiver's acceptance rate is 'U' bits/sec, and the
   window size is 'W' bits. Ignoring line errors and other traffic, the
   sender transmits at a rate between W/K and the maximum line rate (the
   sender can send a window's worth of data each timeout period).

noToC RFC0675 - Page 41

   If W/K is greater than U, the difference must be retransmissions
   which is undesirable, so the window should be reduced to W', such
   that W'/K is approximately equal to U. This may mean that the entire
   bandwidth of the transmission channel is not being used, but it is
   the fastest rate at which the receiver is accepting data, and the
   line capacity is free for other users. This is exactly the same case
   where the rates of the sender and receiver were almost equal, and so
   more buffering is needed. Thus we see that line utilization and
   retransmissions can be traded off against buffering.

   If the receiver does not accept data fast enough (by not performing
   sufficient RECEIVES) the sender may continue retransmitting since
   unaccepted data will not be ACK'ed. In this case the receiver should
   reduce the window size to "throttle" the sender and inhibit useless
   retransmissions.

   Receiver window control:

      If the user at the receiving side is not accepting data, the
      window should be reduced to zero. In particular, if all TCP
      incoming packet buffers for a connection are filled with received
      packets, the window must go to zero to prevent retransmissions
      until the user accepts some packets.

      Short term flow control:

      Let F = the number of user receive buffers filled

         B = the total user receive buffers

         W = the long-term or nominal window size

         W' = the window size returned to the sender

      then a possible value for W' is

         W' = W*[1-F/B]**a

      The value of 'a' should be greater than one, in order to shut the
      window faster as buffers run out. The values of W' and F actually
      used could be averages of recent values, in order to get smooth
      control. Note that W' is constantly being recomputed, while the
      value of W, which sets the upper limit of W', only changes slowly
      in response to other factors.

      The value of W can be large (up to half the sequence number space)
      to allow for good throughput on high delay channels. The sender
      needn't allocate W worth of buffer space anyway. The long-term

noToC RFC0675 - Page 42

      variation of W to match flow requirements may be a separate
      question

   This short-term mechanism for flow control allows some buffering in
   the two TCP's at either end, (as much as they are willing), and the
   rest in the user process at the send side where the data is being
   created. Hence the cost of buffering to smooth out bursty traffic is
   borne partly by the TCP's, and partly by the user at the send side.
   None of it is borne by the communication subnet.

5.  NETWORK MEASUREMENT PLANS FOR TCP

5.1  USERLEVEL DIAGNOSTICS

   We have in mind a program which will exercise a given TCP, causing it
   to cycle through a number of states; opening, closing, and
   transmitting on a variety of connections. This program will collect
   statistics and will generally try to detect deviation from TCP
   functional specifications. Clearly there will have to be a copy of
   this program both at the local site being tested and some site which
   has a certified TCP. So we will have to produce a specification for
   this user level diagnostic program also.

   There needs to be a master and a slave side to all this so the master
   can tell the slave what's going wrong with the test.

5.2  SINGLE CONNECTION MEASUREMENTS

   Round trip delay times

      Time from moment the packet is sent by the TCP to the time that
      the ACK is received by the TCP.

      Time from the moment the USER issues the SEND to the time that the
      USER gets the successful return code.

         Note: packet size should be used to distinguish from one set of
         round trip times and another.

         Network destination, and current configuration and traffic load
         may also be issues of importance that must be taken into
         account.

         What if the destination TCP decides to queue up ACKs and send a
         single ACK after a while? How does this affect round trip
         statistics?

noToC RFC0675 - Page 43

         What about out of order arrivals and the bunched ACK for all of
         them?

         The histogram of round trip times include retransmission times
         and these must be taken into account in the analysis and
         evaluation of the collected data.

         Packet size statistics

      Histogram of packet length in both directions on the full duplex
      connection.

      Histogram of letter size in both directions.

   Measure of disorderly arrival

      Distance from the first octet of arriving packet to the left
      window edge. A histogram of this measure gives an idea of the out
      of order nature of packet arrivals. It will be 0 for packets
      arriving in order.

   Retransmission Histogram

   Effective throughput

      This is the effective rate at which the left edge of the window
      advances. The time interval over which the measure is made is a
      parameter of the measurement experiment. The shorter the interval,
      the more bursty we would expect the measure to be.

      It is possible to measure effective data throughput in both
      directions from one TCP by observing the rate at which the left
      window edge is moving on ACK sent and received for the two
      windows.

      Since throughput is largely dependent upon buffer allocation and
      window size, we must record these values also. Varying window for
      a fixed file transmission might be a good way to discover the
      sensitivity of throughput to window size.

   Output measurement

      The throughput measurement is for data only, but includes
      retransmission. The output rate should include all octets
      transmitted and will give a measure of retransmission overhead.
      Output rate also includes packet format overhead octets as well as
      data.

(next page on part 3)