Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 1045

VMTP: Versatile Message Transaction Protocol: Protocol specification

Pages: 128
Experimental
Part 2 of 4 – Pages 29 to 53
First   Prev   Next

Top   ToC   RFC1045 - Page 29   prevText
2.9. Forwarded Message Transactions

A Server may invoke another Server to handle a Request.  It is fairly
common for the invocation of the second Server to be the last action
performed by the first Server as part of handling the Request.  For
example, the original Server may function primarily to select a process
to handle the Request.  Also, the Server may simply check the
authorization on the Request.  Describing this situation in the context
of RPC, a nested remote procedure call may be the last action in the
remote procedure and the return parameters are exactly those of the
nested call.  (This situation is analogous to tail recursion.)

As an optimization to support this case, VMTP provides a Forward
operation that allows the server to send the nested Request to the other
server and have this other server respond directly to the Client.

If the message transaction being forwarded was not multicast, not secure
or the two Servers are the same principal and the ForwardCount of the
Request is less than the maximum forward count of 15, the Forward
operation is implemented by the Server sending a Request onto the next
Server with the forwarded Request identified by the same Client and
Transaction as the original Request and a ForwardCount one greater than
the Request received from the Client.  In this case, the new Server
responds directly to the Client.  A forwarded Request is illustrated in
the following figure.

 +---------+   Request       +----------+
 | Client  +---------------->| Server 1 |
 +---------+                 +----------+
      ^                        |
      |                        | forwarded Request
      |                        V
      |   Response           +----------+
      +----------------------| Server 2 |
                             +----------+

If the message transaction does not meet the above requirements, the
Server's VMTP module issues a nested call and simply maps the returned
Response to a Response to original Request without further Server-level
processing.  In this case, the only optimization over a user-level
nested call is one fewer VMTP service operation; the VMTP module handles
the return to the invoking call directly.  The Server may also use this
form of forwarding when the Request is part of a stream of message
transactions.  Otherwise, it must wait until the forwarded message
transaction completes before proceeding with the subsequent message
transactions in the stream.
Top   ToC   RFC1045 - Page 30
Implementation of the user-level Forward operation is optional,
depending on whether the server modules require this facility.  Handling
an incoming forwarded Request is a minor modification of handling a
normal incoming Request.  In particular, it is only necessary to examine
the ForwardCount field when the Transaction of the Request matches that
of the last message transaction received from the Client.  Thus, the
additional complexity in the VMTP module for the required forwarding
support is minimal; the complexity is concentrated in providing a highly
optimized user-level Forward primitive, and that is optional.


2.10. VMTP Management

VMTP management includes operations for creating, deleting, modifying
and querying VMTP entities and entity groups.  VMTP management is
logically implemented by a VMTP management server module that is invoked
using a message transaction addressed to the Server, VMTP_MANAGER_GROUP,
a well-known group entity identifier, in conjunction with Coresident
Entity mechanism introduced in Section 2.7.  A particular Request may
address the local module, the module managing a particular entity, the
set of modules managing those entities contained in a specific group or
all management modules, as appropriate.

The VMTP management procedures are specified in Appendix III.


2.11. Streamed Message Transactions

Streamed message transactions refer to two or more message transactions
initiated by a Client before it receives the response to the first
message transaction, with each transaction being processed and responded
to in order but asynchronous relative to the initiation of the
transactions.  A Client streams messages transactions, and thereby has
multiple message transactions outstanding, by sending them as part of a
single run of message transactions.  A run  of message transactions is a
sequence of message transactions with the same Client and Server and
consecutive Transaction identifiers, with all but the first and last
Requests and Responses flagged with the NSR (Not Start Run)  and NER
(Not End Run)  control bits.  (Conversely, the first Request and
Response does not have the NSR set and the last Request and Response
does not have the NER bit set.)  The message transactions in a run use
Top   ToC   RFC1045 - Page 31
consecutive transaction identifiers (except if the STI bit <4> is used
in one, in which case the transaction identifier for the next message
transaction is 256 greater, rather than 1).

The Client retains a record for each outstanding transaction until it
gets a Response or is timed out in error.  The record provides the
information required to retransmit the Request.  On retransmission
timeout, the client retransmits the last Request for which it has not
received a Response the same as is done with non-streamed communication.
(I.e. there need be only one timeout for all the outstanding message
transactions associated with a single client.)

The consecutive transaction identifiers within a run of message
transactions are used as sequence numbers for error control.  The Server
handles each message transaction in the sequence specified by its
transaction identifier.  When it receives a message transaction that is
not marked as the beginning of a run, it checks that it previously
received a message transaction with the predecessor transaction
identifier, either 1 less than the current one or 256 less if the
previous one had the STI bit set.  If not, the Server sends a
NotifyVmtpClient operation to the Client's manager indicating either:
(1) the first message transaction was not fully received, or else (2) it
has no record of the last one received.  If the NRT control flag is set,
it does not await nor expect retransmission but proceeds with handling
this Request.  This flag is used primarily when datagram Requests are
used as part of a stream of message transactions.  If NRT was not
specified, the Client must retransmit from the first message transaction
not fully received (either at all or in part) before the Server can
proceed with handling this run of Requests or else restart the run of
message transactions.

The Client expects to receive the Responses in a consecutive sequence,
using the Transaction identifier to detect missing Responses.  Thus, the
Server must return Responses in sequence except possibly for some gaps,
as follows.  The Server can specify in the PGcount field in a Response,
the number of consecutively previous Responses that this Response




_______________

<4>   The STI bit is used by the Client to effectively allocate 255
transaction identifiers for use by the Server in returning a large
Response or stream of Responses.
Top   ToC   RFC1045 - Page 32
corresponds to, up to a maximum of 255 previous Responses <5>.  Thus,
for example, a Response with Transaction identifier 46 and PGcount 3
represents Responses 43, 44, 45 and 46.  This facility allows the Server
to eliminate sending Responses to Requests that require no Response,
effectively batching the Responses into one.  It also allows the Server
to effectively maintain strictly consecutive sequencing when the Client
has skipped 256 Transaction identifiers using the STI bit and the Server
does not have that many Responses to return.

If the Client receives a Response that is not consecutive, it
retransmits the Request(s) for which the Response(s) is/are missing
(unless, of course, the corresponding Requests were sent as datagrams).
The Client should wait at the end of a run of message transactions for
the last one to complete.

When a Server receives a Request with the NSR bit clear and a higher
transaction identifier than it currently has for the Client, it
terminates all processing and discards Responses associated with the
previous Requests.  Thus, a stream of message transactions is
effectively aborted by starting a new run, even if the Server was in the
middle of handling the previous run.

Using a mixture of datagram and normal Requests as part of a stream of
message transactions, particularly with the use of the NRT bit, can lead
to complex behavior under packet loss.  It is recommended that a run of
message transactions be all of one type to avoid problems, i.e. all
normal or all datagrams.  Finally, when a Server forwards a Request that
is part of a run, it must suspend further processing of the subsequent
Requests until the forwarded Request has been handled, to preserve order
of processing.  The simplest handling of this situation is to use a real
nested call when forwarding with streamed message transactions.

Flow control of streamed message transactions relies on rate control at
the Client plus receipt (or non-receipt) of management notify operations
indicating the presence of overrunning.  A Client must reduce the number
of outstanding message transactions at the Server when it receives a
NotifyVmtpServer operation with the MSGTRANS_OVERFLOW ResponseCode.  The
transact parameter indicates the last packet group that was accepted.


_______________

<5>  PGcount actually corresponds to packet groups which are described
in Section 2.13.  This (simplified) description is accurate when there
is one Request or Response per packet group.
Top   ToC   RFC1045 - Page 33
The implementation of multiple outstanding message transactions requires
the ability to record, timeout and buffer multiple outstanding message
transactions at the Client end as well as the Server end.  However, this
facility is optional for both the Client and the Server.  Client systems
with heavy-weight processes and high network access cost are most likely
to benefit from this facility.  Servers that serve a wide variety of
client machines should implement streaming to accommodate these types of
clients.


2.12. Fault-Tolerant Applications

One approach to fault-tolerant systems is to maintain a log of all
messages sent at each node and replay the messages at a node when the
node fails, after restarting it from the last checkpoint <6>.  As an
experimental facility, VMTP provides a Receive Sequence Number field in
the NotifyVmtpClient and NotifyVmtpServer operations as well as the Next
Receive Sequence (NRS) flag in the Response packet to allow a sender to
log a receive sequence number with each message sent, allowing the
packets to be replayed at a recovering node in the same sequence as they
were originally received, thereby recovering to the same state as
before.

Basically, each sending node maintains a receive sequence number for
each receiving node.  On sending a Request to a node, it presume that
the receive sequence number is one greater than the one it has recorded
for that node.  If not, the receiving node sends a notify operation
indicating the receive sequence number assigned the Request.  The NRS in
the Response confirms that the Request message was the next receive
sequence number, so the sender can detect if it failed to receive the
notify operation in the previous case.  With Responses, the packets are
ordered by the Transaction identifier except for multicast message
transactions, in which there may be multiple Responses with the same
identification.  In this case, NotifyVmtpServer operations are used to
provide receive sequence numbers.

This experimental extension of the protocol is focused on support for
fault-tolerant real-time distributed systems required in various
critical applications.  It may be removed or extended, depending on
further investigations.

_______________

<6>  The sender-based logging is being investigated by Willy Zwaenepoel
of Rice University.
Top   ToC   RFC1045 - Page 34
2.13. Packet Groups

A message (whether Request or Response) is sent as one or more packet
groups.  A packet group is one or more packets, each containing the same
transaction identification and message control block.  Each packet is
formatted as below with the message control block logically embedded in
the VMTP header.

 +------------------------------------++---------------------+
 |            VMTP Header             ||                     |
 +------------+-----------------------||   segment data      |
 |VMTP Control| Message Control Block ||                     |
 +------------+-----------------------++---------------------+

The some fields of the VMTP control portion of the packet and data
segment portion can differ between packets within the same packet group.

The segment data portion of a packet group represents up to 16
kilooctets of the segment specified in the message control block.  The
portion contained in each packet is indicated by the PacketDelivery
field contained in the VMTP header.  The PacketDelivery field as a bit
mask has a similar interpretation to the MsgDelivery field in that each
bit corresponds to a segment data block of 512 octets.  The
PacketDelivery field limits a packet group to 16 kilooctets and a
maximum of 32 VMTP packets (with a minimum of 1 packet).  Data can be
sent in fewer packets by sending multiple data blocks per packet.  We
require that the underlying datagram service support delivery of (at
minimum) the basic 580 octet VMTP packet <7>.  To illustrate the use of
the PacketDelivery field, consider for example the Ethernet which has a
MTU of 1536 octets.  so one would send 2 512-octet segment data blocks
per packet.  (In fact, if a third block is last in the segment and less
than 512 octets and fits in the packet without making it too big, an
Ethernet packet could contain three data blocks.  Thus, an Ethernet
packet group for a segment of size 0x1D00 octets (14.5 blocks) and
MsgDelivery 0x000074FF consists of 6 packets indicated as follows <8>.

_______________

<7>  Note that with a 20 octet IP header, a VMTP packet is 600
octets.  We propose the convention that any host implementing VMTP
implicitly agrees to accept IP/VMTP packets of at least 600 octets.

<8>  We use the C notation 0xHHHH to represent a hexadecimal number.
Top   ToC   RFC1045 - Page 35
 Packet
 Delivery  1 1  1 1  1 1  1 1  0 0  1 0  1 0  1 0  0 0 0 0 0 . . .
           0000 0400 0800 0C00 1000 1400 1800 1C00
          +----+----+----+----+----+----+----+-+
 Segment  |....|....|....|....|....|....|....|.|
          +----+----+----+----+----+----+----+-+
          :    :    :    :    :    :  : /  /   :
          v    v    v    v    v    v  v   /|   v
          +----+----+----+----+    +----+  +---+
 Packets  |  1 |  2 |  3 |  4 |    |  5 |  | 6 |
          +----+----+----+----+    +----+  +---+

Each '.' is 256 octets of data.  The PacketDelivery masks for the 6
packets are: 0x00000003, 0x0000000C, 0x00000030, 0x000000C0, 0x00001400
and 0x00006000, indicating the segment blocks contained in each of the
packets.  (Note that the delivery bits are in little endian order.)

A packet group is sent as a single "blast" of packets with no explicit
flow control.  However, the sender should estimate and transmit at a
rate of packet transmission to avoid congesting the network or
overwhelming the receiver, as described in Section 2.5.6.  Packets in a
packet group can be sent in any order with no change in semantics.

When the first packet of a packet group is received (assuming the Server
does not decide to discard the packet group), the Server saves a copy of
the VMTP packet header, indicates it is currently receiving a packet
group, initializes a "current delivery mask" (indicating the data in the
segment received so far) to 0, accepts this packet (updating the current
delivery mask) and sets the timer for the packet group.  Subsequent
packets in the packet group update the current delivery mask.

Reception of a packet group is terminated when either the current
delivery mask indicates that all the packets in the packet group have
been received or the packet group reception timer expires (set to TC3 or
TS1).  If the packet group reception timer expires, if the NRT bit is
set in the Control flags then the packet group is discarded if not
complete unless MDM is set.  In this case, the MsgDelivery field in the
message control block is set to indicate the segment data blocks
actually received and the message control block and segment data
received is delivered to application level.

If NRT is not set and not all data blocks have been received, a
NotifyVmtpClient (if a Request) or NotifyVmtpServer (if a Response) is
sent back with a PacketDelivery field indicating the blocks received.
The source of the packet group is then expected to retransmit the
missing blocks.  If not all blocks of a Request are received after
RequestAckRetries(Client) retransmissions, the Request is discarded and
Top   ToC   RFC1045 - Page 36
a NotifyVmtpClient operation with an error response code is sent to the
client's manager unless MDM is set.  With a Response, there are
ResponseAckRetries(Server) retransmissions and then, if MDM is not set,
the requesting entity is returned the message control block with an
indication of the amount of segment data received extending contiguously
from the start of the segment.  E.g. if the sender sent 6 512-octet
blocks and only the first two and the last two arrived, the receiver
would be told that 1024 octets were received.  The ResponseCode field is
set to BAD_REPLY_SEGMENT.  (Note that VMTP is only able to indicate the
specific segment blocks received if MDM is set.)

The parameters RequestAckRetries(Client) and ResponseAckRetries(Server)
could be set on a per-client and per-server basis in a sophisticated
implementation based on knowledge of packet loss.

If the APG flag is set, a NotifyVmtpClient or NotifyVmtpServer
operation is sent back at the end of the packet group reception,
depending on whether it is a Request or a Response.

At minimum, a Server should check that each packet in the packet group
contains the same Client, Server, Transaction identifier and SegmentSize
fields.  It is a protocol error for any field other than the Checksum,
packet group control flags, Length and PacketDelivery in the VMTP header
to differ between any two packets in one packet group.  A packet group
containing a protocol error of this nature should be discarded.

Notify operations should be sent (or invoked) in the manager whenever
there is a problem with a unicast packet.  i.e. negative acknowledgments
are always sent in this case.  In the case of problems with multicast
packets, the default is to send nothing in response to an error
condition unless there is some clear reason why no other node can
respond positively.  For example, the packet might be a Probe for an
entity that is known to have been recently existing on the receiving
host but now invalid and could not have migrated.  In this case, the
receiving host responds to the Probe indicating the entity is
nonexistent, knowing that no other host can respond to the Probe.  For
packets and packet groups that are received and processed without
problems, a Notify operation is invoked only if the APG bit is set.


2.14. Runs of Packet Groups

A run of packet groups is a sequence of packet groups, all Request
packets or all Response packets, with the same Client and consecutive
transaction identifiers, all but the first and last packets flagged with
the NSR (Not Start Run) and NER (Not End Run) control bits.  When each
packet group in the run corresponds to a single Request or Response, it
Top   ToC   RFC1045 - Page 37
is identical to a run of message transactions. (See Section 2.11)
However, a Request message or a Response message may consists of up to
256 packet groups within a run, for a maximum of 4 megaoctets of segment
data.  A message that is continued in the next packet group in the run
is flagged in the current packet group by the CMG flag.  Otherwise, the
next packet group in the run (if any) is treated as a separate Request
or Response.

Normally, each Request and Response message is sent as a single packet
group and each run consists of a single packet group.  In this case
neither NSR or NER are set.  For multi-packet group messages, the
PacketDelivery mask in the i-th packet group of a message corresponds to
the portion of the segment offset by i-1 times 16 kilooctets,
designating the the first packet group to have i = 1.


2.15. Byte Order

For purposes of transmission and reception, the MCB is treated as
consisting of 8 32-bit fields and the segment is a sequence of bytes.
VMTP transmits the MCB in big-endian order, performing byte-swapping, if
necessary, before transmission.  A little-endian host must byte-swap the
MCB on reception.  (The data segment is transmitted as a sequence of
bytes with no reordering.)  The byte order of the sender of a message is
indicated by the LEE  bit in the entity identifier for the sender, the
Client field if a Request and the Server field if a Response.  The
sender and receiver of a message are required to agree in some higher
level protocol (such as an RPC presentation protocol) on who does
further swapping of the MCB and data segment if required by the types of
the data actually being transmitted.  For example, the segment data may
contain a record with 8-bit, 16-bit and 32-bit fields, so additional
transformation is required to move the segment from a host of one byte
order to another.

VMTP to date has used a higher-level presentation protocol in which
segment data is sent in the native order of the sending host and
byte-swapped as necessary by the receiving host.  This approach
minimizes the byte-swapping overhead between machines of common byte
order (including when the communication is transparently local to one
host), avoids a strong bias in the protocol to one byte-order, and
allows for the sending entity to be sending to a group of hosts with
different byte orders.  (Note that the byte-swap overhead for the MCB is
minimal.)  The presentation-level overhead is minimal because most
common operations, such as file access operations, have parameters that
fit the MCB and data segment data types exactly.
Top   ToC   RFC1045 - Page 38
2.16. Minimal VMTP Implementation

A minimal VMTP client needs to be able to send a Request packet group
and receive a Response packet group as well as accept and respond to
Requests sent to its management module, including Probe and NotifyClient
operations.  It may also require the ability to invoke Probe and Notify
operations to locate a Server and acknowledge responses.  (the latter
only if it is involved in transactions that are not idempotent or
datagram message transactions.  However, a simple sensor, for example,
can transmit VMTP datagram Requests indicating its current state with
even less mechanism.)  The minimal client thus requires very little code
and is suitable as a basis for (e.g.) a network boot loader.

A minimal VMTP server implements idempotent, non-encrypted message
transactions, possibly with no segment data support.  It should use an
entity state record for each Request but need only retain it while
processing the Request.  Without segment data larger than a packet,
there is no need for any timers, buffering (outside of immediate request
processing) or queuing.  In particular, it needs only as many records as
message transactions it handles simultaneously (e.g. 1).  The entity
state record is required to recognize and respond to Request
retransmissions during request processing.

The minimal server need only receive Requests and and be able to send
Response packets.  It need have only a minimal management module
supporting Probe operations.  (Support for the NotifyVmtpClient
operation is only required if it does not respond immediately to a
Request.)  Thus the VMTP support for say a time server, sensor, or
actuator can be extremely simple.  Note that the server need never issue
a Probe operation if it uses the host address of the Request for the
Response and does not require the Client information returned by the
Probe operation.  The minimal server should also support reception of
forwarded Requests.


2.17. Message vs. Procedural Request Handling

A request-response protocol can be used to implement two forms of
semantics on reception.  With procedural handling of a Request, a
Request is handled by a process associated with the Server that
effectively takes on the identity of the calling process, treating the
Request message as invoking a procedure, and relinquishing its
association to the calling process on return.  VMTP supports multiple
nested calls spanning multiple machines.  In this case, the distributed
call stack that results is associated with a single process from the
standpoint of authentication and resource management, using the
ProcessId field supported by VMTP.  The entity identifiers effectively
Top   ToC   RFC1045 - Page 39
link these call frames together.  That is, the Client field in a Request
is effectively the return link to the previous call frame.

With message handling of a Request, a Request message is queued for a
server process.  The server process dequeues, reads, processes and
responds to the Request message, executing as a separate process.
Subsequent Requests to the same server are queued until the server asks
to receive the next Request.

Procedural semantics have the advantage of allowing each Request (up to
the resource limits of the Server) to execute concurrently at the
Server, with Request-specific synchronization.  Message semantics have
the advantage that Requests are serialized at the Server and that the
request processing logically executes with the priority, protection and
independent execution of a separate process.  Note that procedural and
message handling of a request appear no differently to the client
invoking the message transaction, except possibly for differences in
performance.

We view the two Request handling approaches as appropriate under
different circumstances.  VMTP supports both models.


2.18. Bibliography

The basic protocol is similar to that used in the original form of the V
kernel [3, 4] as well as the transport protocol of Birrell and
Nelson's [2] remote procedure call mechanism.  An earlier version of the
protocol was described in SIGCOMM'86 [6].  The rate-based flow control
is similar to the techniques of Netblt [9].  The support for idempotency
draws, in part, on the favorable experience with idempotency in the V
distributed system.  Its use was originally inspired by the Woodstock
File Server [11].  The multicast support draws on the multicast
facilities in V [5] and is designed to work with, and is now implemented
using, the multicast extensions to the Internet [8] described in RFC 966
and 988.  The secure version of the protocol is similar to that
described by Birrell [1] for secure RPC.  The use of runs of packet
groups is similar to Fletcher and Watson's delta-T protocol [10].  The
use of "management" operations implemented using VMTP in place of
specialized packet types is viewed as part of a general strategy of
using recursion to simplify protocol architectures [7].

Finally, this protocol was designed, in part, to respond to the
requirements identified by Braden in RFC 955.  We believe that VMTP
satisfies the requirements stated in RFC 955.
Top   ToC   RFC1045 - Page 40
[1]   A.D. Birrell, "Secure Communication using Remote Procedure
      Calls", ACM. Trans. on Computer Systems 3(1), February, 1985.


[2]   A. Birrell and B. Nelson, "Implementing Remote Procedure Calls",
      ACM Trans. on Computer Systems 2(1), February, 1984.


[3]   D.R. Cheriton and W. Zwaenepoel, "The Distributed V Kernel and its
      Performance for Diskless Workstations", In Proceedings of the 9th
      Symposium on Operating System Principles,  ACM, 1983.


[4]   D.R. Cheriton, "The V Kernel: A Software Base for Distributed
      Systems", IEEE Software 1(2), April, 1984.


[5]   D.R. Cheriton and W. Zwaenepoel, "Distributed Process Groups in
      the V Kernel", ACM Trans. on Computer Systems 3(2), May, 1985.


[6]   D.R. Cheriton, "VMTP: A Transport Protocol for the Next
      Generation of Communication Systems", In Proceedings of
      SIGCOMM'86, ACM, Aug 5-7, 1986.


[7]   D.R. Cheriton, "Exploiting Recursion to Simplify an RPC 
      Communication Architecture", in preparation, 1988.


[8]   D.R. Cheriton and S.E. Deering, "Host Groups: A Multicast 
      Extension for Datagram Internetworks", In 9th Data Communication 
      Symposium, IEEE Computer Society and ACM SIGCOMM, September, 1985.


[9]   D.D. Clark and M. Lambert and L. Zhang, "NETBLT: A Bulk Data 
      Transfer Protocol", Technical Report RFC 969, Defense Advanced 
      Research Projects Agency, 1985.


[10]  J.G. Fletcher and R.W. Watson, "Mechanism for a Reliable Timer-
      based Protocol", Computer Networks 2:271-290, 1978.
Top   ToC   RFC1045 - Page 41
[11]  D. Swinehart and G. McDaniel and D. Boggs, "WFS: A Simple File 
      System for a Distributed Environment", In Proc. 7th Symp. 
      Operating Systems Principles, 1979.
Top   ToC   RFC1045 - Page 42
3. VMTP Packet Formats

VMTP uses 2 basic packet formats corresponding to Request packets and
Response packets.  These packet formats are identical in most of the
fields to simplify the implementation.

We first describe the entity identifier format and the packet fields
that are used in general, followed by a detailed description of each of
the packet formats.  These fields are described below in detail.  The
individual packet formats are described in the following subsections.
The reader and VMTP implementor may wish to refer to Chapters 4 and 5
for a description of VMTP event handling and only refer to this detailed
description as needed.


3.1. Entity Identifier Format

The 64-bit non-group entity identifiers have the following substructure.

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |R| |L|R|
 |A|0|E|E|      Domain-specific structure
 |E| |E|S|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                Domain-specific structure                        |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The field meanings are as follows:

RAE             Remote Alias Entity - the entity identifier identifies
                an entity that is acting as an alias for some entity
                outside this entity domain.  This bit is used by
                higher-level protocols.  For instance, servers may take
                extra security and protection measures with aliases.

GRP             Group - 0, for non-group entity identifiers.

LEE             Little-Endian Entity - the entity transmits data in
                little-endian (VAX) order.

RES              Reserved - must be 0.

The 64-bit entity group identifiers have the following substructure.
Top   ToC   RFC1045 - Page 43
  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |R| |U|R|
 |A|1|G|E|      Domain-specific structure
 |E| |P|S|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                Domain-specific structure                        |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The field meanings are as follows:

RAE             Remote Alias Entity - same as for non-group entity
                identifier.

GRP             Group - 1, for entity group identifiers.

UGP             Unrestricted Group - no restrictions are placed on
                joining this group.  I.e. any entity can join limited
                only by implementation resources.

RES              Reserved - must be 0.

The all-zero entity identifier is reserved and guaranteed to be
unallocated in all domains.  In addition, a domain may reserve part of
the entity identifier space for statically allocated identifiers.
However, this is domain-specific.

Description of currently defined entity identifier domains is provided
in Appendix IV.


3.2. Packet Fields

Client          64-bit identifier for the client entity associated with
                this packet.  The structure, allocation and binding of
                this identifier is specific to the specified Domain.  An
                entity identifier always includes 4 types bits as
                specified in Section 3.1.

Version         The 3-bit identifier specifying the version of the
                protocol.  Current version is version 0.

Domain          The 13-bit identifier specifying the naming and
                administration domain for the client and server named in
                the packet.
Top   ToC   RFC1045 - Page 44
Packet Flags: 3 bits. (The normal case has none of the flags set.)

  HCO           Header checksum only - checksum has only been calculated
                on the header.  This is used in some real-time
                applications where the strict correctness of the data is
                not needed.

  EPG           Encrypted packet group - part of a secure message
                transaction.

  MPG           Multicast packet group - packet was multicast on
                transmission.

Length          A 13-bit field that specifies the number of 32-bit words
                in the segment data portion of the packet (if any),
                excluding the checksum field.  (Every VMTP packet is
                required to be a multiple of 64 bits, possibly by
                padding out the segment data.)  The minimum legal Length
                is 0, the maximum length is 4096 and it must be an even
                number.

Control Flags: 9 bits. (The normal case has none of the flags set.)

  NRS           Next Receive Sequence - the associated Request message
                (in a Response) or previous Response (if a Request) was
                received consecutive with the last Request from this
                entity.  That is, there was no interfering messages
                received.

  APG           Acknowledge Packet Group - Acknowledge packet group on
                receipt.  If a Request, send back a Request to the
                client's manager providing an update on the state of the
                transaction as soon as the request packet group is
                received, independent of the response being available.
                If a Response, send an update to the server's manager as
                soon as possible after response packet group is received
                providing an update on the state of the transaction at
                the client

  NSR           Not Start Run - 1 if this packet is not part of the
                first packet group of a run of packet groups.

  NER           Not End Run - 1 if this packet is not part of the last
                packet group of a run of packet groups.

  NRT           No Retransmission - do not ask for retransmissions of
                this packet group if not all received within timeout
Top   ToC   RFC1045 - Page 45
                period, just deliver or discard.

  MDG           Member of Destination Group - this packet is sent to a
                group and the client is a member of this group.

  CMG           Continued Message - the message (Request or Response) is
                continued in the next packet group.  The next packet
                group has to be part of the same run of packet groups.

  STI           Skip Transaction Identifiers - the next transaction
                identifier that the Client plans to use is the current
                transaction plus 256, if part of the same run and at
                least this big if not.  In a Request, this authorizes
                the Server to send back up to 256 packet groups
                containing the Response.

  DRT           Delay Response Transmission - set by request sender if
                multiple responses are expected (as indicated by the MRD
                flag in the RequestCode) and it may be overrun by
                multiple responses.  The responder(s) should then
                introduce a short random delay in sending the Response
                to minimize the danger of overrunning the Client.  This
                is normally only used for responding to multicast
                Requests where the Client may be receiving a large
                number of Responses, as indicated by the MRD flag in the
                Request flags.  Otherwise, the Response is sent
                immediately.

RetransmitCount:
                3 bits - the ordinal number of transmissions of this
                packet group prior to this one, modulo 8.  This field is
                used in estimation of roundtrip times.  This count may
                wrap around during a message transaction.  However, it
                should be sufficient to match acknowledgments and
                responses with a particular transmission.

ForwardCount:   4 bits indicating the number of times this Request has
                been forwarded.  The original Request is always sent
                with a ForwardCount of 0.

Interpacket Gap: 8 bits.  
                Indicates the recommended time to use between subsequent
                packet transmissions within a multi-packet packet group
                transmission.  The Interpacket Gap time is in 1/32nd of
                a network packet transmission time for a packet of size
                MTU for the node.  (Thus, the maximum gap time is 8
                packet times.)
Top   ToC   RFC1045 - Page 46
PGcount: 8 bits 
                The number of packet groups that this packet group
                represents in addition to that specified by the
                Transaction field.  This is used in acknowledging
                multiple packet groups in streamed communication.

Priority        4-bit identifier for priority for the processing of this
                request both on transmission and reception.  The
                interpretation is:

                1100            urgent/emergency

                1000            important

                0000            normal

                0100            background

                Viewing the higher-order bit as a sign bit (with 1
                meaning negative), low values are high priority and high
                values are low priority.  The low-order 2 bits indicate
                additional (lower) gradations for each level.

Function Code: 1 bit - types of VMTP packets.  If the low-order bit of
                the function code is 0, the packet is sent to the
                Server, else it is sent to the Client.

                0               Request

                1               Response

Transaction: 32 bits:  
                Identifier for this message transaction.

PacketDelivery: 32 bits:  
                Delivery indicates the segment blocks contained in this
                packet.  Each bit corresponds to one 512-octet block of
                segment data.  A 1 bit in the i-th bit position
                (counting the LSB as 0) indicates the presence of the
                i-th segment block.

Server: 64 bits 
                Entity identifier for the server or server group
                associated with this transaction.  This is the receiver
                when a Request packet and the sender when a Response
                packet.
Top   ToC   RFC1045 - Page 47
Code: 32 bits   The Request Code and Response Code, set either at the
                user level or VMTP level depending on use and packet
                type.  Both the Request and Response codes include 8
                high-order bits from the following set of control bits:

  CMD           Conditional Message Delivery -  only deliver the request
                or response if the receiving entity is waiting for it at
                the time of delivery, otherwise drop the message.

  DGM           DataGram Message - indicates that the message is being
                sent as a datagram.  If a Request message, do not wait
                for reply, or retransmit.  If a Response message, treat
                this message transaction as idempotent.

  MDM           Message Delivery Mask - indicates that the MsgDelivery
                field is being used.  Otherwise, the MsgDelivery field
                is available for general use.

  SDA           Segment Data Appended - segment data is appended to the
                message control block, with the total size of the
                segment specified by the SegmentSize field.  Otherwise,
                the segment data is null and the SegmentSize field is
                not used by VMTP and available for user- or RPC-level
                uses.

  CRE           CoResident Entity - indicates that the CoResidentEntity
                field in the message should be interpreted by VMTP.
                Otherwise, this field is available for additional user
                data.

  MRD           Multiple Responses Desired - multiple Responses are
                desired to to this Request if it is multicast.
                Otherwise, the VMTP module can discard subsequent
                Responses after the first Response.

  PIC           Public Interface Code - Values for Code with this bit
                set are reserved for definition by the VMTP
                specification and other standard protocols defined on
                top of VMTP.

  RES           Reserved for future use. Must be 0.

CoResidentEntity
                64-bit Identifier for an entity or group of entities
                with which the Server entity or entities must be
                co-resident, i.e. route only to entities (identified by
                Server) on the same host(s) as that specified by
Top   ToC   RFC1045 - Page 48
                CoResidentEntity, Only meaningful if CRE is set in the
                Code field.

User Data       12 octets Space in the header for the VMTP user to
                specify user-specific control and data.

MsgDelivery: 32 bits 
                The segment blocks being transmitted (in total) in this
                packet group following the conventions for the
                PacketDelivery field.  This field is ignored by the
                protocol and treated as an additional user data field if
                MDM is 0.  On transmission, the user level sets the
                MsgDelivery to indicate those portions of the segment to
                be transmitted.  On receipt, the MsgDelivery field is
                modified by the VMTP module to indicate the segment data
                blocks that were actually received before the message
                control block is passed to the user or RPC level.  In
                particular, the kernel does not discard the packet group
                if segment data blocks are missing.  A Server or Client
                entity receiving a message with a MsgDelivery in use
                must check the field to ensure adequate delivery and
                retry the operation if necessary.

SegmentSize: 32 bits 
                Size of segment in octets, up to a maximum of 16
                kilooctets without streaming and 4 megaoctets with
                streaming, if SDA is set.  Otherwise, this field is
                ignored by the protocol and treated as an additional
                user data field.

Segment Data: 0-16 kilooctets 
                0 octets if SDA is 0, else the portion of the segment
                corresponding to the Delivery Mask, limited by the
                SegmentSize and the MTU, padded out to a multiple of 64
                bits.

Checksum: 32 bits.  
                The 32-bit checksum for the header and segment data.


The VMTP checksum algorithm <9> develops a 32-bit checksum by computing

_______________

<9>  This algorithm and description are largely due to Steve Deering of
Stanford University.
Top   ToC   RFC1045 - Page 49
two 16-bit, ones-complement sums (like IP), each covering different
parts of the packet.  The packet is divided into clusters of 16 16-bit
words.  The first, third, fifth,... clusters are added to the first sum,
and the second, fourth, sixth,... clusters are added to the second sum.
Addition stops at the end of the packet; there is no need to pad out to
a cluster boundary (although it is necessary that the packet be an
integral multiple of 64 bits; padding octets may have any value and are
included in the checksum and in the transmitted packet).  If either of
the resulting sums is zero, it is changed to 0xFFFF.  The two sums are
appended to the transmitted packet, with the first sum being transmitted
first.  Four bytes of zero in place of the checksum may be used to
indicate that no checksum was computed.

The 16-bit, ones-complement addition in this algorithm is the same as
used in IP and, therefore, subject to the same optimizations.  In
particular, the words may be added up 32-bits at a time as long as the
carry-out of each addition is added to the sum on the following
addition, using an "add-with-carry" type of instruction.  (64-bit or
128-bit additions would also work on machines that have registers that
big.)

A particular weakness of this algorithm (shared by IP) is that it does
not detect the erroneous swapping of 16-bit words, which may easily
occur due to software errors.  A future version of VMTP is expected to
include a more secure algorithm, but such an algorithm appears to
require hardware support for efficient execution.

Not all of these fields are used in every packet.  The specific packet
formats are described below.  If a field is not mentioned in the
description of a packet type, its use is assumed to be clear from the
above description.
Top   ToC   RFC1045 - Page 50
3.3. Request Packet

The Request packet (or packet group) is sent from the client to the
server or group of servers to solicit processing plus the return of zero
or more responses.  A Request packet is identified by a 0 in the LSB of
the fourth 32-bit word in the packet.

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +                       Client (8 octets)                       +
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |Ver  |                         |H|E|M|                         |
 |sion |          Domain         |C|P|P|      Length             |
 |     |                         |O|G|G|                         |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |N|A|N|N|N|M|C|S|D|Retra|Forward|    Inter-     |       |R|R|R| |
 |R|P|S|E|R|D|M|T|R|nsmit| Count |    Packet     | Prior |E|E|E|0|
 |S|G|R|R|T|G|G|I|T|Count|       |     Gap       | -ity  |S|S|S| |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                      Transaction                              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                     PacketDelivery                            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +                    Server (8 octets)                          +
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |C|D|M|S|R|C|M|P|                                               |
 |M|G|D|D|E|R|R|I|        RequestCode                            |
 |D|M|M|A|S|E|D|C|                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +                 CoResidentEntity (8 octets)                   +
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 >                   User Data (12 octets)                       <
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                      MsgDelivery                              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                       SegmentSize                             |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 >                  segment data, if any                         <
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                        Checksum                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 3-1:   Request Packet Format

The fields of the Request packet are set according to the semantics
described in Section 3.2 with the following qualifications.
Top   ToC   RFC1045 - Page 51
InterPacketGap  The estimated interpacket gap time the client would like
                for the Response packet group to be sent by the Server
                in responding to this Request.

Transaction     Identifier for transaction, at least one greater than
                the previously issued Request from this Client.

Server          Server to which this Request is destined.

RequestCode     Request code for this request, indicating the operation
                to perform.
Top   ToC   RFC1045 - Page 52
3.4. Response Packet

The Response packet is sent from the Server to the Client in response to
a Request, identified by a 1 in the LSB of the fourth 32-bit word in the
packet.

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +                       Client (8 octets)                       +
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |Ver  |                         |H|E|M|                         |
 |sion |          Domain         |C|P|P|      Length             |
 |     |                         |O|G|G|                         |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |N|A|N|N|N|R|C|S|R|Retra|Forward|               |       |R|R|R| |
 |R|P|S|E|R|E|M|T|E|nsmit| Count |    PGcount    | Prior |E|E|E|1|
 |S|G|R|R|T|S|G|I|S|Count|       |               | -ity  |S|S|S| |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                      Transaction                              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                      PacketDelivery                           |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +                        Server (8 octets)                      +
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |C|D|M|S|R|R|R|R|                                               |
 |M|G|D|D|E|E|E|E|        ResponseCode                           |
 |D|M|M|A|S|S|S|S|                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 >                   UserData (20 octets)                        <
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                     MsgDelivery                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                    Segment Size                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 >                  segment data, if any                         <
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                       Checksum                                |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 3-2:   Response Packet Format

The fields of the Response packet are set according to the semantics
described in Section 3.2 with the following qualifications.

Client, Version, Domain, Transaction
                Match those in the Request packet group to which this is
Top   ToC   RFC1045 - Page 53
                a response.

STI             1 if this Response is using one or more of the
                transaction identifiers skipped by the Client after the
                Request to which this is a Response.  STI in the Request
                essentially allocates up to 256 transaction identifiers
                for the Server to use in a run of Response packet
                groups.

RetransmitCount The retransmit count from the last Request packet
                received to which this is a response.

ForwardCount    The number of times the corresponding Request was
                forwarded before this Response was generated.

PGcount         The number of consecutively previous packet groups that
                this response is acknowledging in addition to the one
                identified by the Transaction identifier.

Server          Server sending this response.  This may differ from that
                originally specified in the Request packet if the
                original Server was a server group, or the request was
                forwarded.

The next two chapters describes the protocol operation using these
packet formats, with the the Client and the Server portions described
separately.


(next page on part 3)

Next Section