3.3 SPECIFIC ISSUES 3.3.1 Routing Outbound Datagrams The IP layer chooses the correct next hop for each datagram it sends. If the destination is on a connected network, the datagram is sent directly to the destination host; otherwise, it has to be routed to a gateway on a connected network. 3.3.1.1 Local/Remote Decision To decide if the destination is on a connected network, the following algorithm MUST be used [see IP:3]: (a) The address mask (particular to a local IP address for a multihomed host) is a 32-bit mask that selects the network number and subnet number fields of the corresponding IP address. (b) If the IP destination address bits extracted by the address mask match the IP source address bits extracted by the same mask, then the destination is on the corresponding connected network, and the datagram is to be transmitted directly to the destination host. (c) If not, then the destination is accessible only through a gateway. Selection of a gateway is described below (3.3.1.2). A special-case destination address is handled as follows: * For a limited broadcast or a multicast address, simply pass the datagram to the link layer for the appropriate interface.
* For a (network or subnet) directed broadcast, the
datagram can use the standard routing algorithms.
The host IP layer MUST operate correctly in a minimal
network environment, and in particular, when there are no
gateways. For example, if the IP layer of a host insists on
finding at least one gateway to initialize, the host will be
unable to operate on a single isolated broadcast net.
3.3.1.2 Gateway Selection
To efficiently route a series of datagrams to the same
destination, the source host MUST keep a "route cache" of
mappings to next-hop gateways. A host uses the following
basic algorithm on this cache to route a datagram; this
algorithm is designed to put the primary routing burden on
the gateways [IP:11].
(a) If the route cache contains no information for a
particular destination, the host chooses a "default"
gateway and sends the datagram to it. It also builds a
corresponding Route Cache entry.
(b) If that gateway is not the best next hop to the
destination, the gateway will forward the datagram to
the best next-hop gateway and return an ICMP Redirect
message to the source host.
(c) When it receives a Redirect, the host updates the
next-hop gateway in the appropriate route cache entry,
so later datagrams to the same destination will go
directly to the best gateway.
Since the subnet mask appropriate to the destination address
is generally not known, a Network Redirect message SHOULD be
treated identically to a Host Redirect message; i.e., the
cache entry for the destination host (only) would be updated
(or created, if an entry for that host did not exist) for
the new gateway.
DISCUSSION:
This recommendation is to protect against gateways that
erroneously send Network Redirects for a subnetted
network, in violation of the gateway requirements
[INTRO:2].
When there is no route cache entry for the destination host
address (and the destination is not on the connected
network), the IP layer MUST pick a gateway from its list of
"default" gateways. The IP layer MUST support multiple
default gateways.
As an extra feature, a host IP layer MAY implement a table
of "static routes". Each such static route MAY include a
flag specifying whether it may be overridden by ICMP
Redirects.
DISCUSSION:
A host generally needs to know at least one default
gateway to get started. This information can be
obtained from a configuration file or else from the
host startup sequence, e.g., the BOOTP protocol (see
[INTRO:1]).
It has been suggested that a host can augment its list
of default gateways by recording any new gateways it
learns about. For example, it can record every gateway
to which it is ever redirected. Such a feature, while
possibly useful in some circumstances, may cause
problems in other cases (e.g., gateways are not all
equal), and it is not recommended.
A static route is typically a particular preset mapping
from destination host or network into a particular
next-hop gateway; it might also depend on the Type-of-
Service (see next section). Static routes would be set
up by system administrators to override the normal
automatic routing mechanism, to handle exceptional
situations. However, any static routing information is
a potential source of failure as configurations change
or equipment fails.
3.3.1.3 Route Cache
Each route cache entry needs to include the following
fields:
(1) Local IP address (for a multihomed host)
(2) Destination IP address
(3) Type(s)-of-Service
(4) Next-hop gateway IP address
Field (2) MAY be the full IP address of the destination
host, or only the destination network number. Field (3),
the TOS, SHOULD be included.
See Section 3.3.4.2 for a discussion of the implications of
multihoming for the lookup procedure in this cache.
DISCUSSION:
Including the Type-of-Service field in the route cache
and considering it in the host route algorithm will
provide the necessary mechanism for the future when
Type-of-Service routing is commonly used in the
Internet. See Section 3.2.1.6.
Each route cache entry defines the endpoints of an
Internet path. Although the connecting path may change
dynamically in an arbitrary way, the transmission
characteristics of the path tend to remain
approximately constant over a time period longer than a
single typical host-host transport connection.
Therefore, a route cache entry is a natural place to
cache data on the properties of the path. Examples of
such properties might be the maximum unfragmented
datagram size (see Section 3.3.3), or the average
round-trip delay measured by a transport protocol.
This data will generally be both gathered and used by a
higher layer protocol, e.g., by TCP, or by an
application using UDP. Experiments are currently in
progress on caching path properties in this manner.
There is no consensus on whether the route cache should
be keyed on destination host addresses alone, or allow
both host and network addresses. Those who favor the
use of only host addresses argue that:
(1) As required in Section 3.3.1.2, Redirect messages
will generally result in entries keyed on
destination host addresses; the simplest and most
general scheme would be to use host addresses
always.
(2) The IP layer may not always know the address mask
for a network address in a complex subnetted
environment.
(3) The use of only host addresses allows the
destination address to be used as a pure 32-bit
number, which may allow the Internet architecture
to be more easily extended in the future without
any change to the hosts.
The opposing view is that allowing a mixture of
destination hosts and networks in the route cache:
(1) Saves memory space.
(2) Leads to a simpler data structure, easily
combining the cache with the tables of default and
static routes (see below).
(3) Provides a more useful place to cache path
properties, as discussed earlier.
IMPLEMENTATION:
The cache needs to be large enough to include entries
for the maximum number of destination hosts that may be
in use at one time.
A route cache entry may also include control
information used to choose an entry for replacement.
This might take the form of a "recently used" bit, a
use count, or a last-used timestamp, for example. It
is recommended that it include the time of last
modification of the entry, for diagnostic purposes.
An implementation may wish to reduce the overhead of
scanning the route cache for every datagram to be
transmitted. This may be accomplished with a hash
table to speed the lookup, or by giving a connection-
oriented transport protocol a "hint" or temporary
handle on the appropriate cache entry, to be passed to
the IP layer with each subsequent datagram.
Although we have described the route cache, the lists
of default gateways, and a table of static routes as
conceptually distinct, in practice they may be combined
into a single "routing table" data structure.
3.3.1.4 Dead Gateway Detection
The IP layer MUST be able to detect the failure of a "next-
hop" gateway that is listed in its route cache and to choose
an alternate gateway (see Section 3.3.1.5).
Dead gateway detection is covered in some detail in RFC-816
[IP:11]. Experience to date has not produced a complete
algorithm which is totally satisfactory, though it has
identified several forbidden paths and promising techniques.
* A particular gateway SHOULD NOT be used indefinitely in
the absence of positive indications that it is
functioning.
* Active probes such as "pinging" (i.e., using an ICMP
Echo Request/Reply exchange) are expensive and scale
poorly. In particular, hosts MUST NOT actively check
the status of a first-hop gateway by simply pinging the
gateway continuously.
* Even when it is the only effective way to verify a
gateway's status, pinging MUST be used only when
traffic is being sent to the gateway and when there is
no other positive indication to suggest that the
gateway is functioning.
* To avoid pinging, the layers above and/or below the
Internet layer SHOULD be able to give "advice" on the
status of route cache entries when either positive
(gateway OK) or negative (gateway dead) information is
available.
DISCUSSION:
If an implementation does not include an adequate
mechanism for detecting a dead gateway and re-routing,
a gateway failure may cause datagrams to apparently
vanish into a "black hole". This failure can be
extremely confusing for users and difficult for network
personnel to debug.
The dead-gateway detection mechanism must not cause
unacceptable load on the host, on connected networks,
or on first-hop gateway(s). The exact constraints on
the timeliness of dead gateway detection and on
acceptable load may vary somewhat depending on the
nature of the host's mission, but a host generally
needs to detect a failed first-hop gateway quickly
enough that transport-layer connections will not break
before an alternate gateway can be selected.
Passing advice from other layers of the protocol stack
complicates the interfaces between the layers, but it
is the preferred approach to dead gateway detection.
Advice can come from almost any part of the IP/TCP
architecture, but it is expected to come primarily from
the transport and link layers. Here are some possible
sources for gateway advice:
o TCP or any connection-oriented transport protocol
should be able to give negative advice, e.g.,
triggered by excessive retransmissions.
o TCP may give positive advice when (new) data is
acknowledged. Even though the route may be
asymmetric, an ACK for new data proves that the
acknowleged data must have been transmitted
successfully.
o An ICMP Redirect message from a particular gateway
should be used as positive advice about that
gateway.
o Link-layer information that reliably detects and
reports host failures (e.g., ARPANET Destination
Dead messages) should be used as negative advice.
o Failure to ARP or to re-validate ARP mappings may
be used as negative advice for the corresponding
IP address.
o Packets arriving from a particular link-layer
address are evidence that the system at this
address is alive. However, turning this
information into advice about gateways requires
mapping the link-layer address into an IP address,
and then checking that IP address against the
gateways pointed to by the route cache. This is
probably prohibitively inefficient.
Note that positive advice that is given for every
datagram received may cause unacceptable overhead in
the implementation.
While advice might be passed using required arguments
in all interfaces to the IP layer, some transport and
application layer protocols cannot deduce the correct
advice. These interfaces must therefore allow a
neutral value for advice, since either always-positive
or always-negative advice leads to incorrect behavior.
There is another technique for dead gateway detection
that has been commonly used but is not recommended.
This technique depends upon the host passively
receiving ("wiretapping") the Interior Gateway Protocol
(IGP) datagrams that the gateways are broadcasting to
each other. This approach has the drawback that a host
needs to recognize all the interior gateway protocols
that gateways may use (see [INTRO:2]). In addition, it
only works on a broadcast network.
At present, pinging (i.e., using ICMP Echo messages) is
the mechanism for gateway probing when absolutely
required. A successful ping guarantees that the
addressed interface and its associated machine are up,
but it does not guarantee that the machine is a gateway
as opposed to a host. The normal inference is that if
a Redirect or other evidence indicates that a machine
was a gateway, successful pings will indicate that the
machine is still up and hence still a gateway.
However, since a host silently discards packets that a
gateway would forward or redirect, this assumption
could sometimes fail. To avoid this problem, a new
ICMP message under development will ask "are you a
gateway?"
IMPLEMENTATION:
The following specific algorithm has been suggested:
o Associate a "reroute timer" with each gateway
pointed to by the route cache. Initialize the
timer to a value Tr, which must be small enough to
allow detection of a dead gateway before transport
connections time out.
o Positive advice would reset the reroute timer to
Tr. Negative advice would reduce or zero the
reroute timer.
o Whenever the IP layer used a particular gateway to
route a datagram, it would check the corresponding
reroute timer. If the timer had expired (reached
zero), the IP layer would send a ping to the
gateway, followed immediately by the datagram.
o The ping (ICMP Echo) would be sent again if
necessary, up to N times. If no ping reply was
received in N tries, the gateway would be assumed
to have failed, and a new first-hop gateway would
be chosen for all cache entries pointing to the
failed gateway.
Note that the size of Tr is inversely related to the
amount of advice available. Tr should be large enough
to insure that:
* Any pinging will be at a low level (e.g., <10%) of
all packets sent to a gateway from the host, AND
* pinging is infrequent (e.g., every 3 minutes)
Since the recommended algorithm is concerned with the
gateways pointed to by route cache entries, rather than
the cache entries themselves, a two level data
structure (perhaps coordinated with ARP or similar
caches) may be desirable for implementing a route
cache.
3.3.1.5 New Gateway Selection
If the failed gateway is not the current default, the IP
layer can immediately switch to a default gateway. If it is
the current default that failed, the IP layer MUST select a
different default gateway (assuming more than one default is
known) for the failed route and for establishing new routes.
DISCUSSION:
When a gateway does fail, the other gateways on the
connected network will learn of the failure through
some inter-gateway routing protocol. However, this
will not happen instantaneously, since gateway routing
protocols typically have a settling time of 30-60
seconds. If the host switches to an alternative
gateway before the gateways have agreed on the failure,
the new target gateway will probably forward the
datagram to the failed gateway and send a Redirect back
to the host pointing to the failed gateway (!). The
result is likely to be a rapid oscillation in the
contents of the host's route cache during the gateway
settling period. It has been proposed that the dead-
gateway logic should include some hysteresis mechanism
to prevent such oscillations. However, experience has
not shown any harm from such oscillations, since
service cannot be restored to the host until the
gateways' routing information does settle down.
IMPLEMENTATION:
One implementation technique for choosing a new default
gateway is to simply round-robin among the default
gateways in the host's list. Another is to rank the
gateways in priority order, and when the current
default gateway is not the highest priority one, to
"ping" the higher-priority gateways slowly to detect
when they return to service. This pinging can be at a
very low rate, e.g., 0.005 per second.
3.3.1.6 Initialization
The following information MUST be configurable:
(1) IP address(es).
(2) Address mask(s).
(3) A list of default gateways, with a preference level.
A manual method of entering this configuration data MUST be
provided. In addition, a variety of methods can be used to
determine this information dynamically; see the section on
"Host Initialization" in [INTRO:1].
DISCUSSION:
Some host implementations use "wiretapping" of gateway
protocols on a broadcast network to learn what gateways
exist. A standard method for default gateway discovery
is under development.
3.3.2 Reassembly
The IP layer MUST implement reassembly of IP datagrams.
We designate the largest datagram size that can be reassembled
by EMTU_R ("Effective MTU to receive"); this is sometimes
called the "reassembly buffer size". EMTU_R MUST be greater
than or equal to 576, SHOULD be either configurable or
indefinite, and SHOULD be greater than or equal to the MTU of
the connected network(s).
DISCUSSION:
A fixed EMTU_R limit should not be built into the code
because some application layer protocols require EMTU_R
values larger than 576.
IMPLEMENTATION:
An implementation may use a contiguous reassembly buffer
for each datagram, or it may use a more complex data
structure that places no definite limit on the reassembled
datagram size; in the latter case, EMTU_R is said to be
"indefinite".
Logically, reassembly is performed by simply copying each
fragment into the packet buffer at the proper offset.
Note that fragments may overlap if successive
retransmissions use different packetizing but the same
reassembly Id.
The tricky part of reassembly is the bookkeeping to
determine when all bytes of the datagram have been
reassembled. We recommend Clark's algorithm [IP:10] that
requires no additional data space for the bookkeeping.
However, note that, contrary to [IP:10], the first
fragment header needs to be saved for inclusion in a
possible ICMP Time Exceeded (Reassembly Timeout) message.
There MUST be a mechanism by which the transport layer can
learn MMS_R, the maximum message size that can be received and
reassembled in an IP datagram (see GET_MAXSIZES calls in
Section 3.4). If EMTU_R is not indefinite, then the value of
MMS_R is given by:
MMS_R = EMTU_R - 20
since 20 is the minimum size of an IP header.
There MUST be a reassembly timeout. The reassembly timeout
value SHOULD be a fixed value, not set from the remaining TTL.
It is recommended that the value lie between 60 seconds and 120
seconds. If this timeout expires, the partially-reassembled
datagram MUST be discarded and an ICMP Time Exceeded message
sent to the source host (if fragment zero has been received).
DISCUSSION:
The IP specification says that the reassembly timeout
should be the remaining TTL from the IP header, but this
does not work well because gateways generally treat TTL as
a simple hop count rather than an elapsed time. If the
reassembly timeout is too small, datagrams will be
discarded unnecessarily, and communication may fail. The
timeout needs to be at least as large as the typical
maximum delay across the Internet. A realistic minimum
reassembly timeout would be 60 seconds.
It has been suggested that a cache might be kept of
round-trip times measured by transport protocols for
various destinations, and that these values might be used
to dynamically determine a reasonable reassembly timeout
value. Further investigation of this approach is
required.
If the reassembly timeout is set too high, buffer
resources in the receiving host will be tied up too long,
and the MSL (Maximum Segment Lifetime) [TCP:1] will be
larger than necessary. The MSL controls the maximum rate
at which fragmented datagrams can be sent using distinct
values of the 16-bit Ident field; a larger MSL lowers the
maximum rate. The TCP specification [TCP:1] arbitrarily
assumes a value of 2 minutes for MSL. This sets an upper
limit on a reasonable reassembly timeout value.
3.3.3 Fragmentation
Optionally, the IP layer MAY implement a mechanism to fragment
outgoing datagrams intentionally.
We designate by EMTU_S ("Effective MTU for sending") the
maximum IP datagram size that may be sent, for a particular
combination of IP source and destination addresses and perhaps
TOS.
A host MUST implement a mechanism to allow the transport layer
to learn MMS_S, the maximum transport-layer message size that
may be sent for a given {source, destination, TOS} triplet (see
GET_MAXSIZES call in Section 3.4). If no local fragmentation
is performed, the value of MMS_S will be:
MMS_S = EMTU_S - <IP header size>
and EMTU_S must be less than or equal to the MTU of the network
interface corresponding to the source address of the datagram.
Note that <IP header size> in this equation will be 20, unless
the IP reserves space to insert IP options for its own purposes
in addition to any options inserted by the transport layer.
A host that does not implement local fragmentation MUST ensure
that the transport layer (for TCP) or the application layer
(for UDP) obtains MMS_S from the IP layer and does not send a
datagram exceeding MMS_S in size.
It is generally desirable to avoid local fragmentation and to
choose EMTU_S low enough to avoid fragmentation in any gateway
along the path. In the absence of actual knowledge of the
minimum MTU along the path, the IP layer SHOULD use
EMTU_S <= 576 whenever the destination address is not on a
connected network, and otherwise use the connected network's
MTU.
The MTU of each physical interface MUST be configurable.
A host IP layer implementation MAY have a configuration flag
"All-Subnets-MTU", indicating that the MTU of the connected
network is to be used for destinations on different subnets
within the same network, but not for other networks. Thus,
this flag causes the network class mask, rather than the subnet
address mask, to be used to choose an EMTU_S. For a multihomed
host, an "All-Subnets-MTU" flag is needed for each network
interface.
DISCUSSION:
Picking the correct datagram size to use when sending data
is a complex topic [IP:9].
(a) In general, no host is required to accept an IP
datagram larger than 576 bytes (including header and
data), so a host must not send a larger datagram
without explicit knowledge or prior arrangement with
the destination host. Thus, MMS_S is only an upper
bound on the datagram size that a transport protocol
may send; even when MMS_S exceeds 556, the transport
layer must limit its messages to 556 bytes in the
absence of other knowledge about the destination
host.
(b) Some transport protocols (e.g., TCP) provide a way to
explicitly inform the sender about the largest
datagram the other end can receive and reassemble
[IP:7]. There is no corresponding mechanism in the
IP layer.
A transport protocol that assumes an EMTU_R larger
than 576 (see Section 3.3.2), can send a datagram of
this larger size to another host that implements the
same protocol.
(c) Hosts should ideally limit their EMTU_S for a given
destination to the minimum MTU of all the networks
along the path, to avoid any fragmentation. IP
fragmentation, while formally correct, can create a
serious transport protocol performance problem,
because loss of a single fragment means all the
fragments in the segment must be retransmitted
[IP:9].
Since nearly all networks in the Internet currently
support an MTU of 576 or greater, we strongly recommend
the use of 576 for datagrams sent to non-local networks.
It has been suggested that a host could determine the MTU
over a given path by sending a zero-offset datagram
fragment and waiting for the receiver to time out the
reassembly (which cannot complete!) and return an ICMP
Time Exceeded message. This message would include the
largest remaining fragment header in its body. More
direct mechanisms are being experimented with, but have
not yet been adopted (see e.g., RFC-1063).
3.3.4 Local Multihoming
3.3.4.1 Introduction
A multihomed host has multiple IP addresses, which we may
think of as "logical interfaces". These logical interfaces
may be associated with one or more physical interfaces, and
these physical interfaces may be connected to the same or
different networks.
Here are some important cases of multihoming:
(a) Multiple Logical Networks
The Internet architects envisioned that each physical
network would have a single unique IP network (or
subnet) number. However, LAN administrators have
sometimes found it useful to violate this assumption,
operating a LAN with multiple logical networks per
physical connected network.
If a host connected to such a physical network is
configured to handle traffic for each of N different
logical networks, then the host will have N logical
interfaces. These could share a single physical
interface, or might use N physical interfaces to the
same network.
(b) Multiple Logical Hosts
When a host has multiple IP addresses that all have the
same <Network-number> part (and the same <Subnet-
number> part, if any), the logical interfaces are known
as "logical hosts". These logical interfaces might
share a single physical interface or might use separate
physical interfaces to the same physical network.
(c) Simple Multihoming
In this case, each logical interface is mapped into a
separate physical interface and each physical interface
is connected to a different physical network. The term
"multihoming" was originally applied only to this case,
but it is now applied more generally.
A host with embedded gateway functionality will
typically fall into the simple multihoming case. Note,
however, that a host may be simply multihomed without
containing an embedded gateway, i.e., without
forwarding datagrams from one connected network to
another.
This case presents the most difficult routing problems.
The choice of interface (i.e., the choice of first-hop
network) may significantly affect performance or even
reachability of remote parts of the Internet.
Finally, we note another possibility that is NOT
multihoming: one logical interface may be bound to multiple
physical interfaces, in order to increase the reliability or
throughput between directly connected machines by providing
alternative physical paths between them. For instance, two
systems might be connected by multiple point-to-point links.
We call this "link-layer multiplexing". With link-layer
multiplexing, the protocols above the link layer are unaware
that multiple physical interfaces are present; the link-
layer device driver is responsible for multiplexing and
routing packets across the physical interfaces.
In the Internet protocol architecture, a transport protocol
instance ("entity") has no address of its own, but instead
uses a single Internet Protocol (IP) address. This has
implications for the IP, transport, and application layers,
and for the interfaces between them. In particular, the
application software may have to be aware of the multiple IP
addresses of a multihomed host; in other cases, the choice
can be made within the network software.
3.3.4.2 Multihoming Requirements
The following general rules apply to the selection of an IP
source address for sending a datagram from a multihomed
host.
(1) If the datagram is sent in response to a received
datagram, the source address for the response SHOULD be
the specific-destination address of the request. See
Sections 4.1.3.5 and 4.2.3.7 and the "General Issues"
section of [INTRO:1] for more specific requirements on
higher layers.
Otherwise, a source address must be selected.
(2) An application MUST be able to explicitly specify the
source address for initiating a connection or a
request.
(3) In the absence of such a specification, the networking
software MUST choose a source address. Rules for this
choice are described below.
There are two key requirement issues related to multihoming:
(A) A host MAY silently discard an incoming datagram whose
destination address does not correspond to the physical
interface through which it is received.
(B) A host MAY restrict itself to sending (non-source-
routed) IP datagrams only through the physical
interface that corresponds to the IP source address of
the datagrams.
DISCUSSION:
Internet host implementors have used two different
conceptual models for multihoming, briefly summarized
in the following discussion. This document takes no
stand on which model is preferred; each seems to have a
place. This ambivalence is reflected in the issues (A)
and (B) being optional.
o Strong ES Model
The Strong ES (End System, i.e., host) model
emphasizes the host/gateway (ES/IS) distinction,
and would therefore substitute MUST for MAY in
issues (A) and (B) above. It tends to model a
multihomed host as a set of logical hosts within
the same physical host.
With respect to (A), proponents of the Strong ES
model note that automatic Internet routing
mechanisms could not route a datagram to a
physical interface that did not correspond to the
destination address.
Under the Strong ES model, the route computation
for an outgoing datagram is the mapping:
route(src IP addr, dest IP addr, TOS)
-> gateway
Here the source address is included as a parameter
in order to select a gateway that is directly
reachable on the corresponding physical interface.
Note that this model logically requires that in
general there be at least one default gateway, and
preferably multiple defaults, for each IP source
address.
o Weak ES Model
This view de-emphasizes the ES/IS distinction, and
would therefore substitute MUST NOT for MAY in
issues (A) and (B). This model may be the more
natural one for hosts that wiretap gateway routing
protocols, and is necessary for hosts that have
embedded gateway functionality.
The Weak ES Model may cause the Redirect mechanism
to fail. If a datagram is sent out a physical
interface that does not correspond to the
destination address, the first-hop gateway will
not realize when it needs to send a Redirect. On
the other hand, if the host has embedded gateway
functionality, then it has routing information
without listening to Redirects.
In the Weak ES model, the route computation for an
outgoing datagram is the mapping:
route(dest IP addr, TOS) -> gateway, interface
3.3.4.3 Choosing a Source Address DISCUSSION: When it sends an initial connection request (e.g., a TCP "SYN" segment) or a datagram service request (e.g., a UDP-based query), the transport layer on a multihomed host needs to know which source address to use. If the application does not specify it, the transport layer must ask the IP layer to perform the conceptual mapping: GET_SRCADDR(remote IP addr, TOS) -> local IP address Here TOS is the Type-of-Service value (see Section 3.2.1.6), and the result is the desired source address. The following rules are suggested for implementing this mapping: (a) If the remote Internet address lies on one of the (sub-) nets to which the host is directly connected, a corresponding source address may be chosen, unless the corresponding interface is known to be down. (b) The route cache may be consulted, to see if there is an active route to the specified destination network through any network interface; if so, a local IP address corresponding to that interface may be chosen. (c) The table of static routes, if any (see Section 3.3.1.2) may be similarly consulted. (d) The default gateways may be consulted. If these gateways are assigned to different interfaces, the interface corresponding to the gateway with the highest preference may be chosen. In the future, there may be a defined way for a multihomed host to ask the gateways on all connected networks for advice about the best network to use for a given destination. IMPLEMENTATION: It will be noted that this process is essentially the same as datagram routing (see Section 3.3.1), and therefore hosts may be able to combine the
implementation of the two functions.
3.3.5 Source Route Forwarding
Subject to restrictions given below, a host MAY be able to act
as an intermediate hop in a source route, forwarding a source-
routed datagram to the next specified hop.
However, in performing this gateway-like function, the host
MUST obey all the relevant rules for a gateway forwarding
source-routed datagrams [INTRO:2]. This includes the following
specific provisions, which override the corresponding host
provisions given earlier in this document:
(A) TTL (ref. Section 3.2.1.7)
The TTL field MUST be decremented and the datagram perhaps
discarded as specified for a gateway in [INTRO:2].
(B) ICMP Destination Unreachable (ref. Section 3.2.2.1)
A host MUST be able to generate Destination Unreachable
messages with the following codes:
4 (Fragmentation Required but DF Set) when a source-
routed datagram cannot be fragmented to fit into the
target network;
5 (Source Route Failed) when a source-routed datagram
cannot be forwarded, e.g., because of a routing
problem or because the next hop of a strict source
route is not on a connected network.
(C) IP Source Address (ref. Section 3.2.1.3)
A source-routed datagram being forwarded MAY (and normally
will) have a source address that is not one of the IP
addresses of the forwarding host.
(D) Record Route Option (ref. Section 3.2.1.8d)
A host that is forwarding a source-routed datagram
containing a Record Route option MUST update that option,
if it has room.
(E) Timestamp Option (ref. Section 3.2.1.8e)
A host that is forwarding a source-routed datagram
containing a Timestamp Option MUST add the current
timestamp to that option, according to the rules for this
option.
To define the rules restricting host forwarding of source-
routed datagrams, we use the term "local source-routing" if the
next hop will be through the same physical interface through
which the datagram arrived; otherwise, it is "non-local
source-routing".
o A host is permitted to perform local source-routing
without restriction.
o A host that supports non-local source-routing MUST have a
configurable switch to disable forwarding, and this switch
MUST default to disabled.
o The host MUST satisfy all gateway requirements for
configurable policy filters [INTRO:2] restricting non-
local forwarding.
If a host receives a datagram with an incomplete source route
but does not forward it for some reason, the host SHOULD return
an ICMP Destination Unreachable (code 5, Source Route Failed)
message, unless the datagram was itself an ICMP error message.
3.3.6 Broadcasts
Section 3.2.1.3 defined the four standard IP broadcast address
forms:
Limited Broadcast: {-1, -1}
Directed Broadcast: {<Network-number>,-1}
Subnet Directed Broadcast:
{<Network-number>,<Subnet-number>,-1}
All-Subnets Directed Broadcast: {<Network-number>,-1,-1}
A host MUST recognize any of these forms in the destination
address of an incoming datagram.
There is a class of hosts* that use non-standard broadcast
address forms, substituting 0 for -1. All hosts SHOULD
_________________________
*4.2BSD Unix and its derivatives, but not 4.3BSD.
recognize and accept any of these non-standard broadcast
addresses as the destination address of an incoming datagram.
A host MAY optionally have a configuration option to choose the
0 or the -1 form of broadcast address, for each physical
interface, but this option SHOULD default to the standard (-1)
form.
When a host sends a datagram to a link-layer broadcast address,
the IP destination address MUST be a legal IP broadcast or IP
multicast address.
A host SHOULD silently discard a datagram that is received via
a link-layer broadcast (see Section 2.4) but does not specify
an IP multicast or broadcast destination address.
Hosts SHOULD use the Limited Broadcast address to broadcast to
a connected network.
DISCUSSION:
Using the Limited Broadcast address instead of a Directed
Broadcast address may improve system robustness. Problems
are often caused by machines that do not understand the
plethora of broadcast addresses (see Section 3.2.1.3), or
that may have different ideas about which broadcast
addresses are in use. The prime example of the latter is
machines that do not understand subnetting but are
attached to a subnetted net. Sending a Subnet Broadcast
for the connected network will confuse those machines,
which will see it as a message to some other host.
There has been discussion on whether a datagram addressed
to the Limited Broadcast address ought to be sent from all
the interfaces of a multihomed host. This specification
takes no stand on the issue.
3.3.7 IP Multicasting
A host SHOULD support local IP multicasting on all connected
networks for which a mapping from Class D IP addresses to
link-layer addresses has been specified (see below). Support
for local IP multicasting includes sending multicast datagrams,
joining multicast groups and receiving multicast datagrams, and
leaving multicast groups. This implies support for all of
[IP:4] except the IGMP protocol itself, which is OPTIONAL.
DISCUSSION:
IGMP provides gateways that are capable of multicast
routing with the information required to support IP
multicasting across multiple networks. At this time,
multicast-routing gateways are in the experimental stage
and are not widely available. For hosts that are not
connected to networks with multicast-routing gateways or
that do not need to receive multicast datagrams
originating on other networks, IGMP serves no purpose and
is therefore optional for now. However, the rest of
[IP:4] is currently recommended for the purpose of
providing IP-layer access to local network multicast
addressing, as a preferable alternative to local broadcast
addressing. It is expected that IGMP will become
recommended at some future date, when multicast-routing
gateways have become more widely available.
If IGMP is not implemented, a host SHOULD still join the "all-
hosts" group (224.0.0.1) when the IP layer is initialized and
remain a member for as long as the IP layer is active.
DISCUSSION:
Joining the "all-hosts" group will support strictly local
uses of multicasting, e.g., a gateway discovery protocol,
even if IGMP is not implemented.
The mapping of IP Class D addresses to local addresses is
currently specified for the following types of networks:
o Ethernet/IEEE 802.3, as defined in [IP:4].
o Any network that supports broadcast but not multicast,
addressing: all IP Class D addresses map to the local
broadcast address.
o Any type of point-to-point link (e.g., SLIP or HDLC
links): no mapping required. All IP multicast datagrams
are sent as-is, inside the local framing.
Mappings for other types of networks will be specified in the
future.
A host SHOULD provide a way for higher-layer protocols or
applications to determine which of the host's connected
network(s) support IP multicast addressing.
3.3.8 Error Reporting Wherever practical, hosts MUST return ICMP error datagrams on detection of an error, except in those cases where returning an ICMP error message is specifically prohibited. DISCUSSION: A common phenomenon in datagram networks is the "black hole disease": datagrams are sent out, but nothing comes back. Without any error datagrams, it is difficult for the user to figure out what the problem is. 3.4 INTERNET/TRANSPORT LAYER INTERFACE The interface between the IP layer and the transport layer MUST provide full access to all the mechanisms of the IP layer, including options, Type-of-Service, and Time-to-Live. The transport layer MUST either have mechanisms to set these interface parameters, or provide a path to pass them through from an application, or both. DISCUSSION: Applications are urged to make use of these mechanisms where applicable, even when the mechanisms are not currently effective in the Internet (e.g., TOS). This will allow these mechanisms to be immediately useful when they do become effective, without a large amount of retrofitting of host software. We now describe a conceptual interface between the transport layer and the IP layer, as a set of procedure calls. This is an extension of the information in Section 3.3 of RFC-791 [IP:1]. * Send Datagram SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt => result ) where the parameters are defined in RFC-791. Passing an Id parameter is optional; see Section 3.2.1.5. * Receive Datagram RECV(BufPTR, prot => result, src, dst, SpecDest, TOS, len, opt)
All the parameters are defined in RFC-791, except for:
SpecDest = specific-destination address of datagram
(defined in Section 3.2.1.3)
The result parameter dst contains the datagram's destination
address. Since this may be a broadcast or multicast address,
the SpecDest parameter (not shown in RFC-791) MUST be passed.
The parameter opt contains all the IP options received in the
datagram; these MUST also be passed to the transport layer.
* Select Source Address
GET_SRCADDR(remote, TOS) -> local
remote = remote IP address
TOS = Type-of-Service
local = local IP address
See Section 3.3.4.3.
* Find Maximum Datagram Sizes
GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S
MMS_R = maximum receive transport-message size.
MMS_S = maximum send transport-message size.
(local, remote, TOS defined above)
See Sections 3.3.2 and 3.3.3.
* Advice on Delivery Success
ADVISE_DELIVPROB(sense, local, remote, TOS)
Here the parameter sense is a 1-bit flag indicating whether
positive or negative advice is being given; see the
discussion in Section 3.3.1.4. The other parameters were
defined earlier.
* Send ICMP Message
SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt)
-> result
(Parameters defined in RFC-791).
Passing an Id parameter is optional; see Section 3.2.1.5.
The transport layer MUST be able to send certain ICMP
messages: Port Unreachable or any of the query-type
messages. This function could be considered to be a special
case of the SEND() call, of course; we describe it separately
for clarity.
* Receive ICMP Message
RECV_ICMP(BufPTR ) -> result, src, dst, len, opt
(Parameters defined in RFC-791).
The IP layer MUST pass certain ICMP messages up to the
appropriate transport-layer routine. This function could be
considered to be a special case of the RECV() call, of
course; we describe it separately for clarity.
For an ICMP error message, the data that is passed up MUST
include the original Internet header plus all the octets of
the original message that are included in the ICMP message.
This data will be used by the transport layer to locate the
connection state information, if any.
In particular, the following ICMP messages are to be passed
up:
o Destination Unreachable
o Source Quench
o Echo Reply (to ICMP user interface, unless the Echo
Request originated in the IP layer)
o Timestamp Reply (to ICMP user interface)
o Time Exceeded
DISCUSSION:
In the future, there may be additions to this interface to
pass path data (see Section 3.3.1.3) between the IP and
transport layers.