RFC 3819

Advice for Internet Subnetwork Designers

Pages: 60
Best Current Practice: 89
→ Errata

Part 1 of 3 – Pages 1 to 14

RFC3819 - Page 1

Network Working Group                                       P. Karn, Ed.
Request for Comments: 3819                                      Qualcomm
BCP: 89                                                       C. Bormann
Category: Best Current Practice                  Universitaet Bremen TZI
                                                            G. Fairhurst
                                                  University of Aberdeen
                                                             D. Grossman
                                                          Motorola, Inc.
                                                               R. Ludwig
                                                       Ericsson Research
                                                              J. Mahdavi
                                                                  Novell
                                                           G. Montenegro
                                   Sun Microsystems Laboratories, Europe
                                                                J. Touch
                                                                 USC/ISI
                                                                 L. Wood
                                                           Cisco Systems
                                                               July 2004


                Advice for Internet Subnetwork Designers

Status of this Memo

   This document specifies an Internet Best Current Practices for the
   Internet Community, and requests discussion and suggestions for
   improvements.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2004).

Abstract

   This document provides advice to the designers of digital
   communication equipment, link-layer protocols, and packet-switched
   local networks (collectively referred to as subnetworks), who wish to
   support the Internet protocols but may be unfamiliar with the
   Internet architecture and the implications of their design choices on
   the performance and efficiency of the Internet.

RFC3819 - Page 2

Table of Contents

   1.  Introduction and Overview. . . . . . . . . . . . . . . . . . .  2
   2.  Maximum Transmission Units (MTUs) and IP Fragmentation . . . .  4
       2.1.  Choosing the MTU in Slow Networks. . . . . . . . . . . .  6
   3.  Framing on Connection-Oriented Subnetworks . . . . . . . . . .  7
   4.  Connection-Oriented Subnetworks. . . . . . . . . . . . . . . .  9
   5.  Broadcasting and Discovery . . . . . . . . . . . . . . . . . . 10
   6.  Multicasting . . . . . . . . . . . . . . . . . . . . . . . . . 11
   7.  Bandwidth on Demand (BoD) Subnets. . . . . . . . . . . . . . . 13
   8.  Reliability and Error Control. . . . . . . . . . . . . . . . . 14
       8.1.  TCP vs Link-Layer Retransmission . . . . . . . . . . . . 14
       8.2.  Recovery from Subnetwork Outages . . . . . . . . . . . . 17
       8.3.  CRCs, Checksums and Error Detection. . . . . . . . . . . 18
       8.4.  How TCP Works. . . . . . . . . . . . . . . . . . . . . . 20
       8.5.  TCP Performance Characteristics. . . . . . . . . . . . . 22
             8.5.1.  The Formulae . . . . . . . . . . . . . . . . . . 22
             8.5.2.  Assumptions. . . . . . . . . . . . . . . . . . . 23
             8.5.3.  Analysis of Link-Layer Effects on TCP
                     Performance. . . . . . . . . . . . . . . . . . . 24
   9.  Quality-of-Service (QoS) Considerations. . . . . . . . . . . . 26
   10. Fairness vs Performance. . . . . . . . . . . . . . . . . . . . 29
   11. Delay Characteristics. . . . . . . . . . . . . . . . . . . . . 30
   12. Bandwidth Asymmetries. . . . . . . . . . . . . . . . . . . . . 31
   13. Buffering, Flow and Congestion Control . . . . . . . . . . . . 31
   14. Compression. . . . . . . . . . . . . . . . . . . . . . . . . . 34
   15. Packet Reordering. . . . . . . . . . . . . . . . . . . . . . . 36
   16. Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
   17. Routing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
   18. Security Considerations. . . . . . . . . . . . . . . . . . . . 41
   19. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 44
   20. Informative References . . . . . . . . . . . . . . . . . . . . 45
   21. Contributors' Addresses. . . . . . . . . . . . . . . . . . . . 57
   22. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 58
   23. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 60

1.  Introduction and Overview

   IP, the Internet Protocol [RFC791] [RFC2460], is the core protocol of
   the Internet.  IP defines a simple "connectionless" packet-switched
   network.  The success of the Internet is largely attributed to IP's
   simplicity, the "end-to-end principle" [SRC81] on which the Internet
   is based, and the resulting ease of carrying IP on a wide variety of
   subnetworks, not necessarily designed with IP in mind.  A subnetwork
   refers to any network operating immediately below the IP layer to
   connect two or more systems using IP (i.e., end hosts or routers).
   In its simplest form, this may be a direct connection between the IP
   systems (e.g., using a length of cable or a wireless medium).

RFC3819 - Page 3

   This document defines a subnetwork as a layer 2 network, which is a
   network that does not rely upon the services of IP routers to forward
   packets between parts of the subnetwork.  However, IP routers may
   bridge frames at Layer 2 between parts of a subnetwork.  Sometimes,
   it is convenient to aggregate a group of such subnetworks into a
   single logical subnetwork.  IP routing protocols (e.g., OSPF, IS-IS,
   and PIM) can be configured to support this aggregation, but typically
   present a layer-3 subnetwork rather than a layer-2 subnetwork.  This
   may also result in a specific packet passing several times over the
   same layer-2 subnetwork via an intermediate layer-3 gateway (router).
   Because that aggregation requires layer-3 components, issues thereof
   are beyond the scope of this document.

   However, while many subnetworks carry IP, they do not necessarily do
   so with maximum efficiency, minimum complexity, or cost, nor do they
   implement certain features to efficiently support newer Internet
   features of increasing importance, such as multicasting or quality of
   service.

   With the explosive growth of the Internet, IP packets comprise an
   increasingly large fraction of the traffic carried by the world's
   telecommunications networks.  It therefore makes sense to optimize
   both existing and new subnetwork technologies for IP as much as
   possible.

   Optimizing a subnetwork for IP involves three complementary
   considerations:

   1.  Providing functionality sufficient to carry IP.

   2.  Eliminating unnecessary functions that increase cost or
       complexity.

   3.  Choosing subnetwork parameters that maximize the performance of
       the Internet protocols.

   Because IP is so simple, consideration 2 is more of an issue than
   consideration 1.  That is to say, subnetwork designers make many more
   errors of commission than errors of omission.  However, certain
   enhancements to Internet features, such as multicasting and quality-
   of-service, benefit significantly from support given by the
   underlying subnetworks beyond that necessary to carry "traditional"
   unicast, best-effort IP.

RFC3819 - Page 4

   A major consideration in the efficient design of any layered
   communication network is the appropriate layer(s) in which to
   implement a given function.  This issue was first addressed in the
   seminal paper, "End-to-End Arguments in System Design" [SRC81].  That
   paper argued that many functions can be implemented properly *only*
   on an end-to-end basis, i.e., at the highest protocol layers, outside
   the subnetwork.  These functions include ensuring the reliable
   delivery of data and the use of cryptography to provide
   confidentiality and message integrity.

   Such functions cannot be provided solely by the concatenation of
   hop-by-hop services; duplicating these functions at the lower
   protocol layers (i.e., within the subnetwork) can be needlessly
   redundant or even harmful to cost and performance.

   However, partial duplication of functionality in a lower layer can
   *sometimes* be justified by performance, security, or availability
   considerations.  Examples include link-layer retransmission to
   improve the performance of an unusually lossy channel, e.g., mobile
   radio, link-level encryption intended to thwart traffic analysis, and
   redundant transmission links to improve availability, increase
   throughput, or to guarantee performance for certain classes of
   traffic.  Duplication of protocol functions should be done only with
   an understanding of system-level implications, including possible
   interactions with higher-layer mechanisms.

   The original architecture of the Internet was influenced by the
   end-to-end principle [SRC81], and has been, in our view, part of the
   reason for the Internet's success.

   The remainder of this document discusses the various subnetwork
   design issues that the authors consider relevant to efficient IP
   support.

2.  Maximum Transmission Units (MTUs) and IP Fragmentation

   IPv4 packets (datagrams) vary in size, from 20 bytes (the size of the
   IPv4 header alone) to a maximum of 65535 bytes.  Subnetworks need not
   support maximum-sized (64KB) IP packets, as IP provides a scheme that
   breaks packets that are too large for a given subnetwork into
   fragments that travel as independent IP packets and are reassembled
   at the destination.  The maximum packet size supported by a
   subnetwork is known as its Maximum Transmission Unit (MTU).

   Subnetworks may, but are not required to, indicate the length of each
   packet they carry.  One example is Ethernet with the widely used DIX
   [DIX82] (not IEEE 802.3 [IEEE8023]) header, which lacks a length

RFC3819 - Page 5

   field to indicate the true data length when the packet is padded to a
   minimum of 60 bytes.  This is not a problem for uncompressed IP
   because each IP packet carries its own length field.

   If optional header compression [RFC1144] [RFC2507] [RFC2508]
   [RFC3095] is used, however, it is required that the link framing
   indicate frame length because that is needed for the reconstruction
   of the original header.

   In IP version 4 (the version now in widespread use), fragmentation
   can occur at either the sending host or in an intermediate router,
   and fragments can be further fragmented at subsequent routers if
   necessary.

   In IP version 6 [RFC2460], fragmentation can occur only at the
   sending host; it cannot occur in a router (called "router
   fragmentation" in this document).

   Both IPv4 and IPv6 provide a "path MTU discovery" procedure [RFC1191]
   [RFC1435] [RFC1981] that allows the sending host to avoid
   fragmentation by discovering the minimum MTU along a given path and
   reduce its packet sizes accordingly.  This procedure is optional in
   IPv4 and IPv6.

   Path MTU discovery is widely deployed, but it sometimes encounters
   problems.  Some routers fail to generate the ICMP messages that
   convey path MTU information to the sender, and sometimes the ICMP
   messages are blocked by overly restrictive firewalls.  The result can
   be a "Path MTU Black Hole" [RFC2923] [RFC1435].

   The Path MTU Discovery procedure, the persistence of path MTU black
   holes, and the deletion of router fragmentation in IPv6 reflect a
   consensus of the Internet technical community that router
   fragmentation is best avoided.  This requires that subnetworks
   support MTUs that are "reasonably" large.  All IPv4 end hosts are
   required to accept and reassemble IP packets of size 576 bytes
   [RFC791], but such a small value would clearly be inefficient.
   Because IPv6 omits fragmentation by routers, [RFC2460] specifies a
   larger minimum MTU of 1280 bytes.  Any subnetwork with an internal
   packet payload smaller than 1280 bytes must implement a mechanism
   that performs fragmentation/reassembly of IP packets to/from
   subnetwork frames if it is to support IPv6.

   If a subnetwork cannot directly support a "reasonable" MTU with
   native framing mechanisms, it should internally fragment.  That is,
   it should transparently break IP packets into internal data elements
   and reassemble them at the other end of the subnetwork.

RFC3819 - Page 6

   This leaves the question of what is a "reasonable" MTU.  Ethernet (10
   and 100 Mb/s) has an MTU of 1500 bytes, and because of the ubiquity
   of Ethernet few Internet paths currently have MTUs larger than this
   value.  This severely limits the utility of larger MTUs provided by
   other subnetworks.  Meanwhile, larger MTUs are increasingly desirable
   on high-speed subnetworks to reduce the per-packet processing
   overhead in host computers, and implementers are encouraged to
   provide them even though they may not be usable when Ethernet is also
   in the path.

   Various "tunneling" schemes, such as GRE [RFC2784] or IP Security in
   tunnel mode [RFC2406], treat IP as a subnetwork for IP.  Since
   tunneling adds header overhead, it can trigger fragmentation, even
   when the same physical subnetworks (e.g., Ethernet) are used on both
   sides of the host performing IPsec encapsulation.  Tunneling has made
   it more difficult to avoid router fragmentation and has increased the
   incidence of path MTU black holes [RFC2401] [RFC2923].  Larger
   subnetwork MTUs may help to alleviate this problem.

2.1.  Choosing the MTU in Slow Networks

   In slow networks, the largest possible packet may take a considerable
   amount of time to send.  This is known as channelisation or
   serialisation delay.  Total end-to-end interactive response time
   should not exceed the well-known human factors limit of 100 to 200
   ms.  This includes all sources of delay: electromagnetic propagation
   delay, queuing delay, serialisation delay, and the store-and-forward
   time, i.e., the time to transmit a packet at link speed.

   At low link speeds, store-and-forward delays can dominate total
   end-to-end delay; these are in turn directly influenced by the
   maximum transmission unit (MTU) size.  Even when an interactive
   packet is given a higher queuing priority, it may have to wait for a
   large bulk transfer packet to finish transmission.  This worst-case
   wait can be set by an appropriate choice of MTU.

   For example, if the MTU is set to 1500 bytes, then an MTU-sized
   packet will take about 8 milliseconds to send on a T1 (1.536 Mb/s)
   link.  But if the link speed is 19.2kb/s, then the transmission time
   becomes 625 ms -- well above our 100-200ms limit.  A 256-byte MTU
   would lower this delay to a little over 100 ms.  However, care should
   be taken not to lower the MTU excessively, as this will increase
   header overhead and trigger frequent router fragmentation (if Path
   MTU discovery is not in use).  This is likely to be the case with
   multicast, where Path MTU discovery is ineffective.

   One way to limit delay for interactive traffic without imposing a
   small MTU is to give priority to this traffic and to preempt (abort)

RFC3819 - Page 7

   the transmission of a lower-priority packet when a higher priority
   packet arrives in the queue.  However, the link resources used to
   send the aborted packet are lost, and overall throughput will
   decrease.

   Another way to limit delay is to implement a link-level multiplexing
   scheme that allows several packets to be in progress simultaneously,
   with transmission priority given to segments of higher-priority IP
   packets.  For links using the Point-To-Point Protocol (PPP)
   [RFC1661], multi-class multilink [RFC2686] [RFC2687] [RFC2689]
   provides such a facility.

   ATM (asynchronous transfer mode), where SNDUs are fragmented and
   interleaved across smaller 53-byte ATM cells, is another example of
   this technique.  However, ATM is generally used on high-speed links
   where the store-and-forward delays are already minimal, and it
   introduces significant (~9%) increases in overhead due to the
   addition of 5-byte cell overhead to each 48-byte ATM cell.

   A third example is the Data-Over-Cable Service Interface
   Specification (DOCSIS) with typical upstream bandwidths of 2.56 Mb/s
   or 5.12 Mb/s.  To reduce the impact of a 1500-byte MTU in DOCSIS 1.0
   [DOCSIS1], a data link layer fragmentation mechanism is specified in
   DOCSIS 1.1 [DOCSIS2].  To accommodate the installed base, DOCSIS 1.1
   must be backward compatible with DOCSIS 1.0 cable modems, which
   generally do not support fragmentation.  Under the co-existence of
   DOCSIS 1.0 and DOCSIS 1.1, the unfragmented large data packets from
   DOCSIS 1.0 cable modems may affect the quality of service for voice
   packets from DOCSIS 1.1 cable modems.  In this case, it has been
   shown in [DOCSIS3] that the use of bandwidth allocation algorithms
   can mitigate this effect.

   To summarize, there is a fundamental tradeoff between efficiency and
   latency in the design of a subnetwork, and the designer should keep
   this tradeoff in mind.

3.  Framing on Connection-Oriented Subnetworks

   IP requires that subnetworks mark the beginning and end of each
   variable-length, asynchronous IP packet.  Some examples of links and
   subnetworks that do not provide this as an intrinsic feature include:

   1.  leased lines carrying a synchronous bit stream;

   2.  ISDN B-channels carrying a synchronous octet stream;

   3.  dialup telephone modems carrying an asynchronous octet stream;

RFC3819 - Page 8

       and

   4.  Asynchronous Transfer Mode (ATM) networks carrying an
       asynchronous stream of fixed-sized "cells".

   The Internet community has defined packet framing methods for all
   these subnetworks.  The Point-To-Point Protocol (PPP) [RFC1661],
   which uses a variant of HDLC, is applicable to bit synchronous,
   octet-synchronous, and octet asynchronous links (i.e., examples 1-3
   above).  PPP is one preferred framing method for IP, since a large
   number of systems interoperate with PPP.  ATM has its own framing
   methods, described in [RFC2684] [RFC2364].

   At high speeds, a subnetwork should provide a framed interface
   capable of carrying asynchronous, variable-length IP datagrams.  The
   maximum packet size supported by this interface is discussed above in
   the MTU/Fragmentation section.  The subnetwork may implement this
   facility in any convenient manner.

   IP packet boundaries need not coincide with any framing or
   synchronization mechanisms internal to the subnetwork.  When the
   subnetwork implements variable sized data units, the most
   straightforward approach is to place exactly one IP packet into each
   subnetwork data unit (SNDU), and to rely on the subnetwork's existing
   ability to delimit SNDUs to also delimit IP packets.  A good example
   is Ethernet.  However, some subnetworks have SNDUs of one or more
   fixed sizes, as dictated by switching, forward error correction
   and/or interleaving considerations.  Examples of such subnetworks
   include ATM, with a single cell payload size of 48 octets plus a 5-
   octet header, and IS-95 digital cellular, with two "rate sets" of
   four fixed frame sizes each that may be selected on 20 millisecond
   boundaries.

   Because IP packets are of variable length, they may not necessarily
   fit into an integer multiple of fixed-sized SNDUs.  An "adaptation
   layer" is needed to convert IP packets into SNDUs while marking the
   boundary between each IP packet in some manner.

   There are several approaches to this problem.  The first is to encode
   each IP packet into one or more SNDUs with no SNDU containing pieces
   of more than one IP packet, and to pad out the last SNDU of the
   packet as needed.  Bits in a control header added to each SNDU
   indicate where the data segment belongs in the IP packet.  If the
   subnetwork provides in-order, at-most-once delivery, the header can
   be as simple as a pair of bits indicating whether the SNDU is the
   first and/or the last in the IP packet.  Alternatively, for
   subnetworks that do not reorder the fragments of an SNDU, only the
   last SNDU of the packet could be marked, as this would implicitly

RFC3819 - Page 9

   indicate the next SNDU as the first in a new IP packet.  The AAL5
   (ATM Adaptation Layer 5) scheme used with ATM is an example of this
   approach, though it adds other features, including a payload length
   field and a payload CRC.

   In AAL5, the ATM User-User Indication, which is encoded in the
   Payload Type field of an ATM cell, indicates the last cell of a
   packet.  The packet trailer is located at the end of the SNDU and
   contains the packet length and a CRC.

   Another framing technique is to insert per-segment overhead to
   indicate the presence of a segment option.  When present, the option
   carries a pointer to the end of the packet.  This differs from AAL5
   in that it permits another packet to follow within the same segment.
   MPEG-2 Transport Streams [EN301192] [ISO13818] support this style of
   fragmentation, and may either use padding (limiting each MPEG
   transport stream packet to carry only part of one IP packet), or
   allow a second IP packet to start in the same Transport Stream packet
   (no padding).

   A third approach is to insert a special flag sequence into the data
   stream between each IP packet, and to pack the resulting data stream
   into SNDUs without regard to SNDU boundaries.  This may have
   implications when frames are lost.  The flag sequence can also pad
   unused space at the end of an SNDU.  If the special flag appears in
   the user data, it is escaped to an alternate sequence (usually larger
   than a flag) to avoid being misinterpreted as a flag.  The HDLC-based
   framing schemes used in PPP are all examples of this approach.

   All three adaptation schemes introduce overhead; how much depends on
   the distribution of IP packet sizes, the size(s) of the SNDUs, and in
   the HDLC-like approaches, the content of the IP packet (since flag-
   like sequences occurring in the packet must be escaped, which expands
   them).  The designer must also weigh implementation complexity and
   performance in the choice and design of an adaptation layer.

4.  Connection-Oriented Subnetworks

   IP has no notion of a "connection"; it is a purely connectionless
   protocol.  When a connection is required by an application, it is
   usually provided by TCP [RFC793], the Transmission Control Protocol,
   running atop IP on an end-to-end basis.

   Connection-oriented subnetworks can be (and are widely) used to carry
   IP, but often with considerable complexity.  Subnetworks consisting
   of few nodes can simply open a permanent connection between each pair
   of nodes.  This is frequently done with ATM.  However, the number of
   connections increases as the square of the number of nodes, so this

RFC3819 - Page 10

   is clearly impractical for large subnetworks.  A "shim" layer between
   IP and the subnetwork is therefore required to manage connections.
   This is one of the most common functions of a Subnetwork Dependent
   Convergence Function (SNDCF) sublayer between IP and a subnetwork.

   SNDCFs typically open subnetwork connections as needed when an IP
   packet is queued for transmission and close them after an idle
   timeout.  There is no relation between subnetwork connections and any
   connections that may exist at higher layers (e.g., TCP).

   Because Internet traffic is typically bursty and transaction-
   oriented, it is often difficult to pick an optimal idle timeout.  If
   the timeout is too short, subnetwork connections are opened and
   closed rapidly, possibly over-stressing the subnetwork connection
   management system (especially if it was designed for voice traffic
   call holding times).  If the timeout is too long, subnetwork
   connections are idle much of the time, wasting any resources
   dedicated to them by the subnetwork.

   Purely connectionless subnets (such as Ethernet), which have no state
   and dynamically share resources, are optimal for supporting best-
   effort IP, which is stateless and dynamically shares resources.
   Connection-oriented packet networks (such as ATM and Frame Relay),
   which have state and dynamically share resources, are less optimal,
   since best-effort IP does not benefit from the overhead of creating
   and maintaining state.  Connection-oriented circuit-switched networks
   (including the PSTN and ISDN) have state and statically allocate
   resources for a call, and thus require state creation and maintenance
   overhead, but do not benefit from the efficiencies of statistical
   multiplexing sharing of capacity inherent in IP.

   In any event, if an SNDCF that opens and closes subnet connections is
   used to support IP, care should be taken to make sure that connection
   processing in the subnet can keep up with relatively short holding
   times.

5.  Broadcasting and Discovery

   Subnetworks fall into two categories: point-to-point and shared.  A
   point-to-point subnet has exactly two endpoint components (hosts or
   routers); a shared link has more than two endpoint components, using
   either an inherently broadcast medium (e.g., Ethernet, radio) or a
   switching layer hidden from the network layer (e.g., switched
   Ethernet, Myrinet [MYR95], ATM).  Switched subnetworks handle
   broadcast by copying broadcast packets, providing each interface that
   supports one, or more, systems (hosts or routers) with a copy of each
   packet.

RFC3819 - Page 11

   Several Internet protocols for IPv4 make use of broadcast
   capabilities, including link-layer address lookup (ARP), auto-
   configuration (RARP, BOOTP, DHCP), and routing (RIP).

   A lack of broadcast capability can impede the performance of these
   protocols, or render them inoperable (e.g., DHCP).  ARP-like link
   address lookup can be provided by a centralized database, but at the
   expense of potentially higher response latency and the need for nodes
   to have explicit knowledge of the ARP server address.  Shared links
   should support native, link-layer subnet broadcast.

   A corresponding set of IPv6 protocols uses multicasting (see next
   section) instead of broadcasting to provide similar functions with
   improved scaling in large networks.

6.  Multicasting

   The Internet model includes "multicasting", where IP packets are sent
   to all the members of a multicast group [RFC1112] [RFC3376]
   [RFC2710].  Multicast is an option in IPv4, but a standard feature of
   IPv6.  IPv4 multicast is currently used by multimedia,
   teleconferencing, gaming, and file distribution (web, peer-to-peer
   sharing) applications, as well as by some key network and host
   protocols (e.g., RIPv2, OSPF, NTP).  IPv6 additionally relies on
   multicast for network configuration (DHCP-like autoconfiguration) and
   link-layer address discovery [RFC2461] (replacing ARP).  In the case
   of IPv6, this can allow autoconfiguration and address discovery to
   span across routers, whereas the IPv4 broadcast-based services cannot
   without ad-hoc router support [RFC1812].

   Multicast-enabled IP routers organize each multicast group into a
   spanning tree, and route multicast packets by making copies of each
   multicast packet and forwarding the copies to each output interface
   that includes at least one downstream member of the multicast group.

   Multicasting is considerably more efficient when a subnetwork
   explicitly supports it.  For example, a router relaying a multicast
   packet onto an Ethernet segment need send only one copy of the
   packet, no matter how many members of the multicast group are
   connected to the segment.  Without native multicast support, routers
   and switches on shared links would need to use broadcast with
   software filters, such that every multicast packet sent incurs
   software overhead for every node on the subnetwork, even if a node is
   not a member of the multicast group.  Alternately, the router would
   transmit a separate copy to every member of the multicast group on
   the segment, as is done on multicast-incapable switched subnets.

RFC3819 - Page 12

   Subnetworks using shared channels (e.g., radio LANs, Ethernets) are
   especially suitable for native multicasting, and their designers
   should make every effort to support it.  This involves designating a
   section of the subnetwork's own address space for multicasting.  On
   these networks, multicast is basically broadcast on the medium, with
   Layer-2 receiver filters.

   Subnet interfaces also need to be designed to accept packets
   addressed to some number of multicast addresses, in addition to the
   unicast packets specifically addressed to them.  The number of
   multicast addresses that needs to be supported by a host depends on
   the requirements of the associated host; at least several dozen will
   meet most current needs.

   On low-speed networks, the multicast address recognition function may
   be readily implemented in host software, but on high-speed networks,
   it should be implemented in subnetwork hardware.  This hardware need
   not be complete; for example, many Ethernet interfaces implement a
   "hashing" function where the IP layer receives all of the multicast
   (and unicast) traffic to which the associated host subscribes, plus
   some small fraction of multicast traffic to which the host does not
   subscribe.  Host/router software then has to discard the unwanted
   packets that pass the Layer-2 multicast address filter [RFC1112].

   There does not need to be a one-to-one mapping between a Layer-2
   multicast address and an IP multicast address.  An address overlap
   may significantly degrade the filtering capability of a receiver's
   hardware multicast address filter.  A subnetwork supporting only
   broadcast should use this service for multicast and must rely on
   software filtering.

   Switched subnetworks must also provide a mechanism for copying
   multicast packets to ensure the packets reach at least all members of
   a multicast group.  One option is to "flood" multicast packets in the
   same manner as broadcast.  This can lead to unnecessary transmissions
   on some subnetwork links (notably non-multicast-aware Ethernet
   switches).  Some subnetworks therefore allow multicast filter tables
   to control which links receive packets belonging to a specific group.
   To configure this automatically requires access to Layer-3 group
   membership information (e.g., IGMP [RFC3376], or MLD [RFC2710]).
   Various implementation options currently exist to provide a subnet
   node with a list of mappings of multicast addresses to
   ports/interfaces.  These employ a range of approaches, including
   signaling from end hosts (e.g., IEEE 802 GARP/GMRP [802.1p]),
   signaling from switches (e.g., CGMP [CGMP] and RGMP [RFC3488]),
   interception and proxy of IP group membership packets (e.g., IGMP/MLD
   Proxy [MAGMA-PROXY]), and enabling Layer-2 devices to
   snoop/inspect/peek into forwarded Layer-3 protocol headers (e.g.,

RFC3819 - Page 13

   IGMP, MLD, PIM) so that they may infer Layer-3 multicast group
   membership [MAGMA-SNOOP].  These approaches differ in their
   complexity, flexibility, and ability to support new protocols.

7.  Bandwidth on Demand (BoD) Subnets

   Some subnets allow a number of subnet nodes to share a channel
   efficiently by assigning transmission opportunities dynamically.
   Transmission opportunities are requested by a subnet node when it has
   packets to send.  The subnet schedules and grants transmission
   opportunities sufficient to allow the transmitting subnet node to
   send one or more packets (or packet fragments).  We call these
   subnets Bandwidth on Demand (BoD) subnets.  Examples of BoD subnets
   include Demand Assignment Multiple Access (DAMA) satellite and
   terrestrial wireless networks, IEEE 802.11 point coordination
   function (PCF) mode, and DOCSIS.  A connection-oriented network (such
   as the PSTN, ATM or Frame Relay) reserves resources on a much longer
   timescale, and is therefore not a BoD subnet in our taxonomy.

   The design parameters for BoD are similar to those in connection-
   oriented subnetworks, although the implementations may vary
   significantly.  In BoD, the user typically requests access to the
   shared channel for some duration.  Access may be allocated for a
   period of time at a specific rate, for a certain number of packets,
   or until the user releases the channel.  Access may be coordinated
   through a central management entity or with a distributed algorithm
   amongst the users.  Examples of the resource that may be shared
   include a terrestrial wireless hop, an upstream channel in a cable
   television system, a satellite uplink, and an end-to-end satellite
   channel.

   Long-delay BoD subnets pose problems similar to connection-oriented
   subnets in anticipating traffic.  While connection-oriented subnets
   hold idle channels open expecting new data to arrive, BoD subnets
   request channel access based on buffer occupancy (or expected buffer
   occupancy) on the sending port.  Poor performance will likely result
   if the sender does not anticipate additional traffic arriving at that
   port during the time it takes to grant a transmission request.  It is
   recommended that the algorithm have the capability to extend a hold
   on the channel for data that has arrived after the original request
   was generated (this may be done by piggybacking new requests on user
   data).

   There is a wide variety of BoD protocols available.  However, there
   has been relatively little comprehensive research on the interactions
   between BoD mechanisms and Internet protocol performance.  Research
   on some specific mechanisms is available (e.g., [AR02]).  One item
   that has been studied is TCP's retransmission timer [KY02].  BoD

RFC3819 - Page 14

   systems can cause spurious timeouts when adjusting from a relatively
   high data rate, to a relatively low data rate.  In this case, TCP's
   transmitted data takes longer to get through the network than
   predicted by the TCP sender's computed retransmission timeout.
   Therefore, the TCP sender is prone to resending a segment
   prematurely.

(page 14 continued on part 2)