tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search

RFC 3542

 
 
 

Advanced Sockets Application Program Interface (API) for IPv6

Part 2 of 3, p. 17 to 48
Prev RFC Part       Next RFC Part

 


prevText      Top      Up      ToC       Page 17 
3.  IPv6 Raw Sockets

   Raw sockets bypass the transport layer (TCP or UDP).  With IPv4, raw
   sockets are used to access ICMPv4, IGMPv4, and to read and write IPv4
   datagrams containing a protocol field that the kernel does not
   process.  An example of the latter is a routing daemon for OSPF,
   since it uses IPv4 protocol field 89.  With IPv6 raw sockets will be
   used for ICMPv6 and to read and write IPv6 datagrams containing a
   Next Header field that the kernel does not process.  Examples of the
   latter are a routing daemon for OSPF for IPv6 and RSVP (protocol
   field 46).

   All data sent via raw sockets must be in network byte order and all
   data received via raw sockets will be in network byte order.  This
   differs from the IPv4 raw sockets, which did not specify a byte
   ordering and used the host's byte order for certain IP header fields.

   Another difference from IPv4 raw sockets is that complete packets
   (that is, IPv6 packets with extension headers) cannot be sent or
   received using the IPv6 raw sockets API.  Instead, ancillary data
   objects are used to transfer the extension headers and hoplimit
   information, as described in Section 6.  Should an application need
   access to the complete IPv6 packet, some other technique, such as the
   datalink interfaces BPF or DLPI, must be used.

   All fields except the flow label in the IPv6 header that an
   application might want to change (i.e., everything other than the
   version number) can be modified using ancillary data and/or socket
   options by the application for output.  All fields except the flow
   label in a received IPv6 header (other than the version number and
   Next Header fields) and all extension headers that an application
   might want to know are also made available to the application as
   ancillary data on input.  Hence there is no need for a socket option

Top      Up      ToC       Page 18 
   similar to the IPv4 IP_HDRINCL socket option and on receipt the
   application will only receive the payload i.e., the data after the
   IPv6 header and all the extension headers.

   This API does not define access to the flow label field, because
   today there is no standard usage of the field.

   When writing to a raw socket the kernel will automatically fragment
   the packet if its size exceeds the path MTU, inserting the required
   fragment headers.  On input the kernel reassembles received
   fragments, so the reader of a raw socket never sees any fragment
   headers.

   When we say "an ICMPv6 raw socket" we mean a socket created by
   calling the socket function with the three arguments AF_INET6,
   SOCK_RAW, and IPPROTO_ICMPV6.

   Most IPv4 implementations give special treatment to a raw socket
   created with a third argument to socket() of IPPROTO_RAW, whose value
   is normally 255, to have it mean that the application will send down
   complete packets including the IPv4 header.  (Note: This feature was
   added to IPv4 in 1988 by Van Jacobson to support traceroute, allowing
   a complete IP header to be passed by the application, before the
   IP_HDRINCL socket option was added.)  We note that IPPROTO_RAW has no
   special meaning to an IPv6 raw socket (and the IANA currently
   reserves the value of 255 when used as a next-header field).

3.1.  Checksums

   The kernel will calculate and insert the ICMPv6 checksum for ICMPv6
   raw sockets, since this checksum is mandatory.

   For other raw IPv6 sockets (that is, for raw IPv6 sockets created
   with a third argument other than IPPROTO_ICMPV6), the application
   must set the new IPV6_CHECKSUM socket option to have the kernel (1)
   compute and store a checksum for output, and (2) verify the received
   checksum on input, discarding the packet if the checksum is in error.
   This option prevents applications from having to perform source
   address selection on the packets they send.  The checksum will
   incorporate the IPv6 pseudo-header, defined in Section 8.1 of [RFC-
   2460].  This new socket option also specifies an integer offset into
   the user data of where the checksum is located.

      int  offset = 2;
      setsockopt(fd, IPPROTO_IPV6, IPV6_CHECKSUM, &offset,
                 sizeof(offset));

Top      Up      ToC       Page 19 
   By default, this socket option is disabled.  Setting the offset to -1
   also disables the option.  By disabled we mean (1) the kernel will
   not calculate and store a checksum for outgoing packets, and (2) the
   kernel will not verify a checksum for received packets.

   This option assumes the use of the 16-bit one's complement of the
   one's complement sum as the checksum algorithm and that the checksum
   field is aligned on a 16-bit boundary.  Thus, specifying a positive
   odd value as offset is invalid, and setsockopt() will fail for such
   offset values.

   An attempt to set IPV6_CHECKSUM for an ICMPv6 socket will fail.
   Also, an attempt to set or get IPV6_CHECKSUM for a non-raw IPv6
   socket will fail.

   (Note: Since the checksum is always calculated by the kernel for an
   ICMPv6 socket, applications are not able to generate ICMPv6 packets
   with incorrect checksums (presumably for testing purposes) using this
   API.)

3.2.  ICMPv6 Type Filtering

   ICMPv4 raw sockets receive most ICMPv4 messages received by the
   kernel.  (We say "most" and not "all" because Berkeley-derived
   kernels never pass echo requests, timestamp requests, or address mask
   requests to a raw socket.  Instead these three messages are processed
   entirely by the kernel.)  But ICMPv6 is a superset of ICMPv4, also
   including the functionality of IGMPv4 and ARPv4.  This means that an
   ICMPv6 raw socket can potentially receive many more messages than
   would be received with an ICMPv4 raw socket: ICMP messages similar to
   ICMPv4, along with neighbor solicitations, neighbor advertisements,
   and the three multicast listener discovery messages.

   Most applications using an ICMPv6 raw socket care about only a small
   subset of the ICMPv6 message types.  To transfer extraneous ICMPv6
   messages from the kernel to user can incur a significant overhead.
   Therefore this API includes a method of filtering ICMPv6 messages by
   the ICMPv6 type field.

   Each ICMPv6 raw socket has an associated filter whose datatype is
   defined as

      struct icmp6_filter;

   This structure, along with the macros and constants defined later in
   this section, are defined as a result of including the
   <netinet/icmp6.h>.

Top      Up      ToC       Page 20 
   The current filter is fetched and stored using getsockopt() and
   setsockopt() with a level of IPPROTO_ICMPV6 and an option name of
   ICMP6_FILTER.

   Six macros operate on an icmp6_filter structure:

      void ICMP6_FILTER_SETPASSALL (struct icmp6_filter *);
      void ICMP6_FILTER_SETBLOCKALL(struct icmp6_filter *);

      void ICMP6_FILTER_SETPASS ( int, struct icmp6_filter *);
      void ICMP6_FILTER_SETBLOCK( int, struct icmp6_filter *);

      int  ICMP6_FILTER_WILLPASS (int,
                                  const struct icmp6_filter *);
      int  ICMP6_FILTER_WILLBLOCK(int,
                                  const struct icmp6_filter *);

   The first argument to the last four macros (an integer) is an ICMPv6
   message type, between 0 and 255.  The pointer argument to all six
   macros is a pointer to a filter that is modified by the first four
   macros and is examined by the last two macros.

   The first two macros, SETPASSALL and SETBLOCKALL, let us specify that
   all ICMPv6 messages are passed to the application or that all ICMPv6
   messages are blocked from being passed to the application.

   The next two macros, SETPASS and SETBLOCK, let us specify that
   messages of a given ICMPv6 type should be passed to the application
   or not passed to the application (blocked).

   The final two macros, WILLPASS and WILLBLOCK, return true or false
   depending whether the specified message type is passed to the
   application or blocked from being passed to the application by the
   filter pointed to by the second argument.

   When an ICMPv6 raw socket is created, it will by default pass all
   ICMPv6 message types to the application.

   As an example, a program that wants to receive only router
   advertisements could execute the following:

      struct icmp6_filter  myfilt;

      fd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);

      ICMP6_FILTER_SETBLOCKALL(&myfilt);
      ICMP6_FILTER_SETPASS(ND_ROUTER_ADVERT, &myfilt);
      setsockopt(fd, IPPROTO_ICMPV6, ICMP6_FILTER, &myfilt,

Top      Up      ToC       Page 21 
                 sizeof(myfilt));

   The filter structure is declared and then initialized to block all
   messages types.  The filter structure is then changed to allow router
   advertisement messages to be passed to the application and the filter
   is installed using setsockopt().

   In order to clear an installed filter the application can issue a
   setsockopt for ICMP6_FILTER with a zero length.  When no such filter
   has been installed, getsockopt() will return the kernel default
   filter.

   The icmp6_filter structure is similar to the fd_set datatype used
   with the select() function in the sockets API.  The icmp6_filter
   structure is an opaque datatype and the application should not care
   how it is implemented.  All the application does with this datatype
   is allocate a variable of this type, pass a pointer to a variable of
   this type to getsockopt() and setsockopt(), and operate on a variable
   of this type using the six macros that we just defined.

   Nevertheless, it is worth showing a simple implementation of this
   datatype and the six macros.

      struct icmp6_filter {
        uint32_t  icmp6_filt[8];  /* 8*32 = 256 bits */
      };

      #define ICMP6_FILTER_WILLPASS(type, filterp) \
        ((((filterp)->icmp6_filt[(type) >> 5]) & \
          (1 << ((type) & 31))) != 0)
      #define ICMP6_FILTER_WILLBLOCK(type, filterp) \
        ((((filterp)->icmp6_filt[(type) >> 5]) & \
          (1 << ((type) & 31))) == 0)
      #define ICMP6_FILTER_SETPASS(type, filterp) \
        ((((filterp)->icmp6_filt[(type) >> 5]) |= \
          (1 << ((type) & 31))))
      #define ICMP6_FILTER_SETBLOCK(type, filterp) \
        ((((filterp)->icmp6_filt[(type) >> 5]) &= \
          ~(1 << ((type) & 31))))
      #define ICMP6_FILTER_SETPASSALL(filterp) \
        memset((filterp), 0xFF, sizeof(struct icmp6_filter))
      #define ICMP6_FILTER_SETBLOCKALL(filterp) \
        memset((filterp), 0, sizeof(struct icmp6_filter))

   (Note: These sample definitions have two limitations that an
   implementation may want to change.  The first four macros evaluate
   their first argument two times.  The second two macros require the
   inclusion of the <string.h> header for the memset() function.)

Top      Up      ToC       Page 22 
3.3.  ICMPv6 Verification of Received Packets

   The protocol stack will verify the ICMPv6 checksum and discard any
   packets with invalid checksums.

   An implementation might perform additional validity checks on the
   ICMPv6 message content and discard malformed packets.  However, a
   portable application must not assume that such validity checks have
   been performed.

   The protocol stack should not automatically discard packets if the
   ICMP type is unknown to the stack.  For extensibility reasons
   received ICMP packets with any type (informational or error) must be
   passed to the applications (subject to ICMP6_FILTER filtering on the
   type value and the checksum verification).

4.  Access to IPv6 and Extension Headers

   Applications need to be able to control IPv6 header and extension
   header content when sending as well as being able to receive the
   content of these headers.  This is done by defining socket option
   types which can be used both with setsockopt and with ancillary data.
   Ancillary data is discussed in Appendix A.  The following optional
   information can be exchanged between the application and the kernel:

   1. The send/receive interface and source/destination address,
   2. The hop limit,
   3. Next hop address,
   4. The traffic class,
   5. Routing header,
   6. Hop-by-Hop options header, and
   7. Destination options header.

   First, to receive any of this optional information (other than the
   next hop address, which can only be set) on a UDP or raw socket, the
   application must call setsockopt() to turn on the corresponding flag:

      int  on = 1;

      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVPKTINFO,  &on, sizeof(on));
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVHOPLIMIT, &on, sizeof(on));
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVRTHDR,    &on, sizeof(on));
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVHOPOPTS,  &on, sizeof(on));
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVDSTOPTS,  &on, sizeof(on));
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVTCLASS,   &on, sizeof(on));

Top      Up      ToC       Page 23 
   When any of these options are enabled, the corresponding data is
   returned as control information by recvmsg(), as one or more
   ancillary data objects.

   This document does not define how to receive the optional information
   on a TCP socket.  See Section 4.1 for more details.

   Two different mechanisms exist for sending this optional information:

   1. Using setsockopt to specify the option content for a socket.
      These are known "sticky" options since they affect all transmitted
      packets on the socket until either a new setsockopt is done or the
      options are overridden using ancillary data.

   2. Using ancillary data to specify the option content for a single
      datagram.  This only applies to datagram and raw sockets; not to
      TCP sockets.

   The three socket option parameters and the three cmsghdr fields that
   describe the options/ancillary data objects are summarized as:

      opt level/    optname/          optval/
      cmsg_level    cmsg_type         cmsg_data[]
      ------------  ------------      ------------------------
      IPPROTO_IPV6  IPV6_PKTINFO      in6_pktinfo structure
      IPPROTO_IPV6  IPV6_HOPLIMIT     int
      IPPROTO_IPV6  IPV6_NEXTHOP      socket address structure
      IPPROTO_IPV6  IPV6_RTHDR        ip6_rthdr structure
      IPPROTO_IPV6  IPV6_HOPOPTS      ip6_hbh structure
      IPPROTO_IPV6  IPV6_DSTOPTS      ip6_dest structure
      IPPROTO_IPV6  IPV6_RTHDRDSTOPTS ip6_dest structure
      IPPROTO_IPV6  IPV6_TCLASS       int

      (Note: IPV6_HOPLIMIT can be used as ancillary data items only)

   All these options are described in detail in Section 6, 7, 8 and 9.
   All the constants beginning with IPV6_ are defined as a result of
   including <netinet/in.h>.

   Note: We intentionally use the same constant for the cmsg_level
   member as is used as the second argument to getsockopt() and
   setsockopt() (what is called the "level"), and the same constant for
   the cmsg_type member as is used as the third argument to getsockopt()
   and setsockopt() (what is called the "option name").

Top      Up      ToC       Page 24 
   Issuing getsockopt() for the above options will return the sticky
   option value i.e., the value set with setsockopt().  If no sticky
   option value has been set getsockopt() will return the following
   values:

   -  For the IPV6_PKTINFO option, it will return an in6_pktinfo
      structure with ipi6_addr being in6addr_any and ipi6_ifindex being
      zero.

   -  For the IPV6_TCLASS option, it will return the kernel default
      value.

   -  For other options, it will indicate the lack of the option value
      with optlen being zero.

   The application does not explicitly need to access the data
   structures for the Routing header, Hop-by-Hop options header, and
   Destination options header, since the API to these features is
   through a set of inet6_rth_XXX() and inet6_opt_XXX() functions that
   we define in Section 7 and Section 10.  Those functions simplify the
   interface to these features instead of requiring the application to
   know the intimate details of the extension header formats.

   When specifying extension headers, this API assumes the header
   ordering and the number of occurrences of each header as described in
   [RFC-2460].  More details about the ordering issue will be discussed
   in Section 12.

4.1.  TCP Implications

   It is not possible to use ancillary data to transmit the above
   options for TCP since there is not a one-to-one mapping between send
   operations and the TCP segments being transmitted.  Instead an
   application can use setsockopt to specify them as sticky options.
   When the application uses setsockopt to specify the above options it
   is expected that TCP will start using the new information when
   sending segments.  However, TCP may or may not use the new
   information when retransmitting segments that were originally sent
   when the old sticky options were in effect.

   It is unclear how a TCP application can use received information
   (such as extension headers) due to the lack of mapping between
   received TCP segments and receive operations.  In particular, the
   received information could not be used for access control purposes
   like on UDP and raw sockets.

Top      Up      ToC       Page 25 
   This specification therefore does not define how to get the received
   information on TCP sockets.  The result of the IPV6_RECVxxx options
   on a TCP socket is undefined as well.

4.2.  UDP and Raw Socket Implications

   The receive behavior for UDP and raw sockets is quite
   straightforward.  After the application has enabled an IPV6_RECVxxx
   socket option it will receive ancillary data items for every
   recvmsg() call containing the requested information.  However, if the
   information is not present in the packet the ancillary data item will
   not be included.  For example, if the application enables
   IPV6_RECVRTHDR and a received datagram does not contain a Routing
   header there will not be an IPV6_RTHDR ancillary data item.  Note
   that due to buffering in the socket implementation there might be
   some packets queued when an IPV6_RECVxxx option is enabled and they
   might not have the ancillary data information.

   For sending the application has the choice between using sticky
   options and ancillary data.  The application can also use both having
   the sticky options specify the "default" and using ancillary data to
   override the default options.

   When an ancillary data item is specified in a call to sendmsg(), the
   item will override an existing sticky option of the same name (if
   previously specified).  For example, if the application has set
   IPV6_RTHDR using a sticky option and later passes IPV6_RTHDR as
   ancillary data this will override the IPV6_RTHDR sticky option and
   the routing header of the outgoing packet will be from the ancillary
   data item, not from the sticky option.  Note, however, that other
   sticky options than IPV6_RTHDR will not be affected by the IPV6_RTHDR
   ancillary data item; the overriding mechanism only works for the same
   type of sticky options and ancillary data items.

   (Note: the overriding rule is different from the one in RFC 2292.  In
   RFC 2292, an ancillary data item overrode all sticky options
   previously defined.  This was reasonable, because sticky options
   could only be specified as a set by a single socket option.  However,
   in this API, each option is separated so that it can be specified as
   a single sticky option.  Additionally, there are much more ancillary
   data items and sticky options than in RFC 2292, including ancillary-
   only one.  Thus, it should be natural for application programmers to
   separate the overriding rule as well.)

   An application can also temporarily disable a particular sticky
   option by specifying a corresponding ancillary data item that could
   disable the sticky option when being used as an argument for a socket
   option.  For example, if the application has set IPV6_HOPOPTS as a

Top      Up      ToC       Page 26 
   sticky option and later passes IPV6_HOPOPTS with a zero length as an
   ancillary data item, the packet will not have a Hop-by-Hop options
   header.

5.  Extensions to Socket Ancillary Data

   This specification uses ancillary data as defined in Posix with some
   compatible extensions, which are described in the following
   subsections.  Section 20 will provide a detailed overview of
   ancillary data and related structures and macros, including the
   extensions.

5.1.  CMSG_NXTHDR

      struct cmsghdr *CMSG_NXTHDR(const struct msghdr *mhdr,
                                  const struct cmsghdr *cmsg);

   CMSG_NXTHDR() returns a pointer to the cmsghdr structure describing
   the next ancillary data object.  Mhdr is a pointer to a msghdr
   structure and cmsg is a pointer to a cmsghdr structure.  If there is
   not another ancillary data object, the return value is NULL.

   The following behavior of this macro is new to this API: if the value
   of the cmsg pointer is NULL, a pointer to the cmsghdr structure
   describing the first ancillary data object is returned.  That is,
   CMSG_NXTHDR(mhdr, NULL) is equivalent to CMSG_FIRSTHDR(mhdr).  If
   there are no ancillary data objects, the return value is NULL.

5.2.  CMSG_SPACE

   socklen_t CMSG_SPACE(socklen_t length);

   This macro is new with this API.  Given the length of an ancillary
   data object, CMSG_SPACE() returns an upper bound on the space
   required by the object and its cmsghdr structure, including any
   padding needed to satisfy alignment requirements.  This macro can be
   used, for example, when allocating space dynamically for the
   ancillary data.  This macro should not be used to initialize the
   cmsg_len member of a cmsghdr structure; instead use the CMSG_LEN()
   macro.

Top      Up      ToC       Page 27 
5.3.  CMSG_LEN

   socklen_t CMSG_LEN(socklen_t length);

   This macro is new with this API.  Given the length of an ancillary
   data object, CMSG_LEN() returns the value to store in the cmsg_len
   member of the cmsghdr structure, taking into account any padding
   needed to satisfy alignment requirements.

   Note the difference between CMSG_SPACE() and CMSG_LEN(), shown also
   in the figure in Section 20.2: the former accounts for any required
   padding at the end of the ancillary data object and the latter is the
   actual length to store in the cmsg_len member of the ancillary data
   object.

6.  Packet Information

   There are five pieces of information that an application can specify
   for an outgoing packet using ancillary data:

      1.  the source IPv6 address,
      2.  the outgoing interface index,
      3.  the outgoing hop limit,
      4.  the next hop address, and
      5.  the outgoing traffic class value.

   Four similar pieces of information can be returned for a received
   packet as ancillary data:

      1.  the destination IPv6 address,
      2.  the arriving interface index,
      3.  the arriving hop limit, and
      4.  the arriving traffic class value.

   The first two pieces of information are contained in an in6_pktinfo
   structure that is set with setsockopt() or sent as ancillary data
   with sendmsg() and received as ancillary data with recvmsg().  This
   structure is defined as a result of including <netinet/in.h>.

      struct in6_pktinfo {
        struct in6_addr ipi6_addr;    /* src/dst IPv6 address */
        unsigned int    ipi6_ifindex; /* send/recv interface index */
      };

   In the socket option and cmsghdr level will be IPPROTO_IPV6, the type
   will be IPV6_PKTINFO, and the first byte of the option value and
   cmsg_data[] will be the first byte of the in6_pktinfo structure.  An
   application can clear any sticky IPV6_PKTINFO option by doing a

Top      Up      ToC       Page 28 
   "regular" setsockopt with ipi6_addr being in6addr_any and
   ipi6_ifindex being zero.

   This information is returned as ancillary data by recvmsg() only if
   the application has enabled the IPV6_RECVPKTINFO socket option:

      int  on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVPKTINFO, &on, sizeof(on));

   (Note: The hop limit is not contained in the in6_pktinfo structure
   for the following reason.  Some UDP servers want to respond to client
   requests by sending their reply out the same interface on which the
   request was received and with the source IPv6 address of the reply
   equal to the destination IPv6 address of the request.  To do this the
   application can enable just the IPV6_RECVPKTINFO socket option and
   then use the received control information from recvmsg() as the
   outgoing control information for sendmsg().  The application need not
   examine or modify the in6_pktinfo structure at all.  But if the hop
   limit were contained in this structure, the application would have to
   parse the received control information and change the hop limit
   member, since the received hop limit is not the desired value for an
   outgoing packet.)

6.1.  Specifying/Receiving the Interface

   Interfaces on an IPv6 node are identified by a small positive
   integer, as described in Section 4 of [RFC-3493].  That document also
   describes a function to map an interface name to its interface index,
   a function to map an interface index to its interface name, and a
   function to return all the interface names and indexes.  Notice from
   this document that no interface is ever assigned an index of 0.

   When specifying the outgoing interface, if the ipi6_ifindex value is
   0, the kernel will choose the outgoing interface.

   The ordering among various options that can specify the outgoing
   interface, including IPV6_PKTINFO, is defined in Section 6.7.

   When the IPV6_RECVPKTINFO socket option is enabled, the received
   interface index is always returned as the ipi6_ifindex member of the
   in6_pktinfo structure.

Top      Up      ToC       Page 29 
6.2.  Specifying/Receiving Source/Destination Address

   The source IPv6 address can be specified by calling bind() before
   each output operation, but supplying the source address together with
   the data requires less overhead (i.e., fewer system calls) and
   requires less state to be stored and protected in a multithreaded
   application.

   When specifying the source IPv6 address as ancillary data, if the
   ipi6_addr member of the in6_pktinfo structure is the unspecified
   address (IN6ADDR_ANY_INIT or in6addr_any), then (a) if an address is
   currently bound to the socket, it is used as the source address, or
   (b) if no address is currently bound to the socket, the kernel will
   choose the source address.  If the ipi6_addr member is not the
   unspecified address, but the socket has already bound a source
   address, then the ipi6_addr value overrides the already-bound source
   address for this output operation only.

   The kernel must verify that the requested source address is indeed a
   unicast address assigned to the node.  When the address is a scoped
   one, there may be ambiguity about its scope zone.  This is
   particularly the case for link-local addresses.  In such a case, the
   kernel must first determine the appropriate scope zone based on the
   zone of the destination address or the outgoing interface (if known),
   then qualify the address.  This also means that it is not feasible to
   specify the source address for a non-binding socket by the
   IPV6_PKTINFO sticky option, unless the outgoing interface is also
   specified.  The application should simply use bind() for such
   purposes.

   IPV6_PKTINFO can also be used as a sticky option for specifying the
   socket's default source address.  However, the ipi6_addr member must
   be the unspecified address for TCP sockets, because it is not
   possible to dynamically change the source address of a TCP
   connection.  When the IPV6_PKTINFO option is specified for a TCP
   socket with a non-unspecified address, the call will fail.  This
   restriction should be applied even before the socket binds a specific
   address.

   When the in6_pktinfo structure is returned as ancillary data by
   recvmsg(), the ipi6_addr member contains the destination IPv6 address
   from the received packet.

6.3.  Specifying/Receiving the Hop Limit

   The outgoing hop limit is normally specified with either the
   IPV6_UNICAST_HOPS socket option or the IPV6_MULTICAST_HOPS socket
   option, both of which are described in [RFC-3493].  Specifying the

Top      Up      ToC       Page 30 
   hop limit as ancillary data lets the application override either the
   kernel's default or a previously specified value, for either a
   unicast destination or a multicast destination, for a single output
   operation.  Returning the received hop limit is useful for IPv6
   applications that need to verify that the received hop limit is 255
   (e.g., that the packet has not been forwarded).

   The received hop limit is returned as ancillary data by recvmsg()
   only if the application has enabled the IPV6_RECVHOPLIMIT socket
   option:

      int  on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVHOPLIMIT, &on, sizeof(on));

   In the cmsghdr structure containing this ancillary data, the
   cmsg_level member will be IPPROTO_IPV6, the cmsg_type member will be
   IPV6_HOPLIMIT, and the first byte of cmsg_data[] will be the first
   byte of the integer hop limit.

   Nothing special need be done to specify the outgoing hop limit: just
   specify the control information as ancillary data for sendmsg().  As
   specified in [RFC-3493], the interpretation of the integer hop limit
   value is

      x < -1:        return an error of EINVAL
      x == -1:       use kernel default
      0 <= x <= 255: use x
      x >= 256:      return an error of EINVAL

   This API defines IPV6_HOPLIMIT as an ancillary-only option, that is,
   the option name cannot be used as a socket option.  This is because
   [RFC-3493] has more fine-grained socket options; IPV6_UNICAST_HOPS
   and IPV6_MULTICAST_HOPS.

6.4.  Specifying the Next Hop Address

   The IPV6_NEXTHOP ancillary data object specifies the next hop for the
   datagram as a socket address structure.  In the cmsghdr structure
   containing this ancillary data, the cmsg_level member will be
   IPPROTO_IPV6, the cmsg_type member will be IPV6_NEXTHOP, and the
   first byte of cmsg_data[] will be the first byte of the socket
   address structure.

   This is a privileged option.  (Note: It is implementation defined and
   beyond the scope of this document to define what "privileged" means.
   Unix systems use this term to mean the process must have an effective
   user ID of 0.)

Top      Up      ToC       Page 31 
   This API only defines the case where the socket address contains an
   IPv6 address (i.e., the sa_family member is AF_INET6).  And, in this
   case, the node identified by that address must be a neighbor of the
   sending host.  If that address equals the destination IPv6 address of
   the datagram, then this is equivalent to the existing SO_DONTROUTE
   socket option.

   This option does not have any meaning for multicast destinations.  In
   such a case, the specified next hop will be ignored.

   When the outgoing interface is specified by IPV6_PKTINFO as well, the
   next hop specified by this option must be reachable via the specified
   interface.

   In order to clear a sticky IPV6_NEXTHOP option the application must
   issue a setsockopt for IPV6_NEXTHOP with a zero length.

6.5.  Specifying/Receiving the Traffic Class value

   The outgoing traffic class is normally set to 0.  Specifying the
   traffic class as ancillary data lets the application override either
   the kernel's default or a previously specified value, for either a
   unicast destination or a multicast destination, for a single output
   operation.  Returning the received traffic class is useful for
   programs such as a diffserv debugging tool and for user level ECN
   (explicit congestion notification) implementation.

   The received traffic class is returned as ancillary data by recvmsg()
   only if the application has enabled the IPV6_RECVTCLASS socket
   option:

      int  on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVTCLASS, &on, sizeof(on));

   In the cmsghdr structure containing this ancillary data, the
   cmsg_level member will be IPPROTO_IPV6, the cmsg_type member will be
   IPV6_TCLASS, and the first byte of cmsg_data[] will be the first byte
   of the integer traffic class.

   To specify the outgoing traffic class value, just specify the control
   information as ancillary data for sendmsg() or using setsockopt().
   Just like the hop limit value, the interpretation of the integer
   traffic class value is

      x < -1:        return an error of EINVAL
      x == -1:       use kernel default
      0 <= x <= 255: use x
      x >= 256:      return an error of EINVAL

Top      Up      ToC       Page 32 
   In order to clear a sticky IPV6_TCLASS option the application can
   specify -1 as the value.

   There are cases where the kernel needs to control the traffic class
   value and conflicts with the user-specified value on the outgoing
   traffic.  An example is an implementation of ECN in the kernel,
   setting 2 bits of the traffic class value.  In such cases, the kernel
   should override the user-specified value.  On the incoming traffic,
   the kernel may mask some of the bits in the traffic class field.

6.6.  Additional Errors with sendmsg() and setsockopt()

   With the IPV6_PKTINFO socket option there are no additional errors
   possible with the call to recvmsg().  But when specifying the
   outgoing interface or the source address, additional errors are
   possible from sendmsg() or setsockopt().  Note that some
   implementations might only be able to return this type of errors for
   setsockopt().  The following are examples, but some of these may not
   be provided by some implementations, and some implementations may
   define additional errors:

   ENXIO         The interface specified by ipi6_ifindex does not exist.

   ENETDOWN      The interface specified by ipi6_ifindex is not enabled
                 for IPv6 use.

   EADDRNOTAVAIL ipi6_ifindex specifies an interface but the address
                 ipi6_addr is not available for use on that interface.

   EHOSTUNREACH  No route to the destination exists over the interface
                 specified by ipi6_ifindex.

6.7.  Summary of Outgoing Interface Selection

   This document and [RFC-3493] specify various methods that affect the
   selection of the packet's outgoing interface.  This subsection
   summarizes the ordering among those in order to ensure deterministic
   behavior.

   For a given outgoing packet on a given socket, the outgoing interface
   is determined in the following order:

   1. if an interface is specified in an IPV6_PKTINFO ancillary data
      item, the interface is used.

   2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
      option, the interface is used.

Top      Up      ToC       Page 33 
   3. otherwise, if the destination address is a multicast address and
      the IPV6_MULTICAST_IF socket option is specified for the socket,
      the interface is used.

   4. otherwise, if an IPV6_NEXTHOP ancillary data item is specified,
      the interface to the next hop is used.

   5. otherwise, if an IPV6_NEXTHOP sticky option is specified, the
      interface to the next hop is used.

   6. otherwise, the outgoing interface should be determined in an
      implementation dependent manner.

   The ordering above particularly means if the application specifies an
   interface by the IPV6_MULTICAST_IF socket option (described in [RFC-
   3493]) as well as specifying a different interface by the
   IPV6_PKTINFO sticky option, the latter will override the former for
   every multicast packet on the corresponding socket.  The reason for
   the ordering comes from expectation that the source address is
   specified as well and that the pair of the address and the outgoing
   interface should be preferred.

   In any case, the kernel must also verify that the source and
   destination addresses do not break their scope zones with regard to
   the outgoing interface.

7.  Routing Header Option

   Source routing in IPv6 is accomplished by specifying a Routing header
   as an extension header.  There can be different types of Routing
   headers, but IPv6 currently defines only the Type 0 Routing header
   [RFC-2460].  This type supports up to 127 intermediate nodes (limited
   by the length field in the extension header).  With this maximum
   number of intermediate nodes, a source, and a destination, there are
   128 hops.

   Source routing with the IPv4 sockets API (the IP_OPTIONS socket
   option) requires the application to build the source route in the
   format that appears as the IPv4 header option, requiring intimate
   knowledge of the IPv4 options format.  This IPv6 API, however,
   defines six functions that the application calls to build and examine
   a Routing header, and the ability to use sticky options or ancillary
   data to communicate this information between the application and the
   kernel using the IPV6_RTHDR option.

Top      Up      ToC       Page 34 
   Three functions build a Routing header:

      inet6_rth_space()    - return #bytes required for Routing header
      inet6_rth_init()     - initialize buffer data for Routing header
      inet6_rth_add()      - add one IPv6 address to the Routing header

   Three functions deal with a returned Routing header:

      inet6_rth_reverse()  - reverse a Routing header
      inet6_rth_segments() - return #segments in a Routing header
      inet6_rth_getaddr()  - fetch one address from a Routing header

   The function prototypes for these functions are defined as a result
   of including <netinet/in.h>.

   To receive a Routing header the application must enable the
   IPV6_RECVRTHDR socket option:

      int  on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVRTHDR, &on, sizeof(on));

   Each received Routing header is returned as one ancillary data object
   described by a cmsghdr structure with cmsg_type set to IPV6_RTHDR.
   When multiple Routing headers are received, multiple ancillary data
   objects (with cmsg_type set to IPV6_RTHDR) will be returned to the
   application.

   To send a Routing header the application specifies it either as
   ancillary data in a call to sendmsg() or using setsockopt().  For the
   sending side, this API assumes the number of occurrences of the
   Routing header as described in [RFC-2460].  That is, applications can
   only specify at most one outgoing Routing header.

   The application can remove any sticky Routing header by calling
   setsockopt() for IPV6_RTHDR with a zero option length.

   When using ancillary data a Routing header is passed between the
   application and the kernel as follows: The cmsg_level member has a
   value of IPPROTO_IPV6 and the cmsg_type member has a value of
   IPV6_RTHDR.  The contents of the cmsg_data[] member is implementation
   dependent and should not be accessed directly by the application, but
   should be accessed using the six functions that we are about to
   describe.

   The following constant is defined as a result of including the
   <netinet/in.h>:

      #define IPV6_RTHDR_TYPE_0    0 /* IPv6 Routing header type 0 */

Top      Up      ToC       Page 35 
   When a Routing header is specified, the destination address specified
   for connect(), sendto(), or sendmsg() is the final destination
   address of the datagram.  The Routing header then contains the
   addresses of all the intermediate nodes.

7.1.  inet6_rth_space

      socklen_t inet6_rth_space(int type, int segments);

   This function returns the number of bytes required to hold a Routing
   header of the specified type containing the specified number of
   segments (addresses).  For an IPv6 Type 0 Routing header, the number
   of segments must be between 0 and 127, inclusive.  The return value
   is just the space for the Routing header.  When the application uses
   ancillary data it must pass the returned length to CMSG_SPACE() to
   determine how much memory is needed for the ancillary data object
   (including the cmsghdr structure).

   If the return value is 0, then either the type of the Routing header
   is not supported by this implementation or the number of segments is
   invalid for this type of Routing header.

   (Note: This function returns the size but does not allocate the space
   required for the ancillary data.  This allows an application to
   allocate a larger buffer, if other ancillary data objects are
   desired, since all the ancillary data objects must be specified to
   sendmsg() as a single msg_control buffer.)

7.2.  inet6_rth_init

      void *inet6_rth_init(void *bp, socklen_t bp_len, int type,
                           int segments);

   This function initializes the buffer pointed to by bp to contain a
   Routing header of the specified type and sets ip6r_len based on the
   segments parameter.  bp_len is only used to verify that the buffer is
   large enough.  The ip6r_segleft field is set to zero; inet6_rth_add()
   will increment it.

   When the application uses ancillary data the application must
   initialize any cmsghdr fields.

   The caller must allocate the buffer and its size can be determined by
   calling inet6_rth_space().

   Upon success the return value is the pointer to the buffer (bp), and
   this is then used as the first argument to the inet6_rth_add()
   function.  Upon an error the return value is NULL.

Top      Up      ToC       Page 36 
7.3.  inet6_rth_add

      int inet6_rth_add(void *bp, const struct in6_addr *addr);

   This function adds the IPv6 address pointed to by addr to the end of
   the Routing header being constructed.

   If successful, the segleft member of the Routing Header is updated to
   account for the new address in the Routing header and the return
   value of the function is 0.  Upon an error the return value of the
   function is -1.

7.4.  inet6_rth_reverse

      int inet6_rth_reverse(const void *in, void *out);

   This function takes a Routing header extension header (pointed to by
   the first argument) and writes a new Routing header that sends
   datagrams along the reverse of that route.  The function reverses the
   order of the addresses and sets the segleft member in the new Routing
   header to the number of segments.  Both arguments are allowed to
   point to the same buffer (that is, the reversal can occur in place).

   The return value of the function is 0 on success, or -1 upon an
   error.

7.5.  inet6_rth_segments

      int inet6_rth_segments(const void *bp);

   This function returns the number of segments (addresses) contained in
   the Routing header described by bp.  On success the return value is
   zero or greater.  The return value of the function is -1 upon an
   error.

7.6.  inet6_rth_getaddr

      struct in6_addr *inet6_rth_getaddr(const void *bp, int index);

   This function returns a pointer to the IPv6 address specified by
   index (which must have a value between 0 and one less than the value
   returned by inet6_rth_segments()) in the Routing header described by
   bp.  An application should first call inet6_rth_segments() to obtain
   the number of segments in the Routing header.

   Upon an error the return value of the function is NULL.

Top      Up      ToC       Page 37 
8.  Hop-By-Hop Options

   A variable number of Hop-by-Hop options can appear in a single Hop-
   by-Hop options header.  Each option in the header is TLV-encoded with
   a type, length, and value.  This IPv6 API defines seven functions
   that the application calls to build and examine a Hop-by_Hop options
   header, and the ability to use sticky options or ancillary data to
   communicate this information between the application and the kernel.
   This uses the IPV6_HOPOPTS for a Hop-by-Hop options header.

   Today several Hop-by-Hop options are defined for IPv6.  Two pad
   options, Pad1 and PadN, are for alignment purposes and are
   automatically inserted by the inet6_opt_XXX() routines and ignored by
   the inet6_opt_XXX() routines on the receive side.  This section of
   the API is therefore defined for other (and future) Hop-by-Hop
   options that an application may need to specify and receive.

   Four functions build an options header:

      inet6_opt_init()     - initialize buffer data for options header
      inet6_opt_append()   - add one TLV option to the options header
      inet6_opt_finish()   - finish adding TLV options to the options
                             header
      inet6_opt_set_val()  - add one component of the option content to
                             the option

      Three functions deal with a returned options header:

      inet6_opt_next()     - extract the next option from the options
                             header
      inet6_opt_find()     - extract an option of a specified type from
                             the header
      inet6_opt_get_val()  - retrieve one component of the option
                             content

   Individual Hop-by-Hop options (and Destination options, which are
   described in Section 9 and are very similar to the Hop-by-Hop
   options) may have specific alignment requirements.  For example, the
   4-byte Jumbo Payload length should appear on a 4-byte boundary, and
   IPv6 addresses are normally aligned on an 8-byte boundary.  These
   requirements and the terminology used with these options are
   discussed in Section 4.2 and Appendix B of [RFC-2460].  The alignment
   of first byte of each option is specified by two values, called x and
   y, written as "xn + y".  This states that the option must appear at
   an integer multiple of x bytes from the beginning of the options
   header (x can have the values 1, 2, 4, or 8), plus y bytes (y can
   have a value between 0 and 7, inclusive).  The Pad1 and PadN options
   are inserted as needed to maintain the required alignment.  The

Top      Up      ToC       Page 38 
   functions below need to know the alignment of the end of the option
   (which is always in the form "xn," where x can have the values 1, 2,
   4, or 8) and the total size of the data portion of the option.  These
   are passed as the "align" and "len" arguments to inet6_opt_append().

   Multiple Hop-by-Hop options must be specified by the application by
   placing them in a single extension header.

   Finally, we note that use of some Hop-by-Hop options or some
   Destination options, might require special privilege.  That is,
   normal applications (without special privilege) might be forbidden
   from setting certain options in outgoing packets, and might never see
   certain options in received packets.

8.1.  Receiving Hop-by-Hop Options

   To receive a Hop-by-Hop options header the application must enable
   the IPV6_RECVHOPOPTS socket option:

      int  on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVHOPOPTS, &on, sizeof(on));

   When using ancillary data a Hop-by-hop options header is passed
   between the application and the kernel as follows: The cmsg_level
   member will be IPPROTO_IPV6 and the cmsg_type member will be
   IPV6_HOPOPTS.  These options are then processed by calling the
   inet6_opt_next(), inet6_opt_find(), and inet6_opt_get_val()
   functions, described in Section 10.

8.2.  Sending Hop-by-Hop Options

   To send a Hop-by-Hop options header, the application specifies the
   header either as ancillary data in a call to sendmsg() or using
   setsockopt().

   The application can remove any sticky Hop-by-Hop options header by
   calling setsockopt() for IPV6_HOPOPTS with a zero option length.

   All the Hop-by-Hop options must be specified by a single ancillary
   data object.  The cmsg_level member is set to IPPROTO_IPV6 and the
   cmsg_type member is set to IPV6_HOPOPTS.  The option is normally
   constructed using the inet6_opt_init(), inet6_opt_append(),
   inet6_opt_finish(), and inet6_opt_set_val() functions, described in
   Section 10.

   Additional errors may be possible from sendmsg() and setsockopt() if
   the specified option is in error.

Top      Up      ToC       Page 39 
9.  Destination Options

   A variable number of Destination options can appear in one or more
   Destination options headers.  As defined in [RFC-2460], a Destination
   options header appearing before a Routing header is processed by the
   first destination plus any subsequent destinations specified in the
   Routing header, while a Destination options header that is not
   followed by a Routing header is processed only by the final
   destination.  As with the Hop-by-Hop options, each option in a
   Destination options header is TLV-encoded with a type, length, and
   value.

9.1.  Receiving Destination Options

   To receive Destination options header the application must enable the
   IPV6_RECVDSTOPTS socket option:

      int  on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVDSTOPTS, &on, sizeof(on));

   Each Destination options header is returned as one ancillary data
   object described by a cmsghdr structure with cmsg_level set to
   IPPROTO_IPV6 and cmsg_type set to IPV6_DSTOPTS.

   These options are then processed by calling the inet6_opt_next(),
   inet6_opt_find(), and inet6_opt_get_value() functions.

9.2.  Sending Destination Options

   To send a Destination options header, the application specifies it
   either as ancillary data in a call to sendmsg() or using
   setsockopt().

   The application can remove any sticky Destination options header by
   calling setsockopt() for IPV6_RTHDRDSTOPTS/IPV6_DSTOPTS with a zero
   option length.

   This API assumes the ordering about extension headers as described in
   [RFC-2460].  Thus, one set of Destination options can only appear
   before a Routing header, and one set can only appear after a Routing
   header (or in a packet with no Routing header).  Each set can consist
   of one or more options but each set is a single extension header.

   Today all destination options that an application may want to specify
   can be put after (or without) a Routing header.  Thus, applications
   should usually need IPV6_DSTOPTS only and should avoid using
   IPV6_RTHDRDSTOPTS whenever possible.

Top      Up      ToC       Page 40 
   When using ancillary data a Destination options header is passed
   between the application and the kernel as follows: The set preceding
   a Routing header are specified with the cmsg_level member set to
   IPPROTO_IPV6 and the cmsg_type member set to IPV6_RTHDRDSTOPTS.  Any
   setsockopt or ancillary data for IPV6_RTHDRDSTOPTS is silently
   ignored when sending packets unless a Routing header is also
   specified.  Note that the "Routing header" here means the one
   specified by this API.  Even when the kernel inserts a routing header
   in its internal routine (e.g., in a mobile IPv6 stack), the
   Destination options header specified by IPV6_RTHDRDSTOPTS will still
   be ignored unless the application explicitly specifies its own
   Routing header.

   The set of Destination options after a Routing header, which are also
   used when no Routing header is present, are specified with the
   cmsg_level member is set to IPPROTO_IPV6 and the cmsg_type member is
   set to IPV6_DSTOPTS.

   The Destination options are normally constructed using the
   inet6_opt_init(), inet6_opt_append(), inet6_opt_finish(), and
   inet6_opt_set_val() functions, described in Section 10.

   Additional errors may be possible from sendmsg() and setsockopt() if
   the specified option is in error.

10.  Hop-by-Hop and Destination Options Processing

   Building and parsing the Hop-by-Hop and Destination options is
   complicated for the reasons given earlier.  We therefore define a set
   of functions to help the application.  These functions assume the
   formatting rules specified in Appendix B in [RFC-2460] i.e., that the
   largest field is placed last in the option.

   The function prototypes for these functions are defined as a result
   of including <netinet/in.h>.

   The first 3 functions (init, append, and finish) are used both to
   calculate the needed buffer size for the options, and to actually
   encode the options once the application has allocated a buffer for
   the header.  In order to only calculate the size the application must
   pass a NULL extbuf and a zero extlen to those functions.

Top      Up      ToC       Page 41 
10.1.  inet6_opt_init

      int inet6_opt_init(void *extbuf, socklen_t extlen);

   This function returns the number of bytes needed for the empty
   extension header i.e., without any options.  If extbuf is not NULL it
   also initializes the extension header to have the correct length
   field.  In that case if the extlen value is not a positive (i.e.,
   non-zero) multiple of 8 the function fails and returns -1.

   (Note: since the return value on success is based on a "constant"
   parameter, i.e., the empty extension header, an implementation may
   return a constant value.  However, this specification does not
   require the value be constant, and leaves it as implementation
   dependent.  The application should not assume a particular constant
   value as a successful return value of this function.)

10.2.  inet6_opt_append

      int inet6_opt_append(void *extbuf, socklen_t extlen, int offset,
                           uint8_t type, socklen_t len, uint_t align,
                           void **databufp);

   Offset should be the length returned by inet6_opt_init() or a
   previous inet6_opt_append().  This function returns the updated total
   length taking into account adding an option with length 'len' and
   alignment 'align'.  If extbuf is not NULL then, in addition to
   returning the length, the function inserts any needed pad option,
   initializes the option (setting the type and length fields) and
   returns a pointer to the location for the option content in databufp.
   If the option does not fit in the extension header buffer the
   function returns -1.

   Type is the 8-bit option type.  Len is the length of the option data
   (i.e., excluding the option type and option length fields).

   Once inet6_opt_append() has been called the application can use the
   databuf directly, or use inet6_opt_set_val() to specify the content
   of the option.

   The option type must have a value from 2 to 255, inclusive.  (0 and 1
   are reserved for the Pad1 and PadN options, respectively.)

   The option data length must have a value between 0 and 255,
   inclusive, and is the length of the option data that follows.

   The align parameter must have a value of 1, 2, 4, or 8.  The align
   value can not exceed the value of len.

Top      Up      ToC       Page 42 
10.3.  inet6_opt_finish

      int inet6_opt_finish(void *extbuf, socklen_t extlen, int offset);

   Offset should be the length returned by inet6_opt_init() or
   inet6_opt_append().  This function returns the updated total length
   taking into account the final padding of the extension header to make
   it a multiple of 8 bytes.  If extbuf is not NULL the function also
   initializes the option by inserting a Pad1 or PadN option of the
   proper length.

   If the necessary pad does not fit in the extension header buffer the
   function returns -1.

10.4.  inet6_opt_set_val

      int inet6_opt_set_val(void *databuf, int offset, void *val,
                            socklen_t vallen);

   Databuf should be a pointer returned by inet6_opt_append().  This
   function inserts data items of various sizes in the data portion of
   the option.  Val should point to the data to be inserted.  Offset
   specifies where in the data portion of the option the value should be
   inserted; the first byte after the option type and length is accessed
   by specifying an offset of zero.

   The caller should ensure that each field is aligned on its natural
   boundaries as described in Appendix B of [RFC-2460], but the function
   must not rely on the caller's behavior.  Even when the alignment
   requirement is not satisfied, inet6_opt_set_val should just copy the
   data as required.

   The function returns the offset for the next field (i.e., offset +
   vallen) which can be used when composing option content with multiple
   fields.

10.5.  inet6_opt_next

      int inet6_opt_next(void *extbuf, socklen_t extlen, int offset,
                         uint8_t *typep, socklen_t *lenp,
                         void **databufp);

   This function parses received option extension headers returning the
   next option.  Extbuf and extlen specifies the extension header.
   Offset should either be zero (for the first option) or the length
   returned by a previous call to inet6_opt_next() or inet6_opt_find().
   It specifies the position where to continue scanning the extension
   buffer.  The next option is returned by updating typep, lenp, and

Top      Up      ToC       Page 43 
   databufp.  Typep stores the option type, lenp stores the length of
   the option data (i.e., excluding the option type and option length
   fields), and databufp points the data field of the option.  This
   function returns the updated "previous" length computed by advancing
   past the option that was returned.  This returned "previous" length
   can then be passed to subsequent calls to inet6_opt_next().  This
   function does not return any PAD1 or PADN options.  When there are no
   more options or if the option extension header is malformed the
   return value is -1.

10.6.  inet6_opt_find

      int inet6_opt_find(void *extbuf, socklen_t extlen, int offset,
                         uint8_t type, socklen_t *lenp,
                         void **databufp);

   This function is similar to the previously described inet6_opt_next()
   function, except this function lets the caller specify the option
   type to be searched for, instead of always returning the next option
   in the extension header.

   If an option of the specified type is located, the function returns
   the updated "previous" total length computed by advancing past the
   option that was returned and past any options that didn't match the
   type.  This returned "previous" length can then be passed to
   subsequent calls to inet6_opt_find() for finding the next occurrence
   of the same option type.

   If an option of the specified type is not located, the return value
   is -1.  If the option extension header is malformed, the return value
   is -1.

10.7.  inet6_opt_get_val

      int inet6_opt_get_val(void *databuf, int offset, void *val,
                            socklen_t vallen);

   Databuf should be a pointer returned by inet6_opt_next() or
   inet6_opt_find().  This function extracts data items of various sizes
   in the data portion of the option.  Val should point to the
   destination for the extracted data.  Offset specifies from where in
   the data portion of the option the value should be extracted; the
   first byte after the option type and length is accessed by specifying
   an offset of zero.

   It is expected that each field is aligned on its natural boundaries
   as described in Appendix B of [RFC-2460], but the function must not
   rely on the alignment.

Top      Up      ToC       Page 44 
   The function returns the offset for the next field (i.e., offset +
   vallen) which can be used when extracting option content with
   multiple fields.

11.  Additional Advanced API Functions

11.1.  Sending with the Minimum MTU

   Unicast applications should usually let the kernel perform path MTU
   discovery [RFC-1981], as long as the kernel supports it, and should
   not care about the path MTU.  Some applications, however, might not
   want to incur the overhead of path MTU discovery, especially if the
   applications only send a single datagram to a destination.  A
   potential example is a DNS server.

   [RFC-1981] describes how path MTU discovery works for multicast
   destinations.  From practice in using IPv4 multicast, however, many
   careless applications that send large multicast packets on the wire
   have caused implosion of ICMPv4 error messages.  The situation can be
   worse when there is a filtering node that blocks the ICMPv4 messages.
   Though the filtering issue applies to unicast as well, the impact is
   much larger in the multicast cases.

   Thus, applications sending multicast traffic should explicitly enable
   path MTU discovery only when they understand that the benefit of
   possibly larger MTU usage outweighs the possible impact of MTU
   discovery for active sources across the delivery tree(s).  This
   default behavior is based on the today's practice with IPv4 multicast
   and path MTU discovery.  The behavior may change in the future once
   it is found that path MTU discovery effectively works with actual
   multicast applications and network configurations.

   This specification defines a mechanism to avoid path MTU discovery by
   sending at the minimum IPv6 MTU [RFC-2460].  If the packet is larger
   than the minimum MTU and this feature has been enabled the IP layer
   will fragment to the minimum MTU.  To control the policy about path
   MTU discovery, applications can use the IPV6_USE_MIN_MTU socket
   option.

   As described above, the default policy should depend on whether the
   destination is unicast or multicast.  For unicast destinations path
   MTU discovery should be performed by default.  For multicast
   destinations path MTU discovery should be disabled by default.  This
   option thus takes the following three types of integer arguments:

   -1: perform path MTU discovery for unicast destinations but do not
       perform it for multicast destinations.  Packets to multicast
       destinations are therefore sent with the minimum MTU.

Top      Up      ToC       Page 45 
   0: always perform path MTU discovery.

   1: always disable path MTU discovery and send packets at the minimum
       MTU.

   The default value of this option is -1.  Values other than -1, 0, and
   1 are invalid, and an error EINVAL will be returned for those values.

   As an example, if a unicast application intentionally wants to
   disable path MTU discovery, it will add the following lines:

      int  on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_USE_MIN_MTU, &on, sizeof(on));

   Note that this API intentionally excludes the case where the
   application wants to perform path MTU discovery for multicast but to
   disable it for unicast.  This is because such usage is not feasible
   considering a scale of performance issues around whether to do path
   MTU discovery or not.  When path MTU discovery makes sense to a
   destination but not to a different destination, regardless of whether
   the destination is unicast or multicast, applications either need to
   toggle the option between sending such packets on the same socket, or
   use different sockets for the two classes of destinations.

   This option can also be sent as ancillary data.  In the cmsghdr
   structure containing this ancillary data, the cmsg_level member will
   be IPPROTO_IPV6, the cmsg_type member will be IPV6_USE_MIN_MTU, and
   the first byte of cmsg_data[] will be the first byte of the integer.

11.2.  Sending without Fragmentation

   In order to provide for easy porting of existing UDP and raw socket
   applications IPv6 implementations will, when originating packets,
   automatically insert a fragment header in the packet if the packet is
   too big for the path MTU.

   Some applications might not want this behavior.  An example is
   traceroute which might want to discover the actual path MTU.

   This specification defines a mechanism to turn off the automatic
   inserting of a fragment header for UDP and raw sockets.  This can be
   enabled using the IPV6_DONTFRAG socket option.

      int on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_DONTFRAG, &on, sizeof(on));

Top      Up      ToC       Page 46 
   By default, this socket option is disabled.  Setting the value to 0
   also disables the option i.e., reverts to the default behavior of
   automatic inserting.  This option can also be sent as ancillary data.
   In the cmsghdr structure containing this ancillary data, the
   cmsg_level member will be IPPROTO_IPV6, the cmsg_type member will be
   IPV6_DONTFRAG, and the first byte of cmsg_data[] will be the first
   byte of the integer.  This API only specifies the use of this option
   for UDP and raw sockets, and does not define the usage for TCP
   sockets.

   When the data size is larger than the MTU of the outgoing interface,
   the packet will be discarded.  Applications can know the result by
   enabling the IPV6_RECVPATHMTU option described below and receiving
   the corresponding ancillary data items.  An additional error EMSGSIZE
   may also be returned in some implementations.  Note, however, that
   some other implementations might not be able to return this
   additional error when sending a message.

11.3.  Path MTU Discovery and UDP

   UDP and raw socket applications need to be able to  determine the
   "maximum send transport-message size" (Section 5.1 of [RFC-1981]) to
   a given destination so that those applications can participate in
   path MTU discovery.  This lets those applications send smaller
   datagrams to the destination, avoiding fragmentation.

   This is accomplished using a new ancillary data item (IPV6_PATHMTU)
   which is delivered to recvmsg() without any actual data.  The
   application can enable the receipt of IPV6_PATHMTU ancillary data
   items by setting the IPV6_RECVPATHMTU socket option.

      int  on = 1;
      setsockopt(fd, IPPROTO_IPV6, IPV6_RECVPATHMTU, &on, sizeof(on));

   By default, this socket option is disabled.  Setting the value to 0
   also disables the option.  This API only specifies the use of this
   option for UDP and raw sockets, and does not define the usage for TCP
   sockets.

   When the application is sending packets too big for the path MTU
   recvmsg() will return zero (indicating no data) but there will be a
   cmsghdr with cmsg_type set to IPV6_PATHMTU, and cmsg_len will
   indicate that cmsg_data is sizeof(struct ip6_mtuinfo) bytes long.
   This can happen when the sending node receives a corresponding ICMPv6
   packet too big error, or when the packet is sent from a socket with
   the IPV6_DONTFRAG option being on and the packet size is larger than
   the MTU of the outgoing interface.  This indication is considered as
   an ancillary data item for a separate (empty) message.  Thus, when

Top      Up      ToC       Page 47 
   there are buffered messages (i.e., messages that the application has
   not received yet) on the socket the application will first receive
   the buffered messages and then receive the indication.

   The first byte of cmsg_data[] will point to a struct ip6_mtuinfo
   carrying the path MTU to use together with the IPv6 destination
   address.

      struct ip6_mtuinfo {
        struct sockaddr_in6 ip6m_addr; /* dst address including
                                          zone ID */
        uint32_t            ip6m_mtu;  /* path MTU in host byte order */
      };

   This cmsghdr will be passed to every socket that sets the
   IPV6_RECVPATHMTU socket option, even if the socket is non-connected.
   Note that this also means an application that sets the option may
   receive an IPV6_MTU ancillary data item for each ICMP too big error
   the node receives, including such ICMP errors caused by other
   applications on the node.  Thus, an application that wants to perform
   the path MTU discovery by itself needs to keep history of
   destinations that it has actually sent to and to compare the address
   returned in the ip6_mtuinfo structure to the history.  An
   implementation may choose not to delivery data to a connected socket
   that has a foreign address that is different than the address
   specified in the ip6m_addr structure.

   When an application sends a packet with a routing header, the final
   destination stored in the ip6m_addr member does not necessarily
   contain complete information of the entire path.

11.4.  Determining the Current Path MTU

   Some applications might need to determine the current path MTU e.g.,
   applications using IPV6_RECVPATHMTU might want to pick a good
   starting value.

   This specification defines a get-only socket option to retrieve the
   current path MTU value for the destination of a given connected
   socket.  If the IP layer does not have a cached path MTU value it
   will return the interface MTU for the interface that will be used
   when sending to the destination address.

   This information is retrieved using the IPV6_PATHMTU socket option.
   This option takes a pointer to the ip6_mtuinfo structure as the
   fourth argument, and the size of the structure should be passed as a
   value-result parameter in the fifth argument.

Top      Up      ToC       Page 48 
      struct ip6_mtuinfo mtuinfo;
      socklen_t infolen = sizeof(mtuinfo);

      getsockopt(fd, IPPROTO_IPV6, IPV6_PATHMTU, &mtuinfo, &infolen);

   When the call succeeds, the path MTU value is stored in the ip6m_mtu
   member of the ip6_mtuinfo structure.  Since the socket is connected,
   the ip6m_addr member is meaningless and should not be referred to by
   the application.

   This option can only be used for a connected socket, because a non-
   connected socket does not have the information of the destination and
   there is no way to pass the destination via getsockopt().  When
   getsockopt() for this option is issued on a non-connected socket, the
   call will fail.  Despite this limitation, this option is still useful
   from a practical point of view, because applications that care about
   the path MTU tend to send a lot of packets to a single destination
   and to connect the socket to the destination for performance reasons.
   If the application needs to get the MTU value in a more generic way,
   it should use a more generic interface, such as routing sockets
   [TCPIPILLUST].



(page 48 continued on part 3)

Next RFC Part