Shim6 layer needs to first identify which of the incoming packets need to be translated and then perform the mapping between locators and ULIDs using the associated context. Such operation is called "demultiplexing". It should be noted that, because any address can be used both as a locator and as a ULID, additional information, other than the addresses carried in packets, needs to be taken into account for this operation. For example, if a host has addresses A1 and A2 and starts communicating with a peer with addresses B1 and B2, then some communication (connections) might use the pair <A1, B1> as ULID and others might use, for example, <A2, B2>. Initially there are no failures, so these address pairs are used as locators, i.e., in the IP address fields in the packets on the wire. But when there is a failure, the Shim6 layer on A might decide to send packets that used <A1, B1> as ULIDs using <A2, B2> as the locators. In this case, B needs to be able to rewrite the IP address field for some packets and not others, but the packets all have the same locator pair. In order to accomplish the demultiplexing operation successfully, data packets carry a Context Tag that allows the receiver of the packet to determine the shim context to be used to perform the operation. Two mechanisms for carrying the Context Tag information have been considered in depth during the shim protocol design: those carrying the Context Tag in the Flow Label field of the IPv6 header and those using a new Extension header to carry the Context Tag. In this appendix, we will describe the pros and cons of each mechanism and justify the selected option.
Suppose that two different contexts are established between Host A and Host B. Context #1 is using IPA1 and IPB1 as ULIDs. The locator set associated to IPA1 is IPA1 and IPA2, while the locator set associated to IPB1 is just IPB1. Context #2 uses IPA3 and IPB2 as ULIDs. The locator set associated to IPA3 is IPA3 and IPA4, while the locator set associated to IPB2 is just IPB2. Because the locator sets of Context #1 and Context #2 are disjoint, hosts could think that the same Context Tag value can be assigned to both of them. The problem arrives when, later on, IPA3 is added as a valid locator for IPA1 in Context #2 and IPB2 is added as a valid locator for IPB1 in Context #1. In this case, the triple <Flow Label, Source Locator, Destination Locator> would not identify a unique context anymore, and correct demultiplexing is no longer possible. A possible approach to overcome this limitation is to simply not repeat the Flow Label values for any communication established in a host. This basically means that each time a new communication that is using different ULIDs is established, a new Flow Label value is assigned to it. By these means, each communication that is using different ULIDs can be differentiated because each has a different Flow Label value. The problem with such an approach is that it requires the receiver of the communication to allocate the Flow Label value used for incoming packets, in order to assign them uniquely. For this, a shim negotiation of the Flow Label value to use in the communication is needed before exchanging data packets. This poses problems with non- Shim6-capable hosts, since they would not be able to negotiate an acceptable value for the Flow Label. This limitation can be lifted by marking the packets that belong to shim sessions from those that do not. These markings would require at least a bit in the IPv6 header that is not currently available, so more creative options would be required, for instance, using new Next Header values to indicate that the packet belongs to a Shim6-enabled communication and that the Flow Label carries context information as proposed in . However, even if new Next Header values are used in this way, such an approach is incompatible with the deferred-establishment capability of the shim protocol, which is a preferred function since it suppresses delay due to shim context establishment prior to the initiation of communication. Such capability also allows nodes to
define at which stage of the communication they decide, based on their own policies, that a given communication requires protection by the shim. In order to cope with the identified limitations, an alternative approach that does not constrain the Flow Label values that are used by communications using ULIDs equal to the locators (i.e., no shim translation) is to only require that different Flow Label values are assigned to different shim contexts. In such an approach, communications start with unmodified Flow Label usage (could be zero or as suggested in ). The packets sent after a failure when a different locator pair is used would use a completely different Flow Label, and this Flow Label could be allocated by the receiver as part of the shim context establishment. Since it is allocated during the context establishment, the receiver of the "failed over" packets can pick a Flow Label of its choosing (that is unique in the sense that no other context is using it as a Context Tag), without any performance impact, respecting that, for each locator pair, the Flow Label value used for a given locator pair doesn't change due to the operation of the multihoming shim. In this approach, the constraint is that Flow Label values being used as context identifiers cannot be used by other communications that use non-disjoint locator sets. This means that once a given Flow Label value has been assigned to a shim context that has a certain locator sets associated, the same value cannot be used for other communications that use an address pair that is contained in the locator sets of the context. This is a constraint in the potential Flow Label allocation strategies. A possible workaround to this constraint is to mark shim packets that require translation, in order to differentiate them from regular IPv6 packets, using the artificial Next Header values described above. In this case, the Flow Label values constrained are only those of the packets that are being translated by the shim. This last approach would be the preferred approach if the Context Tag is to be carried in the Flow Label field. This is the case not only because it imposes the minimum constraints to the Flow Label allocation strategies, limiting the restrictions only to those packets that need to be translated by the shim, but also because context-loss detection mechanisms greatly benefit from the fact that shim data packets are identified as such, allowing the receiving end to identify if a shim context associated to a received packet is supposed to exist, as will be discussed in the context-loss detection appendix below.
These mechanisms basically consist in each end of the context that periodically sends a packet containing context-specific information to the other end. Upon reception of such packets, the receiver verifies that the required context exists. In the case that the context does not exist, it sends a packet notifying the sender of the problem. An obvious alternative for this would be to create a specific context keepalive exchange, which consists in periodically sending packets with this purpose. This option was considered and discarded because it seemed an overkill to define a new packet exchange to deal with this issue. Another alternative is to piggyback the context-loss detection function in other existent packet exchanges. In particular, both shim control and data packets can be used for this. Shim control packets can be trivially used for this because they carry context-specific information. This way, when a node receives one such packet, it will verify if the context exists. However, shim control frequency may not be adequate for context-loss detection since control packet exchanges can be very limited for a session in certain scenarios. Data packets, on the other hand, are expected to be exchanged with a higher frequency but do not necessarily carry context-specific information. In particular, packets flowing before a locator change (i.e., a packet carrying the ULIDs in the address fields) do not need context information since they do not need any shim processing. Packets that carry locators that differ from the ULIDs carry context information. However, we need to make a distinction here between the different approaches considered to carry the Context Tag -- in particular, between those approaches where packets are explicitly marked as shim packets and those approaches where packets are not marked as such. For instance, in the case where the Context Tag is carried in the Flow Label and packets are not marked as shim packets (i.e., no new Next Header values are defined for shim), a receiver that has lost the associated context is not able to detect that the packet is associated with a missing context. The result is that the packet will be passed unchanged to the upper-layer protocol, which in turn will probably silently discard it due to a checksum error. The resulting behavior is that the context loss is undetected. This is one additional reason to discard an approach that carries the Context Tag in the Flow Label field and does not explicitly mark the shim packets as such. On the other hand, approaches that mark shim data packets (like those that use the Extension header or the Flow Label
with new Next Header values) allow the receiver to detect if the context associated to the received packet is missing. In this case, data packets also perform the function of a context-loss detection exchange. However, it must be noted that only those packets that carry a locator that differs from the ULID are marked. This basically means that context loss will be detected after an outage has occurred, i.e., alternative locators are being used. Summarizing, the proposed context-loss detection mechanisms use shim control packets and Shim6 Payload Extension headers to detect context loss. Shim control packets detect context loss during the whole lifetime of the context, but the expected frequency in some cases is very low. On the other hand, Shim6 Payload Extension headers have a higher expected frequency in general, but they only detect context loss after an outage. This behavior implies that it will be common that context loss is detected after a failure, i.e., once it is actually needed. Because of that, a mechanism for recovering from context loss is required if this approach is used. Overall, the mechanism for detecting lost context would work as follows: the end that still has the context available sends a message referring to the context. Upon the reception of such message, the end that has lost the context identifies the situation and notifies the other end of the context-loss event by sending a packet containing the lost context information extracted from the received packet. One option is to simply send an error message containing the received packets (or at least as much of the received packet that the MTU allows to fit). One of the goals of this notification is to allow the other end that still retains context state to re-establish the lost context. The mechanism to re-establish the lost context consists in performing the 4-way initial handshake. This is a time- consuming exchange and, at this point, time may be critical since we are re-establishing a context that is currently needed (because context-loss detection may occur after a failure). So another option, which is the one used in this protocol, is to replace the error message with a modified R1 message so that the time required to perform the context-establishment exchange can be reduced. Upon the reception of this modified R1 message, the end that still has the context state can finish the context-establishment exchange and restore the lost context. 15]. The goal, in terms of
messages and making the parties think that the context has been lost); thus, the resulting situation may not differ that much from the cookie-based approach. Another option that was discussed during the design of this protocol was the possibility of using IPsec for protecting the shim protocol. Now, the problem under consideration in this scenario is how to securely bind an address that is being used as ULID with a locator set that can be used to exchange packets. The mechanism provided by IPsec to securely bind the address that is used with cryptographic keys is the usage of digital certificates. This implies that an IPsec-based solution would require a common and mutually trusted third party to generate digital certificates that bind the key and the ULID. Considering that the scope of application of the shim protocol is global, this would imply a global public key infrastructure (PKI). The major issues with this approach are the deployment difficulties associated with a global PKI. The other possibility would be to use some form of opportunistic IPSec, like Better-Than-Nothing-Security (BTNS) . However, this would still present some issues. In particular, this approach requires a leap- of-faith in order to bind a given address to the public key that is being used, which would actually prevent the most critical security feature that a Shim6 security solution needs to achieve from being provided: proving identifier ownership. On top of that, using IPsec would require to turn on per-packet AH/ESP just for multihoming to occur. In general, SHIM6 was expected to work between pairs of hosts that have no prior arrangement, security association, or common, trusted third party. It was also seen as undesirable to have to turn on per- packet AH/ESP just for the multihoming to occur. However, Shim6 should work and have an additional level of security where two hosts choose to use IPsec. Another design alternative would have employed some form of opportunistic or Better-Than-Nothing Security (BTNS) IPsec to perform these tasks with IPsec instead. Essentially, HIP in opportunistic mode is very similar to SHIM6, except that HIP uses IPsec, employs per-packet ESP, and introduces another set of identifiers. Finally, two different technologies were selected to protect the shim protocol: HBA  and CGA . These two techniques provide a similar level of protection but also provide different functionality with different computational costs. The HBA mechanism relies on the capability of generating all the addresses of a multihomed host as an unalterable set of intrinsically bound IPv6 addresses, known as an HBA set. In this approach,
addresses incorporate a cryptographic one-way hash of the prefix set available into the interface identifier part. The result is that the binding between all the available addresses is encoded within the addresses themselves, providing hijacking protection. Any peer using the shim protocol node can efficiently verify that the alternative addresses proposed for continuing the communication are bound to the initial address through a simple hash calculation. A limitation of the HBA technique is that, once generated, the address set is fixed and cannot be changed without also changing all the addresses of the HBA set. In other words, the HBA technique does not support dynamic addition of address to a previously generated HBA set. An advantage of this approach is that it requires only hash operations to verify a locator set, imposing very low computational cost to the protocol. In a CGA-based approach, the address used as ULID is a CGA that contains a hash of a public key in its interface identifier. The result is a secure binding between the ULID and the associated key pair. This allows each peer to use the corresponding private key to sign the shim messages that convey locator set information. The trust chain in this case is the following: the ULID used for the communication is securely bound to the key pair because it contains the hash of the public key, and the locator set is bound to the public key through the signature. The CGA approach then supports dynamic addition of new locators in the locator set, since in order to do that the node only needs to sign the new locator with the private key associated with the CGA used as ULID. A limitation of this approach is that it imposes systematic usage of public key cryptography with its associate computational cost. Either of these two mechanisms, HBA and CGA, provides time-shifted attack protection, since the ULID is securely bound to a locator set that can only be defined by the owner of the ULID. So the design decision adopted was that both mechanisms, HBA and CGA, are supported. This way, when only stable address sets are required, the nodes can benefit from the low computational cost offered by HBA, while when dynamic locator sets are required, this can be achieved through CGAs with an additional cost. Moreover, because HBAs are defined as a CGA extension, the addresses available in a node can simultaneously be CGAs and HBAs, allowing the usage of the HBA and CGA functionality when needed, without requiring a change in the addresses used.
A key goal for the design of this exchange was protection against DoS attacks. The attack under consideration was basically a situation where an attacker launches a great amount of ULID-pair establishment- request packets, exhausting the victim's resources similarly to TCP SYN flooding attacks. A 4-way handshake exchange protects against these attacks because the receiver does not create any state associated to a given context until the reception of the second packet, which contains prior- contact proof in the form of a token. At this point, the receiver can verify that at least the address used by the initiator is valid to some extent, since the initiator is able to receive packets at this address. In the worst case, the responder can track down the attacker using this address. The drawback of this approach is that it imposes a 4-packet exchange for any context establishment. This would be a great deal if the shim context needed to be established up front, before the communication can proceed. However, thanks to the deferred context-establishment capability of the shim protocol, this limitation has a reduced impact in the performance of the protocol. (However, it may have a greater impact in the situation of context recovery, as discussed earlier. However, in this case, it is possible to perform optimizations to reduce the number of packets as described above.) The other option considered was a 2-way handshake with the possibility to fall back to a 4-way handshake in case of attack. In this approach, the ULID-pair establishment exchange normally consists of a 2-packet exchange and does not verify that the initiator has performed a prior contact before creating context state. In case a DoS attack is detected, the responder falls back to a 4-way handshake similar to the one described previously, in order to prevent the detected attack from proceeding. The main difficulty with this attack is how to detect that a responder is currently under attack. It should be noted that, because this is a 2-way exchange, it is not possible to use the number of half-open sessions (as in TCP) to detect an ongoing attack; different heuristics need to be considered. The design decision taken was that, considering the current impact of DoS attacks and the low impact of the 4-way exchange in the shim protocol (thanks to the deferred context-establishment capability), a 4-way exchange would be adopted for the base protocol.
differences between the existing locator set and the new one. The atomic approach imposes additional overhead since all of the locator set has to be exchanged each time, while the differential approach requires re-synchronization of both ends through changes (i.e., requires that both ends have the same idea about what the current locator set is). Because of the difficulties imposed by the synchronization requirement, the atomic approach was selected. 20]. If an explicit CLOSE handshake and associated timer is used, then there would no longer be a need for the No Context Error message due to a peer having garbage collected at its end of the context. However, there is still potentially a need to have a No Context Error message in the case of a complete state loss of the peer (also known as a crash followed by a reboot). Only if we assume that the reboot takes at least the time of the CLOSE timer, or that it is okay to not provide complete service until CLOSE-timer minutes after the crash, can we completely do away with the No Context Error message. In addition, another aspect that is relevant for this design choice is the context confusion issue. In particular, using a unilateral approach to discard context state clearly opens up the possibility of context confusion, where one of the ends unilaterally discards the context state, while the other does not. In this case, the end that has discarded the state can re-use the Context Tag value used for the discarded state for another context, creating potential context confusion. In order to illustrate the cases where problems would arise, consider the following scenario: o Hosts A and B establish context 1 using CTA and CTB as Context Tags.
o Later on, A discards context 1 and the Context Tag value CTA becomes available for reuse. o However, B still keeps context 1. This would create context confusion in the following two cases: o A new context 2 is established between A and B with a different ULID pair (or Forked Instance Identifier), and A uses CTA as the Context Tag. If the locator sets used for both contexts are not disjoint, we have context confusion. o A new context is established between A and C, and A uses CTA as the Context Tag value for this new context. Later on, B sends Payload Extension header and/or control messages containing CTA, which could be interpreted by A as belonging to context 2 (if no proper care is taken). Again we have context confusion. One could think that using a coordinated approach would eliminate such context confusion, making the protocol much simpler. However, this is not the case, because even in the case of a coordinated approach using a CLOSE/CLOSE ACK exchange, there is still the possibility of a host rebooting without having the time to perform the CLOSE exchange. So, it is true that the coordinated approach eliminates the possibility of context confusion due to premature garbage collection, but it does not prevent the same situations due to a crash and reboot of one of the involved hosts. The result is that, even if we went for a coordinated approach, we would still need to deal with context confusion and provide the means to detect and recover from these situations.