RFC 5559

Pre-Congestion Notification (PCN) Architecture

Pages: 54
Informational
→ Errata

Part 3 of 3 – Pages 40 to 54

RFC5559 - Page 40 prevText

7.  Security Considerations

   Security considerations essentially come from the Trust Assumption
   Section 6.3.1, ie, that all PCN-nodes are PCN-enabled and are trusted
   for truthful PCN-metering and PCN-marking.  PCN splits functionality
   between PCN-interior-nodes and PCN-boundary-nodes, and the security
   considerations are somewhat different for each, mainly because PCN-
   boundary-nodes are flow-aware and PCN-interior-nodes are not.

   o  Because PCN-boundary-nodes are flow-aware, they are trusted to use
      that awareness correctly.  The degree of trust required depends on
      the kinds of decisions they have to make and the kinds of
      information they need to make them.  There is nothing specific to
      PCN.

   o  The PCN-ingress-nodes police packets to ensure a PCN-flow sticks
      within its agreed limit, and to ensure that only PCN-flows that
      have been admitted contribute PCN-traffic into the PCN-domain.
      The policer must drop (or perhaps downgrade to a different DSCP)
      any PCN-packets received that are outside this remit.  This is
      similar to the existing IntServ behaviour.  Between them, the PCN-
      boundary-nodes must encircle the PCN-domain; otherwise, PCN-
      packets could enter the PCN-domain without being subject to
      admission control, which would potentially destroy the QoS of
      existing flows.

   o  PCN-interior-nodes are not flow-aware.  This prevents some
      security attacks where an attacker targets specific flows in the
      data plane -- for instance, for DoS or eavesdropping.

RFC5559 - Page 41

   o  The PCN-boundary-nodes rely on correct PCN-marking by the PCN-
      interior-nodes.  For instance, a rogue PCN-interior-node could
      PCN-mark all packets so that no flows were admitted.  Another
      possibility is that it doesn't PCN-mark any packets, even when it
      is pre-congested.  More subtly, the rogue PCN-interior-node could
      perform these attacks selectively on particular flows, or it could
      PCN-mark the correct fraction overall but carefully choose which
      flows it marked.

   o  The PCN-boundary-nodes should be able to deal with DoS attacks and
      state exhaustion attacks based on fast changes in per-flow
      signalling.

   o  The signalling between the PCN-boundary-nodes must be protected
      from attacks.  For example, the recipient needs to validate that
      the message is indeed from the node that claims to have sent it.
      Possible measures include digest authentication and protection
      against replay and man-in-the-middle attacks.  For the RSVP
      protocol specifically, hop-by-hop authentication is in [RFC2747],
      and [Behringer09] may also be useful.

   Operational security advice is given in Section 5.5.

8.  Conclusions

   This document describes a general architecture for flow admission and
   termination based on pre-congestion information, in order to protect
   the quality of service of established, inelastic flows within a
   single Diffserv domain.  The main topic is the functional
   architecture.  This document also mentions other topics like the
   assumptions and open issues associated with the PCN architecture.

9.  Acknowledgements

   This document is a revised version of an earlier individual working
   draft authored by: P. Eardley, J. Babiarz, K. Chan, A. Charny, R.
   Geib, G. Karagiannis, M. Menth, and T. Tsou.  They are therefore
   contributors to this document.

   Thanks to those who have made comments on this document: Lachlan
   Andrew, Joe Babiarz, Fred Baker, David Black, Steven Blake, Ron
   Bonica, Scott Bradner, Bob Briscoe, Ross Callon, Jason Canon, Ken
   Carlberg, Anna Charny, Joachim Charzinski, Andras Csaszar, Francis
   Dupont, Lars Eggert, Pasi Eronen, Adrian Farrel, Ruediger Geib, Wei
   Gengyu, Robert Hancock, Fortune Huang, Christian Hublet, Cullen
   Jennings, Ingemar Johansson, Georgios Karagiannis, Hein Mekkes,
   Michael Menth, Toby Moncaster, Dimitri Papadimitriou, Dan Romascanu,
   Daisuke Satoh, Ben Strulo, Tom Taylor, Hannes Tschofenig, Tina Tsou,

RFC5559 - Page 42

   David Ward, Lars Westberg, Magnus Westerlund, and Delei Yu.  Thanks
   to Bob Briscoe who extensively revised the Operations and Management
   section.

   This document is the result of discussions in the PCN WG and
   forerunner activity in the TSVWG.  A number of previous drafts were
   presented to TSVWG; their authors were: B. Briscoe, P. Eardley, D.
   Songhurst, F. Le Faucheur, A. Charny, J. Babiarz, K. Chan, S. Dudley,
   G. Karagiannis, A. Bader, L. Westberg, J. Zhang, V. Liatsos, X-G.
   Liu, and A. Bhargava.

   The admission control mechanism evolved from the work led by Martin
   Karsten on the Guaranteed Stream Provider developed in the M3I
   project [Karsten02] [M3I], which in turn was based on the theoretical
   work of Gibbens and Kelly [Gibbens99].

10.  References

10.1.  Normative References

   [RFC2474]        Nichols, K., Blake, S., Baker, F., and D. Black,
                    "Definition of the Differentiated Services Field (DS
                    Field) in the IPv4 and IPv6 Headers", RFC 2474,
                    December 1998.

   [RFC3246]        Davie, B., Charny, A., Bennet, J., Benson, K., Le
                    Boudec, J., Courtney, W., Davari, S., Firoiu, V.,
                    and D. Stiliadis, "An Expedited Forwarding PHB (Per-
                    Hop Behavior)", RFC 3246, March 2002.

10.2.  Informative References

   [RFC1633]        Braden, B., Clark, D., and S. Shenker, "Integrated
                    Services in the Internet Architecture: an Overview",
                    RFC 1633, June 1994.

   [RFC2205]        Braden, B., Zhang, L., Berson, S., Herzog, S., and
                    S. Jamin, "Resource ReSerVation Protocol (RSVP) --
                    Version 1 Functional Specification", RFC 2205,
                    September 1997.

   [RFC2211]        Wroclawski, J., "Specification of the Controlled-
                    Load Network Element Service", RFC 2211,
                    September 1997.

   [RFC2475]        Blake, S., Black, D., Carlson, M., Davies, E., Wang,
                    Z., and W. Weiss, "An Architecture for
                    Differentiated Services", RFC 2475, December 1998.

RFC5559 - Page 43

   [RFC2747]        Baker, F., Lindell, B., and M. Talwar, "RSVP
                    Cryptographic Authentication", RFC 2747,
                    January 2000.

   [RFC2753]        Yavatkar, R., Pendarakis, D., and R. Guerin, "A
                    Framework for Policy-based Admission Control",
                    RFC 2753, January 2000.

   [RFC2983]        Black, D., "Differentiated Services and Tunnels",
                    RFC 2983, October 2000.

   [RFC2998]        Bernet, Y., Ford, P., Yavatkar, R., Baker, F.,
                    Zhang, L., Speer, M., Braden, R., Davie, B.,
                    Wroclawski, J., and E. Felstaine, "A Framework for
                    Integrated Services Operation over Diffserv
                    Networks", RFC 2998, November 2000.

   [RFC3168]        Ramakrishnan, K., Floyd, S., and D. Black, "The
                    Addition of Explicit Congestion Notification (ECN)
                    to IP", RFC 3168, September 2001.

   [RFC3270]        Le Faucheur, F., Wu, L., Davie, B., Davari, S.,
                    Vaananen, P., Krishnan, R., Cheval, P., and J.
                    Heinanen, "Multi-Protocol Label Switching (MPLS)
                    Support of Differentiated Services", RFC 3270,
                    May 2002.

   [RFC3393]        Demichelis, C. and P. Chimento, "IP Packet Delay
                    Variation Metric for IP Performance Metrics (IPPM)",
                    RFC 3393, November 2002.

   [RFC3411]        Harrington, D., Presuhn, R., and B. Wijnen, "An
                    Architecture for Describing Simple Network
                    Management Protocol (SNMP) Management Frameworks",
                    STD 62, RFC 3411, December 2002.

   [RFC3726]        Brunner, M., "Requirements for Signaling Protocols",
                    RFC 3726, April 2004.

   [RFC4216]        Zhang, R. and J. Vasseur, "MPLS Inter-Autonomous
                    System (AS) Traffic Engineering (TE) Requirements",
                    RFC 4216, November 2005.

   [RFC4301]        Kent, S. and K. Seo, "Security Architecture for the
                    Internet Protocol", RFC 4301, December 2005.

   [RFC4303]        Kent, S., "IP Encapsulating Security Payload (ESP)",
                    RFC 4303, December 2005.

RFC5559 - Page 44

   [RFC4594]        Babiarz, J., Chan, K., and F. Baker, "Configuration
                    Guidelines for DiffServ Service Classes", RFC 4594,
                    August 2006.

   [RFC4656]        Shalunov, S., Teitelbaum, B., Karp, A., Boote, J.,
                    and M. Zekauskas, "A One-way Active Measurement
                    Protocol (OWAMP)", RFC 4656, September 2006.

   [RFC4774]        Floyd, S., "Specifying Alternate Semantics for the
                    Explicit Congestion Notification (ECN) Field",
                    BCP 124, RFC 4774, November 2006.

   [RFC4778]        Kaeo, M., "Operational Security Current Practices in
                    Internet Service Provider Environments", RFC 4778,
                    January 2007.

   [RFC5129]        Davie, B., Briscoe, B., and J. Tay, "Explicit
                    Congestion Marking in MPLS", RFC 5129, January 2008.

   [RFC5462]        Andersson, L. and R. Asati, "Multiprotocol Label
                    Switching (MPLS) Label Stack Entry: "EXP" Field
                    Renamed to "Traffic Class" Field", RFC 5462,
                    February 2009.

   [P.800]          "Methods for subjective determination of
                    transmission quality", ITU-T Recommendation P.800,
                    August 1996.

   [Y.1541]         "Network Performance Objectives for IP-based
                    Services", ITU-T Recommendation Y.1541,
                    February 2006.

   [Babiarz06]      Babiarz, J., Chan, K., Karagiannis, G., and P.
                    Eardley, "SIP Controlled Admission and Preemption",
                    Work in Progress, October 2006.

   [Behringer09]    Behringer, M. and F. Le Faucheur, "Applicability of
                    Keying Methods for RSVP Security", Work in Progress,
                    March 2009.

   [Briscoe06]      Briscoe, B., Eardley, P., Songhurst, D., Le
                    Faucheur, F., Charny, A., Babiarz, J., Chan, K.,
                    Dudley, S., Karagiannis, G., Bader, A., and L.
                    Westberg, "An edge-to-edge Deployment Model for Pre-
                    Congestion Notification: Admission Control over a
                    Diffserv Region", Work in Progress, October 2006.

RFC5559 - Page 45

   [Briscoe08]      Briscoe, B., "Emulating Border Flow Policing using
                    Re-PCN on Bulk Data", Work in Progress,
                    September 2008.

   [Briscoe09]      Briscoe, B., "Tunnelling of Explicit Congestion
                    Notification", Work in Progress, March 2009.

   [Bryant08]       Bryant, S., Davie, B., Martini, L., and E.  Rosen,
                    "Pseudowire Congestion Control Framework", Work
                    in Progress, May 2008.

   [Charny07-1]     Charny, A., Babiarz, J., Menth, M., and X. Zhang,
                    "Comparison of Proposed PCN Approaches", Work
                    in Progress, November 2007.

   [Charny07-2]     Charny, A., Zhang, X., Le Faucheur, F., and V.
                    Liatsos, "Pre-Congestion Notification Using Single
                    Marking for Admission and Termination", Work
                    in Progress, November 2007.

   [Charny07-3]     Charny, A., "Email to PCN WG mailing list",
                    November 2007, <http://www1.ietf.org/mail-archive/
                    web/pcn/current/msg00871.html>.

   [Charny08]       Charny, A., "Email to PCN WG mailing list",
                    March 2008, <http://www1.ietf.org/mail-archive/web/
                    pcn/current/msg01359.html>.

   [Eardley07]      Eardley, P., "Email to PCN WG mailing list",
                    October 2007, <http://www1.ietf.org/mail-archive/
                    web/pcn/current/msg00831.html>.

   [Eardley09]      Eardley, P., "Metering and marking behaviour of PCN-
                    nodes", Work in Progress, May 2009.

   [Gibbens99]      Gibbens, R. and F. Kelly, "Distributed connection
                    acceptance control for a connectionless network",
                    Proceedings International Teletraffic Congress
                    (ITC16), Edinburgh, pp. 941-952, 1999.

   [Hancock02]      Hancock, R. and E. Hepworth, "Slide 14 of 'NSIS: An
                    Outline Framework for QoS Signalling'", May 2002, <h
                    ttp://www-nrc.nokia.com/sua/nsis/interim/
                    nsis-framework-outline.ppt>.

RFC5559 - Page 46

   [Iyer03]         Iyer, S., Bhattacharyya, S., Taft, N., and C. Diot,
                    "An approach to alleviate link overload as observed
                    on an IP backbone", IEEE INFOCOM, 2003,
                    <http://www.ieee-infocom.org/2003/papers/10_04.pdf>.

   [Karsten02]      Karsten, M. and J. Schmitt, "Admission Control Based
                    on Packet Marking and Feedback Signalling --
                    Mechanisms, Implementation and Experiments", TU-
                    Darmstadt Technical Report TR-KOM-2002-03, May 2002,
                    <http://www.kom.e-technik.tu-darmstadt.de/
                    publications/abstracts/KS02-5.html>.

   [Kumar01]        Kumar, A., Rastogi, R., Silberschatz, A., and B.
                    Yener, "Algorithms for Provisioning Virtual Private
                    Networks in the Hose Model", Proceedings ACM SIGCOMM
                    (ITC16), , 2001.

   [Lefaucheur06]   Le Faucheur, F., Charny, A., Briscoe, B., Eardley,
                    P., Babiarz, J., and K. Chan, "RSVP Extensions for
                    Admission Control over Diffserv using Pre-congestion
                    Notification (PCN)", Work in Progress, June 2006.

   [M3I]            "M3I - Market Managed Multiservice Internet",
                    <http://www.m3iproject.org/>.

   [Menth08-1]      Menth, M., Lehrieder, F., Eardley, P., Charny, A.,
                    and J. Babiarz, "Edge-Assisted Marked Flow
                    Termination", Work in Progress, February 2008.

   [Menth08-2]      Menth, M., Babiarz, J., Moncaster, T., and B.
                    Briscoe, "PCN Encoding for Packet-Specific Dual
                    Marking (PSDM)", Work in Progress, July 2008.

   [Menth09-1]      Menth, M. and M. Hartmann, "Threshold Configuration
                    and Routing Optimization for PCN-Based Resilient
                    Admission Control", Computer Networks, 2009,
                    <http://dx.doi.org/10.1016/j.comnet.2009.01.013>.

   [Menth09-2]      Menth, M., Lehrieder, F., Briscoe, B., Eardley, P.,
                    Moncaster, T., Babiarz, J., Chan, K., Charny, A.,
                    Karagiannis, G., Zhang, X., Taylor, T., Satoh, D.,
                    and R. Geib, "A Survey of PCN-Based Admission
                    Control and Flow Termination", IEEE
                    Communications Surveys and Tutorials, <http://
                    www3.informatik.uni-wuerzburg.de/staff/menth/
                    Publications/papers/Menth08-PCN-Overview.pdf>>.

RFC5559 - Page 47

   [Moncaster09-1]  Moncaster, T., Briscoe, B., and M. Menth, "Baseline
                    Encoding and Transport of Pre-Congestion
                    Information", Work in Progress, May 2009.

   [Moncaster09-2]  Moncaster, T., Briscoe, B., and M. Menth, "A PCN
                    encoding using 2 DSCPs to provide 3 or more states",
                    Work in Progress, April 2009.

   [Sarker08]       Sarker, Z. and I. Johansson, "Usecases and Benefits
                    of end to end ECN support in PCN Domains", Work
                    in Progress, November 2008.

   [Songhurst06]    Songhurst, DJ., Eardley, P., Briscoe, B., Di Cairano
                    Gilfedder, C., and J. Tay, "Guaranteed QoS Synthesis
                    for Admission Control with Shared Capacity", BT
                    Technical Report TR-CXR9-2006-001, Feburary 2006,
                    <http://www.cs.ucl.ac.uk/staff/
                    B.Briscoe/projects/ipe2eqos/gqs/papers/
                    GQS_shared_tr.pdf>.

   [Taylor09]       Charny, A., Huang, F., Menth, M., and T. Taylor,
                    "PCN Boundary Node Behaviour for the Controlled Load
                    (CL) Mode of Operation", Work in Progress,
                    March 2009.

   [Tsou08]         Tsou, T., Huang, F., and T. Taylor, "Applicability
                    Statement for the Use of Pre-Congestion Notification
                    in a Resource-Controlled Network", Work in Progress,
                    November 2008.

   [Westberg08]     Westberg, L., Bhargava, A., Bader, A., Karagiannis,
                    G., and H. Mekkes, "LC-PCN: The Load Control PCN
                    Solution", Work in Progress, November 2008.

RFC5559 - Page 48

Appendix A.  Possible Future Work Items

   This section mentions some topics that are outside the PCN WG's
   current charter but that have been mentioned as areas of interest.
   They might be work items for the PCN WG after a future re-chartering,
   some other IETF WG, another standards body, or an operator-specific
   usage that is not standardised.

   Note: It should be crystal clear that this section discusses
   possibilities only.

   The first set of possibilities relate to the restrictions described
   in Section 6.3:

   o  A single PCN-domain encompasses several autonomous systems that do
      not trust each other.  A possible solution is a mechanism like re-
      PCN [Briscoe08].

   o  Not all the nodes run PCN.  For example, the PCN-domain is a
      multi-site enterprise network.  The sites are connected by a VPN
      tunnel; although PCN doesn't operate inside the tunnel, the PCN
      mechanisms still work properly because of the good QoS on the
      virtual link (the tunnel).  Another example is that PCN is
      deployed on the general Internet (ie, widely but not universally
      deployed).

   o  Applying the PCN mechanisms to other types of traffic, ie, beyond
      inelastic traffic -- for instance, applying the PCN mechanisms to
      traffic scheduled with the Assured Forwarding per-hop behaviour.
      One example could be flow-rate adaptation by elastic applications
      that adapt according to the pre-congestion information.

   o  The aggregation assumption doesn't hold, because the link capacity
      is too low.  Measurement-based admission control is less accurate,
      with a greater risk of over-admission for instance.

   o  The applicability of PCN mechanisms for emergency use (911, GETS,
      WPS, MLPP, etc.).

   Other possibilities include:

   o  Probing.  This is discussed in Appendix A.1 below.

   o  The PCN-domain extends to the end users.  This scenario is
      described in [Babiarz06].  The end users need to be trusted to do
      their own policing.  If there is sufficient traffic, then the
      aggregation assumption may hold.  A variant is that the PCN-domain
      extends out as far as the LAN edge switch.

RFC5559 - Page 49

   o  Indicating pre-congestion through signalling messages rather than
      in-band (in the form of PCN-marked packets).

   o  The decision-making functionality is at a centralised node rather
      than at the PCN-boundary-nodes.  This requires that the PCN-
      egress-node signals PCN-feedback-information to the centralised
      node, and that the centralised node signals to the PCN-ingress-
      node the decision about admission (or termination).  Such
      possibility may need the centralised node and the PCN-boundary-
      nodes to be configured with each other's addresses.  The
      centralised case is described further in [Tsou08].

   o  Signalling extensions for specific protocols (eg, RSVP and NSIS)
      -- for example, the details of how the signalling protocol
      installs the flowspec at the PCN-ingress-node for an admitted PCN-
      flow, and how the signalling protocol carries the PCN-feedback-
      information.  Perhaps also for other functions such as for coping
      with failure of a PCN-boundary-node ([Briscoe06] considers what
      happens if RSVP is the QoS signalling protocol) and for
      establishing a tunnel across the PCN-domain if it is necessary to
      carry ECN marks transparently.

   o  Policing by the PCN-ingress-node may not be needed if the PCN-
      domain can trust that the upstream network has already policed the
      traffic on its behalf.

   o  PCN for Pseudowire.  PCN may be used as a congestion avoidance
      mechanism for edge-to-edge pseudowire emulations [Bryant08].

   o  PCN for MPLS.  [RFC3270] defines how to support the Diffserv
      architecture in MPLS (Multiprotocol Label Switching) networks.
      [RFC5129] describes how to add PCN for admission control of
      microflows into a set of MPLS aggregates.  PCN-marking is done in
      MPLS's EXP field (which [RFC5462] re-names the Class of Service
      (CoS) field).

   o  PCN for Ethernet.  Similarly, it may be possible to extend PCN
      into Ethernet networks, where PCN-marking is done in the Ethernet
      header.  Note: Specific consideration of this extension is outside
      of the IETF's remit.

RFC5559 - Page 50

A.1.  Probing

A.1.1.  Introduction

   Probing is a potential mechanism to assist admission control.

   PCN's admission control, as described so far, is essentially a
   reactive mechanism where the PCN-egress-node monitors the pre-
   congestion level for traffic from each PCN-ingress-node; if the level
   rises, then it blocks new flows on that ingress-egress-aggregate.
   However, it's possible that an ingress-egress-aggregate carries no
   traffic, and so the PCN-egress-node can't make an admission decision
   using the usual method described earlier.

   One approach is to be "optimistic" and simply admit the new flow.
   However, it's possible to envisage a scenario where the traffic
   levels on other ingress-egress-aggregates are already so high that
   they're blocking new PCN-flows, and admitting a new flow onto this
   "empty" ingress-egress-aggregate adds extra traffic onto a link that
   is already pre-congested.  This may 'tip the balance' so that PCN's
   flow termination mechanism is activated or some packets are dropped.
   This risk could be lessened by configuring, on each link, a
   sufficient 'safety margin' above the PCN-threshold-rate.

   An alternative approach is to make PCN a more proactive mechanism.
   The PCN-ingress-node explicitly determines, before admitting the
   prospective new flow, whether the ingress-egress-aggregate can
   support it.  This can be seen as a "pessimistic" approach, in
   contrast to the "optimism" of the approach above.  It involves
   probing: a PCN-ingress-node generates and sends probe packets in
   order to test the pre-congestion level that the flow would
   experience.

   One possibility is that a probe packet is just a dummy data packet,
   generated by the PCN-ingress-node and addressed to the PCN-egress-
   node.

A.1.2.  Probing Functions

   The probing functions are:

   o  Make the decision that probing is needed.  As described above,
      this is when the ingress-egress-aggregate (or the ECMP path -- see
      Section 6.4) carries no PCN-traffic.  An alternative is to always
      probe, ie, probe before admitting any PCN-flow.

RFC5559 - Page 51

   o  (if required) Communicate the request that probing is needed; the
      PCN-egress-node signals to the PCN-ingress-node that probing is
      needed.

   o  (if required) Generate probe traffic; the PCN-ingress-node
      generates the probe traffic.  The appropriate number (or rate) of
      probe packets will depend on the PCN-metering algorithm; for
      example, an excess-traffic-metering algorithm triggers fewer PCN-
      marks than a threshold-metering algorithm, and so will need more
      probe packets.

   o  Forward probe packets; as far as PCN-interior-nodes are concerned,
      probe packets are handled the same as (ordinary data) PCN-packets
      in terms of routing, scheduling, and PCN-marking.

   o  Consume probe packets; the PCN-egress-node consumes probe packets
      to ensure that they don't travel beyond the PCN-domain.

A.1.3.  Discussion of Rationale for Probing, Its Downsides and Open
        Issues

   It is an unresolved question whether probing is really needed, but
   two viewpoints have been put forward as to why it is useful.  The
   first is perhaps the most obvious: there is no PCN-traffic on the
   ingress-egress-aggregate.  The second assumes that multipath routing
   (eg, ECMP) is running in the PCN-domain.  We now consider each in
   turn.

   The first viewpoint assumes the following:

   o  There is no PCN-traffic on the ingress-egress-aggregate (so a
      normal admission decision cannot be made).

   o  Simply admitting the new flow has a significant risk of leading to
      overload: packets dropped or flows terminated.

   On the former bullet, [Eardley07] suggests that, during the future
   busy hour of a national network with about 100 PCN-boundary-nodes,
   there are likely to be significant numbers of aggregates with very
   few flows under nearly all circumstances.

   The latter bullet could occur if new flows start on many of the empty
   ingress-egress-aggregates, which together overload a link in the PCN-
   domain.  To be a problem, this would probably have to happen in a
   short time period (flash crowd) because, after the reaction time of
   the system, other (non-empty) ingress-egress-aggregates that pass
   through the link will measure pre-congestion and so block new flows.
   Also, flows naturally end anyway.

RFC5559 - Page 52

   The downsides of probing for this viewpoint are:

   o  Probing adds delay to the admission control process.

   o  Sufficient probing traffic has to be generated to test the pre-
      congestion level of the ingress-egress-aggregate.  But the probing
      traffic itself may cause pre-congestion, causing other PCN-flows
      to be blocked or even terminated -- and, in the flash crowd
      scenario, there will be probing on many ingress-egress-aggregates.

   The second viewpoint applies in the case where there is multipath
   routing (eg, ECMP) in the PCN-domain.  Note that ECMP is often used
   on core networks.  There are two possibilities:

   (1)  If admission control is based on measurements of the ingress-
        egress-aggregate, then the viewpoint that probing is useful
        assumes:

        *  There's a significant chance that the traffic is unevenly
           balanced across the ECMP paths and, hence, there's a
           significant risk of admitting a flow that should be blocked
           (because it follows an ECMP path that is pre-congested) or of
           blocking a flow that should be admitted.

        Note: [Charny07-3] suggests unbalanced traffic is quite
        possible, even with quite a large number of flows on a PCN-link
        (eg, 1000), when Assumption 3 (aggregation) is likely to be
        satisfied.

   (2)  If admission control is based on measurements of pre-congestion
        on specific ECMP paths, then the viewpoint that probing is
        useful assumes:

        *  There is no PCN-traffic on the ECMP path on which to base an
           admission decision.

        *  Simply admitting the new flow has a significant risk of
           leading to overload.

        *  The PCN-egress-node can match a packet to an ECMP path.

        Note: This is similar to the first viewpoint and so, similarly,
        could occur in a flash crowd if a new flow starts more or less
        simultaneously on many of the empty ECMP paths.  Because there
        are several ECMP paths between each pair of PCN-boundary-nodes,
        it's presumably more likely that an ECMP path is "empty" than an
        ingress-egress-aggregate is.  To constrain the number of ECMP
        paths, a few tunnels could be set up between each pair of PCN-

RFC5559 - Page 53

        boundary-nodes.  Tunnelling also solves the issue in the point
        immediately above (which is otherwise hard to solve because an
        ECMP routing decision is made independently on each node).

   The downsides of probing for this viewpoint are:

   o  Probing adds delay to the admission control process.

   o  Sufficient probing traffic has to be generated to test the pre-
      congestion level of the ECMP path.  But there's the risk that the
      probing traffic itself may cause pre-congestion, causing other
      PCN-flows to be blocked or even terminated.

   o  The PCN-egress-node needs to consume the probe packets to ensure
      they don't travel beyond the PCN-domain, since they might confuse
      the destination end node.  This is non-trivial, since probe
      packets are addressed to the destination end node in order to test
      the relevant ECMP path (ie, they are not addressed to the PCN-
      egress-node, unlike the first viewpoint above).

   The open issues associated with these viewpoints include:

   o  What rate and pattern of probe packets does the PCN-ingress-node
      need to generate so that there's enough traffic to make the
      admission decision?

   o  What difficulty does the delay (whilst probing is done), and
      possible packet drops, cause applications?

   o  Can the delay be alleviated by automatically and periodically
      probing on the ingress-egress-aggregate?  Or does this add too
      much overhead?

   o  Are there other ways of dealing with the flash crowd scenario?
      For instance, by limiting the rate at which new flows are
      admitted, or perhaps by a PCN-egress-node blocking new flows on
      its empty ingress-egress-aggregates when its non-empty ones are
      pre-congested.

   o  (Second viewpoint only) How does the PCN-egress-node disambiguate
      probe packets from data packets (so it can consume the former)?
      The PCN-egress-node must match the characteristic setting of
      particular bits in the probe packet's header or body, but these
      bits must not be used by any PCN-interior-node's ECMP algorithm.
      In the general case, this isn't possible, but it should be
      possible for a typical ECMP algorithm (which examines the source
      and destination IP addresses and port numbers, the protocol ID,
      and the DSCP).

RFC5559 - Page 54

Author's Address

   Philip Eardley (editor)
   BT
   B54/77, Sirius House Adastral Park Martlesham Heath
   Ipswich, Suffolk  IP5 3RE
   United Kingdom

   EMail: philip.eardley@bt.com