RFC 8568

Network Virtualization Research Challenges

Pages: 42
Informational

Part 2 of 3 – Pages 18 to 33

RFC8568 - Page 18 prevText

4.  Network Virtualization Challenges

4.1.  Overview

   Network virtualization is changing the way the telecommunications
   sector will deploy, extend, and operate their networks.  These new
   technologies aim at reducing the overall costs by moving
   communication services from specific hardware in the operators' cores
   to server farms scattered in data centers (i.e., compute and storage
   virtualization).  In addition, the networks interconnecting the
   functions that compose a network service are fundamentally affected
   in the way they route, process, and control traffic (i.e., network
   virtualization).

4.2.  Guaranteeing Quality of Service

   Achieving a given QoS in an NFV environment with virtualized and
   distributed computing, storage, and networking functions is more
   challenging than providing the equivalent in discrete non-virtualized
   components.  For example, ensuring a guaranteed and stable forwarding
   data rate has proven not to be straightforward when the forwarding
   function is virtualized and runs on top of COTS server hardware
   [openmano_dataplane] [NFV-COTS] [etsi_nfv_whitepaper_3].  Again, the
   comparison point is against a router or forwarder built on optimized
   hardware.  We next identify some of the challenges that this poses.

4.2.1.  Virtualization Technologies

   The issue of guaranteeing a network QoS is less of an issue for
   "traditional" cloud computing because the workloads that are treated
   there are servers or clients in the networking sense and hardly ever
   process packets.  Cloud computing provides hosting for applications
   on shared servers in a highly separated way.  Its main advantage is
   that the infrastructure costs are shared among tenants and that the
   cloud infrastructure provides levels of reliability that can not be
   achieved on individual premises in a cost-efficient way
   [intel_10_differences_nfv_cloud].  NFV has very strict requirements
   posed in terms of performance, stability, and consistency.  Although
   there are some tools and mechanisms to improve this, such as Enhanced
   Performance Awareness (EPA), Single Root I/O Virtualization (SR-IOV),
   Non-Uniform Memory Access (NUMA), Data Plane Development Kit (DPDK),
   etc., these are still unsolved challenges.  One open research issue
   is finding out technologies that are different from Virtual Machines
   (VMs) and more suitable for dealing with network functionalities.

   Lately, a number of lightweight virtualization technologies including
   containers, unikernels (specialized VMs) and minimalistic
   distributions of general-purpose OSes have appeared as virtualization

RFC8568 - Page 19

   approaches that can be used when constructing an NFV platform.
   [LIGHT-NFV] describes the challenges in building such a platform and
   discusses to what extent these technologies, as well as traditional
   VMs, are able to address them.

4.2.2.  Metrics for NFV Characterization

   Another relevant aspect is the need for tools for diagnostics and
   measurements suited for NFV.  There is a pressing need to define
   metrics and associated protocols to measure the performance of NFV.
   Specifically, since NFV is based on the concept of taking centralized
   functions and evolving them to highly distributed software (SW)
   functions, there is a commensurate need to fully understand and
   measure the baseline performance of such systems.

   The IP Performance Metrics (IPPM) WG defines metrics that can be used
   to measure the quality and performance of Internet services and
   applications running over transport-layer protocols (e.g., TCP and
   UDP) over IP.  It also develops and maintains protocols for the
   measurement of these metrics.  While the IPPM WG is a long-running WG
   that started in 1997, at the time of writing, it does not have a
   charter item or active Internet-Drafts related to the topic of
   network virtualization.  In addition to using IPPM to evaluate QoS,
   there is a need for specific metrics for assessing the performance of
   network-virtualization techniques.

   The Benchmarking Methodology Working Group (BMWG) is also performing
   work related to NFV metrics.  For example, [RFC8172] investigates
   additional methodological considerations necessary when benchmarking
   VNFs that are instantiated and hosted in general-purpose hardware,
   using bare-metal hypervisors or other isolation environments (such as
   Linux containers).  An essential consideration is benchmarking
   physical and VNFs in the same way when possible, thereby allowing
   direct comparison.

   There is a clear motivation for the work on performance metrics for
   NFV [etsi_gs_nfv_per_001], as stated in [RFC8172] (and replicated
   here):

      I'm designing and building my NFV Infrastructure platform.  The
      first steps were easy because I had a small number of categories
      of VNFs to support and the VNF vendor gave HW recommendations that
      I followed.  Now I need to deploy more VNFs from new vendors, and
      there are different hardware recommendations.  How well will the
      new VNFs perform on my existing hardware?  Which among several new
      VNFs in a given category are most efficient in terms of capacity
      they deliver?  And, when I operate multiple categories of VNFs
      (and PNFs) *concurrently* on a hardware platform such that they

RFC8568 - Page 20

      share resources, what are the new performance limits, and what are
      the software design choices I can make to optimize my chosen
      hardware platform?  Conversely, what hardware platform upgrades
      should I pursue to increase the capacity of these concurrently
      operating VNFs?

   Lately, there are also some efforts looking into VNF benchmarking.
   The selection of an NFV Infrastructure Point of Presence to host a
   VNF or allocation of resources (e.g., virtual CPUs, memory) needs to
   be done over virtualized (abstracted and simplified) resource views
   [vnf_benchmarking] [VNF-VBAAS].

4.2.3.  Predictive Analysis

   On top of diagnostic tools that enable an assessment of the QoS,
   predictive analyses are required to react before anomalies occur.
   Due to the SW characteristics of VNFs, a reliable diagnosis framework
   could potentially enable the prevention of issues by a proper
   diagnosis and then a reaction in terms of acting on the potentially
   impacted service (e.g., migration to a different compute node,
   scaling in/out, up/down, etc.).

4.2.4.  Portability

   Portability in NFV refers to the ability to run a given VNF on
   multiple NFVIs, that is, guaranteeing that the VNF would be able to
   perform its functions with a high and predictable performance given
   that a set of requirements on the NFVI resources is met.  Therefore,
   portability is a key feature that, if fully enabled, would contribute
   to making the NFV environment achieve a better reliability than a
   traditional system.  Implementing functionality in SW over
   "commodity" infrastructure should make it much easier to port/move
   functions from one place to another.  However, this is not yet as
   ideal as it sounds, and there are aspects that are not fully tackled.
   The existence of different hypervisors, specific hardware
   dependencies (e.g., EPA related), or state-synchronization aspects
   are just some examples of troublemakers for portability purposes.

   The ETSI NFV ISG is doing work in relation to portability.
   [etsi_gs_nfv_per_001] provides a list of minimal features that the VM
   Descriptor and Compute Host Descriptor should contain for the
   appropriate deployment of VM images over an NFVI (i.e., a "telco data
   center"), in order to guarantee high and predictable performance of
   data-plane workloads while assuring their portability.  In addition,
   [etsi_gs_nfv_per_001] provides a set of recommendations on the
   minimum requirements that hardware (HW) and hypervisor should have
   for a "telco data center" suitable for different workloads (data
   plane, control plane, etc.) present in VNFs.  The purpose of

RFC8568 - Page 21

   [etsi_gs_nfv_per_001] is to provide the list of VM requirements that
   should be included in the VM Descriptor template, and the list of HW
   capabilities that should be included in the Compute Host Descriptor
   (CHD) to assure predictable high performance.  ETSI NFV assumes that
   the MANO functions will make the mix & match.  Therefore, there are
   still several research challenges to be addressed here.

4.3.  Performance Improvement

4.3.1.  Energy Efficiency

   Virtualization is typically seen as a direct enabler of energy
   savings.  Some of the enablers for this that are often mentioned
   [nfv_sota_research_challenges] are (i) the multiplexing gains
   achieved by centralizing functions in data centers reduce the overall
   energy consumed and (ii) the flexibility brought by network
   programmability enables to switch off infrastructure as needed in a
   much easier way.  However, there is still a lot of room for
   improvement in terms of virtualization techniques to reduce the power
   consumption, such as enhanced-hypervisor technologies.

   Some additional examples of research topics that could enable energy
   savings are [nfv_sota_research_challenges]:

   o  Energy-aware scaling (e.g., reductions in CPU speeds and partially
      turning off some hardware components to meet a given energy
      consumption target.

   o  Energy-aware function placement.

   o  Scheduling and chaining algorithms, for example, adapting the
      network topology and operating parameters to minimize the
      operation cost (e.g., tracking energy costs to identify the
      cheapest prices).

   Note that it is also important to analyze the trade-off between
   energy efficiency and network performance.

4.3.2.  Improved Link Usage

   The use of NFV and SDN technologies can help improve link usage.  SDN
   has already shown that it can greatly increase average link
   utilization (e.g., Google example [google_sdn_wan]).  NFV adds more
   complexity (e.g., due to service-function chaining / VNF forwarding
   graphs), which needs to be considered.  Aspects like the ones
   described in [NFVRG-TOPO] (on NFV data center topology design) have
   to be looked at carefully as well.

RFC8568 - Page 22

4.4.  Multiple Domains

   Market fragmentation has resulted in a multitude of network operators
   each focused on different countries and regions.  This makes it
   difficult to create infrastructure services spanning multiple
   countries, such as virtual connectivity or compute resources, as no
   single operator has a footprint everywhere.  Cross-domain
   orchestration of services over multiple administrations or over
   multi-domain single administrations will allow end-to-end network and
   service elements to mix in multi-vendor, heterogeneous technology,
   and resource environments [multi-domain_5GEx].

   For the specific use case of 'Network as a Service', it becomes even
   more important to ensure that Cross Domain Orchestration also takes
   care of hierarchy of networks and their association, with respect to
   provisioning tunnels and overlays.

   Multi-domain orchestration is currently an active research topic,
   which is being tackled, among others, by ETSI NFV ISG and the 5GEx
   project <https://www.5gex.eu/> [MULTI-NMRG] [multi-domain_5GEx].

   Another side of the multi-domain problem is the integration/
   harmonization of different management domains.  A key example comes
   from Multi-access Edge Computing, which, according to ETSI, comes
   with its own MANO system and would require integration if
   interconnected to a generic NFV system.

4.5.  5G and Network Slicing

   From the beginning of all 5G discussions in the research and industry
   fora, it has been agreed that 5G will have to address many more use
   cases than the preceding wireless generations, which first focused on
   voice services and then on voice and high-speed packet data services.
   In this case, 5G should be able to handle not only the same (or
   enhanced) voice and packet data services, but also emerging services
   like tactile Internet and the Internet of Things (IoT).  These use
   cases take the requirements to opposite extremes, as some of them
   require ultra-low latency and higher-speed, whereas some others
   require ultra-low power consumption and high-delay tolerance.

   Because of these very extreme 5G use cases, it is envisioned that
   selective combinations of radio access networks and core network
   components will have to be combined into a given network slice to
   address the specific requirements of each use case.

   For example, within the major IoT category, which is perhaps the most
   disrupting one, some autonomous IoT devices will have very low
   throughput, will have much longer sleep cycles (and therefore high

RFC8568 - Page 23

   latency), and a battery life time exceeding by a factor of thousands
   that of smartphones or some other devices that will have almost
   continuous control and data communications.  Hence, it is envisioned
   that a customized network slice will have to be stitched together
   from virtual resources or sub-slices to meet these requirements.

   The actual definition of a "network slice" from an IP infrastructure
   viewpoint is currently undergoing intense debate; see [COMS-PS],
   [NETSLICES], [SLICE-3GPP], and [ngmn_5G_whitepaper].  Network slicing
   is a key for introducing new actors in existing markets at a low cost
   -- by letting new players rent "blocks" of capacity, if the new
   business model enables performance that meets the application needs
   (e.g., broadcasting updates to many sensors with satellite
   broadcasting capabilities).  However, more work needs to be done to
   define the basic architectural approach of how network slices will be
   defined and formed.  For example, is it mostly a matter of defining
   the appropriate network models (e.g., YANG) to stitch the network
   slice from existing components?  Or do end-to-end timing,
   synchronization, and other low-level requirements mean that more
   fundamental research has to be done?

4.5.1.  Virtual Network Operators

   The widespread use/discussion/practice of system and network
   virtualization technologies has led to new business opportunities,
   enlarging the offer of IT resources with virtual network and
   computing resources, among others.  As a consequence, the network
   ecosystem now differentiates between the owner of physical resources,
   the Infrastructure Provider (InP), and the intermediary that conforms
   and delivers network services to the final customers, the Virtual
   Network Operator (VNO).

   VNOs aim to exploit the virtualized infrastructures to deliver new-
   and-improved services to their customers.  However, current network
   virtualization techniques offer poor support for VNOs to control
   their resources.  It has been considered that the InP is responsible
   for the reliability of the virtual resources, but there are several
   situations in which a VNO requires a finer control on its resources.
   For instance, dynamic events, such as the identification of new
   requirements or the detection of incidents within the virtual system,
   might urge a VNO to quickly reform its virtual infrastructure and
   resource allocation.  However, the interfaces offered by current
   virtualization platforms do not offer the necessary functions for
   VNOs to perform the elastic adaptations they need to conduct in
   dynamic environments.

RFC8568 - Page 24

   Beyond their heterogeneity, which can be resolved by software
   adapters, current virtualization platforms do not have common methods
   and functions, so it is difficult for the virtual network controllers
   used by the VNOs to actually manage and control virtual resources
   instantiated on different platforms, not even considering different
   InPs.  Therefore, it is necessary to reach a common definition of the
   functions that should be offered by underlying platforms to give such
   overlay controllers the possibility to allocate and deallocate
   resources dynamically and get monitoring data about them.

   Such common methods should be offered by all underlying controllers,
   regardless of being network-oriented (e.g., ODL, ONOS, and Ryu) or
   computing-oriented (e.g., OpenStack, OpenNebula, and Eucalyptus).
   Furthermore, it is important for those platforms to offer some "PUSH"
   function to report resource state, avoiding the need for the VNO's
   controller to "POLL" for such data.  A starting point to get proper
   notifications within current REST APIs could be to consider the
   protocol proposed by the WEBPUSH WG [RFC8030].

   Finally, in order to establish a proper order and allow the
   coexistence and collaboration of different systems, a common ontology
   regarding network and system virtualization should be defined and
   agreed upon, so different and heterogeneous systems can understand
   each other without requiring reliance on specific adaptation
   mechanisms that might break with any update on any side of the
   relation.

4.5.2.  Extending Virtual Networks and Systems to the Internet of Things

   The Internet of Things (IoT) refers to the vision of connecting a
   multitude of automated devices (e.g., lights, environmental sensors,
   traffic lights, parking meters, health and security systems, etc.) to
   the Internet for purposes of reporting and remote command and control
   of the device.  This vision is being realized by a multi-pronged
   approach of standardization in various forums and complementary open-
   source activities.  For example, in the IETF, support of IoT web
   services has been defined by an HTTP-like protocol adapted for IoT
   called "CoAP" [RFC7252]; and, lately, a group has been studying the
   need to develop a new network layer to support IP applications over
   Low-Power Wide Area Networks (LPWAN).

   Elsewhere, for 5G cellular evolution, there is much discussion on the
   need for supporting virtual network slices for the expected massive
   numbers of IoT devices.  A separate virtual network slice is
   considered necessary for different 5G IoT use cases because devices
   will have very different characteristics than typical cellular

RFC8568 - Page 25

   devices like smartphones [ngmn_5G_whitepaper], and the number of IoT
   devices is expected to be at least one or two orders of magnitude
   higher than other 5G devices (see Section 4.5).

   The specific nature of the IoT ecosystem, particularly reflected in
   the Machine-to-Machine (M2M) communications, leads to the creation of
   new and highly distributed systems which demand location-based
   network and computing services.  A specific example can be
   represented by a set of "things" that suddenly require the setup of a
   firewall to allow external entities to access their data while
   outsourcing some computation requirements to more powerful systems
   relying on cloud-based services.  This representative use case
   exposes important requirements for both NFV and the underlying cloud
   infrastructures.

   In order to provide the aforementioned location-based functions
   integrated with highly distributed systems, the so-called fog
   infrastructures should be able to instantiate VNFs, placing them in
   the required place, e.g., close to their consumers.  This requirement
   implies that the interfaces offered by virtualization platforms must
   support the specification of location-based resources, which is a key
   function in those scenarios.  Moreover, those platforms must also be
   able to interpret and understand the references used by IoT systems
   to their location (e.g., "My-AP" or "5BLDG+2F") and also the
   specification of identifiers linked to other resources, such as the
   case of requiring the infrastructure to establish a link between a
   specific Access Point (AP) and a specific virtual computing node.  In
   summary, the research gap is exact localization of VNFs at far
   network edge infrastructure, which is highly distributed and dynamic.

4.6.  Service Composition

   Current network services deployed by operators often involve the
   composition of several individual functions (such as packet
   filtering, deep-packet inspection, load-balancing).  These services
   are typically implemented by the ordered combination of a number of
   service functions that are deployed at different points within a
   network, not necessarily on the direct data path.  This requires
   traffic to be steered through the required service functions,
   wherever they are deployed [RFC7498].

   For a given service, the abstracted view of the required service
   functions and the order in which they are to be applied is called
   "Service Function Chaining" (SFC) [sfc_challenges], which is called
   "Network Function Forwarding Graph" (NF-FG) in ETSI.  SFC is
   instantiated through the selection of specific service function
   instances on specific network nodes to form a service graph: this is

RFC8568 - Page 26

   called a "Service Function Path" (SFP).  The service functions may be
   applied at any layer within the network protocol stack (network
   layer, transport layer, application layer, etc.).

   Service composition is a powerful means that can provide significant
   benefits when applied in a softwarized network environment.  However,
   there are many research challenges in this area; for example, the
   ones related to composition mechanisms and algorithms to enable load-
   balancing and improve reliability.  The service composition should
   also act as an enabler to gather information across all hierarchies
   (underlays and overlays) of network deployments that may span across
   multiple operators for faster serviceability, thus facilitating
   accomplishing aforementioned goals of "load-balancing and improving
   reliability".

   As described in [dynamic_chaining], different algorithms can be used
   to enable dynamic service composition that optimizes a QoS-based
   utility function (e.g., minimizing the latency per-application
   traffic flows) for a given composition plan.  Such algorithms can
   consider the computation capabilities and load status of resources
   executing the VNF instances, either deduced through estimations from
   historical usage data or collected through real-time monitoring
   (i.e., context-aware selection).  For this reason, selections should
   include references to dynamic information on the status of the
   service instance and its constituent elements, i.e., monitoring
   information related to individual VNF instances and links connecting
   them as well as derived monitoring information at the chain level
   (e.g., end-to-end delay).  At runtime, if one or more VNF instances
   are no longer available or QoS degrades below a given threshold, the
   service selection task can be rerun to perform service substitution.

   There are different research directions that relate to the previous
   point.  For example, the use of Integer Linear Programming (ILP)
   techniques can be explored to optimize the management of diverse
   traffic flows.  Deep-machine learning can also be applied to optimize
   service chains using information parameters, such as some of the ones
   mentioned above.  Newer scheduling paradigms, like co-flows, can also
   be used.

   The SFC working group is working on an architecture for SFC [RFC7665]
   that includes the necessary protocols or protocol extensions to
   convey the SFC and SFP information to nodes that are involved in the
   implementation of service functions and SFCs as well as mechanisms
   for steering traffic through service functions.

   In terms of actual work items, the SFC WG has not yet considered
   working on the management and configuration of SFC components related
   to the support of SFC.  This part is of special interest for

RFC8568 - Page 27

   operators and would be required in order to actually put SFC
   mechanisms into operation.  Similarly, redundancy and reliability
   mechanisms for SFC are currently not dealt with by any WG in the
   IETF.  While this was the main goal of the VNFpool BoF efforts, it
   still remains unaddressed.

4.7.  Device Virtualization for End Users

   So far, most of the network softwarization efforts have focused on
   virtualizing functions of network elements.  While virtualization of
   network elements started with the core, mobile-network architectures
   are now heavily switching to also virtualize Radio Access Network
   (RAN) functions.  The next natural step is to get virtualization down
   at the level of the end-user device (e.g., virtualizing a smartphone)
   [virtualization_mobile_device].  The cloning of a device in the cloud
   (central or local) bears attractive benefits to both the device and
   network operations alike (e.g., power saving at the device by
   offloading computational-heaving functions to the cloud, optimized
   networking -- both device-to-device and device-to-infrastructure) for
   service delivery through tighter integration of the device (via its
   clone in the networking infrastructure).  This is, for example, being
   explored by the European H2020 ICIRRUS project
   <https://www.icirrus-5gnet.eu>.

4.8.  Security and Privacy

   Similar to any other situations where resources are shared, security
   and privacy are two important aspects that need to be taken into
   account.

   In the case of security, there are situations where multiple service
   providers will need to coexist in a virtual or hybrid physical/
   virtual environment.  This requires attestation procedures amongst
   different virtual/physical functions and resources as well as ongoing
   external monitoring.  Similarly, different network slices operating
   on the same infrastructure can present security problems, for
   instance, if one slice running critical applications (e.g., support
   for a safety system) is affected by another slice running a less
   critical application.  In general, the minimum common denominator for
   security measures on a shared system should be equal to or higher
   than the one required by the most-critical application.  Multiple and
   continuous threat model analysis as well as a DevOps model are
   required to maintain a certain level of security in an NFV system.
   Simplistically, DevOps is a process that combines multiple functions
   into single cohesive teams in order to quickly produce quality
   software.  Typically, it relies on also applying the Agile
   development process, which focuses on (among many things) dividing
   large features into multiple, smaller deliveries.  One part of this

RFC8568 - Page 28

   is to immediately test the new smaller features in order to get
   immediate feedback on errors so that if present, they can be
   immediately fixed and redeployed.

   On the other hand, privacy refers to concerns about the control of
   personal data and the decision of what to reveal to whom.  In this
   case, the storage, transmission, collection, and potential
   correlation of information in the NFV system, for purposes not
   originally intended or not known by the user, should be avoided.
   This is particularly challenging, as future intentions and threats
   cannot be easily predicted and still can be applied on data collected
   in the past.  Therefore, well-known techniques, such as data
   minimization using privacy features as default and allowing users to
   opt in/out, should be used to prevent potential privacy issues.

   Compared to traditional networks, NFV will result in networks that
   are much more dynamic (in function distribution and topology) and
   elastic (in size and boundaries).  Thus, NFV will require network
   operators to evolve their operational and administrative security
   solutions to work in this new environment.  For example, in NFV, the
   network orchestrator will become a key node to provide security
   policy orchestration across the different physical and virtual
   components of the virtualized network.  For highly confidential data,
   for example, the network orchestrator should take into account if
   certain physical HW of the network is considered to be more secure
   (e.g., because it is located in secure premises) than other HW.

   Traditional telecom networks typically run under a single
   administrative domain controlled by (exactly) one operator.  With
   NFV, it is expected that in many cases, the telecom operator will now
   become a tenant (running the VNFs), and the infrastructure (NFVI) may
   be run by a different operator and/or cloud service provider (see
   also Section 4.4).  Thus, there will be multiple administrative
   domains involved, making security policy coordination more complex.
   For example, who will be in charge of provisioning and maintaining
   security credentials such as public and private keys?  Also, should
   private keys be allowed to be replicated across the NFV for
   redundancy reasons?  Alternatively, it can be investigated how to
   develop a mechanism that avoids such a security policy coordination,
   thus making the system more robust.

   On a positive note, NFV may better defend against denial-of-service
   (DoS) attacks because of the distributed nature of the network (i.e.,
   no single point of failure) and the ability to steer (undesirable)
   traffic quickly [etsi_gs_nfv_sec_001].  Also, NFVs that have physical
   HW that is distributed across multiple data centers will also provide

RFC8568 - Page 29

   better fault isolation environments.  Particularly, this holds true
   if each data center is protected separately via firewalls,
   Demilitarized Zones (DMZs), and other network-protection techniques.

   SDN can also be used to help improve security by facilitating the
   operation of existing protocols, such as Authentication,
   Authorization and Accounting (AAA).  The management of AAA
   infrastructures, namely the management of AAA routing and the
   establishment of security associations between AAA entities, can be
   performed using SDN, as analyzed in [SDN-AAA].

4.9.  Separation of Control Concerns

   NFV environments offer two possible levels of SDN control.  One level
   is the need for controlling the NFVI to provide connectivity end-to-
   end among VNFs or among VNFs and Physical Network Functions (PNFs).
   A second level is the control and configuration of the VNFs
   themselves (in other words, the configuration of the network service
   implemented by those VNFs), taking advantage of the programmability
   brought by SDN.  Both control concerns are separated in nature.
   However, interaction between both could be expected in order to
   optimize, scale, or influence each other.

   Clear mechanisms for such interactions are needed in order to avoid
   malfunctioning or interference concerns.  These ideas are considered
   in [etsi_gs_nfv_eve005] and [LAYERED-SDN].

4.10.  Network Function Placement

   Network function placement is a problem in any kind of network
   telecommunications infrastructure.  Moreover, the increased degree of
   freedom added by network virtualization makes this problem even more
   important, and also harder to tackle.  Deciding where to place VNFs
   is a resource-allocation problem that needs to (or may) take into
   consideration quite a few aspects: resiliency, (anti-)affinity,
   security, privacy, energy efficiency, etc.

   When several functions are chained (typical scenario), placement
   algorithms become more complex and important (as described in
   Section 4.6).  While there has been research on the topic
   ([nfv_piecing], [dynamic_placement], and [vnf-p]), this still remains
   an open challenge that requires more attention.  The use of multi-
   domains adds another component of complexity to this problem that has
   to be considered.

RFC8568 - Page 30

4.11.  Testing

   The impacts of network virtualization on testing can be divided into
   three groups:

   1.  Changes in methodology

   2.  New functionality

   3.  Opportunities

4.11.1.  Changes in Methodology

   The largest impact of NFV is the ability to isolate the System Under
   Test (SUT).  When testing PNFs, isolating the SUT means that all the
   other devices that the SUT communicates with are replaced with
   simulations (or controlled executions) in order to place the SUT
   under test by itself.  The SUT may be comprised of one or more
   devices.  The simulations use the appropriate traffic type and
   protocols in order to execute test cases.

   As shown in Figure 2, NFV provides a common architecture for all
   functions to use.  A VNF is executed using resources offered by the
   NFVI, which have been allocated using the MANO function.  It is not
   possible to test a VNF by itself, without the entire supporting
   environment present.  This fundamentally changes how to consider the
   SUT.  In the case of a VNF (or multiple VNFs), the SUT is part of a
   larger architecture that is necessary in order to run the SUTs.

   Therefore, isolation of the SUT becomes controlling the environment
   in a disciplined manner.  The components of the environment necessary
   to run the SUTs that are not part of the SUT itself become the test
   environment.  In the case of VNFs that are part of the SUT, the NFVI
   and MANO become the test environment.  The configurations and
   policies that guide the test environment should remain constant
   during the execution of the tests, and also from test to test.
   Configurations such as CPU pinning, NUMA configuration, the SW
   versions and configurations of the hypervisor, vSwitch and NICs
   should remain constant.  The only variables in the testing should be
   those controlling the SUT itself.  If any configuration in the test
   environment is changed from test to test, the results become very
   difficult, if not impossible, to compare since the test environment
   behavior may change the results as a consequence of the configuration
   change.

   Testing the NFVI itself also presents new considerations.  With a
   PNF, the dedicated hardware supporting it is optimized for the
   particular workload of the function.  Routing hardware is specially

RFC8568 - Page 31

   built to support packet forwarding functions, while the hardware to
   support a purely control-plane application (say, a DNS server, or a
   Diameter function) will not have this specialized capability.  In
   NFV, the NFVI is required to support all types of potentially
   different workload types.

   Therefore, testing the NFVI requires careful consideration about what
   types of metrics are sought.  This, in turn, depends on the workload
   type the expected VNF will be.  Examples of different workload types
   are data forwarding, control plane, encryption, and authentication.
   All these types of expected workloads will determine the types of
   metrics that should be sought.  For example, if the workload is
   control plane, then a metric such as jitter is not useful, but
   dropped packets are critical.  In a multi-tenant environment, the
   NFVI could support various types of workloads.  In this case, testing
   with a variety of traffic types while measuring the corresponding
   metrics simultaneously becomes necessary.

   Test beds for any type of testing for an NFV-based system will be
   largely similar to previously used test architectures.  The methods
   are impacted by virtualization, as described above, but the design of
   test beds are similar as in the past.  There are two main new
   considerations:

   o  Since networking is based on software, which has lead to greater
      automation in deployment, the test system should also be
      deployable with the rest of the system in order to fully automate
      the system.  This is especially relevant in a DevOps environment
      supported by a Continuous Integration and Continuous Deployment
      (CI/CD) tool chain (see Section 4.11.3 below).

   o  In any performance test bed, the test system should not share the
      same resources as the SUT.  While multi-tenancy is a reality in
      virtualization, having the test system share resources with the
      SUT will impact the measured results in a performance test bed.
      The test system should be deployed on a separate platform in order
      not to impact the resources available to the SUT.

4.11.2.  New Functionality

   NFV presents a collection of new functionality in order to support
   the goal of software networking.  Each component on the architecture
   shown in Figure 2 has an associated set of functionality that allows
   VNFs to run the following: onboarding, life-cycle management for VNFs
   and Network Services (NS), resource allocation, hypervisor functions,
   etc.

RFC8568 - Page 32

   One of the new capabilities enabled by NFV is VNF Forwarding Graphs
   (VNFFG).  This refers to the graph that represents a network service
   by chaining together VNFs into a forwarding path.  In practice, the
   forwarding path can be implemented in a variety of ways using
   different networking capabilities: vSwitch, SDN, and SDN with a
   northbound application.  Additionally, the VNFFG might use tunneling
   protocols like Virtual eXtensible Local Area Network (VXLAN).  The
   dynamic allocation and implementation of these networking paths will
   have different performance characteristics depending on the methods
   used.  The path implementation mechanism becomes a variable in the
   network testing of the NSs.  The methodology used to test the various
   mechanisms should largely remain the same; as usual, the test
   environment should remain constant for each of the tests, focusing on
   varying the path establishment method.

   "Scaling" refers to the change in allocation of resources to a VNF or
   NS.  It happens dynamically at run-time, based on defined policies
   and triggers.  The triggers can be network, compute, or storage
   based.  Scaling can allocate more resources in times of need, or
   reduce the amount of resources allocated when the demand is reduced.
   The SUT in this case becomes much larger than the VNF itself: MANO
   controls how scaling is done based on policies, and then allocates
   the resources accordingly in the NFVI.  Essentially, the testing of
   scaling includes the entire NFV architecture components into the SUT.

4.11.3.  Opportunities

   Softwarization of networking functionality leads to softwarization of
   the test as well.  As PNFs are being transformed into VNFs, so are
   the test tools.  This leads to the fact that test tools are also
   being controlled and executed in the same environment as the VNFs.
   This presents an opportunity to include VNF-based test tools along
   with the deployment of the VNFs supporting the services of the
   service provider into the host data centers.  Therefore, tests can be
   automatically executed upon deployment in the target environment, for
   each deployment, and each service.  With PNFs, this was very
   difficult to achieve.

   This new concept helps to enable modern concepts like DevOps and
   Continuous Integration and Continuous Deployment in the NFV
   environment.  The CI/CD pipeline supports this concept.  It consists
   of a series of tools, among which immediate testing is an integral
   part, to deliver software from source to deployment.  The ability to
   deploy the test tools themselves into the production environment
   stretches the CI/CD pipeline all the way to production deployment,
   allowing a range of tests to be executed.  The tests can be simple,

RFC8568 - Page 33

   with a goal of verifying the correct deployment and networking
   establishment, but can also be more complex, like testing VNF
   functionality.

(page 33 continued on part 3)