RFC 8632

A YANG Data Model for Alarm Management

Pages: 82
Proposed Standard
→ Errata

Part 1 of 4 – Pages 1 to 21

RFC8632 - Page 1

Internet Engineering Task Force (IETF)                         S. Vallin
Request for Comments: 8632                              Stefan Vallin AB
Category: Standards Track                                   M. Bjorklund
ISSN: 2070-1721                                                    Cisco
                                                          September 2019


                 A YANG Data Model for Alarm Management

Abstract

   This document defines a YANG module for alarm management.  It
   includes functions for alarm-list management, alarm shelving, and
   notifications to inform management systems.  There are also
   operations to manage the operator state of an alarm and
   administrative alarm procedures.  The module carefully maps to
   relevant alarm standards.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 7841.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   https://www.rfc-editor.org/info/rfc8632.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

RFC8632 - Page 2

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Terminology and Notation  . . . . . . . . . . . . . . . .   3
   2.  Objectives  . . . . . . . . . . . . . . . . . . . . . . . . .   5
   3.  Alarm Data Model Concepts . . . . . . . . . . . . . . . . . .   5
     3.1.  Alarm Definition  . . . . . . . . . . . . . . . . . . . .   5
     3.2.  Alarm Type  . . . . . . . . . . . . . . . . . . . . . . .   6
     3.3.  Identifying the Alarming Resource . . . . . . . . . . . .   8
     3.4.  Identifying Alarm Instances . . . . . . . . . . . . . . .   9
     3.5.  Alarm Lifecycle . . . . . . . . . . . . . . . . . . . . .   9
       3.5.1.  Resource Alarm Lifecycle  . . . . . . . . . . . . . .  10
       3.5.2.  Operator Alarm Lifecycle  . . . . . . . . . . . . . .  11
       3.5.3.  Administrative Alarm Lifecycle  . . . . . . . . . . .  11
     3.6.  Root Cause, Impacted Resources, and Related Alarms  . . .  11
     3.7.  Alarm Shelving  . . . . . . . . . . . . . . . . . . . . .  13
     3.8.  Alarm Profiles  . . . . . . . . . . . . . . . . . . . . .  13
   4.  Alarm Data Model  . . . . . . . . . . . . . . . . . . . . . .  13
     4.1.  Alarm Control . . . . . . . . . . . . . . . . . . . . . .  15
       4.1.1.  Alarm Shelving  . . . . . . . . . . . . . . . . . . .  15
     4.2.  Alarm Inventory . . . . . . . . . . . . . . . . . . . . .  16
     4.3.  Alarm Summary . . . . . . . . . . . . . . . . . . . . . .  16
     4.4.  The Alarm List  . . . . . . . . . . . . . . . . . . . . .  17
     4.5.  The Shelved-Alarm List  . . . . . . . . . . . . . . . . .  19
     4.6.  Alarm Profiles  . . . . . . . . . . . . . . . . . . . . .  19
     4.7.  Operations  . . . . . . . . . . . . . . . . . . . . . . .  20
     4.8.  Notifications . . . . . . . . . . . . . . . . . . . . . .  20
   5.  Relationship to the ietf-hardware YANG Module . . . . . . . .  20
   6.  Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . .  21
   7.  The X.733 Mapping Module  . . . . . . . . . . . . . . . . . .  53
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  65
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  65
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  67
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  67
     10.2.  Informative References . . . . . . . . . . . . . . . . .  68
   Appendix A.  Vendor-Specific Alarm Types Example  . . . . . . . .  70
   Appendix B.  Alarm Inventory Example  . . . . . . . . . . . . . .  71
   Appendix C.  Alarm List Example . . . . . . . . . . . . . . . . .  71
   Appendix D.  Alarm Shelving Example . . . . . . . . . . . . . . .  73
   Appendix E.  X.733 Mapping Example  . . . . . . . . . . . . . . .  74
   Appendix F.  Relationship to Other Alarm Standards  . . . . . . .  74
     F.1.  Definition of "Alarm" . . . . . . . . . . . . . . . . . .  74
     F.2.  Data Model  . . . . . . . . . . . . . . . . . . . . . . .  76
       F.2.1.  X.733 . . . . . . . . . . . . . . . . . . . . . . . .  76
       F.2.2.  The Alarm MIB (RFC 3877)  . . . . . . . . . . . . . .  77
       F.2.3.  3GPP Alarm IRP  . . . . . . . . . . . . . . . . . . .  77
       F.2.4.  G.7710  . . . . . . . . . . . . . . . . . . . . . . .  78

RFC8632 - Page 3

   Appendix G.  Alarm-Usability Requirements . . . . . . . . . . . .  78
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  82
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  82

1.  Introduction

   This document defines a YANG module [RFC7950] for alarm management.
   The purpose is to define a standardized alarm interface for network
   devices that can be easily integrated into management applications.
   The model is also applicable as a northbound alarm interface in the
   management applications.

   Alarm monitoring is a fundamental part of monitoring the network.
   Raw alarms from devices do not always tell the status of the network
   services or necessarily point to the root cause.  However, being able
   to feed alarms to the alarm-management application in a standardized
   format is a starting point for performing higher-level network
   assurance tasks.

   The design of the module is based on experience from using and
   implementing available alarm standards from ITU [X.733], 3GPP
   [ALARMIRP], and ANSI [ISA182].

1.1.  Terminology and Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   The following terms are defined in [RFC7950]:

   o  action

   o  client

   o  data tree

   o  server

   The following terms are used within this document:

   Alarm (the general concept):  An alarm signifies an undesirable state
      in a resource that requires corrective action.

RFC8632 - Page 4

   Fault:  A fault is the underlying cause of an undesired behavior.
      There is no trivial one-to-one mapping between faults and alarms.
      One fault may result in several alarms in case the system lacks
      root-cause and correlation capabilities.  An alarm might not have
      an underlying fault as a cause.  For example, imagine a bad Mean
      Opinion Score (MOS) alarm from a Voice over IP (VOIP) probe and
      the cause being non-optimal QoS configuration.

   Alarm Type:  An alarm type identifies a possible unique alarm state
      for a resource.  Alarm types are names to identify the state like
      "link-alarm", "jitter-violation", and "high-disk-utilization".

   Resource:  A fine-grained identification of the alarming resource,
      for example, an interface and a process.

   Alarm Instance:  The alarm state for a specific resource and alarm
      type, for example, ("GigabitEthernet0/15", "link-alarm").  An
      entry in the alarm list.

   Cleared Alarm:  A cleared alarm is an alarm where the system
      considers the undesired state to be cleared.  Operators cannot
      clear alarms; clearance is managed by the system.  For example, a
      "linkUp" notification can be considered a clear condition for a
      "linkDown" state.

   Closed Alarm:  Operators can close alarms irrespective of the alarm
      being cleared or not.  A closed alarm indicates that the alarm
      does not need attention because either the corrective action has
      been taken or it can be ignored for other reasons.

   Alarm Inventory:  A list of all possible alarm types on a system.

   Alarm Shelving:  Blocking alarms according to specific criteria.

   Corrective Action:  An action taken by an operator or automation
      routine in order to minimize the impact of the alarm or resolve
      the root cause.

   Management System:  The alarm-management application that consumes
      the alarms, i.e., acts as a client.

   System:  The system that implements this YANG module, i.e., acts as a
      server.  This corresponds to a network device or a management
      application that provides a northbound alarm interface.

   Tree diagrams used in this document follow the notation defined in
   [RFC8340].

RFC8632 - Page 5

2.  Objectives

   The objectives for the design of the alarm data model are:

   o  Users find it simple to use.  If a system supports this module, it
      shall be straightforward to integrate it into a YANG-based alarm
      manager.

   o  Alarms are viewed as states on resources and not as discrete
      notifications.

   o  A precise definition of "alarm" is provided in order to exclude
      general events that should not be forwarded as alarm
      notifications.

   o  Precise identification of alarm types and alarm instances is
      provided.

   o  A management system should be able to pull all available alarm
      types from a system, i.e., read the alarm inventory from a system.
      This makes it possible to prepare alarm operators with
      corresponding alarm instructions.

   o  Alarm-usability requirements are addressed; see Appendix G.  While
      IETF and telecom standards have addressed alarms mostly from a
      protocol perspective, the process industry has published several
      relevant standards addressing requirements for a useful alarm
      interface; see [EEMUA] and [ISA182].  This document defines
      usability requirements as well as a YANG data model.

   o  Mapping to [X.733], which is a requirement for some alarm systems,
      is achievable.  Still, keep some of the X.733 concepts out of the
      core model in order to make the model small and easy to
      understand.

3.  Alarm Data Model Concepts

   This section defines the fundamental concepts behind the data model.
   This section is rooted in the works of Vallin et. al [ALARMSEM].

3.1.  Alarm Definition

   An alarm signifies an undesirable state in a resource that requires
   corrective action.

RFC8632 - Page 6

   There are two main things to remember from this definition:

   1.  It focuses on leaving out events and logging information in
       general.  Alarms should only be used for undesired states that
       require action.

   2.  It also focuses on alarms as a state on a resource, not the
       notifications that report the state changes.

   See Appendix F for information on how this definition relates to
   other alarm standards.

3.2.  Alarm Type

   This document defines an alarm type with an alarm-type id and an
   alarm-type qualifier.

   The alarm-type id is modeled as a YANG identity.  With YANG
   identities, new alarm types can be defined in a distributed fashion.
   YANG identities are hierarchical, which means that a hierarchy of
   alarm types can be defined.

   Standards and vendors should define their own alarm-type identities
   based on this definition.

   The use of YANG identities means that all possible alarms are
   identified at design time.  This explicit declaration of alarm types
   makes it easier to allow for alarm qualification reviews and
   preparation of alarm actions and documentation.

   There are occasions where the alarm types are not known at design
   time.  An example is a system with digital inputs that allows users
   to connect detectors, such as smoke detectors, to the inputs.  In
   this case, it is a configuration action that says certain connectors
   are fire alarms, for example.

   In order to allow for dynamic addition of alarm types, the alarm data
   model permits further qualification of the identity-based alarm type
   using a string.  A potential drawback of this is that there is a
   significant risk that alarm operators will receive alarm types as a
   surprise.  They do not know how to resolve the problem since a
   defined alarm procedure does not necessarily exist.  To avoid this
   risk, the system MUST publish all possible alarm types in the alarm
   inventory; see Section 4.2.

RFC8632 - Page 7

   A vendor or standards organization can define their own alarm-type
   hierarchy.  The example below shows a hierarchy based on X.733 event
   types:

     import ietf-alarms {
       prefix al;
     }
     identity vendor-alarms {
       base al:alarm-type;
     }
     identity communications-alarm {
       base vendor-alarms;
     }
     identity link-alarm {
       base communications-alarm;
     }

   Alarm types can be abstract.  An abstract alarm type is used as a
   base for defining hierarchical alarm types.  Concrete alarm types are
   used for alarm states and appear in the alarm inventory.  There are
   two kinds of concrete alarm types:

   1.  The last subordinate identity in the "alarm-type-id" hierarchy is
       concrete, for example, "alarm-identity.environmental-
       alarm.smoke".  In this example, "alarm-identity" and
       "environmental-alarm" are abstract YANG identities, whereas
       "smoke" is a concrete YANG identity.

   2.  The YANG identity hierarchy is abstract, and the concrete alarm
       type is defined by the dynamic alarm-qualifier string, for
       example, "alarm-identity.environmental-alarm.external-detector"
       with alarm-type-qualifier "smoke".

RFC8632 - Page 8

   For example:

     // Alternative 1: concrete alarm type identity
     import ietf-alarms {
       prefix al;
     }
     identity environmental-alarm {
       base al:alarm-type;
       description "Abstract alarm type";
     }
     identity smoke {
       base environmental-alarm;
       description "Concrete alarm type";
     }

     // Alternative 2: concrete alarm type qualifier
     import ietf-alarms {
       prefix al;
     }
     identity environmental-alarm {
       base al:alarm-type;
       description "Abstract alarm type";
     }
     identity external-detector {
       base environmental-alarm;
       description
         "Abstract alarm type; a runtime configuration
          procedure sets the type of alarm detected.  This will
          be reported in the alarm-type-qualifier.";
     }

   A server SHOULD strive to minimize the number of dynamically defined
   alarm types.

3.3.  Identifying the Alarming Resource

   It is of vital importance to be able to refer to the alarming
   resource.  This reference must be as fine-grained as possible.  If
   the alarming resource exists in the data tree, an instance-identifier
   MUST be used with the full path to the object.

   When the module is used in a controller/orchestrator/manager, the
   original device resource identification can be modified to include
   the device in the path.  The details depend on how devices are
   identified and are out of scope for this specification.

RFC8632 - Page 9

   Example:

      The original device alarm might identify the resource as
      "/dev:interfaces/dev:interface[dev:name='FastEthernet1/0']".

      The resource identification in the manager could look something
      like: "/mgr:devices/mgr:device[mgr:name='xyz123']/dev:interfaces/
      dev:interface[dev:name='FastEthernet1/0']"

   This module also allows for alternate naming of the alarming resource
   if it is not available in the data tree.

3.4.  Identifying Alarm Instances

   A primary goal of the alarm data model is to remove any ambiguity in
   how alarm notifications are mapped to an update of an alarm instance.
   The X.733 [X.733] and 3GPP [ALARMIRP] documents were not clear on
   this point.  This alarm data model states that the tuple (resource,
   alarm-type identifier, and alarm-type qualifier) corresponds to a
   single alarm instance.  This means that alarm notifications for the
   same resource and same alarm type are matched to update the same
   alarm instance.  These three leafs are therefore used as the key in
   the alarm list:

     list alarm {
       key "resource alarm-type-id alarm-type-qualifier";
       ...
     }

3.5.  Alarm Lifecycle

   The alarm model clearly separates the resource alarm lifecycle from
   the operator and administrative lifecycles of an alarm.

   o  resource alarm lifecycle: the alarm instrumentation that controls
      alarm raise, clearance, and severity changes.

   o  operator alarm lifecycle: operators acting upon alarms with
      actions like acknowledging and closing.  Closing an alarm implies
      that the operator considers the corrective action performed.
      Operators can also shelve (block/filter) alarms in order to avoid
      nuisance alarms.

   o  administrative alarm lifecycle: purging (deleting) unwanted alarms
      and compressing the alarm status-change list.  This module exposes
      operations to manage the administrative lifecycle.  The server may
      also perform these operations based on other policies, but how
      that is done is out of scope for this document.

RFC8632 - Page 10

   A server SHOULD describe how long it retains cleared/closed alarms
   until they are manually purged or if it has an automatic removal
   policy.  How this is done is outside the scope of this document.

3.5.1.  Resource Alarm Lifecycle

   From a resource perspective, an alarm can, for example, have the
   following lifecycle: raise, change severity, change severity, clear,
   being raised again, etc.  All of these status changes can have
   different alarm texts generated by the instrumentation.  Two
   important things to note:

   1.  Alarms are not deleted when they are cleared.  Deleting alarms is
       an administrative process.  The "ietf-alarms" YANG module defines
       an action "purge-alarms" that deletes alarms.

   2.  Alarms are not cleared by operators; only the underlying
       instrumentation can clear an alarm.  Operators can close alarms.

   The YANG tree representation below illustrates the resource-oriented
   lifecycle:

     +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
        ...
        +--ro is-cleared                 boolean
        +--ro last-raised                yang:date-and-time
        +--ro last-changed               yang:date-and-time
        +--ro perceived-severity         severity
        +--ro alarm-text                 alarm-text
        +--ro status-change* [time] {alarm-history}?
           +--ro time                    yang:date-and-time
           +--ro perceived-severity      severity-with-clear
           +--ro alarm-text              alarm-text

   For every status change from the resource perspective, a row is added
   to the "status-change" list, if the server implements the feature
   "alarm-history".  The feature "alarm-history" is optional to
   implement, since keeping the alarm history may have an impact on the
   server's memory resources.

   The last status values are also represented as leafs for the alarm.
   Note well that the alarm severity does not include "cleared"; alarm
   clearance is a boolean flag.

   Therefore, an alarm can look like this: (("GigabitEthernet0/25",
   "link-alarm",""), false, 2018-04-08T08:20:10.00Z,
   2018-04-08T08:20:10.00Z, major, "Interface GigabitEthernet0/25
   down").

RFC8632 - Page 11

3.5.2.  Operator Alarm Lifecycle

   Operators can act upon alarms using the set-operator-state action:

     +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
        ...
        +--ro operator-state-change* [time] {operator-actions}?
        |  +--ro time        yang:date-and-time
        |  +--ro operator    string
        |  +--ro state       operator-state
        |  +--ro text?       string
        +---x set-operator-state {operator-actions}?
           +---w input
              +---w state    writable-operator-state
              +---w text?    string

   The operator state for an alarm can be "none", "ack", "shelved", and
   "closed".  Alarm deletion (using the action "purge-alarms") can use
   this state as a criterion.  For example, a closed alarm is an alarm
   where the operator has performed any required corrective actions.
   Closed alarms are good candidates for being purged.

3.5.3.  Administrative Alarm Lifecycle

   Deleting alarms from the alarm list is considered an administrative
   action.  This is supported by the "purge-alarms" action.  The "purge-
   alarms" action takes a filter as input.  The filter selects alarms
   based on the operator and resource alarm lifecycle such as "all
   closed cleared alarms older than a time specification".  The server
   may also perform these operations based on other policies, but how
   that is done is out of scope for this document.

   Purged alarms are removed from the alarm list.  Note well that if the
   alarm resource state changes after a purge, the alarm will reappear
   in the alarm list.

   Alarms can be compressed.  Compressing an alarm deletes all entries
   in the alarm's "status-change" list except for the last status
   change.  A client can perform this using the "compress-alarms"
   action.  The server may also perform these operations based on other
   policies, but how that is done is out of scope for this document.

3.6.  Root Cause, Impacted Resources, and Related Alarms

   The alarm data model does not mandate any requirements for the system
   to support alarm correlation or root-cause and service-impact
   analysis.  However, if such features are supported, this section
   describes how the results of such analysis are represented in the

RFC8632 - Page 12

   data model.  These parts of the model are optional.  The module
   supports three scenarios:

   Root-cause analysis:  An alarm can indicate candidate root-cause
      resources, for example, a database issue alarm referring to a
      full-disk partition.

   Service-impact analysis:  An alarm can refer to potential impacted
      resources, for example, an interface alarm referring to impacted
      network services.

   Alarm correlation:  Dependencies between alarms; several alarms can
      be grouped as relating to each other, for example, a streaming
      media alarm relating to a high-jitter alarm.

   Different systems have varying degrees of alarm correlation and
   analysis capabilities, and the intent of the alarm data model is to
   enable any capability, including none.

   The general principle of this alarm data model is to limit the amount
   of alarms.  In many cases, several resources are affected for a given
   underlying problem.  A full disk will of course impact databases and
   applications as well.  The recommendation is to have a single alarm
   for the underlying problem and list the affected resources in the
   alarm rather than having separate alarms for each resource.

   The alarm has one leaf-list to identify a possible "impacted-
   resource" and a leaf-list to identify a possible "root-cause-
   resource".  These serve as hints only.  It is up to the client
   application to use this information to present the overall status.
   Using the disk-full example, a good alarm would be to use the hard-
   disk partition as the alarming resource and add the database and
   applications into the "impacted-resource" leaf-list.

   A system should always strive to identify the resource that can be
   acted upon as the "resource" leaf.  The "impacted-resource" leaf-list
   shall be used to identify any side effects of the alarm.  The
   impacted resources cannot be acted upon to fix the problem.  The disk
   full example above illustrates the principle; you cannot fix the
   underlying issue by database operations.  However, you need to pay
   attention to the database to perform any operations that limit the
   impact of the problem.

   On some occasions, the system might not be capable of detecting the
   root cause, the resource that can be acted upon.  The instrumentation
   in this case only monitors the side effect and raises an alarm to
   indicate a situation requiring attention.  The instrumentation still
   might identify possible candidates for the root-cause resource.  In

RFC8632 - Page 13

   this case, the "root-cause-resource" leaf-list can be used to
   indicate the candidate root-cause resources.  An example of this kind
   of alarm might be an active test tool that detects a Service Level
   Agreement (SLA) violation on a VPN connection and identifies the
   devices along the chain as candidate root causes.

   The alarm data model also supports a way to associate different
   alarms with each other using the "related-alarm" list.  This list
   enables the server to inform the client that certain alarms are
   related to other alarms.

   Note well that this module does not prescribe any dependencies or
   preference between the above alarm correlation mechanisms.  Different
   systems have different capabilities, and the above described
   mechanisms are available to support the instrumentation features.

3.7.  Alarm Shelving

   Alarm shelving is an important function in order for alarm-management
   applications and operators to stop superfluous alarms.  A shelved
   alarm implies that any alarms fulfilling these criteria are ignored
   (blocked/filtered).  Shelved alarms appear in a dedicated shelved-
   alarm list; thus, they can be filtered out so that the main alarm
   list only contains entries of interest.  Shelved alarms do not
   generate notifications, but the shelved-alarm list is updated with
   any alarm-state changes.

   Alarm shelving is optional to implement, since matching alarms
   against shelf criteria may have an impact on the server's processing
   resources.

3.8.  Alarm Profiles

   Alarm profiles are used to configure further information to an alarm
   type.  This module supports configuring severity levels overriding
   the system-default levels.  This corresponds to the Alarm Severity
   Assignment Profile (ASAP) functionality in M.3100 [M.3100] and M.3160
   [M.3160].  Other standard or enterprise modules can augment this list
   with further alarm-type information.

4.  Alarm Data Model

   The fundamental parts of the data model are the "alarm-list" with
   associated notifications and the "alarm-inventory" list of all
   possible alarm types.  These MUST be implemented by a system.  The
   rest of the data model is made conditional with these YANG features:
   "operator-actions", "alarm-shelving", "alarm-history", "alarm-
   summary", "alarm-profile", and "severity-assignment".

RFC8632 - Page 14

   The data model has the following overall structure:

     +--rw control
     |  +--rw max-alarm-status-changes?   union
     |  +--rw notify-status-changes?      enumeration
     |  +--rw notify-severity-level?      severity
     |  +--rw alarm-shelving {alarm-shelving}?
     |        ...
     +--ro alarm-inventory
     |  +--ro alarm-type* [alarm-type-id alarm-type-qualifier]
     |        ...
     +--ro summary {alarm-summary}?
     |  +--ro alarm-summary* [severity]
     |  |     ...
     |  +--ro shelves-active?   empty {alarm-shelving}?
     +--ro alarm-list
     |  +--ro number-of-alarms?   yang:gauge32
     |  +--ro last-changed?       yang:date-and-time
     |  +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
     |  |     ...
     |  +---x purge-alarms
     |  |     ...
     |  +---x compress-alarms {alarm-history}?
     |        ...
     +--ro shelved-alarms {alarm-shelving}?
     |  +--ro number-of-shelved-alarms?      yang:gauge32
     |  +--ro shelved-alarms-last-changed?   yang:date-and-time
     |  +--ro shelved-alarm*
     |  |       [resource alarm-type-id alarm-type-qualifier]
     |  |     ...
     |  +---x purge-shelved-alarms
     |  |     ...
     |  +---x compress-shelved-alarms {alarm-history}?
     |        ...
     +--rw alarm-profile*
             [alarm-type-id alarm-type-qualifier-match resource]
             {alarm-profile}?
        +--rw alarm-type-id                        alarm-type-id
        +--rw alarm-type-qualifier-match           string
        +--rw resource                             resource-match
        +--rw description                          string
        +--rw alarm-severity-assignment-profile
                {severity-assignment}?
              ...

RFC8632 - Page 15

4.1.  Alarm Control

   The "/alarms/control/notify-status-changes" leaf controls whether
   notifications are sent for all state changes, only raise and clear,
   or only notifications more severe than a configured level.  This
   feature, in combination with alarm shelving, corresponds to the ITU
   Alarm Report Control functionality; see Appendix F.2.4.

   Every alarm has a list of status changes.  The length of this list is
   controlled by "/alarms/control/max-alarm-status-changes".  When the
   list is full and a new entry created, the oldest entry is removed.

4.1.1.  Alarm Shelving

   The shelving control tree is shown below:

     +--rw control
        +--rw alarm-shelving {alarm-shelving}?
           +--rw shelf* [name]
              +--rw name           string
              +--rw resource*      resource-match
              +--rw alarm-type*
              |       [alarm-type-id alarm-type-qualifier-match]
              |  +--rw alarm-type-id                 alarm-type-id
              |  +--rw alarm-type-qualifier-match    string
              +--rw description?   string


   Shelved alarms are shown in a dedicated shelved-alarm list.  Matching
   alarms MUST appear in the "/alarms/shelved-alarms/shelved-alarm"
   list, and non-matching alarms MUST appear in the "/alarms/alarm-list/
   alarm" list.  The server does not send any notifications for shelved
   alarms.

   Shelving and unshelving can only be performed by editing the shelf
   configuration.  It cannot be performed on individual alarms.  The
   server will add an operator state indicating that the alarm was
   shelved/unshelved.

   A leaf, "/alarms/summary/shelves-active", in the alarm summary
   indicates if there are shelved alarms.

   A system can select not to support the shelving feature.

RFC8632 - Page 16

4.2.  Alarm Inventory

   The alarm inventory represents all possible alarm types that may
   occur in the system.  A management system may use this to build alarm
   procedures.  The alarm inventory is relevant for the following
   reasons:

      The system might not implement all defined alarm type identities,
      and some alarm identities are abstract.

      The system has configured dynamic alarm types using the alarm
      qualifier.  The inventory makes it possible for the management
      system to discover these.

   Note that the mechanism whereby dynamic alarm types are added using
   the alarm-type qualifier MUST populate this list.

   The optional leaf-list "resource" in the alarm inventory enables the
   system to publish for which resources a given alarm type may appear.

   A server MUST implement the alarm inventory in order to enable
   controlled alarm procedures in the client.

   A server implementer may want to document the alarm inventory for
   offline processing by clients.  The file format defined in
   [YANG-INSTANCE] can be used for this purpose.

   The alarm inventory tree is shown below:

     +--ro alarm-inventory
        +--ro alarm-type* [alarm-type-id alarm-type-qualifier]
           +--ro alarm-type-id           alarm-type-id
           +--ro alarm-type-qualifier    alarm-type-qualifier
           +--ro resource*               resource-match
           +--ro will-clear              boolean
           +--ro severity-level*         severity
           +--ro description             string


4.3.  Alarm Summary

   The alarm summary list summarizes alarms per severity: how many
   cleared, cleared and closed, and closed.  It also gives an indication
   if there are shelved alarms.

RFC8632 - Page 17

   The alarm summary tree is shown below:

     +--ro summary {alarm-summary}?
        +--ro alarm-summary* [severity]
        |  +--ro severity                  severity
        |  +--ro total?                    yang:gauge32
        |  +--ro not-cleared?              yang:gauge32
        |  +--ro cleared?                  yang:gauge32
        |  +--ro cleared-not-closed?       yang:gauge32
        |  |       {operator-actions}?
        |  +--ro cleared-closed?           yang:gauge32
        |  |       {operator-actions}?
        |  +--ro not-cleared-closed?       yang:gauge32
        |  |       {operator-actions}?
        |  +--ro not-cleared-not-closed?   yang:gauge32
        |          {operator-actions}?
        +--ro shelves-active?   empty {alarm-shelving}?

4.4.  The Alarm List

   The alarm list, "/alarms/alarm-list", is a function from the tuple
   (resource, alarm type, alarm-type qualifier) to the current composite
   alarm state.  The composite state includes states for the resource
   alarm lifecycle such as severity, clearance flag, and operator states
   such as acknowledged.  This means that for a given resource and alarm
   type, the alarm list shows the current states of the alarm such as
   acknowledged and cleared.

   +--ro alarm-list
      +--ro number-of-alarms?   yang:gauge32
      +--ro last-changed?       yang:date-and-time
      +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
      |  +--ro resource                 resource
      |  +--ro alarm-type-id            alarm-type-id
      |  +--ro alarm-type-qualifier     alarm-type-qualifier
      |  +--ro alt-resource*            resource
      |  +--ro related-alarm*
      |  |       [resource alarm-type-id alarm-type-qualifier]
      |  |       {alarm-correlation}?
      |  |  +--ro resource
      |  |  |       -> /alarms/alarm-list/alarm/resource
      |  |  +--ro alarm-type-id           leafref
      |  |  +--ro alarm-type-qualifier    leafref
      |  +--ro impacted-resource*       resource
      |  |       {service-impact-analysis}?
      |  +--ro root-cause-resource*     resource
      |  |       {root-cause-analysis}?
      |  +--ro time-created             yang:date-and-time

RFC8632 - Page 18

      |  +--ro is-cleared               boolean
      |  +--ro last-raised              yang:date-and-time
      |  +--ro last-changed             yang:date-and-time
      |  +--ro perceived-severity       severity
      |  +--ro alarm-text               alarm-text
      |  +--ro status-change* [time] {alarm-history}?
      |  |  +--ro time                  yang:date-and-time
      |  |  +--ro perceived-severity    severity-with-clear
      |  |  +--ro alarm-text            alarm-text
      |  +--ro operator-state-change* [time] {operator-actions}?
      |  |  +--ro time        yang:date-and-time
      |  |  +--ro operator    string
      |  |  +--ro state       operator-state
      |  |  +--ro text?       string
      |  +---x set-operator-state {operator-actions}?
      |  |  +---w input
      |  |     +---w state    writable-operator-state
      |  |     +---w text?    string
      |  +---n operator-action {operator-actions}?
      |     +-- time        yang:date-and-time
      |     +-- operator    string
      |     +-- state       operator-state
      |     +-- text?       string
      +---x purge-alarms
      |  +---w input
      |  |  +---w alarm-clearance-status    enumeration
      |  |  +---w older-than!
      |  |  |  +---w (age-spec)?
      |  |  |     +--:(seconds)
      |  |  |     |  +---w seconds?   uint16
      |  |  |     +--:(minutes)
      |  |  |     |  +---w minutes?   uint16
      |  |  |     +--:(hours)
      |  |  |     |  +---w hours?     uint16
      |  |  |     +--:(days)
      |  |  |     |  +---w days?      uint16
      |  |  |     +--:(weeks)
      |  |  |        +---w weeks?     uint16
      |  |  +---w severity!
      |  |  |  +---w (sev-spec)?
      |  |  |     +--:(below)
      |  |  |     |  +---w below?   severity
      |  |  |     +--:(is)
      |  |  |     |  +---w is?      severity
      |  |  |     +--:(above)
      |  |  |        +---w above?   severity
      |  |  +---w operator-state-filter! {operator-actions}?

RFC8632 - Page 19

      |  |     +---w state?   operator-state
      |  |     +---w user?    string
      |  +--ro output
      |     +--ro purged-alarms?   uint32
      +---x compress-alarms {alarm-history}?
         +---w input
         |  +---w resource?               resource-match
         |  +---w alarm-type-id?
         |  |       -> /alarms/alarm-list/alarm/alarm-type-id
         |  +---w alarm-type-qualifier?   leafref
         +--ro output
            +--ro compressed-alarms?   uint32

   Every alarm has three important states: the resource clearance state
   "is-cleared", the severity "perceived-severity", and the operator
   state available in the operator-state change list.

   In order to see the alarm history, the resource state changes are
   available in the "status-change" list, and the operator history is
   available in the "operator-state-change" list.

4.5.  The Shelved-Alarm List

   The shelved-alarm list has the same structure as the alarm list
   above.  It shows all the alarms that match the shelving criteria
   "/alarms/control/alarm-shelving".

4.6.  Alarm Profiles

   Alarm profiles, "/alarms/alarm-profile", is a list of configurable
   alarm types.  The list supports configurable alarm severity levels in
   the container "alarm-severity-assignment-profile".  If an alarm
   matches the configured alarm type, it MUST use the configured
   severity level(s) instead of the system default.  This configuration
   MUST also be represented in the alarm inventory.

     +--rw alarm-profile*
             [alarm-type-id alarm-type-qualifier-match resource]
             {alarm-profile}?
        +--rw alarm-type-id                        alarm-type-id
        +--rw alarm-type-qualifier-match           string
        +--rw resource                             resource-match
        +--rw description                          string
        +--rw alarm-severity-assignment-profile
                {severity-assignment}?
           +--rw severity-level*    severity

RFC8632 - Page 20

4.7.  Operations

   The alarm data model supports the following actions to manage the
   alarms:

   "/alarms/alarm-list/purge-alarms":  Delete alarms from the "alarm-
      list" according to specific criteria, for example, all cleared
      alarms older than a specific date.

   "/alarms/alarm-list/compress-alarms":  Compress the "status-change"
      list for the alarms.

   "/alarms/alarm-list/alarm/set-operator-state":  Change the operator
      state for an alarm.  For example, an alarm can be acknowledged by
      setting the operator state to "ack".

   "/alarms/shelved-alarm-list/purge-shelved-alarms":  Delete alarms
      from the "shelved-alarm-list" according to specific criteria, for
      example, all alarms older than a specific date.

   "/alarms/shelved-alarm-list/compress-shelved-alarms":  Compress the
      "status-change" list for the alarms.

4.8.  Notifications

   The alarm data model supports a general notification to report alarm-
   state changes.  It carries all relevant parameters for the alarm-
   management application.

   There is also a notification to report that an operator changed the
   operator state on an alarm, like acknowledged.

   If the alarm inventory is changed, for example, a new card type is
   inserted, a notification will tell the management application that
   new alarm types are available.

5.  Relationship to the ietf-hardware YANG Module

   RFC 8348 [RFC8348] defines the "ietf-hardware" YANG data model for
   the management of hardware.  The "alarm-state" in RFC 8348 is a
   summary of the alarm severity levels that may be active on the
   specific hardware component.  It does not say anything about how
   alarms are reported, and it doesn't provide any details of the
   alarms.

RFC8632 - Page 21

   The mapping between the alarm YANG data model, prefix "al", and the
   "alarm-state" in RFC 8348, prefix "hw", is as follows:

   "al:resource":  Corresponds to an entry in the list
      "/hw:hardware/hw:component/".

   "al:is-cleared":  No bit set in "/hw:hardware/hw:component/hw:state/
      hw:alarm-state".

   "al:perceived-severity":  Corresponding bit set in
      "/hw:hardware/hw:component/hw:state/hw:alarm-state".

   "al:operator-state-change/al:state":  If the alarm is acknowledged by
      the operator, the bit "hw:under-repair" is set in
      "/hw:hardware/hw:component/hw:state/hw:alarm-state".

(page 21 continued on part 2)