A (managed) system may experience faults such as malfunctions, a defect in system design, a defect in the software, or external interference. These faults may (or may not) lead to a system state that is different from the correct or desired system state. An incorrect system state is called error. Errors are hence caused by faults. Faults and errors are not always externally observable and may remain undetected.
Errors, in turn, may (or may not) cause failures. A failure is the inability to deliver the correct service as defined by the service specification. A failure is hence always externally observable.
In summary, a fault may cause one or more errors, and an error may cause one or more failures.
An alarm is a management representation of a fault, an error or a failure that requires attention or reaction by an operator or some machine.
Fault Management is concerned with representing, managing, and reporting alarms. Fault Management is often also referred to as Alarm Management. The alarm model is independent from the underlying managed system. The same model can be used to represent alarms from any 3GPP generation or other networks and any resource. Specifics of the managed system manifest themselves only in the values of the information elements of the alarm model.
Alarms allow to report any kind of issue, from small faults without service impact to large scale failures of telecommunication services affecting many users.
A prerequisite for Fault Management as defined in the present document is that the managed system is represented by managed objects, that are organized in hierarchical object trees, in the management system.
The solution specified in the present document is based on
ITU-T X.733 [8].
Fault Management is considered a generic management service. It shall be able to support fault indications about any generation of 3GPP or other networks and any resource that can be addressed by a distinguished name e.g. ManagedElements, ENBs or NetworkSlices or non-3GPP managed resources.
Fault management can handle alarms about any kind of fault in a 3GPP system from small hardware errors to service failures effecting many users.