The WebRTC protocols are developed and maintained by the rtcweb group in IETF. The WebRTC API is developed by W3C.
The WebRTC API is decomposed into three layers:
API for web developers that consists mainly of the MediaStream, RTCPeerConnection, and RTCDataChannel objects.
API for browser and user agent implementers and providers
Overridable API for audio/video capture and rendering and for network input/output, which the browser implementers may hook their own implementations to.
The main WebRTC stack components are the voice engine, the video engine, and the transport component.
The transport component ensures a secure transport channel for both parties of the call to communicate. It relies on an RTP protocol stack that runs over DTLS and leverages the SRTP profile.
The following diagram depicts the WebRTC protocol stack:
WebRTC delegates the signalling exchange to the application. The signalling protocol and format may be chosen by the application freely. However, the offer and answer are generated in the SDP format. The ICE candidates may be provided as strings or in JSON format.
WebRTC needs negotiation for the following purposes:
Negotiation of the media streams and formats: this relies on the SDP offer/answer mechanism to generate and validate media streams and parameters.
Negotiation of the transport parameters: this relies on ICE to identify and test ICE candidates. Whenever a higher priority ICE candidate is validated, the connection will switch to it.
The following call flow shows an example of the ICE negotiation process:
Due to the separation of the negotiation of the transport parameters from the media parameters, appropriate QoS negotiation needs to consider consecutive and asynchronous changes to the connection parameters. In case of a relay server, such as a TURN server, is deployed, the QoS negotiation is to be extended to appropriately cover the outbound streams as well.
A subset of WebRTC, limited to a protocol stack and implementation excluding codecs and other media processing functions defined in W3C and/or IETF, is considered in clauses 6.5
and 8.3 to define an instantiation of AR conversational services.