WebRTC: Unleashing Real-Time Communication in just 7 steps

What is WebRTC?

WebRTC (Web Real-Time Communication) is a technology that allows web applications and sites to capture and optionally stream audio and/or video media, as well as to exchange arbitrary data between browsers without requiring an intermediary. The standards and protocols used by WebRTC make it possible to share data and perform teleconferencing peer-to-peer, without requiring that the user installs plugins or any other third-party software.

When to use WEBRTC?

WebRTC is used in various scenarios where real-time communication or data exchange between web browsers is required. Some common use cases include:

  1. Video Conferencing: WebRTC is frequently used in video conferencing applications to enable users to have face-to-face meetings directly in their web browsers without the need for additional software or plugins.
  2. Voice Calls: It powers web-based voice calling applications, allowing users to make phone calls directly from their web browsers.
  3. Live Streaming: WebRTC can be used to stream live audio and video content over the web in real-time, enabling applications such as live gaming, webinars, and online broadcasting.
  4. File Sharing: WebRTC’s data channel feature allows for real-time file sharing between browsers, enabling users to transfer files directly without relying on third-party servers.
  5. Collaborative Applications: It is used in collaborative applications such as online document editing, collaborative whiteboards, and real-time code editing platforms, where multiple users need to interact with each other in real-time.
  6. Remote Assistance: WebRTC can be utilized in remote assistance and support applications, allowing support agents to communicate with and assist users in real-time directly through their web browsers.
  7. Online Gaming: WebRTC’s low-latency communication capabilities make it suitable for online gaming applications, enabling real-time multiplayer gaming experiences directly in the browser.
  8. IoT (Internet of Things): WebRTC can be used in IoT applications for real-time communication between IoT devices and web browsers, enabling scenarios such as remote monitoring and control.

Overall, WebRTC is used in a wide range of applications where real-time communication, collaboration, or data exchange over the web is required, offering developers a powerful and standardized solution for building such functionalities directly into web applications.

When to avoid WebRTC?

Here are some situations when you might want to consider alternatives:

  1. Large-scale Broadcasts: WebRTC is designed for peer-to-peer and small group communications. For broadcasting to large audiences, traditional streaming protocols like HLS (HTTP Live Streaming) or DASH (Dynamic Adaptive Streaming over HTTP) are more efficient and scalable. These protocols use a one-to-many distribution model that is better suited for delivering content to a large number of viewers simultaneously.
  2. Highly Reliable and Controlled Environments: While WebRTC is designed to be robust over various network conditions, it might not be suitable for environments where you have strict quality of service (QoS) requirements or need guaranteed delivery of data. In cases where network control and predictability are crucial, such as in some enterprise or mission-critical applications, dedicated media servers or other controlled setups might be more appropriate.
  3. Low Latency Not Required: If the application does not require real-time interaction (e.g., video on demand), using WebRTC could be overkill. In such cases, other technologies that are optimized for less than real-time delivery, but provide better bandwidth efficiency and caching opportunities, may be a better fit.
  4. Complex Server-Side Processing of Media: WebRTC is primarily peer-to-peer. If your application requires complex processing of media (such as mixing, transcoding, or adding overlays), you may need to route streams through a server, which can negate some of the benefits of WebRTC’s peer-to-peer nature. In these scenarios, using media server technologies that handle these operations might be more suitable.
  5. Limited Device or Browser Support: Although WebRTC is widely supported across modern browsers, there are still environments and older devices where WebRTC is not supported. If your target audience includes users on these browsers or devices, relying solely on WebRTC might limit your application’s accessibility.
  6. Regulatory and Compliance Requirements: In situations where data residency and regulatory compliance are critical (such as certain healthcare or financial services applications), the decentralized nature of WebRTC might pose challenges. Specific configurations and additional measures might be necessary to ensure compliance, or alternative solutions might be needed.
  7. Cost Considerations for TURN Services: Although WebRTC itself is free, running TURN servers (necessary for relaying traffic when peer-to-peer connections cannot be established) can be costly, especially at scale. If your application frequently requires TURN services, it might increase operational costs.

Choosing whether to use WebRTC involves considering the specific needs of your application, the expected user experience, and the technical requirements of your infrastructure. In cases where WebRTC isn’t suitable, alternative technologies or architectures should be considered to better meet the application’s needs.

How does WebRTC work?

Here is a step by step guide:

Step 1. Signaling and Connection Setup

  • Initialization: WebRTC uses JavaScript APIs to access device capabilities (camera, microphone, and screen capture) and create a communication channel. A developer needs to implement the signaling mechanism using a signaling server (using technologies such as WebSocket or XHR) to exchange messages between peers before a direct connection can be established.
  • Signaling Process: The signaling process involves exchanging session control messages to negotiate the parameters of the call, including:
    • Offer/Answer Model: A peer (caller) creates an offer describing its streaming capabilities (codecs, resolutions, etc.) using SDP (Session Description Protocol). This offer is sent to the other peer (callee) via the signaling server.
    • Answer: The callee responds with an answer, also formulated in SDP, describing its own capabilities and agreeing to the proposed parameters or suggesting modifications.
    • ICE Candidates Gathering: Each peer gathers ICE (Interactive Connectivity Establishment) candidates, which are possible ways to connect based on the network environment. This includes direct local network paths, STUN (Session Traversal Utilities for NAT) server reflexive paths, and TURN (Traversal Using Relay NAT) server relay paths.

Step 2. Network Address Discovery (ICE)

  • Candidate Collection: Peers use ICE (Interactive connectivity establishment) to discover and propose candidate IP addresses and ports to establish the connection. This includes finding out the best path for the media streams to travel between peers.
  • STUN/TURN Servers: STUN servers help clients find out their public-facing IP and port, which can traverse NAT. TURN servers relay traffic if direct (peer-to-peer) connection establishment fails.

    A STUN (Session Traversal Utilities for NAT) server allows clients to discover their public IP and port. This is important for creating server reflexive candidates when the device is behind a NAT.

    A TURN (Traversal Using Relays around NAT) server acts as a relay between the peers when no direct connection is possible (e.g., in restrictive NAT scenarios). TURN servers consume more bandwidth since all traffic must pass through the TURN server, making it a less preferred option if direct connections or STUN can be used.

Step 3. Connection Establishment

  • ICE Negotiation: Peers exchange ICE candidates over the signaling channel and try to find a match that works for both ends. This process involves testing various candidate pairs to establish the best connection route.
  • DTLS Handshake: Once a viable network path is chosen, a secure connection is established using DTLS (Datagram Transport Layer Security). This is essential to ensure that the data exchanged via WebRTC is encrypted and secure.

Step 4. Media Capture and Stream Establishment

  • Media Access: Using the getUserMedia API, WebRTC requests access to the local devices’ media inputs, like microphones and cameras.
  • Stream Handling: The local media streams are captured, and can either be displayed locally or sent across the network. The RTCPeerConnection interface manages the transmission of these streams, handling encoding and decoding, packetization, and network adaptation logics.

Step 5. Media Transmission

  • SRTP for Media Encryption: The actual media is sent over the network using SRTP (Secure Real-time Transport Protocol), which ensures the encryption, integrity checking, and authentication of media data packets.
  • Bandwidth Management: WebRTC continuously monitors the connection and dynamically adjusts the quality of the stream based on the available network bandwidth.

Step 6. Ongoing Communication and Adaptation

  • Feedback Loops: RTCP (Real-Time Control Protocol) provides out-of-band statistics and control information for an RTP (Real-time Transport Protocol) flow. It helps in synchronization and overall media stream quality adjustment.
  • Data Channel: WebRTC also supports a bidirectional data channel over the same DTLS connection, enabling arbitrary data to be sent peer-to-peer alongside the media streams.

Step 7. Session Termination

  • Closing the Connection: Either peer can close the connection by terminating the RTCPeerConnection, which will properly close all associated media and data channels. The signaling server can also be notified to clean up the session data.

WebRTC’s design allows for high-performance, adaptable, and secure communication in web applications, providing the tools necessary for real-time, peer-to-peer communication entirely within the browser.

See more in