Signaling
TechnicalThe process of coordinating communication between WebRTC peers before establishing a connection
What is Signaling?
Signaling is the process that allows two WebRTC peers to discover each other and exchange the information necessary to establish a direct peer-to-peer connection. Think of signaling as the initial phone call where you arrange to meet someone—you're not at the meeting yet, you're just coordinating where and when it will happen.
When you want to start a video call with someone, your browser doesn't magically know their IP address, what video codecs they support, or how to reach them through their firewall. Signaling is the messenger service that exchanges this critical setup information between the two peers before WebRTC can establish the actual media connection.
Interestingly, WebRTC deliberately does NOT specify how signaling should work. You can use WebSockets, HTTP long polling, Server-Sent Events, or even carrier pigeons—WebRTC doesn't care. This flexibility is intentional, allowing developers to integrate WebRTC into existing architectures and communication systems.
Why Signaling is Necessary
WebRTC enables peer-to-peer connections, meaning your browser talks directly to the other person's browser without media going through a server (in most cases). But there's a catch-22: to establish a direct connection, the two peers need to know about each other first.
Your WebRTC client knows nothing initially. It doesn't know:
- Who you want to connect to
- Where they are (IP address)
- What media formats they support (codecs)
- How to reach them through NAT/firewalls
- What encryption keys to use for secure communication
Signaling provides a neutral meeting place—a signaling server—where both peers can exchange this bootstrapping information. Once they have enough data about each other, they can attempt to establish a direct connection and the signaling server's job is done (though it often remains available for coordinating changes during the call).
How Signaling Works: The Offer/Answer Dance
WebRTC signaling follows a pattern called the Offer/Answer Model, borrowed from SIP (Session Initiation Protocol). Here's the step-by-step process:
1. Peer A Creates an Offer
The peer initiating the call (Peer A) creates an SDP offer by calling createOffer() on their RTCPeerConnection object. This offer contains:
- Supported audio and video codecs (H.264, VP8, VP9, Opus, etc.)
- Media capabilities (resolution, frame rate)
- Encryption keys (DTLS fingerprints for securing the connection)
- Network transport details
- Session metadata
The offer is formatted as SDP (Session Description Protocol), a text-based format that looks like this:
v=0
o=- 123456789 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE 0 1
m=audio 9 UDP/TLS/RTP/SAVPF 111 103
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:abcd
a=ice-pwd:efgh1234
a=rtpmap:111 opus/48000/2
m=video 9 UDP/TLS/RTP/SAVPF 96 97
a=rtpmap:96 VP8/90000
...
2. Offer is Sent via Signaling Channel
Peer A sets this offer as their local description (setLocalDescription()) and sends it to Peer B through the signaling channel. This could be a WebSocket message, an HTTP POST request, a message queue—whatever the application uses for signaling.
Critically, the signaling server doesn't need to understand the SDP content. It's just passing a blob of text from Peer A to Peer B. The SDP is a "black box" to the signaling infrastructure.
3. Peer B Receives and Processes the Offer
Peer B receives the offer via the signaling channel and sets it as their remote description (setRemoteDescription()). The browser parses the SDP to understand what Peer A is capable of and what they're proposing for the call.
4. Peer B Creates an Answer
Peer B creates an SDP answer by calling createAnswer(). This answer responds to the offer by:
- Selecting compatible codecs from those offered (rejecting unsupported ones)
- Confirming which media streams to accept
- Providing their own encryption keys
- Adding their network transport details
5. Answer is Sent Back
Peer B sets this answer as their local description and sends it back to Peer A through the signaling channel. Peer A receives it and sets it as their remote description.
6. ICE Candidates Exchange
While the offer/answer exchange happens, both peers are gathering ICE candidates—potential network paths for the connection. Each time a peer discovers a candidate (local IP, public IP from STUN, relay address from TURN), they send it to the other peer via signaling.
This is called "Trickle ICE"—candidates are sent as they're discovered rather than waiting for all candidates before sending. This significantly speeds up connection establishment, often reducing setup time from 5-10 seconds to under 1 second.
7. Connection Established
Once both peers have each other's SDP and sufficient ICE candidates, ICE performs connectivity checks and establishes the best peer-to-peer path. Media starts flowing directly between peers, and the signaling channel is no longer needed for the media itself (though it often remains available for call control like adding participants or ending the call).
What is SDP?
Session Description Protocol (SDP) is the format used for WebRTC offers and answers. Defined in RFC 8866, SDP is a text-based protocol originally designed for describing multimedia sessions. While it looks cryptic, it's human-readable and follows a simple key=value format.
An SDP message contains:
- Session-level information: Session name, timing, connection data
- Media descriptions: One per media type (audio, video, data channel)
- Attributes: Codec details, encryption, ICE parameters, transport protocols
Each media description (m= line) lists codecs in preference order. For example, m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 means this is a video stream supporting three codecs with IDs 96, 97, and 98 (VP8, VP9, H.264, etc., specified in subsequent rtpmap attributes).
While you rarely need to manually parse SDP (the browser handles it), understanding its structure helps debug connection issues and optimize codec selection.
Common Signaling Protocols
Since WebRTC doesn't standardize signaling, developers have freedom to choose what works best for their application:
WebSocket (Most Common)
WebSocket provides full-duplex, real-time, bidirectional communication over a single TCP connection. It's the de facto standard for WebRTC signaling because:
- Low latency (persistent connection, no reconnection overhead)
- Efficient for real-time signaling (push from server to client instantly)
- Well-supported by libraries like Socket.IO, ws, uWebSockets
- Natural fit for Trickle ICE (stream candidates as they're discovered)
Most production WebRTC applications use WebSocket for signaling.
HTTP (REST + Polling or SSE)
HTTP-based signaling uses REST APIs for sending messages and either polling or Server-Sent Events (SSE) for receiving. While less efficient than WebSocket, it works through restrictive corporate firewalls that might block WebSocket.
WHIP (WebRTC-HTTP Ingestion Protocol) and WHEP (WebRTC-HTTP Egress Protocol) are modern IETF standards that use simple HTTP POST requests for signaling, particularly useful for broadcasting scenarios.
Custom/Proprietary
Some applications use existing messaging infrastructure:
- XMPP/Jingle (messaging apps extending existing chat protocols)
- SIP (integrating with traditional VoIP infrastructure)
- MQTT (IoT applications)
- Custom message queues (RabbitMQ, Redis Pub/Sub)
The Signaling Server's Role
A signaling server typically handles:
- User presence: Tracking which users are online and available for calls
- Room/session management: Creating and managing call rooms, handling who can join
- Message routing: Relaying SDP offers, answers, and ICE candidates between peers
- Authentication: Verifying users have permission to initiate or join calls
- Call control: Coordinating events like mute/unmute, adding participants, screen sharing, ending calls
The signaling server is also where you implement business logic: user permissions, recording triggers, billing events, analytics, etc. While it doesn't handle media, it's the control plane for your entire video calling system.
Signaling in Different Architectures
Peer-to-Peer
In pure P2P, the signaling server only helps peers find each other and exchange initial connection data. Once connected, peers communicate directly. The signaling server can go offline mid-call without affecting media quality.
SFU (Selective Forwarding Unit)
With an SFU, signaling gets more complex. The server must coordinate:
- Each client's connection to the SFU (not peer-to-peer)
- Which streams each participant subscribes to
- Simulcast layer selection
- Dynamic stream routing as participants change their view
SFU signaling is more sophisticated than P2P, often using custom protocols beyond basic SDP exchange.
MCU (Multipoint Control Unit)
MCU signaling is simpler in some ways—each client connects to the MCU server, receives one composite stream, and the signaling server manages who's in the conference. The MCU handles all mixing server-side, so signaling focuses on room management rather than complex stream routing.
Security Considerations
Use Secure Transports
Always use WSS (WebSocket Secure) or HTTPS, not plain WS or HTTP. The SDP contains encryption fingerprints, but if the signaling channel itself is unencrypted, an attacker could perform man-in-the-middle attacks by modifying the SDP.
Authenticate Signaling Messages
Verify that users have permission to send offers, join rooms, or access certain calls. Implement token-based authentication (JWT is common) and validate tokens on every signaling message, not just initial connection.
Rate Limiting
Signaling servers are vulnerable to DoS attacks. A malicious client could spam the server with offers or join requests. Implement rate limiting per user/IP to prevent abuse.
Validate SDP
While treating SDP as a black box, you can still validate basic sanity: size limits (typical SDP is 2-10KB, anything massive is suspicious), proper formatting, reasonable number of media lines. This prevents attacks trying to exploit SDP parsers.
Debugging Signaling Issues
When WebRTC connections fail, signaling is often the culprit:
- Connection never starts: Check if SDP offer/answer are exchanged. Use browser DevTools Network tab to verify signaling messages arrive
- "Failed to set remote description" error: SDP is malformed or incompatible. Check codec negotiation, ensure both peers support at least one common codec
- ICE candidates not arriving: Signaling channel might not support Trickle ICE messages, or there's a bug in candidate relay logic
- Connection works sometimes but not always: Race conditions in signaling. Ensure proper ordering: setLocalDescription → send offer → receive answer → setRemoteDescription
Chrome's chrome://webrtc-internals shows the complete signaling exchange, including SDP content and timing, invaluable for debugging.
Modern Developments (2025)
WHIP/WHEP
WebRTC-HTTP Ingestion Protocol (WHIP) and WebRTC-HTTP Egress Protocol (WHEP) standardize HTTP-based signaling for broadcasting use cases. They reduce signaling complexity to simple HTTP POST requests, perfect for ingesting streams to CDNs or cloud services.
Adoption is growing in 2025, particularly for live streaming and broadcast scenarios where traditional WebSocket signaling was overkill.
Decentralized Signaling
Some projects explore decentralized signaling using DHT (Distributed Hash Tables), blockchain, or peer-discovery protocols. While theoretically interesting, centralized signaling servers remain far more practical and reliable for production applications in 2025.
The Bottom Line
Signaling is the unsung choreographer of WebRTC. While it doesn't transport media, it makes the entire peer-to-peer connection possible by coordinating the initial handshake. Understanding signaling is essential for debugging WebRTC issues, implementing secure video calling, and scaling your infrastructure.
WebRTC's deliberate lack of signaling standardization is a strength, not a weakness. It allows you to integrate WebRTC into any existing system—whether you're adding video to a chat app, building a telehealth platform, or creating a gaming voice chat feature. You can reuse your existing authentication, messaging infrastructure, and business logic.
The signaling server is where your application's unique identity lives. The WebRTC connection might be standardized, but signaling is where you differentiate your product.
References
- WebRTC Signaling Server: How it Works? - Ant Media
- Signaling - WebRTC for the Curious
- Understanding WebRTC Signaling: A Guide - DigitalSamba
- SDP Messages Tutorial - Session Description Protocol - Stream
- Signaling and video calling - WebRTC API - Mozilla Developer Network
- WebRTC connectivity - Mozilla Developer Network
- WebRTC Signaling: Servers, Protocols, and How it Works - VideoSDK
- Getting started with peer connections - WebRTC.org