End-to-End Encryption (E2EE)
技術Security system where only communicating parties can read messages, excluding servers
What is End-to-End Encryption?
End-to-end encryption (E2EE) is a security system where only the communicating parties can read the messages. No one else—not the service provider, not the server infrastructure, not even the platform operator—can decrypt the communication. Think of it like sending a letter in a locked box where only the recipient has the key, and the postal service can't open it.
In video calling, E2EE means your audio and video streams are encrypted on your device and remain encrypted until they reach the recipient's device. Even if someone intercepts the data in transit or compromises the servers routing your call, they see only encrypted gibberish.
E2EE is the gold standard for privacy. It's used by Signal, WhatsApp (for messaging and calls), FaceTime, and increasingly by enterprise video conferencing platforms. As of 2025, E2EE in WebRTC has matured significantly with new standards and browser APIs making implementation practical for production applications.
WebRTC's Built-in Encryption (DTLS-SRTP)
WebRTC mandates encryption—all WebRTC connections are encrypted by default using DTLS-SRTP:
- DTLS (Datagram Transport Layer Security): Establishes encrypted channels and exchanges encryption keys
- SRTP (Secure Real-time Transport Protocol): Encrypts the actual media packets (audio and video)
This encryption is mandatory and automatic—you can't disable it. Every WebRTC connection is encrypted, protecting against passive eavesdropping on the network.
However, DTLS-SRTP alone is NOT end-to-end encryption. Here's why.
The SFU Problem: Why Standard WebRTC Isn't E2EE
In peer-to-peer WebRTC (two people connecting directly), DTLS-SRTP provides true E2EE. Only the two participants have the decryption keys. Perfect.
But most video calling uses SFU (Selective Forwarding Unit) servers for group calls. Here's what happens:
- Your device encrypts media with DTLS-SRTP and sends it to the SFU server
- The SFU server decrypts the media to inspect and route it
- The SFU re-encrypts the media with new DTLS-SRTP keys
- The re-encrypted media is sent to recipients
In step 2, the SFU has access to unencrypted media—even if only briefly in memory. This breaks end-to-end encryption. The media is encrypted in transit, but the server can potentially access it.
Why do SFUs decrypt? They need to inspect RTP headers to determine which streams to forward, detect keyframes, handle simulcast layers, and perform quality-based switching. Traditional SFU architecture requires plaintext access to media packets.
True E2EE with Insertable Streams
The solution: add a second layer of encryption that the SFU never decrypts. Only the endpoints (participants) have the keys for this layer.
How It Works
- Your device captures and encodes video/audio
- Before sending, your device adds application-layer encryption (E2EE layer)
- This encrypted payload is then wrapped in DTLS-SRTP (transport encryption)
- The SFU receives the packet, decrypts DTLS-SRTP, but sees only the application-layer ciphertext
- The SFU forwards the (still encrypted) media to recipients
- Recipients decrypt the application-layer encryption to recover the original media
The SFU never sees plaintext media. It only sees encrypted frames that it blindly forwards.
Insertable Streams API
Introduced in 2020 and standardized as RTCRtpScriptTransform, this browser API allows JavaScript code to intercept encoded media frames before they're packetized and sent, or after they're received but before decoding.
Your code can:
- Read each encoded frame
- Apply additional encryption using cryptographic libraries (like WebCrypto API)
- Return the encrypted frame for sending
- On reception, decrypt frames before passing them to the decoder
Browser support: Chromium-based browsers (Chrome, Edge, Opera) fully support it. Safari has partial support. Firefox support is in progress as of 2025.
SFrame: The Emerging Standard
SFrame (Secure Frame) is an IETF standard protocol specifically designed for encrypting media frames in WebRTC group calls. It's optimized for the constraints of real-time media:
Key Features
- Partial encryption: Only encrypts the media payload, leaving RTP headers and some metadata unencrypted so SFUs can still route packets
- Per-frame encryption: Each frame is independently encrypted, allowing out-of-order delivery and packet loss without affecting other frames
- Minimal overhead: ~10-40 bytes per frame for encryption metadata
- Fast symmetric encryption: Uses AES-GCM for speed (critical for real-time encoding)
- Group key management: Supports multiple participants with shared group keys
How SFrame Works
Each participant has an encryption key. Before sending a frame:
- Frame is encoded (VP8, H.264, Opus, etc.)
- SFrame adds a small header indicating which key was used and a counter
- The encoded payload is encrypted with AES-GCM
- The encrypted frame is sent via WebRTC
Recipients identify the sender from the SFrame header, use the corresponding key to decrypt, and decode the media.
MLS: Group Key Management
The hardest part of E2EE isn't encrypting frames—it's managing encryption keys across multiple participants, especially as people join and leave calls.
MLS (Messaging Layer Security), standardized by the IETF in 2023, is a group key exchange protocol designed for exactly this problem.
MLS Features
- Efficient key updates: Adding a participant doesn't require re-exchanging keys with everyone
- Forward secrecy: Compromising today's keys doesn't decrypt past communications
- Post-compromise security: Recovering from key compromise is possible
- Scalability: Works efficiently with thousands of participants
Key Rotation
When someone leaves a call, keys must be rotated so departed participants can't decrypt future conversation. MLS handles this efficiently:
- Generate new group key
- Distribute to current participants via the existing E2EE channel
- All participants switch to the new key
For joins, hash ratcheting can derive new keys without full key exchange, reducing overhead.
Implementation Challenges
1. Performance Overhead
Additional encryption/decryption consumes CPU. On mobile devices or low-end hardware, this can reduce video quality or drain batteries faster. Hardware acceleration for AES helps, but the overhead is real (typically 10-30% CPU increase).
2. Partial Encryption Complexity
SFUs need some unencrypted metadata to function:
- Keyframe indicators (to prioritize important frames)
- Simulcast layer information (for quality switching)
- Codec-specific metadata
Determining exactly what must remain unencrypted varies by codec (VP8, VP9, H.264, Opus) and frame type. Getting this wrong breaks SFU functionality or leaks information.
3. Key Management Complexity
Securely distributing, rotating, and managing keys across dynamic groups is hard. You need:
- Secure initial key exchange (often using public key cryptography)
- Key derivation functions
- Synchronization mechanisms (all participants must use the same key version)
- Handling race conditions (simultaneous joins/leaves)
4. Browser Support
As of 2025, Insertable Streams support is incomplete. Chromium browsers are fully compatible, but Safari and Firefox support is partial or in development. Cross-browser E2EE requires fallback strategies or limits supported platforms.
5. Debugging and Monitoring
With E2EE, you can't inspect media server-side. Debugging call quality issues becomes harder—you can't look at the video frames the server sees because it sees only encrypted data. Telemetry and diagnostics must happen client-side.
P2P vs. SFU E2EE
Peer-to-Peer
E2EE is trivial with P2P. WebRTC's built-in DTLS-SRTP already provides E2EE—only the two peers have the keys. No additional encryption layer needed.
This is how FaceTime, WhatsApp 1-on-1 calls, and Signal work. The simplicity is beautiful: connect directly, exchange keys via DTLS, encrypt with SRTP. Done.
SFU (Group Calls)
E2EE with SFU requires the additional application-layer encryption (SFrame or similar) plus group key management (MLS or similar). Much more complex, but necessary for calls with >4-5 participants.
Real-World Implementations (2025)
Apps With E2EE
- WhatsApp: E2EE for all calls (1-on-1 and group), uses Signal Protocol for key exchange
- Signal: E2EE for everything (messaging and calls), pioneer of modern E2EE
- FaceTime: E2EE for all calls, Apple's proprietary implementation
- Zoom: Optional E2EE for meetings (disabled by default, requires host enablement)
- Jitsi Meet: Offers E2EE using Insertable Streams for Chromium browsers
- Cloudflare Calls: Demonstrated E2EE implementation with MLS (Orange Meets project)
Apps Without E2EE
- Google Meet: Encrypted in transit (DTLS-SRTP) but not E2EE—Google can decrypt
- Microsoft Teams: Encrypted in transit but not E2EE—Microsoft can decrypt
- Most enterprise video platforms: Not E2EE by default to enable features like recording, transcription, compliance monitoring
Why Not Always Use E2EE?
If E2EE is more secure, why don't all platforms use it?
- Feature limitations: Server-side features like cloud recording, live transcription, content moderation, and AI features require access to media—incompatible with E2EE
- Compliance: Some industries require the ability to audit communications, which E2EE prevents
- Performance: Additional encryption overhead reduces quality on low-end devices
- Complexity: E2EE is harder to implement and maintain, increasing development cost
- Browser support: Cross-browser E2EE still has compatibility challenges in 2025
For many business applications, the trade-offs favor transport encryption (DTLS-SRTP) without E2EE. For privacy-focused consumer apps, E2EE is increasingly the standard.
Verifying E2EE
How can you tell if a video call truly uses E2EE?
- Security codes/fingerprints: Apps like Signal and WhatsApp show security codes you can verify with the other party out-of-band (in person, separate channel)
- Platform documentation: Check if the platform explicitly states E2EE and explains how it works
- Open source: Open source implementations (like Jitsi) can be audited
- Third-party audits: Security audits by reputable firms provide confidence
If an app offers cloud recording or AI transcription, it's probably not E2EE (those features require server access to unencrypted media).
The Future of E2EE in WebRTC
As of 2025, E2EE in WebRTC is transitioning from "difficult and niche" to "practical and increasingly standard":
- SFrame is being standardized and implemented in production systems
- MLS provides robust group key management
- Browser support for Insertable Streams is improving
- Major platforms (Zoom, Jitsi) now offer E2EE options
- Libraries like libsframe make implementation easier
Expect E2EE to become the default for consumer video calling by 2026-2027, similar to how HTTPS became the default for websites.
The Bottom Line
End-to-end encryption is the gold standard for communication privacy. While WebRTC has always been encrypted in transit (DTLS-SRTP), true E2EE—where even the servers can't decrypt your calls—requires additional application-layer encryption using technologies like SFrame and key management protocols like MLS.
The trade-off is complexity and reduced functionality (no server-side recording, transcription, or AI features). But for privacy-critical applications—personal calls, sensitive business discussions, healthcare consultations—E2EE is essential.
As browser support matures and standards solidify, implementing E2EE in WebRTC applications is becoming practical for mainstream developers. Understanding E2EE helps you make informed decisions about when to use it and how to implement it effectively.
References
- True End-to-End Encryption with WebRTC Insertable Streams - webrtcHacks
- Having fun with Insertable Streams and E2EE (and SFrame!) - Meetecho
- Does your video call have End-to-End Encryption? Probably not.. - webrtcHacks
- WebRTC Security - Is it secure and safe? - Stream
- End-to-End Encryption - Dyte
- Exploring End-to-End Encryption (E2EE) in WebRTC - DigitalSamba
- Orange Me2eets: We made an end-to-end encrypted video calling app - Cloudflare
- End-to-End Encryption - Cloudflare Realtime Documentation