videocalling

Latency

技術

The time delay between sending and receiving data in a network connection

What is Latency?

Latency is the time delay between when data is sent and when it's received. In video calling, it's the lag between when you speak and when the other person hears you, or when you move and when they see that movement. Think of latency like the postal delivery time—bandwidth is the size of the package you can send, but latency is how long the package takes to arrive.

Low latency is critical for natural conversation flow. When latency exceeds 150-200 milliseconds, conversations become noticeably awkward—people start talking over each other, pauses feel unnatural, and the entire interaction feels less "live" and more like a delayed broadcast.

WebRTC is designed specifically for low-latency real-time communication. While traditional streaming protocols like HLS might have 5-30 seconds of latency, WebRTC typically achieves sub-500 millisecond latency, often as low as 100-250ms in optimal conditions.

Latency vs. Bandwidth

These are often confused, but they measure different things:

  • Bandwidth: How much data can travel per second (capacity)
  • Latency: How long data takes to travel (speed/delay)

You can have high bandwidth with high latency (satellite internet: 25 Mbps but 600ms latency) or low bandwidth with low latency (3G mobile: 1 Mbps but 50ms latency). Video calling requires both adequate bandwidth AND low latency.

Low bandwidth causes pixelation and freezing. High latency causes delayed, awkward conversations. Both degrade the experience, but in different ways.

Components of Latency

Total end-to-end latency in a WebRTC video call consists of multiple components:

1. Capture Latency (5-20ms)

Time to capture audio/video from your microphone and camera. Modern devices typically achieve 10-15ms capture latency.

2. Encoding Latency (10-50ms)

Time to compress raw audio/video into an encoded format. Hardware encoders are faster (~10-20ms) than software encoders (~20-50ms). More complex codecs (VP9, AV1) add more encoding latency than simpler ones (VP8, H.264).

3. Packetization Latency (1-5ms)

Time to break encoded data into RTP packets. Minimal but measurable.

4. Network Latency (10-300ms)

Time for packets to travel across the internet from sender to receiver. This is typically the largest and most variable component:

  • Local network (same city): 5-20ms
  • Regional (same country): 20-50ms
  • Cross-country: 50-100ms
  • Intercontinental: 100-300ms
  • Satellite: 500-700ms (geostationary orbit)

Network latency is limited by physics—the speed of light through fiber optic cables. You cannot improve this beyond the physical distance limit, though poor routing or congestion can make it worse than the theoretical minimum.

5. Jitter Buffer Latency (0-100ms)

Receivers buffer incoming packets briefly to smooth out jitter (variation in packet arrival times). Adaptive jitter buffers dynamically adjust size based on network conditions—typically 20-60ms for stable networks, expanding to 100ms+ for jittery networks.

6. Decoding Latency (5-30ms)

Time to decompress received packets into raw audio/video. Like encoding, hardware decoders are faster than software.

7. Rendering Latency (5-20ms)

Time to display video frames and play audio through speakers. Includes synchronization delays to keep audio and video aligned.

Total End-to-End Latency

Summing all components:

  • Best case (local network, hardware acceleration): 50-100ms
  • Typical case (same region, good connection): 100-250ms
  • Acceptable case (cross-country): 250-400ms
  • Poor case (intercontinental, congestion): 400-600ms+

WebRTC aims to stay below 500ms for all calls, and below 250ms when possible.

Round Trip Time (RTT)

Round Trip Time (RTT) is the time for a data packet to travel from sender to receiver and back—a full round trip. RTT is double the one-way latency (assuming symmetric paths).

Why RTT matters: WebRTC uses RTCP feedback mechanisms where the receiver reports packet reception back to the sender. High RTT delays this feedback, slowing down congestion control reactions and bandwidth estimation.

Typical RTT measurements:

  • Excellent: <50ms RTT (local/regional connections)
  • Good: 50-100ms RTT (national connections)
  • Acceptable: 100-200ms RTT (intercontinental)
  • Poor: >200ms RTT (satellite, severely congested networks)

You can measure RTT using the ping command: ping google.com shows RTT to Google's servers.

Acceptable Latency Thresholds

Different applications have different latency tolerance:

Video Conferencing (Most Common)

  • Excellent: <100ms (feels instant, natural conversation)
  • Good: 100-150ms (barely noticeable delay)
  • Acceptable: 150-250ms (slight but tolerable delay)
  • Poor: 250-400ms (awkward pauses, frequent overlap)
  • Unusable: >400ms (conversation becomes very difficult)

Studies show that latencies above 150ms lead to significantly poorer user engagement, with dropout rates increasing up to 20% in applications requiring real-time interaction.

Live Interactive Gaming/Auctions

  • Required: <100ms (competitive play demands near-instant response)
  • Acceptable: 100-150ms (casual gaming)
  • Poor: >150ms (noticeable input lag affecting gameplay)

Live Broadcasting/Webinars

  • Acceptable: <1000ms (one-way communication, less critical)
  • WebRTC advantage: 200-500ms (enables chat interaction with broadcaster)
  • Traditional streaming: 5000-30000ms (HLS, DASH—too slow for interactivity)

Telehealth/Remote Consultation

  • Required: <200ms (medical professionals need natural conversation for effective diagnosis)

Causes of High Latency

Geographic Distance

Light travels through fiber at ~200,000 km/s (2/3 the speed in vacuum). New York to London (5,500 km) has a theoretical minimum latency of 27.5ms one-way. Add routing overhead, and practical minimum is 50-70ms one-way, 100-140ms RTT. This is physics—you cannot improve it beyond choosing servers closer to users.

Network Congestion

When routers and switches are overloaded, packets wait in queues, adding delay. This is especially common during peak hours (evenings in residential areas) or on shared networks (public WiFi, office networks).

Poor Routing

Internet routing isn't always optimal. Your packets might take a circuitous route through multiple ISPs and peering points. Using a CDN or edge network can improve routing efficiency.

WiFi vs. Wired

WiFi adds 5-30ms compared to wired Ethernet, plus additional jitter. For lowest latency, use a wired connection when possible.

Mobile Networks

Cellular networks have higher and more variable latency than fixed broadband:

  • 5G: 10-30ms (best case)
  • 4G/LTE: 30-70ms
  • 3G: 100-200ms

Codec Complexity

More advanced codecs (VP9, AV1) compress better but take longer to encode/decode. For lowest latency, use simpler codecs (VP8, H.264) with hardware acceleration.

Large Jitter Buffers

When networks are jittery, receivers increase buffer size to smooth out packet arrival variations. This trades latency for smoothness—larger buffers mean higher latency but fewer glitches.

Reducing Latency

1. Use Edge Servers/CDN

Deploy WebRTC SFU servers at edge locations close to users. Instead of routing all traffic through a central data center, use geographically distributed servers. Cloudflare, AWS CloudFront, and specialized WebRTC CDNs offer edge deployment.

2. Optimize Network Path

  • Use direct peering between networks when possible
  • Avoid unnecessary hops through multiple ISPs
  • Consider SD-WAN or private network backbones for enterprise applications

3. Hardware Acceleration

Enable hardware encoding/decoding for codecs like H.264 and H.265. This reduces encoding latency from 40-50ms to 10-20ms and lowers power consumption.

4. Minimize Jitter Buffer Size

Adaptive jitter buffers dynamically adjust based on network conditions. On stable networks with low jitter, buffers can shrink to 20-30ms. Ensure your WebRTC implementation uses adaptive buffering rather than fixed large buffers.

5. Prefer UDP over TCP

WebRTC defaults to UDP, which has lower latency than TCP because it doesn't wait for retransmissions. Only fall back to TCP when firewalls block UDP.

6. Wired Connections

Whenever possible, use wired Ethernet instead of WiFi. WiFi adds latency and jitter, especially on congested channels or with weak signal strength.

7. Reduce Encoding Complexity

Configure encoders for lower complexity/faster presets. You'll sacrifice some compression efficiency, but gain lower latency. For real-time calls, speed matters more than maximizing compression.

Measuring Latency

Network RTT (Ping)

Simple tool: ping [server-address] shows round-trip time to the server. This measures only network latency, not end-to-end WebRTC latency.

WebRTC Stats API

Use getStats() on RTCPeerConnection to access:

  • currentRoundTripTime: RTT measured via RTCP
  • jitterBufferDelay: Time packets spend in jitter buffer
  • totalEncodeTime: Cumulative encoding time
  • totalDecodeTime: Cumulative decoding time

End-to-End Latency Testing

For true end-to-end measurement, use loopback testing: play a known audio signal, record it through WebRTC, and measure the time difference. This captures all latency components including capture, encoding, network, decoding, and rendering.

Browser Tools

  • Chrome: chrome://webrtc-internals shows RTT graphs and timing statistics
  • Firefox: about:webrtc displays connection statistics including RTT

Latency vs. Quality Trade-offs

Lower latency sometimes conflicts with other goals:

  • Jitter buffer size: Small buffers = low latency but more glitches. Large buffers = higher latency but smoother playback
  • Forward Error Correction (FEC): Adds redundancy to recover from packet loss without retransmission, but increases latency by 20-50ms
  • Codec choice: Simple codecs (VP8) = lower latency. Complex codecs (AV1) = higher latency but better quality per bitrate

For interactive video calling, prioritize low latency. For broadcasting or recording, you can accept higher latency in exchange for better quality or error correction.

The Bottom Line

Latency is the invisible enemy of natural conversation. While bandwidth gets most of the attention, latency often determines whether a video call feels "real" or frustratingly delayed. WebRTC's low-latency design—sub-500ms, often 100-250ms—is what enables natural, interactive communication that older streaming protocols simply cannot match.

Understanding latency helps you diagnose connection issues, optimize infrastructure, and set realistic expectations. Geographic distance imposes hard physical limits you cannot overcome. But reducing unnecessary latency from inefficient routing, large buffers, or slow encoding can dramatically improve the user experience.

In 2025, as user expectations for real-time responsiveness continue to rise, optimizing WebRTC latency isn't optional—it's essential for delivering competitive, engaging products.

References