videocalling

Jitter

技术

Variation in packet arrival times causing irregular delivery of audio and video data

What is Jitter?

Jitter is the variation in the time delay between when packets are sent and when they arrive. While latency measures the average delay, jitter measures the inconsistency in that delay. Think of it like a train schedule: latency is the average travel time, but jitter is the unpredictability—sometimes the train is on time, sometimes it's early, sometimes it's late.

In video calling, packets should ideally arrive at regular intervals. Your encoder might send a packet every 20 milliseconds, and ideally, the receiver gets a packet every 20ms. But in reality, network conditions vary—packet 1 arrives after 50ms, packet 2 after 35ms, packet 3 after 60ms, packet 4 after 40ms. This variation is jitter.

High jitter causes audio dropouts, video freezing, robotic or choppy sound, and overall degraded call quality. It's particularly damaging to real-time communication because it disrupts the smooth, continuous playback that human perception expects.

Jitter vs. Latency

These terms are often confused, but they measure different aspects of network performance:

  • Latency: Average time for packets to travel from sender to receiver. Measures overall delay
  • Jitter: Variation in packet arrival times. Measures inconsistency in delay

You can have high latency with low jitter (consistent but slow delivery, like satellite internet) or low latency with high jitter (fast but unpredictable, like congested WiFi). For video calling, you need both low latency AND low jitter.

Jitter reflects short-term conditions or inconsistencies in packet flow, while latency (RTT) is the average time over a longer period. A network might have 50ms average latency but 30ms jitter, meaning packets arrive anywhere from 20ms to 80ms after being sent.

How Jitter Occurs

Network Congestion

The most common cause. When routers and switches are busy, packets wait in queues. Queue lengths fluctuate constantly—sometimes a packet gets through immediately, sometimes it waits 50ms. This variability creates jitter.

Congestion is worse during peak hours (evenings for residential networks, business hours for corporate networks) and on shared connections (public WiFi, cellular networks).

Route Changes

Internet routing is dynamic. If the network path changes mid-call (common on mobile networks as you move between towers, or when ISPs adjust routing), packets suddenly experience different delays, causing jitter spikes.

WiFi Interference

Wireless networks are inherently more jittery than wired connections. Radio interference, competing devices, signal strength variations, and retransmissions all introduce timing variability. WiFi typically adds 10-30ms of jitter compared to wired Ethernet's <5ms.

Packet Prioritization (QoS)

Ironically, Quality of Service (QoS) mechanisms can sometimes increase jitter. When routers prioritize certain traffic, other packets get delayed variably depending on current priority queue loads.

Packet Routing Variability

Not all packets take the same path. Some might take a direct route, others might bounce through additional hops. This path diversity creates arrival time variation.

Impact on Audio and Video Quality

Audio Impact

Audio is extremely sensitive to jitter because our ears detect timing inconsistencies easily:

  • Low jitter (0-20ms): Imperceptible, audio sounds natural
  • Moderate jitter (20-50ms): Occasionally choppy or slightly robotic sound
  • High jitter (50-100ms+): Frequent dropouts, severe distortion, unintelligible speech

When jitter exceeds the jitter buffer's capacity, the buffer runs out of packets to play, causing audio gaps (silence) or plays late packets out of order (garbled sound).

Video Impact

Video is slightly more tolerant of jitter than audio, but still suffers:

  • Low jitter: Smooth playback at consistent frame rate
  • Moderate jitter: Occasional frame drops or stuttering
  • High jitter: Frequent freezing, jerky motion, frames displayed out of order

Because video frames can be decoded and displayed with some flexibility (unlike audio which must play continuously), jitter buffers for video can be larger without as much perceived quality impact.

Jitter Buffers: The Solution

A jitter buffer is a small queue on the receiving side that temporarily stores incoming packets before playing them. Instead of playing packets immediately as they arrive (which would sound choppy due to jitter), the buffer collects packets and plays them at regular intervals.

How Jitter Buffers Work

  1. Packets arrive at irregular intervals due to jitter
  2. The jitter buffer stores these packets temporarily
  3. The buffer waits until it has enough packets to ensure continuous playback
  4. Packets are then played out at regular intervals (e.g., every 20ms for audio)
  5. This smooths out the timing variations, providing consistent playback

The trade-off: larger buffers smooth out more jitter but add latency. Smaller buffers reduce latency but risk running empty if jitter is high.

Fixed vs. Adaptive Jitter Buffers

Fixed jitter buffers use a constant size (e.g., always 60ms). Simple but inefficient—wastes latency on good networks, insufficient on bad networks.

Adaptive jitter buffers dynamically adjust size based on current network conditions. When jitter is low, the buffer shrinks to minimize latency. When jitter increases, the buffer grows to prevent dropouts.

Modern WebRTC implementations universally use adaptive jitter buffers. They continuously monitor packet arrival patterns and adjust buffer size in real-time, typically ranging from 15-120ms for audio.

NetEQ: WebRTC's Audio Jitter Buffer

NetEQ is Chromium's sophisticated audio jitter buffer implementation, used in all Chromium-based browsers (Chrome, Edge, Opera). It's one of the most advanced jitter buffer algorithms in production.

NetEQ Features

  • Adaptive buffering: Continuously optimizes delay based on network jitter
  • Packet loss concealment: Synthesizes missing audio when packets are lost
  • Time stretching/compression: Subtly speeds up or slows down audio to maintain buffer levels without pitch changes
  • Comfort noise generation: Adds gentle background noise during silence to avoid jarring gaps
  • Dynamic range adaptation: Adjusts to varying jitter patterns throughout a call

Buffer Size Behavior

NetEQ typically starts with a ~40ms buffer. On stable networks with minimal jitter, it can shrink to 15-20ms, minimizing latency. On poor networks with high jitter, it expands to 100-120ms to prevent dropouts.

The algorithm constantly evaluates: "Can I reduce the buffer without risking dropouts?" and "Do I need to increase the buffer to handle current jitter levels?"

Video Jitter Buffers

Video jitter buffers work similarly to audio but with different constraints:

  • Can be larger (50-200ms) because video frames are displayed at discrete intervals (e.g., 30 fps = every 33ms)
  • Must handle dependencies between frames (I-frames, P-frames, B-frames in codecs like H.264)
  • Can skip frames when buffer runs low, unlike audio which must play continuously
  • Often prioritize keyframes (I-frames) over delta frames to ensure decodability

Acceptable Jitter Levels

ITU-T recommendations and industry standards suggest:

  • Excellent: <10ms jitter (LAN, quality wired connections)
  • Good: 10-30ms jitter (typical broadband, good WiFi)
  • Acceptable: 30-50ms jitter (marginal connections, busy networks)
  • Poor: 50-100ms jitter (severely congested or unstable networks)
  • Unusable: >100ms jitter (call quality severely degraded)

These thresholds assume adequate jitter buffering. Without jitter buffers, even 20-30ms jitter would cause noticeable quality issues.

Measuring Jitter

WebRTC Stats API

Use getStats() on RTCPeerConnection to access jitter metrics:

  • jitter: Packet arrival time variation (in seconds, typically 0.001-0.100)
  • jitterBufferDelay: Current jitter buffer size
  • jitterBufferEmittedCount: Number of packets played from buffer

Browser Tools

  • Chrome: chrome://webrtc-internals displays real-time jitter graphs
  • Firefox: about:webrtc shows jitter statistics per connection

Network Testing Tools

Tools like iperf, mtr, or online jitter tests can measure network-level jitter independently of WebRTC.

Reducing Jitter

1. Use Wired Connections

Ethernet has much lower jitter than WiFi (typically <5ms vs. 10-30ms). For critical calls, use a wired connection when possible.

2. Reduce Network Congestion

  • Close bandwidth-heavy applications (streaming, downloads, cloud sync)
  • Limit other users on your network during important calls
  • Upgrade to higher bandwidth if consistently congested

3. Quality of Service (QoS)

Configure your router to prioritize WebRTC traffic (UDP ports used by STUN/TURN, or use DSCP marking). This ensures video call packets get priority over less time-sensitive traffic like file downloads.

4. Optimize WiFi

  • Use 5GHz band instead of 2.4GHz (less congestion, lower jitter)
  • Position close to the access point for strong signal
  • Reduce interference by changing WiFi channels
  • Use WiFi 6 (802.11ax) if available—better jitter characteristics under load

5. Choose Better ISPs/Networks

Some ISPs have more stable routing and better peering agreements, resulting in lower jitter. Fiber connections typically have lower jitter than cable or DSL.

6. Increase Packet Size (Packetization Time)

Sending larger packets less frequently can reduce the impact of jitter. For audio, use 20ms packetization on good networks, but increase to 60ms or even 120ms on poor networks. Larger packets mean fewer packet timing variations matter.

7. Edge Servers

Deploy WebRTC SFU servers closer to users. Shorter network paths have fewer hops, reducing opportunities for jitter to accumulate.

Jitter vs. Packet Loss

These often occur together but are different problems:

  • Jitter: Packets arrive at irregular times but (usually) all arrive eventually
  • Packet loss: Some packets never arrive at all

High jitter can lead to packet loss if the jitter buffer overflows (packets arrive too late and are discarded as useless). Conversely, packet loss doesn't necessarily cause jitter—packets might be lost consistently without timing variations.

The Bottom Line

Jitter is the silent saboteur of video call quality. While latency determines overall delay and bandwidth determines maximum quality, jitter determines consistency. A network with perfect bandwidth and acceptable latency can still deliver terrible call quality if jitter is high.

Fortunately, WebRTC's adaptive jitter buffers (like NetEQ for audio) are remarkably effective at masking jitter, automatically adjusting to network conditions. But there's a limit—extreme jitter (>100ms) cannot be fully compensated without adding unacceptable latency.

Understanding jitter helps you diagnose "the call sounds choppy" complaints, optimize network infrastructure, and set realistic quality expectations. Wired connections, QoS prioritization, and edge deployment are your best defenses against jitter.

References