videocalling

Virtual Background

Feature

AI-powered feature that replaces or blurs the background behind a user during video calls

What is Virtual Background?

Virtual background is a feature that uses machine learning to identify and separate a person from their background in real-time video, then either replaces the background with a custom image/video or applies blur effects. This technology enables users to maintain privacy, hide messy environments, or add professional or creative backgrounds to their video calls.

In WebRTC applications, virtual backgrounds are implemented using computer vision algorithms that perform semantic segmentation on each video frame, classifying pixels as either "person" or "background" and then processing them differently.

How Virtual Background Works

Semantic Segmentation Process

The core technology behind virtual backgrounds involves these steps:

  1. Frame Capture: Capture video frames from the user's camera using getUserMedia()
  2. ML Inference: Run each frame through a machine learning model that classifies every pixel as person or background
  3. Mask Generation: The ML model outputs a segmentation mask - a grayscale image where white represents the person and black represents the background
  4. Background Processing: Apply blur effect or replace background pixels based on the mask
  5. Frame Composition: Combine the segmented person with the new background
  6. Stream Output: Send the processed frames to WebRTC peer connection

Implementation with Insertable Streams

Modern WebRTC implementations use the Insertable Streams API (also called Encoded Transform API) for efficient frame processing:

// Get camera stream
const stream = await navigator.mediaDevices.getUserMedia({ video: true });

// Create video track processor
const videoTrack = stream.getVideoTracks()[0];
const processor = new MediaStreamTrackProcessor({ track: videoTrack });
const generator = new MediaStreamTrackGenerator({ kind: 'video' });

// Process frames
const transformer = new TransformStream({
  async transform(frame, controller) {
    // Run ML segmentation
    const mask = await segmentationModel.predict(frame);
    // Apply background effect
    const processedFrame = applyVirtualBackground(frame, mask, backgroundImage);
    controller.enqueue(processedFrame);
    frame.close();
  }
});

processor.readable.pipeThrough(transformer).pipeTo(generator.writable);

Machine Learning Technologies

MediaPipe Selfie Segmentation

MediaPipe is Google's open-source framework for building ML pipelines. The Selfie Segmentation model is specifically optimized for person segmentation in video conferencing scenarios:

  • Performance: Fastest option, typically achieving 30-60 FPS on modern hardware
  • Accuracy: High-quality segmentation for people within 2 meters of the camera
  • Technology: Uses WebAssembly (WASM) for near-native performance in browsers
  • Model Size: Compact models (~1-3 MB) optimized for real-time use
  • Acceleration: Leverages XNNPACK library for hardware acceleration

MediaPipe is the same technology used in Google Meet and is the recommended solution for production WebRTC applications as of 2025.

TensorFlow.js BodyPix

BodyPix is a TensorFlow.js model for person and body part segmentation:

  • Performance: Moderate, 15-40 FPS depending on browser and hardware (Chrome significantly faster than Firefox)
  • Flexibility: Can segment individual body parts, not just person vs background
  • License: Apache License, suitable for commercial use
  • Browser Support: Good cross-browser compatibility
  • Model Variants: Multiple model architectures available (MobileNet, ResNet) with quality/performance tradeoffs

BodyPix is easier to integrate but generally slower than MediaPipe for simple background replacement use cases.

TensorFlow DeepLab v3+

DeepLab v3+ is a high-quality semantic segmentation model:

  • Accuracy: Excellent segmentation quality, better edge detection
  • Performance: Slower than MediaPipe/BodyPix, typically 5-15 FPS
  • Use Case: Better suited for pre-recorded video processing than real-time conferencing
  • Resource Usage: Higher CPU/GPU requirements

Virtual Background Effects

Background Replacement

Replace the background with a custom image or video:

  • Static Images: Office settings, scenic locations, branded backgrounds
  • Animated Backgrounds: Looping videos for dynamic effects
  • Green Screen Alternative: Achieve green screen effects without physical setup

Background Blur

Apply Gaussian blur to background pixels while keeping the person sharp:

  • Light Blur: Slight defocus (5-10px radius) for subtle background reduction
  • Medium Blur: Moderate blur (15-25px radius) for privacy without complete replacement
  • Heavy Blur: Strong blur (30-50px radius) making background unrecognizable

Background blur is computationally lighter than full replacement and often preferred for professional settings.

Performance Considerations

CPU Usage

Virtual background processing is CPU-intensive:

  • MediaPipe: ~10-30% CPU usage on modern processors at 720p@30fps
  • BodyPix: ~20-50% CPU usage depending on model architecture
  • Optimization: Process at lower resolution (480p, 360p) then upscale for better performance
  • Frame Skipping: Run inference every 2-3 frames and reuse masks to reduce CPU load

Resolution and Frame Rate

Virtual background performance scales with video resolution:

  • 360p (640x360): Very fast, suitable for low-end devices
  • 480p (854x480): Good balance of quality and performance
  • 720p (1280x720): Standard for modern devices, acceptable performance on mid-range hardware
  • 1080p (1920x1080): Requires high-end CPU/GPU, may need frame rate reduction

Most implementations target 720p at 30 FPS or 480p at 30 FPS depending on device capabilities.

Browser Differences

Performance varies significantly across browsers:

  • Chrome/Edge: Best performance, 40+ FPS with BodyPix, 60 FPS with MediaPipe
  • Firefox: Moderate performance, 15-30 FPS with same models
  • Safari: Good performance on Apple Silicon Macs, moderate on Intel Macs

Quality Enhancement Techniques

Edge Refinement

Improve segmentation quality around hair and edges:

  • Edge Feathering: Apply subtle blur to mask edges to reduce harsh cutouts
  • Temporal Smoothing: Average masks across multiple frames to reduce jitter
  • Color Spill Correction: Adjust edge pixels to prevent background color bleeding

Lighting Considerations

Virtual backgrounds work best with proper lighting:

  • Good Lighting: Well-lit subjects produce cleaner segmentation masks
  • Backlighting Issues: Strong backlighting (window behind user) reduces segmentation accuracy
  • Consistent Lighting: Uniform lighting helps ML models maintain stable masks

Privacy and Security

Client-Side Processing

Virtual background processing should happen client-side:

  • Privacy: Video frames are processed locally, not sent to servers
  • Bandwidth: No additional server resources required
  • Latency: No round-trip delay from server processing

Model Loading

ML models are typically loaded from CDNs:

  • Model Size: 1-10 MB depending on architecture
  • Caching: Models should be cached for subsequent sessions
  • Lazy Loading: Only load models when users enable virtual background

Common Use Cases

  • Privacy: Hide home environment, family members, or personal items during professional calls
  • Professionalism: Maintain consistent, professional appearance regardless of physical location
  • Branding: Display company logos, branded backgrounds for corporate meetings
  • Education: Teachers use themed backgrounds to create engaging virtual classrooms
  • Entertainment: Creators use fun, creative backgrounds for streaming and content creation
  • Remote Work: Enable working from anywhere without exposing location

Best Practices

  1. Provide Multiple Options: Offer both blur and replacement, with quality presets (Low/Medium/High)
  2. Device Detection: Automatically select appropriate model and settings based on device capabilities
  3. Preview Before Call: Allow users to test and adjust virtual background before joining
  4. Graceful Degradation: Disable on low-end devices or offer reduced quality options
  5. User Control: Make it easy to toggle on/off during calls
  6. Resource Monitoring: Monitor CPU usage and reduce quality if system is struggling
  7. Lighting Guidance: Provide tips for optimal lighting conditions
  8. Custom Backgrounds: Allow users to upload their own background images

Platform Examples

  • Zoom: Uses proprietary ML models, supports both blur and replacement
  • Google Meet: Uses MediaPipe Selfie Segmentation for background effects
  • Microsoft Teams: Offers background blur and custom backgrounds with edge refinement
  • Slack Huddles: Integrated background blur using modern segmentation models

References