videocalling

MCU (Multipoint Control Unit)

架构

Server that mixes all streams into a single composite stream

What is an MCU?

A Multipoint Control Unit (MCU) is a centralized video conferencing architecture that receives individual streams from all participants, decodes them, mixes them into a single composite stream, and sends that unified stream back to each participant. Think of it as a video production studio in a server—it takes multiple camera feeds and creates one professionally mixed output.

Unlike SFU which simply forwards streams, or P2P which connects participants directly, MCU actively processes video and audio. It decodes every incoming stream, combines them into a single layout (like a gallery view with all participants visible), re-encodes the composite, and distributes it to everyone.

This is the traditional enterprise video conferencing approach, used historically by systems like Cisco TelePresence and Polycom before SFU became dominant. As of 2025, MCU is less common for general video calling but still used in specific scenarios where its unique characteristics provide advantages.

How MCU Works: Step by Step

1. Stream Upload

Each participant connects to the MCU and uploads their video and audio stream once. Like SFU, participants only need to send their stream to one destination, minimizing upload bandwidth requirements.

2. Stream Decoding

The MCU receives all incoming streams and decodes them completely. Each video is decompressed from its encoded format (H.264, VP8, etc.) into raw video frames. Each audio stream is decoded to PCM (uncompressed audio). This is computationally expensive—the server must decode every participant's stream simultaneously.

3. Mixing and Compositing

Here's where the magic happens. The MCU combines all decoded streams into a single output:

  • Video mixing: Creates a layout with all participants arranged in tiles, grid, or active speaker view. Resizes and positions each video feed within the composite frame
  • Audio mixing: Combines all audio streams, applying gain control to prevent clipping, removing echo, and balancing volume levels
  • Layout customization: Can create different layouts for different participants based on their preferences or bandwidth

4. Re-encoding

The MCU takes the mixed composite and encodes it into a single video stream. The server can optimize the encoding quality and bitrate based on each participant's available bandwidth—someone on slow internet gets a lower bitrate version of the same composite.

5. Distribution

Finally, the MCU sends the single composite stream to each participant. Every participant receives one video stream containing everyone in the meeting, rather than multiple individual streams like in SFU.

Key Advantages

Guaranteed Bandwidth Savings for Participants

Each participant downloads exactly one stream, regardless of how many people are in the meeting. In a 50-person conference, you download the same amount of data as in a 5-person meeting. This is invaluable for participants on limited or metered connections.

Minimal Client Requirements

Because the server handles all mixing and the client receives only one stream, even very low-powered devices can participate in large meetings. Old computers, basic smartphones, or hardware video conferencing endpoints that can only decode a single stream work perfectly with MCU.

Consistent Experience

Everyone sees the exact same layout (unless customized). There's no variance in who sees what—the server creates a unified view. This is particularly valuable for recorded meetings or legal proceedings where consistency matters.

Easy Recording

Since the MCU already creates a complete composite, recording is trivial—just save the output stream. No need for complex server-side composition like SFU requires.

Superior Audio Mixing

MCUs can apply professional audio processing: automatic gain control, echo cancellation across all participants, noise reduction, and intelligent mixing that prevents audio distortion even when many people speak simultaneously.

Significant Disadvantages

Extreme Server Cost

This is MCU's biggest problem. Decoding and encoding video is CPU-intensive. A single MCU server might handle 20-30 HD participants before maxing out CPU, whereas an SFU can handle hundreds. Server infrastructure costs are typically 10x higher than SFU for the same number of participants.

Added Latency

The decode-mix-encode process introduces 100-300ms of additional latency compared to SFU's ~50ms. For natural conversation flow, this delay is noticeable. Video conferencing feels less real-time, more like a broadcast.

Quality Loss

Every decode/encode cycle degrades quality slightly. The composite output is never quite as sharp as receiving the original stream. Fast motion or detailed content suffers from compression artifacts introduced during re-encoding.

Limited Flexibility

Everyone receives the same composite (or a small number of variants). Participants can't individually choose which speakers to focus on or rearrange their layout freely like they can with SFU.

Scalability Challenges

Adding participants increases server load exponentially. Doubling the number of participants requires more than double the CPU—the server must decode more streams AND create larger, more complex composites.

When to Use MCU

Despite being less common in 2025, MCU still makes sense for:

  • Large webinars or broadcasts: When hundreds or thousands of viewers join, downloading one stream beats downloading multiple streams
  • Legacy hardware compatibility: Older video conferencing endpoints that can't handle multiple streams
  • Severely limited client bandwidth: Remote areas with very slow internet where downloading even 2-3 streams is impossible
  • Professional recording requirements: When you need perfectly synchronized, professionally mixed recordings
  • Guaranteed equal experience: Legal depositions, official proceedings where everyone must see identical content
  • Very low-powered devices: IoT devices, embedded systems, or anything that can barely decode one stream

MCU vs SFU vs P2P

Understanding the trade-offs:

  • P2P: Best for 1-4 participants. Zero server cost, maximum privacy, lowest latency. Doesn't scale
  • SFU: Best for 5-100+ participants. Moderate server cost, good quality, industry standard. Clients need more bandwidth
  • MCU: Best for legacy systems or extreme client bandwidth constraints. High server cost, added latency, guaranteed client bandwidth savings

Hybrid Approaches

Some modern platforms use hybrid MCU/SFU architectures:

  • Active participants receive SFU streams for low latency and high quality
  • Passive viewers (like in a webinar) receive an MCU-mixed composite to save bandwidth
  • Mobile clients might receive MCU composites while desktop users get SFU streams

This gives the best of both worlds: interactive participants get SFU's quality and low latency, while large audiences benefit from MCU's bandwidth efficiency.

Why SFU Won

MCU was the dominant architecture for enterprise video conferencing from the 1990s through the early 2010s. But three factors shifted the industry to SFU:

  1. Internet speeds improved: Client download bandwidth increased enough that receiving multiple streams became feasible
  2. Mobile devices got powerful: Smartphones can now decode 4-6 streams simultaneously, eliminating MCU's client-side advantage
  3. Cloud economics: CPU costs remained high while bandwidth costs dropped, making SFU's bandwidth-for-CPU trade-off economically superior

In 2025, MCU persists mainly in niche use cases or as part of hybrid architectures. For most video calling applications, SFU provides better economics and user experience.

The Bottom Line

MCU represents the traditional, centralized approach to video conferencing: maximum server processing in exchange for minimal client requirements. While its high costs and added latency have made it less common in modern applications, it still excels in scenarios where client bandwidth is severely constrained or when legacy hardware compatibility is essential.

Understanding MCU helps you appreciate why SFU became the industry standard—and recognize the specific situations where MCU's trade-offs still make sense.

References