Skip to main content
Version: 0.3.0

Overview

How it works

WebRTC voice chat applications receive remote streams in form of MediaStream objects, which consist of audio and/or video tracks, in the form of MediaStreamTrack instances.

In order to bring the audio tracks into a WebAudioAPI graph, it has to be converted into a MediaStreamAudioSourceNode. Luckily, the Source class handles this automatically for you.

source.setInput(mediaStreamTrack);

Internally, it uses AudioContext.createMediaStreamSource to create a source node which can be used as an input for a renderer's source.

Requirements for the communication provider

In order to spatialize each participant independently in 3D space you need to get ahold of the individual audio streams. This is possible if your voice chat provider/implementation is based on the Mesh or Selective Forwarding Unit (SFU) architecture. 👍🥳

In case of Multipoint Control Unit (MCU) each client only receives one single stream containing a mix of all participant's audio signal, no inidividual spatialization is possible. 👎😢

Quick primer on voice call architectures

Assuming N participants are in a call.

Mesh

Each participant sends N-1 copies of the audio stream to each other participant. As a consequence, each participant receives N-1 streams.

  • N-1 outgoing streams
  • N-1 incoming streams

Selective Forwarding Unit

Each participant sends one audio stream to the server. The server receives all streams and sends back N-1 streams to each participant.

  • 1 outgoing stream
  • N-1 incoming streams

Multipoint Control Unit

Each participant sends one audio stream to the server. The server creates N individual mixes, one for each participant, and sends the mix back.

  • 1 outgoing stream
  • 1 incoming stream

Next up

On the next pages you'll find the shared / provider-independent basic setup code and the provider-specific implementations.