Overview
How it works
WebRTC voice chat applications receive remote streams in form of MediaStream
objects, which consist of audio and/or video tracks, in the form of MediaStreamTrack
instances.
In order to bring the audio tracks into a WebAudioAPI graph, it has to be converted into a MediaStreamAudioSourceNode
. Luckily, the Source
class handles this automatically for you.
source.setInput(mediaStreamTrack);
Internally, it uses AudioContext.createMediaStreamSource
to create a source node which can be used as an input for a renderer's source.
Requirements for the communication provider
In order to spatialize each participant independently in 3D space you need to get ahold of the individual audio streams. This is possible if your voice chat provider/implementation is based on the Mesh or Selective Forwarding Unit (SFU) architecture. 👍🥳
In case of Multipoint Control Unit (MCU) each client only receives one single stream containing a mix of all participant's audio signal, no inidividual spatialization is possible. 👎😢
Quick primer on voice call architectures
Assuming N
participants are in a call.
Mesh
Each participant sends N-1
copies of the audio stream to each other participant. As a consequence, each participant receives N-1
streams.
N-1
outgoing streamsN-1
incoming streams
Selective Forwarding Unit
Each participant sends one audio stream to the server. The server receives all streams and sends back N-1
streams to each participant.
1
outgoing streamN-1
incoming streams
Multipoint Control Unit
Each participant sends one audio stream to the server. The server creates N
individual mixes, one for each participant, and sends the mix back.
1
outgoing stream1
incoming stream
Next up
On the next pages you'll find the shared / provider-independent basic setup code and the provider-specific implementations.