The EBU 3326 Standard or N/ACIP
To allow for interoperability between IP enabled audio hardware and software, the EBU developed a standard
originally known as N/ACIP, Network or Audio over IP, and now defined as Tech 3326.
In a nutshell, the EBU used the existing VoIP protocols and narrowed their parameters, adopting them for AoIP.
Colloquially known as SIP, the technology really employs several different protocols and is a fairly open system. Best
practices developed for VoIP do not suffice, as quality and fidelity are a low priority for telephony.
Here we summarise the standard, with our own added comments in italics:
The Internet Protocol V4 must be supported, IPv6 is optional.
IPv6 still has not been widely adopted in general, but as the world is running out of V4 address space, V6 will gain more
significance in the near future.
For signaling and codec negotiation, the common SIP protocol must be used. The standard does not define any transport
layer protocol for this.
We recommend TCP due to more reliable interaction with NAT and a potential problem with
the packet size limit of UDP. Most VoIP software and devices refer only to this signaling layer when offering a choice between
TCP and UDP.
The audio transmission protocol of choice is RTP. Here, UDP is recommended, but TCP is allowed.
Due to its reliable nature, TCP is not suitable for low latency transmissions outside 100% reliable networks.
Error correction in TCP works by requesting resending of the lost packet, inducing significant delays.
Resilience is better handled on the codec (Opus, AAC) or RTP (RFC5109) level.
Forward error correction on the RTP level is optional.
When using Opus with sip.audio in 'Smart' Relay mode, we always include redundant error correction data on the codec level.
Wether this is included in the return stream, depends on the manufacturer's implementation.
EBU 3326 recommends a small range of ports for audio transport, starting at 5004 and 5005 for RTP and RTCP respectively, with
increments of ten for additional streams (5014/15, 5024/25, and so on). It is recommended that streams are sent and received
on the same port, a feature commonly known as "rport".
Since our servers process many concurrent streams at once,
sip.audio employs a much larger range of different ports when using 'Smart' or 'Pass-Thru' mode. Those
are determined during the negotiation phase and do not present a problem to proper SIP implementations.
Employing RTCP, a method of exchanging information about stream statistics, is recommended. It can help adjust various
aspects of the media stream if problems are encountered.
EBU 3326 defines a set of required, recommended and optional codecs. G.711, G.722, MPEG L2 and 16-bit PCM are required. AAC, AAC-LD
and APT-X are recommended. Opus, MP3, AACv2, and AMR are optional.
EBU 3326 was last revised in 2014. In any future revision, we would hope to see the inclusion of newer more efficient codecs,
such as Opus and the lossless FLAC; which offers efficiency over raw PCM data.
The "payload" types identifying codecs during negotiation are not defined.
Some AoIP codecs are defined in other standards, some are not. This presents a challenge to interoperability, real-world
implementations vary. Where practical, we apply fixes between otherwise incompatible clients.
The standard does not mention STUN or ICE; a suite of protocols designed to aid in NAT traversal.
Our 'Smart' and 'Pass-Thru' modes are designed to allow implementations without these technologies to work behind a NAT router.
Direct connections of devices behind NAT without STUN or ICE require specific firewall settings and can only work for a single machine
on a network.
RTP can encrypt the data stream. EBU 3326 does not touch on the subject, however webRTC, the flavor of RTC built into
modern web browsers demands it.
Our 'Smart' relay mode can translate some encryption methods for clients not supporting it.
The beauty of the concept is, that other than a different mix of codecs, any old VoIP system already supports it, allowing
developers to borrow from tried and tested technology. With the advent of 'HD Voice' telephony, which usually employs G.722, even consumer
grade hardware can be used with tolerable quality. Similarly, many 'softphones' made for VoIP supporting the Opus codec
can be employed in the field. Built-in microphones are often the weak link here, but combining say the Luci app and a digitally
connected microphone can offer impressive audio fidelity.