The Real-Time Protocol (RTP) is responsible for the real-time transport of data such as audio and video. It was standardized in RFC 3550. It uses UDP as the transport protocol. To be transported, the audio or video has to be packetized by a codec. Basically, the protocol allows the specification of the timing and content requirements of the media transmission for the incoming and outgoing packets using the following:
The sequence number
Timestamps
Packet forward without retransmission
Source identification
Content identification
Synchronism
A codec is an algorithm capable of encoding or decoding a digital stream. The content described in the RTP protocol is usually encoded by a codec. Each codec has a specific use. Some have compression while others do not. G.711 is still the most popular codec and does not use compression. With 64 Kbps of bandwidth for a single channel, it needs a high-speed network, commonly found in Local Area Networks (LANs). However, in Wide Area Networks (WANs), 64 Kbps can be too expensive to buy for a single channel. Codecs such as G.729 and GSM can compress the voice packets to as low as 8 Kbps, saving a lot of bandwidth. To simplify the way you choose a voice codec, the following table shows the most relevant ones. Bandwidths do not consider the lower protocol layer headers. There are also video codecs where the most relevant ones are the H.264 series and VP8 from Google.
Codec |
Bandwidth |
MOS |
Env. |
When to use |
---|---|---|---|---|
G.711 |
64 Kbps |
4.45 |
LAN/WAN |
Use it for toll quality and broad support from gateways. |
G.729 |
8 Kbps |
4.04 |
WAN |
Use it to save bandwidth and keep toll quality. |
G.722 |
64 Kbps |
4.5 |
LAN |
Use it for high-definition voice. |
OPUS |
6-510 Kbps |
— |
INTERNET |
OPUS is the most sophisticated codec ever created. It spans from a narrowband audio to high-definition music. |
There are other codecs such as G.723, GSM, iLBC, and SILK that are slowly losing ground to OPUS. OPUS is the codec adopted for the WebRTC standard. Obviously, you can dig a little more into codec details; there are dozens available, but I truly believe that the four described previously are the relevant choices at the time of this book being written. MOS is the Mean Opinion Score and defines the audio quality.
MOS |
Quality |
Impairment |
---|---|---|
5 |
Excellent |
Imperceptible |
4 |
Good |
Perceptible but not annoying |
3 |
Fair |
Slightly annoying |
2 |
Poor |
Annoying |
1 |
Bad |
Very annoying |
Source: ITU-T P.800 recommendation
There are three ways to carry DTMF in VoIP networks: inband as audio tones, named events on RTP as defined in RFC 2833, and signaling using the SIP INFO messages. RFC 2833 describes a method to transmit DTMF as named events in the RTP protocol. It is very important that you use the same method between user agent servers and clients.