Scalable Wideband Speech and Audio Codec

ITU-T Recommendation G.711.1 is a scalable, low-delay, low-complexity, wideband speech and audio codec standard designed to interoperate seamlessly with existing G.711-based VoIP systems and terminals.

As service providers introduce the higher quality of wideband speech services, interoperability with existing equipment is an important feature for the transition period when wideband phones and other equipment need to co-exist with their legacy narrowband counterparts.

With the G.711.1 scalable codec, whenever both parties in a call support wideband, they will have the full experience of wideband speech. However, when one party has only legacy narrowband equipment, the codec’s seamless interoperability ensures that the call will still provide toll quality. There will be no additional delay or impairments caused by a need for transcoding. In fact, the audio quality in a call from a G.711.1 terminal to a G.711 terminal will be better than between two G.711 terminals because of noise filtering at the G.711.1 encoder (which will improve the sound heard at the G.711 terminal) and decoder (which will improve the sound heard at the G.711.1 terminal).

G.711.1 achieves this interoperability with legacy narrowband terminals and equipment through an embedded, layered architecture. Operating at a 16-kHz sampling rate, G.711.1 produces three bitstreams in three layers. The core layer, operating at 64 kbps, is bitstream interoperable with G.711. Two other layers enhance the fidelity of the output to the original signal. The first one enhances the lower-band part of the signal in a 16-kbps bitstream, and the second encodes the higher-band, that is, wideband, part (4000–7000 Hz) in a second 16-kbps bitstream. The core layer is always delivered, and either one or both of the upper layers can also be delivered, resulting in four possible modes of the G.711.1 codec, as shown in this figure.

G.711.1 can also operate at an 8-kHz sampling rate and output only the core layer or two narrowband layers.

G.711.1 features a short frame length, low algorithmic delay, and low complexity, all of which contribute to speech delivery without perceptible delay. The codec also includes noise feedback (through perceptual filtering) for improved sound quality and frame erasure concealment in both the lower and the upper bands for robustness to packet losses.

G.711.1 offers as well an optional postfiltering mechanism for reducing the PCM quantization noise in the lower band when only the core, G.711-compatible, layer is decoded. This feature, described in Appendix I to the recommendation, is intended for end-user terminals.

Applications

High-quality speech services over broadband networks, especially IP telephony and multi-point speech conferencing.

Benefits

  • Interoperability: The embedded 64-kbps bitstream is fully interoperable with G.711 infrastructure and terminals, so no transcoding delay or cost is introduced when a G.711.1 wideband terminal communicates with legacy equipment.

  • Scalability: Both bandwidth and bit-rate scalability enable G.711.1 to deliver the best possible quality in any circumstances. The bitstream can be truncated on the fly if conditions such as network congestion require.

  • Low computational complexity and memory requirements ensure support of G.711.1 by existing hardware.

  • Low delay: Its 5-ms frame length and low (11.875 ms) algorithmic delay ensure its viability for real-time speech applications.

  • Robustness to packet losses: Frame erasure concealment algorithms applied separately to the lower-band and higher-band signals and deliberate omission of inter-frame prediction result in clean speech quality even in sub-optimal conditions.

The G.711.1 Standard

ITU-T Recommendation G.711.1 was approved in March 2008.

G.711.1 at a Glance

Standardization

Recommended March 2008 by ITU-T

Technology

Three-stage coding structure includes:

Log companded pulse code modulation (PCM) of the lower band including noise feedback

Embedded PCM extension with adaptive bit allocation for enhancing the quality of the base layer in the lower band

Weighted vector quantization coding of the higher band based on modified discrete cosine transformation (MDCT)

Bit rates

64, 80, 96 kbps

Encoded bandwidth

At 64 and 80 kbps: 50–4000 Hz

At 80 and 96 kbps: 50–7000 Hz

Delay

Frame size: 5 ms

Algorithmic delay: 11.875 ms

Quality

Scales from toll quality at 64 kbps to full wideband quality at 96 kbps

Complexity

Encoder: 5.396 WMOPS
Decoder: 3.304 WMOPS

RAM/ROM requirements (in 16-bit words):

Encoder
Decoder
Static RAM (kWords)
0.18
1.50
Scatch RAM (kWords)
0.66
0.70
Data ROM (kWords)
2.21
Program ROM (number of basic ops)
1943

Fixed-point

C code available