| |

Scalable Wideband Speech and Audio Codec
ITU-T Recommendation G.711.1 is a scalable, low-delay, low-complexity, wideband speech and audio codec standard designed to interoperate seamlessly with existing G.711-based VoIP systems and terminals.
As service providers introduce the higher quality of wideband speech services, interoperability with existing equipment is an important feature for the transition period when wideband phones and other equipment need to co-exist with their legacy narrowband counterparts.
With the G.711.1 scalable codec, whenever both parties in a call support wideband, they will have the full experience of wideband speech. However, when one party has only legacy narrowband equipment, the codec’s seamless interoperability ensures that the call will still provide toll quality. There will be no additional delay or impairments caused by a need for transcoding. In fact, the audio quality in a call from a G.711.1 terminal to a G.711 terminal will be better than between two G.711 terminals because of noise filtering at the G.711.1 encoder (which will improve the sound heard at the G.711 terminal) and decoder (which will improve the sound heard at the G.711.1 terminal).
G.711.1 achieves this interoperability with legacy narrowband terminals and equipment through an embedded, layered architecture. Operating at a 16-kHz sampling rate, G.711.1 produces three bitstreams in three layers. The core layer, operating at 64 kbps, is bitstream interoperable with G.711. Two other layers enhance the fidelity of the output to the original signal. The first one enhances the lower-band part of the signal in a 16-kbps bitstream, and the second encodes the higher-band, that is, wideband, part (4000–7000 Hz) in a second 16-kbps bitstream. The core layer is always delivered, and either one or both of the upper layers can also be delivered, resulting in four possible modes of the G.711.1 codec, as shown in this figure.

G.711.1 can also operate at an 8-kHz sampling rate and output only the core layer or two narrowband layers.
G.711.1 features a short frame length, low algorithmic delay, and low complexity, all of which contribute to speech delivery without perceptible delay. The codec also includes noise feedback (through perceptual filtering) for improved sound quality and frame erasure concealment in both the lower and the upper bands for robustness to packet losses.
G.711.1 offers as well an optional postfiltering mechanism for reducing the PCM quantization noise in the lower band when only the core, G.711-compatible, layer is decoded. This feature, described in Appendix I to the recommendation, is intended for end-user terminals.
Applications
High-quality speech services over broadband networks, especially IP telephony and multi-point speech conferencing.
Benefits
-
Interoperability: The embedded 64-kbps bitstream is fully interoperable with G.711 infrastructure and terminals, so no transcoding delay or cost is introduced when a G.711.1 wideband terminal communicates with legacy equipment.
-
Scalability: Both bandwidth and bit-rate scalability enable G.711.1 to deliver the best possible quality in any circumstances. The bitstream can be truncated on the fly if conditions such as network congestion require.
-
Low computational complexity and memory requirements ensure support of G.711.1 by existing hardware.
-
Low delay: Its 5-ms frame length and low (11.875 ms) algorithmic delay ensure its viability for real-time speech applications.
-
Robustness to packet losses: Frame erasure concealment algorithms applied separately to the lower-band and higher-band signals and deliberate omission of inter-frame prediction result in clean speech quality even in sub-optimal conditions.
The G.711.1 Standard
ITU-T Recommendation G.711.1 was approved in March 2008.
G.711.1 at a Glance
Standardization |
Recommended March 2008 by ITU-T |
Technology |
Three-stage coding structure includes:
Log companded pulse code modulation (PCM) of the lower band including noise feedback
Embedded PCM extension with adaptive bit allocation for enhancing the quality of the base layer in the lower band
Weighted vector quantization coding of the higher band based on modified discrete cosine transformation (MDCT) |
Bit rates |
64, 80, 96 kbps |
Encoded bandwidth |
At 64 and 80 kbps: 50–4000 Hz
At 80 and 96 kbps: 50–7000 Hz |
Delay |
Frame size: 5 ms
Algorithmic delay: 11.875 ms |
Quality |
Scales from toll quality at 64 kbps to full wideband quality at 96 kbps |
Complexity |
Encoder: 5.396 WMOPS
Decoder: 3.304 WMOPS
RAM/ROM requirements (in 16-bit words):
|
|
Encoder
|
Decoder
|
| Static RAM (kWords) |
0.18
|
1.50
|
| Scatch RAM (kWords) |
0.66
|
0.70
|
| Data ROM (kWords) |
2.21
|
| Program ROM (number of basic ops) |
1943
|
|
Fixed-point |
C code available |
|
|