G.729.1 Scalable Wideband Speech and Audio Codec

ITU-T Recommendation G.729.1 is a scalable wideband speech and audio coding standard specifically designed to facilitate a graceful and cost-effective evolution to high-quality wideband (50 Hz–7 kHz) speech communications in packet-switched networks.

Wideband speech is very attractive both to providers, who want to differentiate their services from the competition’s by offering wideband quality, and to users, who enjoy the better comprehension, greater comfort and reduced fatigue that the crisper high sounds and richer lows of wideband speech deliver. Digital networks easily support the bit rates required by modern codecs to provide wideband speech. However, wideband speech has yet to become widely available because of lack of interoperability with existing standards and already installed infrastructure and terminals.

Now, G.729.1 achieves this interoperability by implementing an innovative embedded layered architecture, where the core layer, based on the widely deployed G.729 narrowband speech codec, produces a bitstream at 8 kbps that is fully understandable by legacy G.729 decoders. Successive layers gradually improve the audio fidelity at the receiving terminal, with wideband speech coding starting at 14 kbps and scaling up to best quality at 32 kbps.

The G.729.1 encoder produces an embedded bitstream that consists of 12 layers that can be truncated at the decoder or by any other component of the communication system to adjust the bit rate on the fly to the desired value without the need for outband signaling. This bit rate scalability has the advantages of

  • introducing only negligible complexity and
  • requiring no feedback channel to the encoder

A gateway to the legacy part of the network can simply discard the additional “high-quality” bits and retain the bits related to the legacy codec. Thus,

  • bitstream interoperability is ensured
  • no transcoding is required
  • no additional algorithmic delay is introduced

Both G.729.1’s bandwidth scalability and its bit rate scalability are desirable features for the VoIP infrastructure, especially in highly heterogeneous networks.

Applications

VoIP (IP telephony) including IP phones, other VoIP handsets, softphones, IP PBXs; media servers/gateways; call center equipment; voice recording equipment; test equipment; audio/video conferencing for enterprise corporate networks or for the mass market (like PSTN emulation over xDSL or wireless access); voice messaging servers.

Benefits

  • Interoperability: Since G.729.1 is bitstream interoperable with the widely deployed G.729 narrowband codec, it can deliver encoded narrowband speech to existing G.729 terminals and equipment without transcoding or other additional overhead. At the same time, it will deliver wideband speech to capable devices. It thus facilitates a smooth transition to wideband by allowing phased investment in infrastructure while supporting existing terminals. Moreover, media files can be encoded just once by G.729.1 for playback on both wideband- and narrowband-capable devices.

  • Scalability: The G.729.1 bitstream is scalable from 8 kbps to 32 kbps.
     In a telephone or video conference, different types of connections and/or terminal equipment can be served without major transcoding overhead. Some user devices will only understand the core bitstream of the hierarchical codec, but others will decode a signal of higher quality.
     For storage applications, users could, for example, listen to their voice mail box from different kinds of terminals and always hear the best possible quality.
     At the network level, G.729.1’s bit rate adaptation can be used to reduce the transmitted bit rate in order to avoid congestion and prevent packet loss, which severely impairs overall quality.

  • Robustness to frame errors: Frame erasure concealment (FEC) information in layers 2 and 3 of the G.729.1 bitstream contributes to its good performance under frame erasures, which is crucial for the coder’s targeted application in VoIP networks.

  • Flexibility: G.729.1’s highly flexible structure enables it to process input and output signals sampled at 16000 Hz or 8000 Hz at both the encoder and decoder.

  • Low-delay version available: Besides its normal mode of operation, G.729.1 offers the possibility of low-delay operation for its narrowband modes, i.e., at bit rates of 8 and 12 kbps.

The G.729.1 Standard

ITU-T Recommendation G.729.1 was approved in May 2006. The name G.729.1 was chosen to highlight the compatibility of this new wideband speech and audio codec with the G.729 narrowband speech codec.

Amendment 1, which defines the RTP payload format, capability identifiers and parameters for signaling G.729.1 capabilities when G.729.1 is used by terminals implementing H.245 protocols, was approved on 13 January 2007. Amendment 2, which defines a floating-point arithmetic implementation for use on DSP hardware optimized for floating-point operations, was approved on 13 February 2007.

G.729.1 at a Glance

Standardization Recommended May 2006 by ITU-T
Technology Three-stage coding structure includes:

Embedded Code-Excited Linear Prediction (CELP) coding of the lower band (50–4000 Hz)

Parametric coding of the higher band (4000–7000 Hz) by Time-Domain Bandwidth Extension (TDBWE)

Enhancement of the full band (50–7000 Hz) by a predictive transform coding technique referred to as Time-Domain Aliasing Cancellation (TDAC)

Bit rates 8–32 kbps
Encoded bandwidth At 8 and 12 kbps: 50–4000 Hz

At 14–32 kbps: 50–7000 Hz

Delay Frame size: 20 ms

Algorithmic delay: 48.9375 ms
The contributions to this delay are:

40 ms for the MDCT window (current superframe + lookahead),

5 ms for LPC lookahead, and

3.9375 ms for analysis-synthesis QMF filterbank.

Note that for an encoder in NB INPUT mode and a decoder in NB OUTPUT and LOW DELAY mode, the algorithmic delay is reduced to 25 ms

Quality Scales from toll quality at 8 kbps to rich wideband at 32 kbps
Complexity

Observed worst-case complexity of G.729.1 in DEFAULT mode (in WMOPS using ITU T Software Tool Library STL2005 v2.1 [from G.191])

Rate (kbps)
Encoder
Decoder
Coder
8
 

11.65

 

7.21
 

18.86

12

+ 2.81

14.46

+ 0.03

7.24 + 2.86 21.70
14

+ 1.41

15.87

+ 2.54

9.78 + 3.95 25.65
32

+ 5.57

21.44

+ 4.57

14.35 + 10.14 35.79

This table shows that G.729.1 is scalable in complexity.

RAM/ROM requirements (in 16-bit words):

5 kword for static RAM,

3.7 kword for dynamic RAM,

8.5 kword for data ROM, and

~ 32 kword for program ROM.

Fixed-point C code available
Floating-point C code available