US20080154588A1 - Speech Coding System to Improve Packet Loss Concealment - Google Patents

Speech Coding System to Improve Packet Loss Concealment Download PDF

Info

Publication number
US20080154588A1
US20080154588A1 US11/942,118 US94211807A US2008154588A1 US 20080154588 A1 US20080154588 A1 US 20080154588A1 US 94211807 A US94211807 A US 94211807A US 2008154588 A1 US2008154588 A1 US 2008154588A1
Authority
US
United States
Prior art keywords
pitch
speech
gain
cycle
subframes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/942,118
Other versions
US8010351B2 (en
Inventor
Yang Gao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/942,118 priority Critical patent/US8010351B2/en
Publication of US20080154588A1 publication Critical patent/US20080154588A1/en
Priority to US13/194,982 priority patent/US8688437B2/en
Application granted granted Critical
Publication of US8010351B2 publication Critical patent/US8010351B2/en
Assigned to HUAWEI TECHNOLOGIES CO., LTD. reassignment HUAWEI TECHNOLOGIES CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GAO, YANG
Priority to US14/175,195 priority patent/US9336790B2/en
Priority to US15/136,968 priority patent/US9767810B2/en
Priority to US15/677,027 priority patent/US10083698B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/083Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/09Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor

Definitions

  • the present invention is generally in the field of signal coding.
  • the present invention is in the field of speech coding and specifically in application where packet loss is an important issue during voice packet transmission.
  • the redundancy of speech wave forms may be considered with respect to several different types of speech signal, such as voiced and unvoiced.
  • voiced speech the speech signal is essentially periodic; however, this periodicity may be variable over the duration of a speech segment and the shape of the periodic wave usually changes gradually from segment to segment.
  • a low bit rate speech coding could greatly benefit from exploring such periodicity.
  • the voiced speech period is also called pitch and pitch prediction is often named Long-Term Prediction.
  • the unvoiced speech the signal is more like a random noise and has a smaller amount of predictability.
  • parametric coding may be used to reduce the redundancy of the speech segments by separating the excitation component of the speech from the spectral envelop component.
  • the slowly changing spectral envelope can be represented by Linear Prediction (also called Short-Term Prediction).
  • Linear Prediction also called Short-Term Prediction
  • a low bit rate speech coding could also benefit a lot from exploring such a Short-Term Prediction.
  • the coding advantage arises from the slow rate at which the parameters change. Yet, it is rare for the parameters to be significantly different from the values held within a few milliseconds. Accordingly, at the sampling rate of 8 k Hz or 16 k Hz, the speech coding algorithm is such that the nominal frame duration is in the range of ten to thirty milliseconds.
  • CELP Code Excited Linear Prediction Technique
  • FIG. 1 shows the initial CELP encoder where the weighted error 109 between the synthesized speech 102 and the original speech 101 is minimized by using a so-called analysis-by-synthesis approach.
  • W(z) is the weighting filter 110 .
  • 1/B(z) is a long-term linear prediction filter 105 ;
  • 1/A(z) is a short-term linear prediction filter 103 .
  • the code-excitation 108 which is also called fixed codebook excitation, is scaled by a gain G c 107 before going through the linear filters.
  • FIG. 2 shows the initial decoder which adds the post-processing block 207 after the synthesized speech.
  • FIG. 3 shows the basic CELP encoder which realized the long-term linear prediction by using an adaptive codebook 307 containing the past synthesized excitation 304 .
  • the periodic information of pitch is employed to generate the adaptive component of the excitation.
  • This excitation component is then scaled by a gain G p 305 (also called pitch gain).
  • G p 305 also called pitch gain.
  • the two scaled excitation components are added together before going through the short-term linear prediction filter 303 .
  • the two gains (G p and G c ) need to be quantized and then sent to the decoder.
  • FIG. 4 shows the basic decoder, corresponding to the encoder in FIG. 3 , which adds the post-processing block 408 after the synthesized speech.
  • e p (n) is one subframe of sample series indexed by n, coming from the adaptive codebook 307 which consists of the past excitation 304 ;
  • e c (n) is from the coded excitation codebook 308 (also called fixed codebook) which is the current excitation contribution.
  • the contribution of e p (n) from the adaptive codebook could be dominant and the pitch gain G p 305 is around a value of 1.
  • the excitation is usually updated for each subframe. Typical frame size is 20 milliseconds and typical subframe size is 5 milliseconds.
  • the pitch gain is limited to a maximum value (depending on Class) much smaller than 1, and the coded excitation codebook size should be larger than other subframes within the same frame, or one more stage of code-excitation is added to compensate for the lower pitch gain.
  • a regular CELP algorithm is used for other subframes rather than the first pitch cycle subframes, or for Class 4.
  • the Class index (class number) assigned above to each defined class can be changed without changing the result.
  • FIG. 1 shows the initial CELP encoder.
  • FIG. 2 shows the initial decoder which adds the post-processing block.
  • FIG. 3 shows the basic CELP encoder which realized the long-term linear prediction by using an adaptive codebook.
  • FIG. 4 shows the basic decoder corresponding to the encoder in FIG. 3 .
  • FIG. 5 shows an example that the pitch period is smaller than the subframe size.
  • FIG. 6 shows an example with which the pitch period is larger than the subframe size and smaller than the half frame size.
  • the present invention discloses a switched long-term pitch prediction approach which improves packet loss concealment.
  • the following description contains specific information pertaining to the Code Excited Linear Prediction Technique (CELP).
  • CELP Code Excited Linear Prediction Technique
  • one skilled in the art will recognize that the present invention may be practiced in conjunction with various speech coding algorithms different from those specifically discussed in the present application. Moreover, some of the specific details, which are within the knowledge of a person of ordinary skill in the art, are not discussed to avoid obscuring the present invention.
  • FIG. 1 shows the initial CELP encoder where the weighted error 109 between the synthesized speech 102 and the original speech 101 is minimized often by using a so-called analysis-by-synthesis approach.
  • W(z) is an error weighting filter 110 .
  • 1/B(z) is a long-term linear prediction filter 105 ;
  • 1/A(z) is a short-term linear prediction filter 103 .
  • the coded excitation 108 which is also called fixed codebook excitation, is scaled by a gain G c 107 before going through the linear filters.
  • the short-term linear filter 103 is obtained by analyzing the original signal 101 and represented by a set of coefficients:
  • the weighting filter 110 is somehow related to the above short-term prediction filter.
  • a typical form of the weighting filter could be
  • the long-term prediction 105 depends on pitch and pitch gain; a pitch can be estimated from the original signal, residual signal, or weighted original signal.
  • the long-term prediction function in principal can be expressed as
  • the coded excitation 108 normally consists of pulse-like signal or noise-like signal, which are mathematically constructed or saved in a codebook. Finally, the coded excitation index, quantized gain index, quantized long-term prediction parameter index, and quantized short-term prediction parameter index are transmitted to the decoder.
  • FIG. 2 shows the initial decoder which adds the post-processing block 207 after the synthesized speech 206 .
  • the decoder is a combination of several blocks which are coded excitation 201 , long-term prediction 203 , short-term prediction 205 and post-processing 207 . Every block except post-processing has the same definition as described in the encoder of FIG. 1 .
  • the post-processing could further consist of short-term post-processing and long-term post-processing.
  • FIG. 3 shows the basic CELP encoder which realized the long-term linear prediction by using an adaptive codebook 307 containing the past synthesized excitation 304 .
  • the periodic pitch information is employed to generate the adaptive component of the excitation.
  • This excitation component is then scaled by a gain 305 (G p , also called pitch gain).
  • G p also called pitch gain
  • the two scaled excitation components are added together before going through the short-term linear prediction filter 303 .
  • the two gains (G p and G c ) need to be quantized and then sent to the decoder.
  • FIG. 4 shows the basic decoder corresponding to the encoder in FIG. 3 , which adds the post-processing block 408 after the synthesized speech 407 .
  • This decoder is similar to FIG. 2 except the adaptive codebook 307 .
  • the decoder is a combination of several blocks which are coded excitation 402 , adaptive codebook 401 , short-term prediction 406 and post-processing 408 . Every block except post-processing has the same definition as described in the encoder of FIG. 3 .
  • the post-processing could further consist of short-term post-processing and long-term post-processing.
  • FIG. 3 illustrates a block diagram of an example encoder capable of embodying the present invention.
  • the long-term prediction plays very important role for voiced speech coding because voiced speech has strong periodicity.
  • the adjacent pitch cycles of voiced speech are similar each other, which means mathematically the pitch gain G p 305 in the following excitation express is very high,
  • e p (n) is one subframe of sample series indexed by n, coming from the adaptive codebook 307 which consists of the past excitation 304 ; e c (n) is from the coded excitation codebook 308 (also called fixed codebook) which is the current excitation contribution.
  • the contribution of e p (n) from the adaptive codebook 307 could be dominant and the pitch gain G p 305 is around a value of 1.
  • the excitation is usually updated for each subframe. Typical frame size is 20 milliseconds and typical subframe size is 5 milliseconds.
  • FIG. 5 shows an example that the pitch period 503 is smaller than the subframe size 502 .
  • FIG. 6 shows an example with which the pitch period 603 is larger than the subframe size 602 and smaller than the half frame size.
  • the pitch gain of the first subframe is limited to a value (let's say 0.5) much smaller than 1.
  • the coded excitation codebook size should be larger than other subframes within the same frame, or one more stage of coded excitation is added only for the first subframe, in order to compensate for the lower pitch gain.
  • a regular CELP algorithm is used for other subframes rather than the first subframe.
  • the pitch track and pitch gain are stable within the frame so that pitch and pitch gain can be encoded more efficiently with less number of bits.
  • the pitch gains of the first two subframes are limited to a value (let's say 0.5) much smaller than 1.
  • the coded excitation codebook size should be larger than other subframes within the same frame, or one more stage of code-excitation is added only for the first half frame, in order to compensate for the lower pitch gains.
  • a regular CELP algorithm is used for other subframes rather than the first two subframes. As this is a strong voiced frame, the pitch track and pitch gain are stable within the frame so that they can be coded more efficiently with less number of bits. Class 3: (strong voiced) and (pitch>half frame).
  • the pitch gains of the subframes covering the first pitch cycle are limited to a value smaller than 1; the coded excitation codebook size could be larger than regular size, or one more stage of coded excitation is added, in order to compensate for the lower pitch gains. Since long pitch lag causes the less error propagation and the probability of having long pitch lag is relatively small, just a regular CELP algorithm can be also used for the entire frame. As this is strong voiced frame, the pitch track and pitch gain are stable within the frame so that they can be coded more efficiently with less number of bits.
  • Class 4 all other cases rather than Class 1 Class 2, and Class3.
  • a regular CELP algorithm can be used for all the other cases (exclude Class 1, Class 2, and Class 3.
  • class index (class number) assigned above to each defined class can be changed without changing the result.
  • the error propagation effect due to speech packet loss is reduced by adaptively diminishing pitch correlations at the boundary of speech frames while still keeping significant contributions from the long-term pitch prediction.

Abstract

A method of significantly reducing error propagation due to voice packet loss, while still greatly profiting from long-term pitch prediction, is achieved by adaptively limiting the maximum value of the pitch gain for the first pitch cycle within one frame. A speech coding system for encoding a speech signal, wherein said a plurality of speech frames are classified into said a plurality of classes depending on if the first pitch cycle is included in one subframe or several subframes. The pitch gain is set to a value significantly smaller than 1 for the subframes covering first pitch cycle; wherein the pitch gain reduction is compensated by increasing the coded excitation codebook size or adding one more stage of excitation for the subframes covering the first pitch cycle.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS Provisional Application Number US60/877,172 Provisional Application Number US60/877,173 BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention is generally in the field of signal coding. In particular, the present invention is in the field of speech coding and specifically in application where packet loss is an important issue during voice packet transmission.
  • 2. Background Art
  • Traditionally, all parametric speech coding methods make use of the redundancy inherent in the speech signal to reduce the amount of information that must be sent and to estimate the parameters of speech samples of a signal at short intervals. This redundancy primarily arises from the repetition of speech wave shapes at a quasi-periodic rate, and the slow changing spectral envelop of speech signal.
  • The redundancy of speech wave forms may be considered with respect to several different types of speech signal, such as voiced and unvoiced. For voiced speech, the speech signal is essentially periodic; however, this periodicity may be variable over the duration of a speech segment and the shape of the periodic wave usually changes gradually from segment to segment. A low bit rate speech coding could greatly benefit from exploring such periodicity. The voiced speech period is also called pitch and pitch prediction is often named Long-Term Prediction. As for the unvoiced speech, the signal is more like a random noise and has a smaller amount of predictability.
  • In either case, parametric coding may be used to reduce the redundancy of the speech segments by separating the excitation component of the speech from the spectral envelop component. The slowly changing spectral envelope can be represented by Linear Prediction (also called Short-Term Prediction). A low bit rate speech coding could also benefit a lot from exploring such a Short-Term Prediction. The coding advantage arises from the slow rate at which the parameters change. Yet, it is rare for the parameters to be significantly different from the values held within a few milliseconds. Accordingly, at the sampling rate of 8 k Hz or 16 k Hz, the speech coding algorithm is such that the nominal frame duration is in the range of ten to thirty milliseconds. A frame duration of twenty milliseconds seems to be the most common choice. In more recent well-known standards such as G.723, G.729, EFR or AMR, the Code Excited Linear Prediction Technique (“CELP”) has been adopted; CELP is commonly understood as a technical combination of Coded Excitation, Long-Term Prediction and Short-Term Prediction. Code-Excited Linear Prediction (CELP) Speech Coding is a very popular algorithm principle in speech compression area.
  • FIG. 1 shows the initial CELP encoder where the weighted error 109 between the synthesized speech 102 and the original speech 101 is minimized by using a so-called analysis-by-synthesis approach. W(z) is the weighting filter 110. 1/B(z) is a long-term linear prediction filter 105; 1/A(z) is a short-term linear prediction filter 103. The code-excitation 108, which is also called fixed codebook excitation, is scaled by a gain G c 107 before going through the linear filters.
  • FIG. 2 shows the initial decoder which adds the post-processing block 207 after the synthesized speech.
  • FIG. 3 shows the basic CELP encoder which realized the long-term linear prediction by using an adaptive codebook 307 containing the past synthesized excitation 304. The periodic information of pitch is employed to generate the adaptive component of the excitation. This excitation component is then scaled by a gain Gp 305 (also called pitch gain). The two scaled excitation components are added together before going through the short-term linear prediction filter 303. The two gains (Gp and Gc) need to be quantized and then sent to the decoder.
  • FIG. 4 shows the basic decoder, corresponding to the encoder in FIG. 3, which adds the post-processing block 408 after the synthesized speech.
  • Long-Term Prediction plays very important role for voiced speech coding because voiced speech has strong periodicity. The adjacent pitch cycles of voiced speech are similar each other, which means mathematically the pitch gain Gp in the following excitation express is very high,

  • e(n)=G p ·e p(n)+G c ·e c(n)  (1)
  • where ep(n) is one subframe of sample series indexed by n, coming from the adaptive codebook 307 which consists of the past excitation 304; ec(n) is from the coded excitation codebook 308 (also called fixed codebook) which is the current excitation contribution. For voiced speech, the contribution of ep(n) from the adaptive codebook could be dominant and the pitch gain Gp 305 is around a value of 1. The excitation is usually updated for each subframe. Typical frame size is 20 milliseconds and typical subframe size is 5 milliseconds. If the previous bit-stream packet is lost and the pitch gain Gp is high, the incorrect estimate of the previous synthesized excitation could cause error propagation for quite long time after the decoder has already received the correct bit-stream packet. The partial reason of this error propagation is that the phase relationship between ep(n) and ec(n) has been changed due to the previous bit-stream packet loss. One simple solution to solve this issue is just to completely cut (remove) the pitch contribution between frames; this means the pitch gain Gp is set to zero in the encoder. Although this kind of solution solved the error propagation problem, it sacrifices too much the quality when there is no bit-stream packet loss or it requires much higher bit rate to achieve the same quality. The invention explained in the following will provide a compromised solution.
  • SUMMARY OF THE INVENTION
  • In accordance with the purpose of the present invention as broadly described herein, there is provided method and system for speech coding.
  • For most voiced speech, one frame contains more than 2 pitch cycles. If the speech is very voiced, a compromised solution to avoid the error propagation while still profiting from the significant long-term prediction is to limit the pitch gain maximum value for the first pitch cycle of each frame. We can classify speech signal into different cases and treat them differently. For example, Class 1 is defined as (strong voiced) and (pitch<=subframe size); Class 2 is defined as (strong voiced) and (pitch>subframe & pitch<=half frame); Class 3 is defined as (strong voiced) and (pitch>half frame); Class 4 represents all other cases. In case of Class 1, Class 2, or Class 3, for the subframes which cover the first pitch cycle within the frame, the pitch gain is limited to a maximum value (depending on Class) much smaller than 1, and the coded excitation codebook size should be larger than other subframes within the same frame, or one more stage of code-excitation is added to compensate for the lower pitch gain. For other subframes rather than the first pitch cycle subframes, or for Class 4, a regular CELP algorithm is used. The Class index (class number) assigned above to each defined class can be changed without changing the result.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The features and advantages of the present invention will become more readily apparent to those ordinarily skilled in the art after reviewing the following detailed description and accompanying drawings, wherein:
  • FIG. 1 shows the initial CELP encoder.
  • FIG. 2 shows the initial decoder which adds the post-processing block.
  • FIG. 3 shows the basic CELP encoder which realized the long-term linear prediction by using an adaptive codebook.
  • FIG. 4 shows the basic decoder corresponding to the encoder in FIG. 3.
  • FIG. 5 shows an example that the pitch period is smaller than the subframe size.
  • FIG. 6 shows an example with which the pitch period is larger than the subframe size and smaller than the half frame size.
  • DETAILED DESCRIPTION OF THE INVENTION
  • The present invention discloses a switched long-term pitch prediction approach which improves packet loss concealment. The following description contains specific information pertaining to the Code Excited Linear Prediction Technique (CELP). However, one skilled in the art will recognize that the present invention may be practiced in conjunction with various speech coding algorithms different from those specifically discussed in the present application. Moreover, some of the specific details, which are within the knowledge of a person of ordinary skill in the art, are not discussed to avoid obscuring the present invention.
  • The drawings in the present application and their accompanying detailed description are directed to merely example embodiments of the invention. To maintain brevity, other embodiments of the invention which use the principles of the present invention are not specifically described in the present application and are not specifically illustrated by the present drawings.
  • FIG. 1 shows the initial CELP encoder where the weighted error 109 between the synthesized speech 102 and the original speech 101 is minimized often by using a so-called analysis-by-synthesis approach. W(z) is an error weighting filter 110. 1/B(z) is a long-term linear prediction filter 105; 1/A(z) is a short-term linear prediction filter 103. The coded excitation 108, which is also called fixed codebook excitation, is scaled by a gain G c 107 before going through the linear filters. The short-term linear filter 103 is obtained by analyzing the original signal 101 and represented by a set of coefficients:
  • A ( z ) = i = 1 P 1 + a i · z - i , i = 1 , 2 , , P ( 1 )
  • The weighting filter 110 is somehow related to the above short-term prediction filter. A typical form of the weighting filter could be
  • W ( z ) = A ( z / α ) A ( z / β ) , ( 2 )
  • where β<α, 0<β<1, 0<α≦1. The long-term prediction 105 depends on pitch and pitch gain; a pitch can be estimated from the original signal, residual signal, or weighted original signal. The long-term prediction function in principal can be expressed as

  • B(z)=1−β·z −Pitch  (3)
  • The coded excitation 108 normally consists of pulse-like signal or noise-like signal, which are mathematically constructed or saved in a codebook. Finally, the coded excitation index, quantized gain index, quantized long-term prediction parameter index, and quantized short-term prediction parameter index are transmitted to the decoder.
  • FIG. 2 shows the initial decoder which adds the post-processing block 207 after the synthesized speech 206. The decoder is a combination of several blocks which are coded excitation 201, long-term prediction 203, short-term prediction 205 and post-processing 207. Every block except post-processing has the same definition as described in the encoder of FIG. 1. The post-processing could further consist of short-term post-processing and long-term post-processing.
  • FIG. 3 shows the basic CELP encoder which realized the long-term linear prediction by using an adaptive codebook 307 containing the past synthesized excitation 304. The periodic pitch information is employed to generate the adaptive component of the excitation. This excitation component is then scaled by a gain 305 (Gp, also called pitch gain). The two scaled excitation components are added together before going through the short-term linear prediction filter 303. The two gains (Gp and Gc) need to be quantized and then sent to the decoder.
  • FIG. 4 shows the basic decoder corresponding to the encoder in FIG. 3, which adds the post-processing block 408 after the synthesized speech 407. This decoder is similar to FIG. 2 except the adaptive codebook 307. The decoder is a combination of several blocks which are coded excitation 402, adaptive codebook 401, short-term prediction 406 and post-processing 408. Every block except post-processing has the same definition as described in the encoder of FIG. 3. The post-processing could further consist of short-term post-processing and long-term post-processing.
  • FIG. 3 illustrates a block diagram of an example encoder capable of embodying the present invention. With reference to FIG. 3 and FIG. 4, the long-term prediction plays very important role for voiced speech coding because voiced speech has strong periodicity. The adjacent pitch cycles of voiced speech are similar each other, which means mathematically the pitch gain Gp 305 in the following excitation express is very high,

  • e(n)=G p ·e p(n)+G c ·e c(n)  (4)
  • where ep(n) is one subframe of sample series indexed by n, coming from the adaptive codebook 307 which consists of the past excitation 304; ec(n) is from the coded excitation codebook 308 (also called fixed codebook) which is the current excitation contribution. For voiced speech, the contribution of ep(n) from the adaptive codebook 307 could be dominant and the pitch gain Gp 305 is around a value of 1. The excitation is usually updated for each subframe. Typical frame size is 20 milliseconds and typical subframe size is 5 milliseconds. If the previous bit-stream packet is lost and the pitch gain Gp is high, the incorrect estimate of the previous synthesized excitation could cause error propagation for quite long time after the decoder has already received the correct bit-stream packet. The partial reason of this error propagation is that the phase relationship between ep(n) and ec(n) has been changed due to the previous bit-stream packet loss. One simple solution to solve this issue is just to completely cut (remove) the pitch contribution between frames; this means the pitch gain Gp 305 is set to zero in the encoder. Although this kind of solution solved the error propagation problem, it sacrifices too much the quality when there is no bit-stream packet loss or it requires much higher bit rate to achieve the same quality. The invention explained in the following will provide a compromised solution.
  • For most voiced speech, one frame contains more than 2 pitch cycles. FIG. 5 shows an example that the pitch period 503 is smaller than the subframe size 502. FIG. 6 shows an example with which the pitch period 603 is larger than the subframe size 602 and smaller than the half frame size. If the speech is very voiced, a compromised solution to avoid the error propagation due to the transmission packet loss while still profiting from the significant long-term prediction gain is to limit the pitch gain maximum value for the first pitch cycle of each frame. We can classify speech signal into different cases and treat them differently. Let's have the following example in which valid speech is classified into 4 classes:
  • Class 1: (strong voiced) and (pitch<=subframe size). For this frame, the pitch gain of the first subframe is limited to a value (let's say 0.5) much smaller than 1. For the first subframe, the coded excitation codebook size should be larger than other subframes within the same frame, or one more stage of coded excitation is added only for the first subframe, in order to compensate for the lower pitch gain. For other subframes rather than the first subframe, a regular CELP algorithm is used. As this is a strong voiced frame, the pitch track and pitch gain are stable within the frame so that pitch and pitch gain can be encoded more efficiently with less number of bits.
    Class 2: (strong voiced) and (pitch>subframe & pitch<=half frame). For this frame, the pitch gains of the first two subframes (half frame) are limited to a value (let's say 0.5) much smaller than 1. For the first two subframes, the coded excitation codebook size should be larger than other subframes within the same frame, or one more stage of code-excitation is added only for the first half frame, in order to compensate for the lower pitch gains. For other subframes rather than the first two subframes, a regular CELP algorithm is used. As this is a strong voiced frame, the pitch track and pitch gain are stable within the frame so that they can be coded more efficiently with less number of bits.
    Class 3: (strong voiced) and (pitch>half frame). When the pitch lag is long, the error propagation effect due to the long-term prediction is less significant than short pitch lag case. For this frame, the pitch gains of the subframes covering the first pitch cycle are limited to a value smaller than 1; the coded excitation codebook size could be larger than regular size, or one more stage of coded excitation is added, in order to compensate for the lower pitch gains. Since long pitch lag causes the less error propagation and the probability of having long pitch lag is relatively small, just a regular CELP algorithm can be also used for the entire frame. As this is strong voiced frame, the pitch track and pitch gain are stable within the frame so that they can be coded more efficiently with less number of bits.
  • Class 4: all other cases rather than Class 1 Class 2, and Class3. For all the other cases (exclude Class 1, Class 2, and Class 3), a regular CELP algorithm can be used.
  • The class index (class number) assigned above to each defined class can be changed without changing the result. For example, the condition (strong voiced) and (pitch<=subframe size) can be defined as Class 2 rather than Class 1; the condition (strong voiced) and (pitch>subframe & pitch<=half frame) can be defined as Class 3 rather than Class 2; etc.
  • In general, the error propagation effect due to speech packet loss is reduced by adaptively diminishing pitch correlations at the boundary of speech frames while still keeping significant contributions from the long-term pitch prediction.
  • The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (10)

1. A method of significantly reducing error propagation due to voice packet loss, while still greatly profiting from long-term pitch prediction, is achieved by adaptively limiting the maximum value of the pitch gain for the first pitch cycle within one frame, said one speech frame having a plurality of pitch cycles.
2. The method of claim 1 further comprising the steps of: making the pitch gain of the first pitch cycle significantly smaller than 1; compensating the pitch gain reduction by increasing the coded excitation codebook size or adding one more stage of excitation for the first pitch cycle.
3. The method of claim 2 further setting the pitch gain to a value about 0.5 for the first pitch cycle.
4. The method of claim 2 further keeping the regular pitch gain and the regular coded excitation codebook size for other pitch cycles rather than the first pitch cycle.
5. The method of claim 2, wherein the adaptive pitch gain limitation for the first pitch cycle within one frame is employed for said strong voiced speech.
6. A speech coding system for encoding a speech signal, said speech signal having a plurality of frames, each of said plurality of frames having a plurality of pitch cycles, wherein said a plurality of frames are classified into said a plurality of classes depending on if the first pitch cycle is included in one subframe or several subframes.
7. The method of claim 6 further comprising the steps of: making the pitch gain significantly smaller than 1 for the subframes covering first pitch cycle; compensating the pitch gain reduction by increasing the coded excitation codebook size or adding one more stage of excitation for the subframes covering the first pitch cycle.
8. The method of claim 7 further setting the pitch gain to a value about 0.5 for the subframes covering the first pitch cycle.
9. The method of claim 7 further keeping the regular pitch gain and the regular coded excitation codebook size for other subframes rather than the subframes covering the first pitch cycle.
10. The method of claim 7, wherein the adaptive pitch gain limitation for the subframes covering the first pitch cycle within one frame is employed for said strong voiced speech.
US11/942,118 2006-12-26 2007-11-19 Speech coding system to improve packet loss concealment Active 2030-06-29 US8010351B2 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US11/942,118 US8010351B2 (en) 2006-12-26 2007-11-19 Speech coding system to improve packet loss concealment
US13/194,982 US8688437B2 (en) 2006-12-26 2011-07-31 Packet loss concealment for speech coding
US14/175,195 US9336790B2 (en) 2006-12-26 2014-02-07 Packet loss concealment for speech coding
US15/136,968 US9767810B2 (en) 2006-12-26 2016-04-24 Packet loss concealment for speech coding
US15/677,027 US10083698B2 (en) 2006-12-26 2017-08-15 Packet loss concealment for speech coding

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US87717306P 2006-12-26 2006-12-26
US87717106P 2006-12-26 2006-12-26
US87717206P 2006-12-26 2006-12-26
US11/942,118 US8010351B2 (en) 2006-12-26 2007-11-19 Speech coding system to improve packet loss concealment

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/194,982 Continuation-In-Part US8688437B2 (en) 2006-12-26 2011-07-31 Packet loss concealment for speech coding

Publications (2)

Publication Number Publication Date
US20080154588A1 true US20080154588A1 (en) 2008-06-26
US8010351B2 US8010351B2 (en) 2011-08-30

Family

ID=39544159

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/942,118 Active 2030-06-29 US8010351B2 (en) 2006-12-26 2007-11-19 Speech coding system to improve packet loss concealment

Country Status (1)

Country Link
US (1) US8010351B2 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063803A1 (en) * 2008-09-06 2010-03-11 GH Innovation, Inc. Spectrum Harmonic/Noise Sharpness Control
US20100063810A1 (en) * 2008-09-06 2010-03-11 Huawei Technologies Co., Ltd. Noise-Feedback for Spectral Envelope Quantization
US20100063802A1 (en) * 2008-09-06 2010-03-11 Huawei Technologies Co., Ltd. Adaptive Frequency Prediction
US20100070269A1 (en) * 2008-09-15 2010-03-18 Huawei Technologies Co., Ltd. Adding Second Enhancement Layer to CELP Based Core Layer
US20100070270A1 (en) * 2008-09-15 2010-03-18 GH Innovation, Inc. CELP Post-processing for Music Signals
US20100174538A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech encoding
US20100174537A1 (en) * 2009-01-06 2010-07-08 Skype Limited Speech coding
US20100174534A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech coding
US20100174541A1 (en) * 2009-01-06 2010-07-08 Skype Limited Quantization
US20100174542A1 (en) * 2009-01-06 2010-07-08 Skype Limited Speech coding
US20100174532A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech encoding
US20110077940A1 (en) * 2009-09-29 2011-03-31 Koen Bernard Vos Speech encoding
EP2352146A1 (en) * 2008-12-31 2011-08-03 Huawei Technologies Co., Ltd. Method and device for obtaining pitch gain, encoder and decoder
US20120209599A1 (en) * 2011-02-15 2012-08-16 Vladimir Malenovsky Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec
WO2013016986A1 (en) * 2011-07-31 2013-02-07 中兴通讯股份有限公司 Compensation method and device for frame loss after voiced initial frame
US8396706B2 (en) 2009-01-06 2013-03-12 Skype Speech coding
GB2499505A (en) * 2013-01-15 2013-08-21 Skype Speech signal decoding
US8532998B2 (en) 2008-09-06 2013-09-10 Huawei Technologies Co., Ltd. Selective bandwidth extension for encoding/decoding audio/speech signal
US9626982B2 (en) 2011-02-15 2017-04-18 Voiceage Corporation Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a CELP codec
KR101761629B1 (en) 2009-11-24 2017-07-26 엘지전자 주식회사 Audio signal processing method and device
WO2020146869A1 (en) 2019-01-13 2020-07-16 Huawei Technologies Co., Ltd. High resolution audio coding

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8688437B2 (en) 2006-12-26 2014-04-01 Huawei Technologies Co., Ltd. Packet loss concealment for speech coding
CN108364657B (en) 2013-07-16 2020-10-30 超清编解码有限公司 Method and decoder for processing lost frame
US9418671B2 (en) 2013-08-15 2016-08-16 Huawei Technologies Co., Ltd. Adaptive high-pass post-filter
CN105225666B (en) 2014-06-25 2016-12-28 华为技术有限公司 The method and apparatus processing lost frames

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6556966B1 (en) * 1998-08-24 2003-04-29 Conexant Systems, Inc. Codebook structure for changeable pulse multimode speech coding
US6714907B2 (en) * 1998-08-24 2004-03-30 Mindspeed Technologies, Inc. Codebook structure and search for speech coding
US7117146B2 (en) * 1998-08-24 2006-10-03 Mindspeed Technologies, Inc. System for improved use of pitch enhancement with subcodebooks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6556966B1 (en) * 1998-08-24 2003-04-29 Conexant Systems, Inc. Codebook structure for changeable pulse multimode speech coding
US6714907B2 (en) * 1998-08-24 2004-03-30 Mindspeed Technologies, Inc. Codebook structure and search for speech coding
US7117146B2 (en) * 1998-08-24 2006-10-03 Mindspeed Technologies, Inc. System for improved use of pitch enhancement with subcodebooks

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063803A1 (en) * 2008-09-06 2010-03-11 GH Innovation, Inc. Spectrum Harmonic/Noise Sharpness Control
US8532983B2 (en) 2008-09-06 2013-09-10 Huawei Technologies Co., Ltd. Adaptive frequency prediction for encoding or decoding an audio signal
US20100063802A1 (en) * 2008-09-06 2010-03-11 Huawei Technologies Co., Ltd. Adaptive Frequency Prediction
US8407046B2 (en) 2008-09-06 2013-03-26 Huawei Technologies Co., Ltd. Noise-feedback for spectral envelope quantization
US8532998B2 (en) 2008-09-06 2013-09-10 Huawei Technologies Co., Ltd. Selective bandwidth extension for encoding/decoding audio/speech signal
US8515747B2 (en) 2008-09-06 2013-08-20 Huawei Technologies Co., Ltd. Spectrum harmonic/noise sharpness control
US20100063810A1 (en) * 2008-09-06 2010-03-11 Huawei Technologies Co., Ltd. Noise-Feedback for Spectral Envelope Quantization
US8775169B2 (en) 2008-09-15 2014-07-08 Huawei Technologies Co., Ltd. Adding second enhancement layer to CELP based core layer
US20100070270A1 (en) * 2008-09-15 2010-03-18 GH Innovation, Inc. CELP Post-processing for Music Signals
US8577673B2 (en) 2008-09-15 2013-11-05 Huawei Technologies Co., Ltd. CELP post-processing for music signals
US8515742B2 (en) 2008-09-15 2013-08-20 Huawei Technologies Co., Ltd. Adding second enhancement layer to CELP based core layer
WO2010031049A1 (en) * 2008-09-15 2010-03-18 GH Innovation, Inc. Improving celp post-processing for music signals
US20100070269A1 (en) * 2008-09-15 2010-03-18 Huawei Technologies Co., Ltd. Adding Second Enhancement Layer to CELP Based Core Layer
EP2352146A4 (en) * 2008-12-31 2012-05-09 Huawei Tech Co Ltd Method and device for obtaining pitch gain, encoder and decoder
EP2352146A1 (en) * 2008-12-31 2011-08-03 Huawei Technologies Co., Ltd. Method and device for obtaining pitch gain, encoder and decoder
US20110218800A1 (en) * 2008-12-31 2011-09-08 Huawei Technologies Co., Ltd. Method and apparatus for obtaining pitch gain, and coder and decoder
WO2010079167A1 (en) * 2009-01-06 2010-07-15 Skype Limited Speech coding
US8655653B2 (en) 2009-01-06 2014-02-18 Skype Speech coding by quantizing with random-noise signal
US10026411B2 (en) 2009-01-06 2018-07-17 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US8392178B2 (en) 2009-01-06 2013-03-05 Skype Pitch lag vectors for speech encoding
US8396706B2 (en) 2009-01-06 2013-03-12 Skype Speech coding
GB2466672B (en) * 2009-01-06 2013-03-13 Skype Speech coding
US9530423B2 (en) 2009-01-06 2016-12-27 Skype Speech encoding by determining a quantization gain based on inverse of a pitch correlation
US8433563B2 (en) 2009-01-06 2013-04-30 Skype Predictive speech signal coding
US9263051B2 (en) 2009-01-06 2016-02-16 Skype Speech coding by quantizing with random-noise signal
US8463604B2 (en) 2009-01-06 2013-06-11 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US20100174532A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech encoding
US20100174542A1 (en) * 2009-01-06 2010-07-08 Skype Limited Speech coding
US8849658B2 (en) 2009-01-06 2014-09-30 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US20100174541A1 (en) * 2009-01-06 2010-07-08 Skype Limited Quantization
US20100174534A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech coding
US20100174537A1 (en) * 2009-01-06 2010-07-08 Skype Limited Speech coding
US20100174538A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech encoding
US8670981B2 (en) 2009-01-06 2014-03-11 Skype Speech encoding and decoding utilizing line spectral frequency interpolation
US8639504B2 (en) 2009-01-06 2014-01-28 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US8452606B2 (en) 2009-09-29 2013-05-28 Skype Speech encoding using multiple bit rates
US20110077940A1 (en) * 2009-09-29 2011-03-31 Koen Bernard Vos Speech encoding
KR101761629B1 (en) 2009-11-24 2017-07-26 엘지전자 주식회사 Audio signal processing method and device
US9076443B2 (en) * 2011-02-15 2015-07-07 Voiceage Corporation Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a CELP codec
US9911425B2 (en) 2011-02-15 2018-03-06 Voiceage Corporation Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a CELP codec
US10115408B2 (en) 2011-02-15 2018-10-30 Voiceage Corporation Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a CELP codec
CN103392203A (en) * 2011-02-15 2013-11-13 沃伊斯亚吉公司 Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec
US9626982B2 (en) 2011-02-15 2017-04-18 Voiceage Corporation Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a CELP codec
US20120209599A1 (en) * 2011-02-15 2012-08-16 Vladimir Malenovsky Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec
WO2013016986A1 (en) * 2011-07-31 2013-02-07 中兴通讯股份有限公司 Compensation method and device for frame loss after voiced initial frame
GB2499505B (en) * 2013-01-15 2014-01-08 Skype Speech coding
GB2499505A (en) * 2013-01-15 2013-08-21 Skype Speech signal decoding
WO2020146869A1 (en) 2019-01-13 2020-07-16 Huawei Technologies Co., Ltd. High resolution audio coding
CN113302684A (en) * 2019-01-13 2021-08-24 华为技术有限公司 High resolution audio coding and decoding
US20210343303A1 (en) * 2019-01-13 2021-11-04 Huawei Technologies Co., Ltd. High resolution audio coding
EP3903308A4 (en) * 2019-01-13 2022-02-23 Huawei Technologies Co., Ltd. High resolution audio coding
JP2022517234A (en) * 2019-01-13 2022-03-07 華為技術有限公司 High resolution audio coding
JP7266689B2 (en) 2019-01-13 2023-04-28 華為技術有限公司 High resolution audio encoding
US11749290B2 (en) * 2019-01-13 2023-09-05 Huawei Technologies Co., Ltd. High resolution audio coding for improving package loss concealment

Also Published As

Publication number Publication date
US8010351B2 (en) 2011-08-30

Similar Documents

Publication Publication Date Title
US8010351B2 (en) Speech coding system to improve packet loss concealment
US10083698B2 (en) Packet loss concealment for speech coding
CN101180676B (en) Methods and apparatus for quantization of spectral envelope representation
US9153237B2 (en) Audio signal processing method and device
US6510407B1 (en) Method and apparatus for variable rate coding of speech
US10482892B2 (en) Very short pitch detection and coding
US7324937B2 (en) Method for packet loss and/or frame erasure concealment in a voice communication system
KR20090073253A (en) Method and device for coding transition frames in speech signals
EP2798631B1 (en) Adaptively encoding pitch lag for voiced speech
US20100332232A1 (en) Method and device for updating status of synthesis filters
McCree et al. A 1.7 kb/s MELP coder with improved analysis and quantization
US8175870B2 (en) Dual-pulse excited linear prediction for speech coding
US8000961B2 (en) Gain quantization system for speech coding to improve packet loss concealment
JP2001051699A (en) Device and method for coding/decoding voice containing silence voice coding and storage medium recording program
US8160890B2 (en) Audio signal coding method and decoding method
September Packet loss concealment for speech coding
KR20000014008A (en) Method for diminishing a fixed code book gain when a continuous frame error is generated at a codec

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: HUAWEI TECHNOLOGIES CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GAO, YANG;REEL/FRAME:027519/0082

Effective date: 20111130

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12