• 黄淳的专栏作者中国国家地理网 2019-06-12
  • 坚守传统手工包粽 探访沪上老字号粽子生产车间 2019-06-10
  • 满满的都是屏 OPPO妹子最爱手机曝光 2019-06-10
  • 北京进入旅游旺季 警察提示游客需防揽客者连环设套忽悠购物 2019-06-08
  • 我写文章不是为了别人的赞许,是为了讨论问题,让人有思考的价值,就像你网名一样,探寻真理。我并非就全盘赞成市场经济,只是在讨论它的合理性,在文中也提问,“既然我们 2019-06-08
  • 山东假存单揽储1.6亿难追回,该省另一相似案件银行被判赔 2019-05-31
  • 天上不会掉馅饼,想要富起来,就要把别人的据为己有,能把别人的据为己有的问世间能有谁,能有几人,所谓的专家明白了吗。 2019-05-31
  • 乌鲁木齐市中级人民法院庭审在线直播 2019-05-21
  • 第530期:为何吃新鲜蔬果能抗肿瘤、强免疫……?因为有“它”! 2019-05-19
  • 世界杯大中华区官方票代谈假票门黄牛倒票行为 2019-05-19
  • 英媒:漫步北京上海,仿佛踏入未来 2019-05-17
  • 2017年各地领导干部给网友书写23万封回信 2019-05-17
  • 回到1396年的波斯街头文章中国国家地理网 2019-04-23
  • 送你一份时代天街附近必吃的火锅名单 2019-04-23
  • 幻想“暴富式创业”侮辱谁的智商? 2019-04-13
  • 江苏七位体彩开奖结果
    学霸学习网 这下你爽了
    相关文章
    当前位置:江苏七位体彩开奖结果 >> >>

    排列五开奖号码走勢图:1 PAPER Special Section on Digital Signal Processing Speech Enhancement Using Nonlinear Mic_图文

    江苏七位体彩开奖结果 www.jwbw.net IEICE TRANS. FUNDAMENTALS, VOL. E00–A, NO. 1 JANUARY 1999

    1

    Speech Enhancement Using Nonlinear Microphone Array Based on Complementary Beamforming
    Hiroshi SARUWATARI? , Shoji KAJITA?? , Kazuya TAKEDA? , and Fumitada ITAKURA?? , Members

    PAPER

    Special Section on Digital Signal Processing

    SUMMARY This paper describes a spatial spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this paper, it is shown that the nonlinear subtraction processing with complementary beamforming can result in a kind of the spectral subtraction without the need for speech pause detection. In addition, the optimization algorithm for the directivity pattern is also described. To evaluate the e?ectiveness, speech enhancement experiments and speech recognition experiments are performed based on computer simulations under both stationary and nonstationary noise conditions. In comparison with the optimized conventional delayand-sum (DS) array, it is shown that: (1) the proposed array improves the signal-to-noise ratio (SNR) of degraded speech by about 2 dB and performs more than 20% better in word recognition rates under the conditions that the white Gaussian noise with the input SNR of ?5 or ?10 dB is used, (2) the proposed array performs more than 5% better in word recognition rates under the nonstationary noise conditions. Also, it is shown that these improvements of the proposed array are same as or superior to those of the conventional spectral subtraction method cascaded with the DS array. key words: speech enhancement, microphone array, complementary beamforming, spectral subtraction, speech recognition

    1.

    Introduction

    Noises in the real acoustic environment, such as air conditioner noises and computer room noises, signi?cantly degrade the performance of speech recognition. One approach to establish a noise robust man-machine interface using speech recognition in the real world is to enhance the speech signals by using noise reduction techniques. Among various noise reduction methods, a microphone array system is one of the most e?ective techniques[1], [2]. The delay-and-sum (DS) array [3]–[6] and the adaptive array[7], [8] are the conventional and popular microphone arrays used for noise reduction. However, the DS array must use a large number of microphones and much computational costs to achieve high
    Manuscript received January 1, 1999. Manuscript revised January 1, 1999. ? The authors are with Department of Information Electronics, Graduate School of Engineering, Nagoya University, Nagoya-shi, 464–8603 Japan. ?? The authors are with Center for Information Media Studies, Nagoya University, Nagoya-shi, 464–8603 Japan.

    performance, especially in low frequency regions. Also, the adaptive array is weak in dealing with the moving noise and nonstationary noise. To achieve further improvement, several microphone arrays combined with nonlinear speech processing, such as the Spectral Subtraction (SS) method[9], have been proposed in the recent works[10]–[12]. In these methods, however, other problems exist in terms of degradations of speech quality due to the speech pause detection error or the misestimation of noise directions. Another microphone array combined with nonlinear processing based on neural networks have been proposed[13]. In this method, however, the analysis is performed under the condition that only narrow-band signals, e.g., simple sinusoidal signals, are assumed as the arriving signals, and the effectiveness of this method for wide-band signals such as speech are not reported. This paper describes a spatial SS method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming, which has been proposed in the development of an ultrasonic imaging array by one of the authors[14], is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this paper, it is shown that the nonlinear subtraction processing with complementary beamforming can result in a kind of the SS without the need for speech pause detection or the estimation of noise directions. In addition, the optimization algorithm for the directivity pattern is also described. Using these techniques, lower sidelobes can be achieved even in low frequency regions compared with those of the conventional DS array. This paper is constructed as follows. In the following section, conventional SS method and DS array are described. In Section 3, the nonlinear microphone array and its optimization algorithm for directivity patterns are described. In Section 4, some experiments based on computer simulations are performed. After discussions on the results of the experiments, we conclude this paper in Section 5.

    IEICE TRANS. FUNDAMENTALS, VOL. E00–A, NO. 1 JANUARY 1999

    2

    2.

    Conventional Noise Reduction Methods and Their Problems

    2.1 Spectral Subtraction Method For a single-channel input, the most common speech enhancement algorithm is spectral subtraction (SS)[9], which estimates the short-time spectral amplitude of noise and subtracts from that of noisy input signal. A generalized form of SS is given by[15] S (SS) (f ) = |O(f )|p ? E |N (f )|p
    1/p

    Look direction Rd ( f ) Element of θ d microphone array x =x k Direction of arrival x
    0
    Fig. 1 Con?guration example of a microphone array and acoustic signals. xk stands for the coordinate of each element, Rd (f ) stands for the arriving signal from the direction of arrival θd , and the look direction is set to be normal to the array (θ = 0).

    · ejψ(f ) , (1)

    where S (SS) (f ), O(f ) and N (f ) are the spectrum of the recovered speech, noisy speech and noise signal, respectively. ψ (f ) is the phase function of the observed noise signal, i.e., O(f ). E[ · ] denotes the expectation operation and is usually replaced by time-averaging under the assumption that the noise is stationary. Power exponent parameter p is typically set to be 1 or 2. In most implementations, half-wave recti?cation or simple ?ooring is adopted to avoid negative amplitudes of the resultant spectrum. Since it is necessary to estimate the noise spectrum, SS inherently has the following problems. (1) Accurate estimation of noise spectrum, therefore the total performance, is greatly a?ected by pause detection results. (2) It can not deal with nonstationary noise contamination. 2.2 Delay-and-Sum Array In multi-channel speech enhancement, the delay-andsum (DS) array is one of the most common techniques. This section describes the principle of the DS array and its problems. In this study, a straight-line array is assumed. The coordinates of the elements are designated as xk (k = 1, · · · , K ), and the arriving signal spectrum for each direction of arrival θd (d = 1, · · · , D) is designated as Rd (f ) (see Fig. 1). Also, the look direction is set to be normal to the array (θ = 0). In the following development, it is assumed that each sound source is located at a su?cient distance away so that the plane-wave approximation holds for the arriving signals. In the conventional DS array processing, the resultant array output is obtained by adding the weighted output values of each element[16]. Thus, the array output signal of the DS array is described in the frequency domain as S (DS) (f ) = g · o(f ) g ≡ g1 , · · · , gk , · · · , gK o(f ) ≡ O1 (f ), · · · , Ok (f ), · · · , OK (f ) ,
    T

    the elements xk , and g is the weight vector of element. The superscript T denotes transposition of the vector. The observation signal Ok (f ) at xk includes the arriving signals Rd (f ) (d = 1, · · · , D) shifted by each time di?erence from the origin, xk sin(θd )/c (where c is the velocity of sound). Thus, the observation signal vector o(f ) can be expressed by o(f ) = a1 (f ), · · · , ad (f ), · · · , aD (f ) · r (f ) (5) ad (f ) ≡ a1,d (f ), · · · , ak,d (f ), · · · , aK,d (f ) ak,d (f ) ≡ exp [j 2πf · xk · sin (θd ) /c] r (f ) ≡ R1 (f ), · · · , Rd (f ), · · · , RD (f ) ,
    T T

    (6) (7) (8)

    where ad (f ) corresponds to the phase di?erence for the signal coming from the direction of arrival θd , at the coordinates of each element xk ; ad (f ) is generally called the steering vector. r(f ) is the arriving signal vector for each direction. By combining Eqs. (2) and (5), the relationship between the array output signal S (DS) (f ) and the arriving signal vector r (f ) is obtained as follows: S (DS) (f ) = ga1 (f ), · · · , gad (f ), · · · , gaD (f ) · r(f ). (9) Equation (9) shows that the arriving signal in each direction Rd (f ) is observed as the weighted signal by gad (f ), which is called the directivity pattern of the array. To obtain the target signal arriving from the look direction only, the directivity pattern must be designed so as to produce a narrow mainlobe and low sidelobes. However, it is extremely di?cult to realize both narrow mainlobe and low sidelobes, especially in low frequency regions. 3. Proposed Algorithm

    (2) (3) (4)

    where S (DS) (f ) is the array output signal, o(f ) is the observation signal vector speci?ed by the coordinates of

    In this section, a new nonlinear microphone array based on the complementary beamforming technique and its optimization algorithm are proposed.

    SARUWATARI et al: SPEECH ENHANCEMENT USING COMPLEMENTARY BEAMFORMING MICROPHONE ARRAY

    3

    Gain

    1

    g Directivity pattern of h
    Directivity pattern of

    If the directivity patterns |gad (f )| and |had (f )| are designed to be complementary, and if there is no correlation among arriving signals, the following approximation holds: E {gad (f ) + had (f )}· Nd(f )
    d∈? 2

    ?90

    0

    90

    θ[ ]


    d∈?g

    |gad (f )|2 · E |Nd (f )|2 +
    d∈?h

    Fig. 2 Example of directivity patterns using the complementary beamforming. The solid line shows the directivity pattern formed by using the weight vector g , and the dotted line shows the directivity pattern formed by using the weight vector h.

    |had (f )|2 · E |Nd (f )|2 {gad (f ) ? had (f )}· Nd (f )
    d∈? 2 2

    ≈ E

    3.1 Nonlinear Microphone Array with Complementary Beamforming First, complementary beamforming proposed in Ref. [14] will be described. In this method, using two types of complementary weight vectors of element g = [g1 ,· · ·, gK ] and h = [h1 ,· · ·, hK ], the signals S (g) (f ) and S (h) (f ) are constructed. The term, “complementary ”, represents one of the following conditions: “directivity pattern gain |gad (f )| ? directivity pattern gain |had (f )|” or “directivity pattern gain |gad (f )| ? directivity pattern gain |had (f )|” for an arbitrary direction d (see Fig. 2). The exception is that the gain of both directivity patterns is unity for the look direction. Hence, from Eq. (9), the signals S (g) (f ) and S (h) (f ) can be expressed by S (g) (f ) = S0 (f ) +
    d∈?

    = E S (r) (f )

    (15)

    ?g ≡ d d for |gad (f )| ? |had (f )| ?h ≡ d d for |gad (f )| ? |had (f )| . Therefore, the expectation value of the power spectrum of the noise component in the primary signal (the second term on the right hand side of Eq. (13)) can be approximated by that of the reference signal. Using the primary and reference signals, without any speech pause detection, we can construct a spatial SS processing by X (f ) ≡ 1 · |S (p) (f )|2 ? E |S (r) (f )|2 2
    1 /2

    · ejφ(f ) , (16)

    gad (f ) · Nd (f ) had (f ) · Nd (f ),
    d∈?

    (10) (11)

    S

    (h)

    (f ) = S0 (f ) +

    where S0 (f ) is the target speech signal arriving from the look direction, and Nd (f ) is the noise signal arriving from other direction. ? is de?ned as the set of direction numbers except for the direction number d0 corresponding to the look direction, θd =0 (≡ θd0 ), i.e., it can be given as ? ≡ d d = 1, · · · , D ; d = | d0 ; θd0 = 0 . (12) Next, using the complementary beamforming, a new spectral subtraction microphone array without the need for speech pause detection is proposed. In this array processing, the sum of Eqs. (10) and (11) is designated as the primary signal, S (p) (f ), and the di?erence is designated as the reference signal, S (r) (f ). These can be given as S (p) (f ) = 2S0 (f ) +
    d∈?

    where X (f ) represents the complex spectrum of the speech signal recovered by the proposed method. Also, φ(f ) is an appropriate phase function; for example, the phase function can be obtained by a conventional DS beamformer. Figure 3 shows the block diagram of this array system. In this algorithm, noise reduction processing is conducted frame by frame, and the expectation value of |S (r) (f )|2 in Eq. (16) is approximately calculated by averaging the power spectra of reference signals over some frames. This interframe-averaged power spectrum is designated as |S (r) (f )|2 hereafter. 3.2 Restriction Algorithm for Over-subtraction In the proposed array processing, when confronted with the nonstationary noises, the instantaneous power spectrum of the reference signal at a frame, |S (r) (f )|2 , can be frequently smaller than the interframe-averaged power spectrum of the reference signal, |S (r) (f )|2 . In such a case, the over-estimation about E[|S (r) (f )|2 ] and over-subtraction from the primary signal in Eq. (16) arise. This over-subtraction causes a degradation of the recovered speech quality. To avoid this degradation, the following algorithm is introduced in Eq. (16) and conducted frame by

    {gad (f )+ had (f )}· Nd(f ) {gad (f ) ? had (f )}· Nd(f ).

    (13) (14)

    S (r) (f ) =
    d∈?

    IEICE TRANS. FUNDAMENTALS, VOL. E00–A, NO. 1 JANUARY 1999

    4

    stDFT stDFT stDFT Short time DFT

    g1 g2 gK h1 h2 hK

    Σ

    Primary Σ | |2

    Σ

    1 1/2 2| | X( f ) Array output

    Array input

    Σ

    Σ | | 2 E[ ] Reference Phase φ( f )

    Conventional beamformer

    Fig. 3 Block diagram of nonlinear microphone array with complementary beamforming shown in Eq. (16).

    where superscript ? denotes complex conjugate. Eq. (18) shows that the gain for the target signal is one, and the gain for the noise is |gad (f ) · had (f )|. Accordingly, to reduce the noise component in Eq. (18), it is not necessary to produce small |gad (f )| and |had (f )| for d ∈ ? individually, but to design them so as to obtain a small |gad (f ) · had (f )| for d ∈ ?. Taking advantage of this complementary characteristic, the degrees of freedom in the design of weight vectors can be improved[14]. In this paper, the optimization method of directivity patterns proposed in Ref. [17] is used. First, let us de?ne the following vector as a function which represents the product of directivity pattern values. f (g , h) ≡ ga1 (f )ha1 (f ), · · · , gad (f )had (f ),
    T

    frame. ? 1 /2 ? 1 ? · |S (p) (f )|2?|S (r) (f )|2 , ? ? 2 ? ? (r) 2 ? if β · |S (f )| > |S (r) (f )|2 ? ? ? ? 1 (p) ? (f )|2? |S (r) (f )|2 ? 2 · |S ? ? ? (otherwise),
    1 /2

    · · · , gaD (f )haD (f )

    (19)

    |X (f )| ≡

    ,

    (17)

    We also de?ne a vector with the desired directivity T pattern, q ≡ [q1 ,· · ·, qd ,· · ·, qD ] . Using these vectors, the second term on the right hand side of Eq. (18) is optimized using the criterion of the weighted square norm minimum. More practically, the constrained least squares problem shown in Eqs. (20) and (21) is solved. min g ,h W (d) · q ? W (d) · f (g, h)
    2

    where β is a threshold parameter to decide if |S (r) (f )|2 on the current frame is appropriate to use in the subtraction processing. In the experiments, β is set to be 0.1. In this algorithm, the instantaneous power spectrum of the reference signal itself is used in the subtraction processing when the power spectrum of the reference signal is decided to be su?ciently smaller than its averaged value. Thus, using this procedure, the restriction for the over-subtraction is achieved. 3.3 Optimization of Directivity Patterns To make Eq. (16) proper as an estimation of the recovered signal, it is necessary to design the complementary directivity patterns |gad (f )| and |had (f )|. To achieve this requirement, we design the directivity patterns for g and h so that the noise component in the expectation value of the power spectrum in the primary signal is decreased. Using E[|S (p) (f )|2 ] instead of |S (p) (f )|2 in Eq. (16), the estimated power spectrum of Eq. (16) is given as ? (f )|2 |X 1 = · E |S (p) (f )|2 ? E |S (r) (f )|2 4 = E |S0 (f )|2 +
    d∈? 2 < = E |S0 (f )| + d∈?

    (20) (21)

    subject to gad0 (f ) = had0 (f ) = (qd0 )1/2 ad0 (f ) = [1, · · · , 1]T

    Here, W (d) represents the following weighting matrix for each direction. W (d) ≡ diag(w1 , · · · , wd , · · · , wD ) (22) Since Eq. (20) is a problem of nonlinear minimization, it can be minimized using the iterative method. To express the value of i th step in the iterations explicitly, the weight vectors of element g and h are rewritten as g i and hi , respectively. As for an iterative method, the Gauss-Newton method[18] is used. The weight vectors of element in the (i + 1) th step are given as g i+1 , hi+1
    T

    = g i , hi

    T

    + α (W J i ) W ei , (23)

    +

    Re gad (f ) · (had (f ))? · E |Nd (f )|2 |gad (f ) · had (f )|· E |Nd (f )|2 , (18)

    where superscript + denotes the pseudo-inverse matrix and α (0 < α < = 1) is the step size parameter for iterations. J i is a Jacobian matrix and ei is an error vector. These can be given as ? ? (g ) (h) Ji Ji ? ? (24) J i ≡ ? J (c) 0 ? i (c) 0 Ji ? ? q ? f (g i , hi ) ei ≡ ? (qd0 )1/2 ? g i · ad0 (f ) ? , (25) (qd0 )1/2 ? hi · ad0 (f ) where J i is a Jacobian matrix with respect to the vec(h) tor function f (g i , hi ) and g i , J i is a Jacobian matrix
    (g )

    SARUWATARI et al: SPEECH ENHANCEMENT USING COMPLEMENTARY BEAMFORMING MICROPHONE ARRAY

    5

    with respect to the vector function f (gi , hi ) and hi , (c) and J i is a Jacobian matrix (vector) with respect to the constraint given as Eq. (21), respectively. Using Eqs. (19) and (21), the elements of these matrices are expressed as follows: Ji Ji
    (g ) d,k

    =

    (h) d,k

    Ji

    (c) k

    ? [f (g i , hi )]d = hi ad (f )ak,d (f ) (26) ? [g i ]k ? [f (g i , hi )]d = = g i ad (f )ak,d (f ) (27) ? [hi ]k ? g i ad0 (f ) ? hi ad0 (f ) = = = ak,d0 (f ), ? [g i ]k ? [hi ]k (28)

    where [ · ]d,k represents the element in the d th row, the k th column of the argument matrix, and [ · ]k represents the k th element of the argument vector. Besides, in Eq. (23), the matrix W is de?ned as follows: W ≡ block diag(W (d) , W (c) ) W (c) ≡ diag(w(c) , w(c) ), (29) (30)

    product characteristics, |gad (f ) · had (f )| (solid line) are shown in Fig. 4 (the design frequency f is set to be 2 kHz). As shown in Fig. 4, the iterative improvement works so as to reduce the magnitude of sidelobes in |gad (f ) · had (f )| as the number of iterations increases. The directivity patterns obtained at 40 iterations were designated as the resultant directivity patterns because the squared error converged at 40 iterations for each frequency. The solid lines in Fig. 5 show the resultant directivity patterns |gad (f ) · had (f )| for 1, 3 and 5 kHz as typical frequencies. For comparison, we also plot the optimized directivity patterns for a conventional DS array based on a single weight vector. It is evident from Fig. 5 that the ability of the sidelobe reduction in the proposed array is improved by about 5 dB for each frequency region. 4. Experiments and Results

    where W (c) is the weighting matrix for the constraint and its element w(c) is set to be su?ciently larger than wd . 3.4 Directivity Pattern Design 3.4.1 Design Condition

    In this section, computer simulations were performed to examine the applicability of the proposed method. The performance of the proposed array shown in Fig. 5 is compared with those of the optimized conventional DS array shown in Fig. 5 and the conventional SS method cascaded with the DS array (DS-SS method) from two standpoints: (1) an objective evaluation of recovered speech quality, (2) a word recognition test. The results of the experiments using the white Gaussian noise as an interference are described in Sect. 4.2 and the results using the nonstationary noise are described in Sect. 4.3. 4.1 Conditions for Experiments All sound data prepared in this experiments were sampled at 12 kHz with 16 bit resolution. To remove the noise components in lower frequency regions, which cannot be reduced by the conventional DS or proposed arrays, all sound data received by microphones are ?ltered by a highpass ?lter. The ?lter has the gradual transition spectral envelope as follows: the cut o? frequency is set to be 500 Hz and the transient characteristic is 14 dB/oct. In the proposed array, noise reduction processing is conducted frame by frame under the following conditions: the frame length is 21.3 msec, the frame shift is half of the frame length, and the window function is rectangular. The interframe-averaged power spectrum, |S (r) (f )|2 , is calculated by averaging the power spectra of reference signal over 10 frames. As for the SS procedure in the DS-SS method, the conditions about the frame length, frame shift, and the window function are same as those of the proposed array. The power exponent parameter, p, in Eq. (1) is set to be 2. As described in Sect. 2.1, the conventional SS method generally requires the speech pause detection technique to estimate the averaged power spectrum of noise. However, especially when confronted

    An eight-element array with the interelement spacing of 5 cm is assumed in the design and the weight vectors are calculated using Eq. (23) for each frequency independently. As common design conditions for each frequency, the desired directivity pattern, qd , is obtained by setting the value of 1 for the look direction (qd0 =1) and 0 for other directions. As for the weighting matrix W (d) , wd for from ?90 to ?14? and from 14 to 90? are set to be 100 and those for other directions are set to be 1. As for the constraint weighting matrix W (c) , w(c) is set to be 300. For the purpose of stabilization, the step size parameter for iterations in the Gauss-Newton method is set to be 0.1. 3.4.2 Initial Condition and Design Example

    In the directivity pattern design, appropriate initial values must be selected before the iterative designing. In this study, we precedently design two weight vectors which have di?erent directivity patterns for each direction, and the iteration is started using these initial values. As examples at the initial and 20th iteration, the directivity patterns formed by using the weight vector g (broken line), the directivity patterns formed by using the the weight vector h (dash-dotted line), and these

    IEICE TRANS. FUNDAMENTALS, VOL. E00–A, NO. 1 JANUARY 1999

    6

    -80 0 -5 -10 -15 -20 -25 -30 -35 -40 -80

    -60

    -40

    -20 0 20 ANGLE [degrees]

    40

    60 g h gxh

    80

    20-iteration

    GAIN [dB]

    AMPLITUDE

    5 0 -5 -10 -15 -20 -25 -30 -35

    0-iteration

    GAIN [dB]

    g h gxh

    (a) Original speech

    (b) Noisy speech

    (c) Recovered speech by proposed array

    -60

    -40

    -20 0 20 ANGLE [degrees]

    40

    60

    80

    0

    0.2

    0.4

    Fig. 4 Directivity patterns designed by using Eq. (23) at initial condition (top), and at 20th iteration (bottom). An eightelement array with an interelement spacing of 5 cm is assumed and the frequency f is set to be 2 kHz. In each ?gure, the broken lines show the directivity patterns formed by using the weight vector g , the dash-dotted lines show the directivity patterns formed by using the weight vector h, and the solid lines show the directivity patterns |gad (f ) · had (f )| in the proposed array.

    0.6 0.8 1.0 TIME [sec]

    1.2

    1.4

    1.6

    Fig. 6 Waveform examples, (a) original speech, (b) noisy speech at a microphone (white Gaussian noise is added with the input SNR of 0 dB), (c) recovered speech by proposed array.

    0 -5 -10 -15 -20 -25 -30 -35 -40 0 -5 -10 -15 -20 -25 -30 -35 -40 0 -5 -10 -15 -20 -25 -30 -35 -40

    f =1kHz

    GAIN [dB]

    gxh Conventional

    the target speech materials are heuristically used in the conventional SS procedure. Thus, the ideal SS method which is free from the speech pause detection error will be compared with the proposed method. 4.2 Experiment Using White Gaussian Noise

    -80

    -60

    -40

    -20 0 20 ANGLE [degrees]

    40

    60

    80

    4.2.1

    Objective Evaluation

    f =3kHz

    GAIN [dB]

    gxh Conventional

    -80

    -60

    -40

    -20 0 20 ANGLE [degrees]

    40

    60

    80

    f =5kHz

    GAIN [dB]

    gxh Conventional

    -80

    -60

    -40

    -20 0 20 ANGLE [degrees]

    40

    60

    80

    Fig. 5 Resultant directivity patterns at 1 kHz (top), at 3 kHz (middle), and at 5 kHz (bottom). In each ?gure, the solid lines show the directivity patterns |gad (f ) · had (f )| in the proposed array and the broken lines show the directivity patterns of the optimized conventional DS array.

    with nonstationary noises, it is di?cult to detect the speech pause automatically. To avoid this problem, in the experiments, speech pause of the leading 200 ms in

    Noisy signals were generated by arti?cially adding the white Gaussian noise to a clean speech signal with various signal-to-noise ratios (SNRs) ranging from ?10 to 10 dB. The noise is assumed to arrive from a single direction selected from between 20 and 80? . As for the speech material, a Japanese sentence (/arayuru geNjitsu o subete jibuN no ho:e nejimagetanoda/) of a female speaker in the ASJ continuous speech corpus for research[19] is used. To illustrate the waveform examples used in the experiments, the original speech, the noisy speech at a single microphone (input SNR of 0 dB and noise direction of 50? ) and the resultant speech of proposed array are shown in Fig. 6. The output SNRs for di?erent noise directions are shown in Figure 7, where the input SNR is 0 dB. Also, to illustrate the behavior of the proposed array between the di?erent input SNRs, the noise reduction rate, de?ned as output SNR in dB minus input SNR in dB, for the noise direction of 50? is shown in Figure 8. In both Figs. 7 and 8, the solid lines show the results of the proposed array, the broken lines show those of the optimized conventional DS array, and the dash-dotted lines show those of the DS-SS method. From these ?gures, it is evident that the abilities of noise reduction in the proposed array is comparable with the conven-

    SARUWATARI et al: SPEECH ENHANCEMENT USING COMPLEMENTARY BEAMFORMING MICROPHONE ARRAY

    7 Table 1 Analysis Conditions for CSR Experiments 25 msec 10 msec Hamming window 12 MFCC[21] + ?MFCC + ??MFCC ?POWER + ??POWER 68 no grammar

    SNR of OUTPUT SIGNAL [dB]

    20 18 16 14 12 20 30 40 50 60 70 NOISE DIRECTION [degrees] 80 Conventional array Conventional array with SS Proposed array WORD RECOGNITION RATE [%]

    Frame Length Frame Shift Window Feature Vector Vocabulary Grammar

    100 80 60 40 20 0 -10 -5 0 5 SNR of INPUT SIGNAL [dB] 10 Single microphone Conventional array Conventional array with SS Proposed array

    Fig. 7 Output SNR for di?erent noise directions (white Gaussian noise is used and the input SNR is 0 dB). The broken line shows the results of the optimized conventional DS array, the dash-dotted line shows the results of the DS-SS method, and the solid line shows the results of the proposed array.

    NOISE REDUCTION RATE [dB]

    21 20 19 18 17 -10 -5 0 5 SNR of INPUT SIGNAL [dB] 10 Conventional array Conventional array with SS Proposed array

    Fig. 9 Word recognition rate for di?erent input SNRs (white Gaussian noise is used and the noise direction is 50? ). The dotted line shows the results using a single microphone, the broken line shows the results of the optimized conventional DS array, the dash-dotted line shows the results of the DS-SS method, and the solid line shows the results of the proposed array.

    Fig. 8 Noise reduction rate for di?erent input SNRs (white Gaussian noise is used and the noise direction is 50? ). The broken line shows the results of the optimized conventional DS array, the dash-dotted line shows the results of the DS-SS method, and the solid line shows the results of the proposed array.

    tional DS array when the noise arrives from the direction in the mainlobe. However, when the noise exists in the direction at the sidelobes, i.e., 40–80?, an improvement of about 2 dB in output SNR can be obtained using the proposed array as compared with the conventional DS array. Also, it can be shown that the amount of improvement increases as the input SNR decreases. Although this is a common phenomenon in the conventional SS processing[20], there is no systematic reasoning of this phenomenon. In addition, it is shown that the improvements of the proposed array are same as or superior to those of the DS-SS method in the best condition. 4.2.2 Word Recognition Test

    test and training set are selected from the ASJ continuous speech corpus for research. The rest of conditions are summarized in Table 1. In this experiment, the noise direction is set to be 50? . Figure 9 shows the results of the word recognition rates for di?erent input SNRs. As shown in this ?gure, the recognition rate using a single microphone only is quite low, and the conventional DS array, DS-SS method and proposed array are e?ective to improve the word recognition rate. As compared with the results of the conventional DS array, the proposed method improves the recognition rate by more than 20% under ?5 and ?10 dB conditions. This indicates that the proposed array is applicable to the speech recognition system under noisy conditions, especially in low speech quality conditions. Also, the improvements of the proposed array are same as or superior to those of the DS-SS method in the best condition. 4.3 Experiment Using Nonstationary Noise 4.3.1 Human Speech-like Noise

    The HMM continuous speech recognition (CSR) experiment is performed in a speaker dependent manner. For the CSR experiment, 10 sentences of one female speaker are used as test data, and the monophone HMM model is trained by 140 phonetically balanced sentences. Both

    To evaluate the noise reduction ability of the proposed array for a nonstationary noise, speech enhancement experiments and speech recognition experiments are performed using the human speech-like noise (HSLN)[22]

    IEICE TRANS. FUNDAMENTALS, VOL. E00–A, NO. 1 JANUARY 1999

    8

    as an interfering noise. HSLN is a kind of bubble noise[23] generated by superimposing independent speech signals. By changing the number of superpositions, we can simulate the various noise conditions. For example, the HSLN of one or several superpositions can be considered as a nonstationary signal which sounds like a single speaker or the overlap of some speakers. When the number of superpositions is set to be some dozens, the HSLN becomes a nonstationary signal which sounds like the bubble noise. When the number of superpositions is more than some hundreds, the HSLN results in the colored stationary noise with preserving the long-term spectrum of human speech (see Fig. 10). 4.3.2 Word Recognition Test

    (a) Human speech-like noise (n=1)

    AMPLITUDE

    (b) Human speech-like noise (n=10)

    (c) Human speech-like noise (n=128)

    0

    0.2

    0.4

    0.6 0.8 1.0 TIME [sec]

    1.2

    1.4

    1.6

    The CSR experiment is performed using the HSLN in the same manner introduced in Sect. 4.2.2. In this experiment, the noise direction is set to be 50? and the input SNR is ?xed at 0 dB. Figure 11 shows the results of the word recognition rates with di?erent numbers of superpositions in HSLN. As compared with the results of the optimized conventional DS array, by applying the proposed method, it is shown that: (1) the improvement in recognition rate of about 5% is obtained when the HSLN of one or several superpositions is used as an interfering noise, (2) the improvement of more than 10% is obtained when the HSLN of several dozens of superpositions or more is used. The improvements of the proposed array are same as or superior to those of the DS-SS method when the number of superpositions of the HSLN is lower than 32. However, when the HSLN of more than 64 superpositions is used, the improvements of the DS-SS method exceed the those of the proposed method. As shown in these results, the proposed array is applicable to the speech recognition system under the nonstationary noise conditions. 5. Conclusion

    Fig. 10 Waveform examples of HSLN with di?erent numbers of superpositions n, (a) nonstationary signal sounds like a single speaker (n = 1), (b) nonstationary noise like a bubble noise (n = 10), (c) colored stationary noise (n = 128).

    WORD RECOGNITION RATE [%]

    100 80 60 40 20 0 1 10 100 NUMBER of SUPERPOSITIONS 1000 Single microphone Conventional array Conventional array with SS Proposed array

    In this paper, a new nonlinear microphone array based on the complementary beamforming technique and its optimization algorithm for directivity patterns were proposed. The speech enhancement experiments and speech recognition experiments were performed based on computer simulations using both stationary and nonstationary noises. From the experiments using the white Gaussian noise, compared with an optimized conventional DS array, it was shown that: (1) the proposed array can improve the output SNR by about 2 dB, (2) the proposed array can improve the word recognition rate by more than 20%, where the input SNR condition is ?5 or ?10 dB. Also, compared with the conventional SS method cascaded with the DS array (DS-SS method), in which the speech pause is provided, it was

    Fig. 11 Word recognition rate for di?erent numbers of superpositions in HSLN. The dotted line shows the results using a single microphone, the broken line shows the results of the optimized conventional DS array, the dash-dotted line shows the results of the DS-SS method, and the solid line shows the results of the proposed array. In this experiment, the noise direction is 50? and the input SNR is 0 dB.

    shown that the improvements of the proposed array are same as or superior to those of the DS-SS method especially for higher SNR conditions. From the experiments using the human speech-like noise as a nonstationary noise, compared with an optimized conventional DS array, it was also shown that: (1) the proposed array can improve the word recognition rate by about 5% when the interfering noise is a single speaker or the overlap of some speakers, (2) the proposed array can improve the word recognition rate by more than 10% when the noise is the nonstationary bubble noise. Also, compared with the DS-SS method, it was shown that the improve-

    SARUWATARI et al: SPEECH ENHANCEMENT USING COMPLEMENTARY BEAMFORMING MICROPHONE ARRAY

    9

    ments of the proposed array are same as or superior to those of the DS-SS method especially when the noise is nonstationary. Acknowledgement The authors are grateful to Dr. Mitsuo Komura in SECOM. CO., LTD., who is a co-proposer of the complementary beamforming technique, for his suggestions and discussions on this work.
    References [1] G.W. Elko, “Microphone array systems for handsfree telecommunication,” Speech Communication, vol.20, pp.229–240, 1996. [2] Y. Kaneda, “Microphone array technologies for speech recognition under noisy environment,” J. Acoust. Soc. Jpn., vol.53, no.11, pp.872–876, 1997 (in Japanese). [3] J.L. Flanagan, J.D. Johnston, R. Zahn, and G.W. Elko, “Computer-steered microphone arrays for sound transduction in large rooms,” J. Acoust. Soc. Am., vol.78, no.5, pp.1508–1518, 1985. [4] K. Kiyohara, Y. Kaneda, S. Takahashi, H. Nomura, and J. Kojima, “A microphone array system for speech recognition,” Proc. ICASSP 97, vol.1, pp.215–218, 1997. [5] M. Omologo, M. Matassoni, P. Svaizer, and D. Giuliani, “Microphone array based speech recognition with di?erent talker-array positions,” Proc. ICASSP 97, vol.1, pp.227– 230, 1997. [6] T. Yamada, S. Nakamura, and K. Shikano, “Hands-free speech recognition with talker localization by a microphone array,” Trans. of Information Process. Soc. Jpn., vol.39, no.5, pp.1275–1283, 1998 (in Japanese). [7] L.J. Gri?ths and C.W. Jim, “An alternative approach to linearly constrained adaptive beamforming,” IEEE Trans. Antennas & Propag., vol.AP-30, no.1, pp.27–34, 1982. [8] Y. Kaneda and J. Ohga, “Adaptive microphone-array system for noise reduction,” IEEE Trans. Acoust., Speech & Signal Process., vol.ASSP-34, no.6, pp.1391–1400, 1986. [9] S.F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech & Signal Process., vol.ASSP-27, no.2, pp.113–120, 1979. [10] H.Y. Kim, F. Asano, Y. Suzuki, and T. Sone, “Speech enhancement based on short-time spectral amplitude estimation with two-channel beamformer,” IEICE Trans. Fundamentals, vol.E79-A, no.12, pp.2151–2158, 1996. [11] J. Meyer and U. Simmer, “Multi-channel speech enhancement in a car environment using Wiener ?ltering and spectral subtraction,” Proc. ICASSP 97, vol.2, pp.1167–1170, 1997. [12] M. Mizumachi and M. Akagi, “Noise reduction by pairedmicrophones using spectral subtraction,” Proc. ICASSP 98, vol.2, pp.1001–1004, 1998. [13] H. Kobatake, W. Mori, and Y. Yano, “Super directive sensor array with neural network structure,” Proc. ICASSP 92, vol.2, pp.321–324, 1992. [14] H. Saruwatari and M. Komura, “Synthetic aperture sonar in air medium using a nonlinear sidelobe canceller,” IEICE Trans. A, vol.J81-A, no.5, pp.815–826, 1998 (in Japanese). [15] J.R. Deller, Jr., J.G. Proakis and J.H. Hansen, “DiscreteTime Processing of Speech Signals,” Macmillan, New York, 1993. [16] D.H. Johnson and D.E. Dudgeon, “Array Signal Processing: Concepts and Techniques,” Prentice-Hall, New Jersey,

    1993. [17] H. Saruwatari and M. Komura, “Complementary beamforming technique for arti?cial image reduction of sonar in air medium,” Proc. Spring Meet. Acoust. Soc. Jpn. 3-3-24, pp.573–574, 1997 (in Japanese). [18] A.L. Peressini, F.E. Sullivan and J.J. Uhl, Jr., “The Mathematics of Nonlinear Programming,” Springer-Verlag, New York, 1988. [19] T. Kobayashi, S. Itabashi, S. Hayashi, and T. Takezawa, “ASJ continuous speech corpus for research,” J. Acoust. Soc. Jpn., vol.48, no.12, pp.888–893, 1992 (in Japanese). [20] M. Ikeda, K. Takeda, and F. Itakura, “Speech enhancement by quadrature comb-?ltering,” Technical Report of IEICE, vol.DSP96, no.70, pp.23–30, 1996 (in Japanese). [21] S.B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust., Speech & Signal Process., vol.ASSP-28, no.4, pp.357–366, 1980. [22] S. Kajita, D. Kobayashi, K. Takeda, and F. Itakura, “Analysis of speech features included in human speech-like noise,” J. Acoust. Soc. Jpn., vol.53, no.5, pp.337–345, 1997 (in Japanese). [23] IEICE, ed., “Handbook for Electronics, Information and communication Engineers,” p.2220, Ohmsha, Tokyo, 1988 (in Japanese).

    Hiroshi Saruwatari was born in Nagoya, Japan, on July 27, 1967. He received the B.E. and M.E. degrees in electrical engineering from Nagoya University, Nagoya, Japan, in 1991 and 1993, respectively. He joined Intelligent Systems Laboratory, SECOM CO.,LTD., Mitaka, Tokyo, Japan, in 1993, where he engaged in the research and development on the ultrasonic array system for the acoustic imaging. He is currently a Ph.D. student of Department of Information Electronics, Graduate School of Engineering, Nagoya University. He is a member of the IEICE and the Acoustical Society of Japan.

    Shoji Kajita was born in Japan on April 20 1967. He received the Bachelor, Master, and Doctor of Information Engineering from Nagoya University, in 1990, 1992 and 1998 respectively. He is an Assistant Professor in Center for Information Media Studies of Nagoya University. His research interests include speech processing based on the human auditory system, especially its applications in speech representation and speech recognition.

    Kazuya Takeda was born in Sendai, Japan on September 1, 1960. He received B.E.E. and M.E.E. and Doctor of Engineering degrees all from Nagoya

    IEICE TRANS. FUNDAMENTALS, VOL. E00–A, NO. 1 JANUARY 1999

    10

    University in 1983, 1985 and 1994, respectively. In 1986, he joined ATR (Advanced Telecommunication Research Laboratories) where he involved in the two major projects of speech database construction and speech synthesis system development. In 1989, he moved to KDD R & D Laboratories and participated a project for constructing voice-activated telephone extension system. Since 1995, he has been working for Nagoya University as an Associate Prof.

    Fumitada Itakura was born in Toyokawa, Japan on August 6 1940. He received the B.E.E., M.E.E. and Doctor of Engineering degrees all from Nagoya University in 1963, 1965 and 1972, respectively. In 1968, he joined the Electrical Communication Laboratory of NTT, Musashino, Tokyo and participated in the speech processing, including the maximum likelihood spectrum estimation, the PARCOR method, the line spectrum pair method, the composite sinusoidal method, and the APC-AB speech coding. In 1981, he was appointed to Head of Speech and Acoustics Research Section of the ECL, NTT. In 1984, he left NTT to become a professor of Nagoya University, where he teaches courses of communication theory and signal processing. In 1975, he received IEEE ASSP Senior Award for his paper on speech recognition based on the minimum prediction residual principle. He is a co-recipient with B.S. Atal of 1986 Morris N. Liebmann Award for contributions to linear predictive coding for speech processing. In 1997, he received IEEE Signal Processing Society Award.


    推荐相关:

    Optimal Speech Enhancement Under Signal Presence Uncertainty ....pdf

    1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, ...

    comparative study of speech enhancement algorithms.doc

    1 Introduction In this paper,it discusses three types of speech enhancement ...signal-to-noise ratio estimation by using the information of front frame ...

    Wavelet Transform-Based Speech Enhancement_1998_图文.pdf

    paper describes a speech enhancement system using ...1. INTRODUCTION Degradation of speech quality ...Digital Signal Processing", Marcel Dekker, Inc.,...

    ...METHOD FOR TIME-DOMAIN MULTI-CHANNEL SPEECH ENHANCEMENT_....pdf

    [1] is one of the most popular speech enhancement technique because of ...This paper refers to a new methodology of processing multimicrophone speech ...

    A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION ....pdf

    Index Terms - speech enhancement, subspace, ...xed. The remainder of this paper is organized ...I) 1 , projects the vector into the signal ...

    Speech Enhancement Using Spectral Weighting with EMD-Based ....pdf

    paper presents a novel speech enhancement technique...second parameter of 978-1-4799-2465-3/13/$31...on Acoustic, Speech and Signal Processing, vol. ...

    ...recursive averaging for robust speech enhancement.pdf

    robust speech enhancement_电子/电路_工程科技_专业...9, NO. 1, JANUARY 2002 Noise Estimation by ...sections in the signal may not be sufficient, ...

    Speech Enhancement via Bayesian Multi-solution Shrinker_论文_....pdf

    Speech Enhancement via Bayesian Multi-solution Shrinker_电子/电路_工程科技_...step.Using speech data and calculating signa1.to noise ratio(SNR)and ...

    ...wavelet threshold function for speech enhancement.doc

    speech enhancement_电脑基础知识_IT/计算机_专业资料...one can not only retain large speech signal,but...2) The new threshold function by using Eq. (...

    Evaluation of Objective Measures for Speech Enhancement.pdf

    This paper reports the correlations of ?ve common...the speech signal alone using a ?ve-point scale...for evaluation of speech enhancement algorithms 1 ...

    Speech Enhancement Based on Spectral Estimation from Higher-....pdf

    In this paper, we propose a unique approach to... for speech enhancement is shown in Fig.1. ...” Acoustics, Speech and Signal Processing, vol....

    Optimal FIR subband beamforming for speech enhancement in ....pdf

    Optimal FIR subband beamforming for speech enhancement in multipath environments_电子/电路_工程科技_专业资料。IEEE SIGNAL PROCESSING LETTERS, VOL. 10, NO. ...

    REAL TIME SPEECH ENHANCEMENT FOR WIRELESS COIMMUNICATION ....pdf

    REAL TIME SPEECH ENHANCEMENT FOR WIRELESS COIM... "Digital Processing of Speech Signals," Prentice...be1iminary test results, using realistically ...

    A Kalman filtering noise canceler for PDC speech enhancement_....pdf

    Speech enhancement techniques suitable for digital ...In the following section, we discuss the speech ...Table 1 Description of the Signal-State Signal...

    ...Masking Model and its Application to Speech Enhancement_....pdf

    using the equation we have developed below: 1?...SPEECH ENHANCEMENT This section presents the ...on Digital Signal Processing and Communication ...

    Speech enhancement using nonlinear microphone array based on ....pdf

    Speech enhancement using nonlinear microphone array ...experiments, we conclude this paper in Section 4...(f ), and noise signals stDFT Gain 1 g ...

    A NOVEL SPEECH ENHANCEMENT SYSTEM BASED ON.pdf

    This paper introduces a novel speech enhancement ...(1) where s k is the clean speech signal, ...In section 2.3, we determined an initial ...

    ...“ULV-Based Signal Subspace Methods For Speech Enhancement....pdf

    In this paper the signal subspace approach for ... Section for Digital Signal Processing Technical ...One approach for nonparametric speech enhancement is...

    Text-Directed Speech Enhancement Using Phoneme Classi#cation ....pdf

    Text-Directed Speech Enhancement Using Phoneme ...The paper is outlined as follows. Section 2 ...Ct~ Cr ; (1) jCr j ~ ~ where Cr ...

    Speech enhancement using nonlinear microphone array based on ....pdf

    1. INTRODUCTION Speech enhancement in noisy ...on nonlinear array signal processing is proposed. ...experiments, we conclude this paper in Section 5...

    江苏七位体彩开奖结果 | 江苏七位体彩开奖结果
    All rights reserved Powered by 江苏七位体彩开奖结果 江苏七位体彩开奖结果 www.jwbw.net
    copyright ©right 2010-2021。
    文档资料库内容来自网络,如有侵犯请联系客服。[email protected]
  • 黄淳的专栏作者中国国家地理网 2019-06-12
  • 坚守传统手工包粽 探访沪上老字号粽子生产车间 2019-06-10
  • 满满的都是屏 OPPO妹子最爱手机曝光 2019-06-10
  • 北京进入旅游旺季 警察提示游客需防揽客者连环设套忽悠购物 2019-06-08
  • 我写文章不是为了别人的赞许,是为了讨论问题,让人有思考的价值,就像你网名一样,探寻真理。我并非就全盘赞成市场经济,只是在讨论它的合理性,在文中也提问,“既然我们 2019-06-08
  • 山东假存单揽储1.6亿难追回,该省另一相似案件银行被判赔 2019-05-31
  • 天上不会掉馅饼,想要富起来,就要把别人的据为己有,能把别人的据为己有的问世间能有谁,能有几人,所谓的专家明白了吗。 2019-05-31
  • 乌鲁木齐市中级人民法院庭审在线直播 2019-05-21
  • 第530期:为何吃新鲜蔬果能抗肿瘤、强免疫……?因为有“它”! 2019-05-19
  • 世界杯大中华区官方票代谈假票门黄牛倒票行为 2019-05-19
  • 英媒:漫步北京上海,仿佛踏入未来 2019-05-17
  • 2017年各地领导干部给网友书写23万封回信 2019-05-17
  • 回到1396年的波斯街头文章中国国家地理网 2019-04-23
  • 送你一份时代天街附近必吃的火锅名单 2019-04-23
  • 幻想“暴富式创业”侮辱谁的智商? 2019-04-13
  • 河南快赢481投注 3月7号pk10开奖结果 福建快三走势l图 黄大仙一肖中特诗 大乐透16021期重点号码 诈金花58w游戏大厅 德州扑克竞技还是违法 江西多乐彩历史开奖结果查询 竟彩计算器 河南快三和值走势图 广西11选5投注 有身边的人中彩票大奖 德甲20192019积分榜 双色球基本走势图表图中彩网 河北十一选五任六推荐