Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2008054071 PROBLEM TO BE SOLVED: In the conventional sound source separation method, it was not possible to remove noise which is a problem in a video conference, such as paper rubbing sound and the like, and sudden difference in elevation angle with voice is small. SOLUTION: In the present invention, using a plurality of microphone intervals and a plurality of sub microphone arrays, it is possible to estimate the arrival direction with high accuracy even if the direction difference between the sound sources is small. Hold the calculator. In addition, since the phase difference histogram calculation unit creates a histogram using only one frame of data, localization can be performed even for noise that suddenly occurs. [Selected figure] Figure 2 Paper scraper [0001] The present invention belongs to a high-speed, high-resolution sound source localization technique aimed at application to a voice communication apparatus such as a video conference system. [0002] Sound source localization technology for estimating the direction of arrival of sound source is an important technology that can be applied to learning of sound source separation filters and speaker direction identification processing for robots, and has been actively studied since the 1980s. 04-05-2019 1 The simplest source localization method is a method called a delay and sum array (see, for example, Non-Patent Document 1). The delay-sum array method is a very lightweight and highspeed method because it consists only of the process of multiplying the input signal by the weighting factor and adding. However, due to the low localization performance, when there are multiple sound sources, there is a problem that multiple sound source directions can not be localized accurately. Therefore, a high-accuracy sound source localization technology such as MUSIC (MUltiple SIgnal Classification) method (see, for example, Non-Patent Document 2) has been proposed, but it requires high-load processing such as eigenvalue calculation and one frame Because it is difficult to localize by the data of the above, it is not possible to localize the direction of the noise generated suddenly. Therefore, a sound source localization method that can be localized by only one frame of data, which is configured by a lightweight process that operates even with an embedded CPU, is required. Moreover, in the MUSIC method, there is a problem that the amount of processing increases in proportion to the search resolution of the sound source direction. The DUET method (see, for example, Non-Patent Document 3) has been proposed as a sound source localization method in which the processing amount is not proportional to the resolution without requiring heavy processing such as eigenvalue calculation. However, with the conventional DUET method, high-accuracy source localization becomes difficult when multiple sound sources are physically close to each other. [0003] Jiro Oga, Yoshio Yamazaki, Yutaka Kanada, "Sound System and Digital Processing," IEICE Information Society, 1995. Nobuyoshi Kikuma, "Adaptive Signal Processing with Array Antennas," Science and Technology Publishing, 1998. Oe. Yilmaz, and S. Rickard, "Blind Separation of Speech Mixtures via Time-Frequency Masking," IEEE Trans. SP, Vol. 52, No. 7, 2004. Akiko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino, "Direction Estimation of Sparse Signals Based on Clustering of Observed Signal Vectors," Proceedings of the National Meeting of the Acoustical Society of Japan in 2006, pp.615-616, 2006. [0004] The voice band of the teleconferencing system currently on the market has shifted from the conventional telephone band (4 kHz) to the wide band (7 khz), and in the future, it is expected to shift to a voice band comparable to higher quality CD quality. It is done. Wide-band voice can clearly hear high-frequency components of unvoiced consonant sounds and is easy to talk with, but since noise is also broadened in bandwidth, if noise is generated, it becomes voice that is 04-05-2019 2 hard to hear immediately. Has the problem of [0005] Therefore, the demand for noise suppression technology is increasing with the spread of broadband in voice communication devices such as video conferencing systems. In particular, it is required to suppress the sound of a paper rub and the sound of hitting a desk at the other party of the conference. A noise canceller is often used for the purpose of suppressing stationary noise such as the sound of air conditioners and projector fans. However, the conventional noise canceller has almost no suppression effect on sudden, high-power nonstationary noise such as paper scraping noise and the sound of tapping on a desk. We have developed a sound source separation technology that captures only the target sound by identifying the difference between the arrival directions of the noise and the target sound when the direction of arrival of the sound source is different for the purpose of suppressing sudden noise. doing. The separation performance of such a source separation method depends on the estimation performance of the noise and the arrival direction of the target sound. In other words, if the directions of arrival of the noise and the target sound can be accurately identified, the sound source separation performance is good. Conversely, when it is difficult to distinguish the direction of arrival, the sound source separation performance is poor. The location of the sound source, such as the paper scraping noise and the sound of tapping the desk, is usually on a desk, and the user's speech and the direction of arrival of these noises are usually 20 ° apart, and the difference is extremely small. Also, in a video conference, since it is necessary to minimize the delay of conversation, input speech must be processed quickly to generate output speech. Therefore, it is necessary to estimate the direction of the noise generated suddenly in a small number of frames. [0006] The outline of a representative invention disclosed in the present application is as follows. An acoustic signal processing apparatus having a phase difference histogram calculation unit characterized by sequentially improving localization accuracy and localizing directions of a plurality of sound sources from data of one frame by using a plurality of microphone pairs having different microphone intervals. [0007] In the broadband video conference, noise generated on the desk such as paper rubbing noise does not impair the ease of listening to the voice, and it is possible to conference with easy-to- 04-05-2019 3 hear voice. [0008] The hardware configuration of this embodiment is shown in FIG. The central processing unit 1 carries out all the calculations included in this embodiment. The recording device 2 is a work memory configured of, for example, a RAM, and all variables used when performing calculations are secured on the storage device 2. It is assumed that all data and programs used at the time of calculation are stored in the storage device 3 configured by, for example, a ROM. The microphone array 4 is composed of at least two or more microphone elements. Each microphone element measures an analog sound pressure value. The number of microphone elements is M. The A / D conversion device is a device that converts (samples) an analog signal into a digital signal, and is a device that can synchronously sample signals of M channels or more. The analog sound pressure value for each microphone element captured by the microphone array 4 is sent to the A / D converter 5. The A / D conversion device 5 converts the sound pressure value of each channel into digital data, and outputs the quantized sound pressure value. [0009] The sound pressure value of each channel converted into digital data is processed by the central processing unit 1 through the storage unit 2. The central processing unit 1 suppresses noise components such as paper rubbing noise from the sound pressure value for each channel using the information regarding the existence range of the target sound and noise stored in the storage device 3 and the like, and emphasizes the target voice. Generate a signal. [0010] A block diagram of the software of this embodiment is shown in FIG. The microphone array 4 is arranged in a straight line. The analog sound pressure value detected by the microphone array 4 is sent to the AD converter 6, and converted to digital data (Equation 1) for each channel. i is an index representing a channel. A vector having digital data of each channel as an element is described as (Equation 2). The equation (3) is sent to the Fourier transform unit 7. The Fourier 04-05-2019 4 transform unit 7 subjects digital data to Fourier transform for each microphone channel, and outputs a signal (Equation 4) in the frequency domain. (Equation 4) is a vector having a band division signal for each channel as an element, and is defined by (Equation 5). The Fourier transform is a short time Fourier transform. Let τ be the frame index of the Fourier transform. The frame size L and frame shift ST of the Fourier transform are set in advance. Hereinafter, the index τ representing the frame and the frequency f will be omitted unless it is necessary to be aware of the frame and the frequency, and the band division signal will be described as (Equation 6). The frequency band signal output from the Fourier transform unit 7 is sent to the phase difference histogram calculation unit 8. The phase difference histogram calculation unit 8 calculates the phase difference between the microphones of the frequency band signal by (Equation 7), and sequentially uses the calculated phase difference between the microphone pairs of the plurality of microphone pairs from the phase difference of the microphone pair with short microphone spacings. The accuracy of the phase difference is successively improved to generate a histogram of the phase difference after the accuracy improvement. The method of calculating the estimated value of the phase difference and the method of generating the histogram will be described later. [0011] [0012] [0013] [0014] [0015] [0016] [0017] [0018] 04-05-2019 5 The obtained histogram of the phase difference is sent to the paper rubbing sound power calculation unit 11. A physical space that is highly likely to generate paper scraping noise is set and placed in advance. Assuming that the azimuth angle of the sound source is θ, the set physical space is expressed as (Equation 8). Although the range is specified only for the azimuth angle here, the elevation angle and the distance may be restricted by the range. In the physical space, the possible value of the phase difference between the microphones is calculated by (Equation 9). di is the microphone interval of the ith microphone pair. The paper rubbing sound power calculation unit 11 adds P (δ) in the range of (Equation 9) and outputs the result as the paper rubbing sound power. Further, the paper rubbing sound power calculation unit 11 identifies a frequency band satisfying (Equation 9) as a band in which the paper rubbing sound is dominant from the estimated value of the phase difference for each frequency, and outputs an index of the frequency band. In the target sound power calculation unit 12, as in the case of the paper rubbing sound power calculation unit 11, a physical space in which the possibility of the target sound being generated is high is set in advance as (Equation 10). Although the range is specified only for the azimuth angle here, the elevation angle and the distance may be restricted by the range. In the physical space, the possible value of the phase difference between the microphones is calculated by (Equation 11). The paper scraping sound power is divided into a plurality of band groups, for example, the paper scraping sound power is calculated every 1000 Hz, in addition to the calculation from the frequency of the entire frequency band, and the paper scraping is performed for each divided band group. Sound power may be calculated. By dividing into a plurality of band groups in this manner, it is possible to more accurately estimate the paper rub 04-05-2019 6 sound power for each band group when the paper rub noise is biased to a part of the band groups, etc. Become. The target sound power calculation unit 12 adds P (δ) in the range of (Expression 11) and outputs the result as the target sound power. Further, the target sound power calculation unit 12 specifies a frequency band satisfying (Expression 11) from the estimated value of the phase difference for each frequency, and outputs the index of the frequency band. Similar to paper scraping sound power, target sound power is divided from a band into a plurality of band groups, for example, the target sound power is calculated every 1000 Hz, in addition to calculation from the frequency of all frequency bands. The target sound power may be calculated each time. [0019] [0020] [0021] [0022] [0023] The paper rubbing sound presence determination unit 10 calculates the value of (Expression 12) from the target sound power Psubject calculated by the target sound power calculation unit 12 and the paper rubbing sound power Pnoise calculated by the paper rubbing sound power calculation unit 11. If the calculated scale exceeds a predetermined threshold, it is determined that there is a paper rubbing noise. The paper rubbing sound presence determination unit 10 outputs the determination result as to whether or not the paper rubbing sound exists, and the determination result is sent to the sound source separation unit 9. 04-05-2019 7 When a band is divided into a plurality of band groups and paper scraping sound power and target sound power are calculated for each of the divided band groups, it is determined for each band group whether or not there is paper scraping sound, and band groups Output each judgment result. [0024] The sound source separation unit 9 uses the band division signal which is an output signal of the Fourier transform unit and the presence determination result of the paper rubbing sound to perform the process of removing the paper rubbing sound. Details of the removal process of the paper rubbing noise will be described later. The signal after the paper scraping noise removal processing and the determination result as to whether or not the paper scraping noise is present are sent to the dereverberation unit. The dereverberation unit removes the reverberation component of the paper rubbing sound from the signal S ^ (f, τ) after the paper rubbing sound removal processing based on the paper rubbing sound determination result of the paper rubbing sound presence determination unit 10. Dereverberation is performed by a spectral subtraction based method such as (Equation 13). Pecho is the power of the reverberation component of the paper rub. Floor is a function that returns 0 if the argument is less than or equal to 0, and returns the value of the argument if it is greater than or equal to 0. Pecho is updated according to (Equation 14). | N | is the amplitude spectrum of the paper rubbing sound for each frequency. When the paper rubbing sound power calculation unit 11 identifies the corresponding frequency as a band in which the paper rubbing sound is predominant, it is assumed that | N | = | X |. In other cases, | N | = 0. [0025] [0026] [0027] [0028] 04-05-2019 8 The speech (15) after reverberation component removal is sent to the inverse Fourier transform unit 14. The inverse Fourier transform unit 14 performs inverse Fourier transform on the speech after removing the reverberation component, and outputs a signal y (t) in the time domain. The frame size of the inverse Fourier transform is equal to the frame size in the Fourier transform unit. The time domain signal output from the inverse Fourier transform unit is sent to the superimposing and adding unit, and is superimposed and added according to the size of the frame shift, and a superimposed time domain signal y ^ (t) is output. [0029] [0030] FIG. 3 is a block diagram of the phase difference histogram calculation unit 8. The frequency domain signal output from the Fourier transform unit 7 is sent to the phase difference calculation unit 8-1. The phase difference calculating unit 8-1 first calculates phase differences of a plurality of microphone pairs. Assuming that the index of the microphone pair is i, the microphone interval of the microphone pair of index i is di. Further, the phase difference of the microphone pair of index i is described as δi. The incoming azimuth of the sound source is assumed to be θ. If there is no reverberation, reverberation and background noise, and there is only one sound source, then θ and δi are in the relation of (Equation 16). The phase difference calculating unit 8-1 calculates an estimated amount of the phase difference for each microphone pair according to (Expression 17). arctan is an inverse function of tan and is a function taking values from -π to + π. Therefore, δ ^ i also takes values from -π to + π. On the other hand, the true phase difference takes a value in the range of (Equation 18). Therefore, in the case of (Equation 19), δ ^ i can not cover the possible range of δi, and θ can not be determined. When δi takes a value in a range that δ ^ i can not cover, ambiguity that is an integral multiple of 2π occurs between δi and δ ^ i. Therefore, δi and δ ^ i are in the relationship of (Equation 20). The phase difference calculation unit 8-1 uses a short microphone interval to obtain n, and then 04-05-2019 9 uses a long microphone interval to obtain δ ^ i. By doing so, n can be obtained at short microphone intervals, so that the problem of ambiguity of integer multiples of 2π can be eliminated, and the phase difference between omnidirectional noises between microphones does not depend on microphone intervals, so The variation of δ ^ i does not depend on the microphone interval. Therefore, it is considered that the variation from the true value is smaller as sin θ obtained by (Equation 16) is longer. [0031] [0032] [0033] [0034] [0035] [0036] Therefore, it is possible to obtain a more accurate phase difference as compared with δi obtained using a short microphone interval. Here, microphone arrangement of a linear arrangement as shown in FIG. 4 is assumed. Select L microphone pairs from M microphone elements, and arrange L microphone pairs in ascending order of the microphone interval. The equation (23) is recursively executed from i = 0 to L−1 to obtain an estimated value δ ^ L−1 of the phase difference. 04-05-2019 10 The initial value of the microphone interval is (Equation 21), and the initial value of the phase difference is (Equation 22). [0037] The phase difference obtained by the above processing is sent to the histogram calculation unit 8-2, and the histogram represented by (Expression 24) is calculated. [0038] [0039] [0040] [0041] [0042] The scraping noise that is a problem during video conferencing is usually the noise generated on the desk. On the other hand, human voice occurs at a position where the elevation angle is higher than that on the desk. When a microphone array arranged in a straight line in the vertical direction as shown in FIG. 5 is placed on a desk, the sound source whose sound elevation angle (vertical angle is 0 °) is 90 ° or more is a paper rubbing sound, and the elevation angle is A sound source of 90 ° or less can be estimated to be human speech. Therefore, when the peak of the histogram calculated by the phase difference histogram calculation unit 8 stands in the range of the phase difference corresponding to an elevation angle of 90 ° or less using the estimated phase difference δ ^ L-1, the peak is a paper It can be 04-05-2019 11 considered to indicate the power of rubbing noise. By setting θ noise_min = 90 and θ noise_max = 180, the paper rubbing sound power calculation unit 11 can calculate the paper rubbing sound power. [0043] FIG. 6 is a diagram showing the data structure of the noise presence range and the sound presence range set by the user through the user interface. ”Ｎｏ．” Indicates the index of registered data. The "type" designates noise or speech, and if noise, it is specified whether it is sudden noise such as paper-scrubbing noise or stationary noise such as the operation noise of an air conditioner. "Range" is a column that specifies the range in which the sound source is present, and specifies the range of azimuth "θ" and "elevation" φ ". [0044] In the paper rubbing sound power calculation unit 11, a range designated in advance as in (Equation 8) may be set as the existence range of the paper rubbing sound, or using the data designated by the user through the user interface in the structure of FIG. It is also good. When the user registers two or more sudden noises, the paper rubbing sound power calculation unit 11 calculates Pnoise for each noise. Also, the frequency index within the noise source range of each noise is specified and output. [0045] Similarly, the paper rubbing sound presence determination unit 10 calculates a ratio for each noise, and separately performs a presence determination for each sudden noise. [0046] 04-05-2019 12 The sound source separation unit 9 calculates how to transmit the sound of each sound source (steering vector) according to (Equation 25) from the frequency components included in each sound source range of the target sound and noise. i is an index for a sound source, and "No." in the data structure of FIG. And one-to-one correspondence. Whether the noise is included in the sound source range of sudden noise or voice can be known from the frequency index output by the paper rubbing sound power calculation unit 11 and the target sound power calculation unit 12; It is determined for each frequency whether the condition 11) is satisfied, and the frequency component determined to be satisfied is regarded as a frequency component included in the sound source range of stationary noise. [0047] [0048] If the sound source direction of X is the range of the ith sound source, the steering vector of the ith sound source is updated by (Equation 25). The steering vector is not updated for the sound sources other than the i-th one. Further, the magnitude of the steering vector is normalized to 1 by (Equation 26). A matrix having a steering vector whose magnitude is normalized to 1 as an element is defined by (Equation 27) as A (f, τ). Calculate the generalized inverse matrix of A (f, τ) by (Equation 28). The sound source separation unit 9 generates three types of separated sounds using the generalized inverse matrix of A (f, τ) and A (f, τ). [0049] [0050] [0051] 04-05-2019 13 [0052] From these three types of separated sounds, an appropriate separated sound is selected and output for each time-frequency. [0053] The first separation sound is calculated by (Equation 29). In (Equation 30), it is classified to which sound source each time-frequency belongs. From the distributed result, a steering vector for separation is selected by (Equation 31), and separated sound is obtained by (Equation 32). This separated sound distributes the output signal to only one sound source at each timefrequency, and when there are two or more sound sources, the output sound may be distorted, but the noise suppression performance is compared with S1. It is expensive. In Equation (33), the power is distributed to one sound source for each time-frequency, and the separated sound is obtained from the power of the sound source subtracted from the input signal. Although this is less likely to distort the output sound as compared to S2, the noise suppression performance is lowered. Also, this separated sound is used only when the sound source to which the component is distributed for each time-frequency is noise. Alternatively, the output signal obtained may be used as a separated sound by using a dead square shaped beam former which forms a dead angle in the noise direction and forms a beam in the target sound direction. Further, noise removal processing by spectral subtraction may be added to the separated sound. At this time, the subtraction coefficient of the spectral subtraction may be linked to the ratio defined by (Equation 12), and the subtraction coefficient may be set to be larger as the ratio is larger. With such a configuration, it is possible to strongly suppress the paper rubbing noise only when the paper rubbing noise is present. [0054] 04-05-2019 14 [0055] [0056] [0057] [0058] [0059] FIG. 7 is a processing flow diagram from the paper scraping sound presence determination unit 10 to the sound source separation unit 9 and the dereverberation unit 13. In S1, if the ratio defined in (Equation 12) exceeds the predetermined threshold using the paper rubbing sound power and the target sound power, it is determined that the paper rubbing sound is present, and if it falls below the threshold, the paper rubbing sound It is determined that there is no When it is determined that the paper rubbing sound is present, the paper rubbing sound removal is performed. In the paper rubbing noise removal, the three separated sounds calculated by the sound source separating unit 9 are switched and used according to the result of the presence determination of the paper rubbing sound. For the frequency components distributed in the paper rubbing sound direction in (Equation 30), (Equation 32) is taken as the separated sound in the case where the paper rubbing sound exists. With respect to frequency components that are not distributed in the paper rubbing sound direction in (Equation 30), (Equation 29) is taken as the separation sound in the case where there is a paper rubbing sound. That is, when there is a paper rubbing noise, it is necessary to remove 04-05-2019 15 the paper rubbing noise as much as possible, so a strong suppression process is performed. If there is no paper scraping noise, the disturbance noise suppression processing is not performed and the input signal is output without processing. By doing this, it is less likely to distort the target sound when there is no paper rubbing noise. In addition, even when it is determined that the paper scraping noise does not exist, when the ratio exceeds a certain value, weak suppression processing may be performed based on (Expression 33). Further, in the case where the presence of stationary noise can be assumed, the configuration may be such that stationary noise is always suppressed using the separated sound of (Equation 29) even if there is no paper scraping noise. [0060] In the paper rubbing sound reverberation determination, after the paper rubbing sound is present, when it is not after a predetermined number of frames, it is determined that the dereverberation is performed. If there is a predetermined number of frames after the presence of the paper rubbing sound, it is determined that the dereverberation is not performed. If it is determined that dereverberation is to be performed, dereverberation processing is performed based on (Equation 13), and a signal after dereverberation is output. FIG. 8 shows a typical example of the time change of the amplitude value of the paper rubbing sound. Since the paper rubbing noise makes a small paper rubbing noise, echo, and reverberation after the direct sound is produced, it can be seen that the amplitude does not attenuate for a while. Therefore, it is effective to detect the direct sound of the paper scraping noise and then to perform the dereverberation processing for a while to strongly suppress the noise. [0061] FIG. 9 shows a comparison of the power spectra of human voice and paper rubbing sound. [0062] While the scraping noise has almost uniform power at all frequencies, the voice is biased to a relatively low band such as 1000 Hz or less. Therefore, even if the paper scraping sound power calculated from the signals of the entire band exceeds the target sound power calculated from the signals of the entire band, the target sound power of the audio is the paper scraping sound at frequencies lower than 1000 Hz. There is a possibility to surpass the power. In such a case, if strong interference noise suppression 04-05-2019 16 processing is performed for 1000 Hz or less, the sound may be distorted and the sound may be difficult to hear. The paper rubbing sound power calculation unit 11, the target sound power calculating unit 12, and the paper rubbing sound presence determination unit 10 calculate the paper rubbing sound power and the target sound power for each of a plurality of band groups, and determine whether the paper rubbing sound exists. By performing the determination for each band group and using the determination result and switching the separation method for each band group, a weak separated sound is selected for a band group in which voice is predominant, and a voice with less distortion is output. Is possible. [0063] Next, with respect to processing in the case of using an arrangement other than the linear arrangement as the microphone arrangement, a change in the process in the phase difference calculation unit 8-1 will be described. As an arrangement other than the linear arrangement, a method of using a plurality of concentric equilateral triangle arrangements shown in FIG. 10 and having different sizes will be described. The equilateral triangular microphone array 16 is used instead of the microphone array 4. The equilateral triangular microphone array 16 has a plurality of concentric equiangular sub microphone arrays 16-1 to U different in size. In the linear arrangement, localization is possible only in the range of -90 degrees to 90 degrees, but in the equilateral triangle arrangement, localization in all directions is possible from -180 degrees to 180 degrees. [0064] For U sub-microphone arrays arranged at the vertices of each regular triangle, indexes are assigned sequentially from the smallest size. L microphone pairs are selected for each sub microphone array. The physical position vector of the microphone element is described as P. [0065] For the i-th microphone pair of the l-th sub-microphone array, let two microphone elements be i0 and i1. At this time, the difference between the position vectors of the microphone pair is calculated by (Equation 34). Further, a matrix having elements of differences in position vectors of these microphone pairs is defined by (Equation 35). The pseudo inverse matrix of Dl is obtained by (Equation 36) and (Equation 37). A vector having, as an element, the phase 04-05-2019 17 difference for each of L microphone pairs of the l-th submicrophone is obtained from the input signal for each time-frequency by Eq. (38). If the microphone spacings of all the microphone pairs are c / 2f or less, the position vector of the sound source whose magnitude is normalized by 1 can be obtained by (Equation 39). The wider the distance between the microphones, the more accurate the estimation of the position vector of the sound source is. [0066] [0067] [0068] [0069] [0070] [0071] [0072] However, if even one microphone spacing exceeds c / 2f, the ambiguity of 2π of the phase occurs as in the linear arrangement, and the relationship between the sound source direction and r becomes (Equation 40), with an indefinite term n . Therefore, as in the linear arrangement, the indefinite term n is calculated by the sub microphone array with a short microphone interval, and the phase difference is calculated more accurately by the sub microphone array with a long microphone interval. The initial value of the phase indeterminate term is set to (Equation 41). 04-05-2019 18 The initial value of the vector r consisting of the phase difference for each microphone pair is set to (Expression 42). nl is a vector having, as elements, indeterminate terms of integer values shown in (Equation 43). For each sub microphone, calculate n1 that satisfies (Expression 44). 1Is a vector in which all elements have the value 1 as shown in (Equation 45). The phase vector after finding the indeterminate n is defined by (Equation 46). [0073] The phase vector after finding the indeterminate n in all the sub microphone arrays is calculated, and using the phase vector of the sub microphone array with the largest size, an estimated value of the sound source direction is obtained by Eq. The histogram calculation unit 8-2 calculates a histogram of the determined sound source direction. When the determined sound source direction satisfies (Expression 48), it can be determined that the frequency component belongs to the i-th sound source. [0074] [0075] [0076] [0077] [0078] [0079] 04-05-2019 19 [0080] [0081] [0082] [0083] It shows about the process at the time of using the several sub microphone array arrange | positioned concentrically. [0084] Consider placing the microphones concentrically as shown in FIG. [0085] Regarding the same circumferential microphone array 17, the microphone spacing of the microphone element 1 and the microphone element 2 and the microphone spacing of the microphone element 4 and the microphone element 5 and the microphone spacing of the microphone element 7 and the microphone element 8 are equal to d0, these three microphone pairs Let be the microphone pair of the 0th sub microphone array. Similarly, the microphone spacing of the microphone elements 2 and 3 and the microphone spacing of the microphone elements 5 and 6 and the microphone spacing of the microphone elements 8 and 9 are equal d1 and these three microphone pairs are the first sub microphone array And a microphone pair. Similarly, the microphone spacing of the microphone element 1 and the microphone element 3 and the microphone spacing of the microphone element 4 and the microphone element 6 and the microphone spacing of the microphone element 7 and the microphone element 9 are equal d2 and these three microphone pairs are the second sub microphone array And a microphone pair. 04-05-2019 20 It is assumed that d0 <d1 <d2. [0086] For these three sub-microphone arrays, as in the regular triangle arrangement, a phase vector with ambiguity resolved is obtained based on (Equation 44), and a sound source direction is obtained based on (Equation 47) from the phase vector. Sound source localization becomes possible. [0087] The figure which showed the hardware constitutions of this invention. Block diagram of the software of the present invention. The block diagram of the phase difference histogram calculation part of this invention. The layout of a linear microphone array. An example of arranging a microphone array on a desk. Structure of data set by the user regarding the type of noise of the present invention. The processing flow figure of the paper rubbing noise removal of this invention. The figure which showed the time change of the amplitude value of the paper rubbing sound. The comparison figure of the power spectrum of a voice, and the power spectrum of a paper ruble. 04-05-2019 21 The figure which showed one example of the equilateral triangle arrangement | positioning which can be used as a microphone array of this invention. The figure which showed one example of the same circumferential arrangement | positioning which can be used as a microphone array of this invention. Explanation of sign [0088] 1・· · Central processing unit, 2 · · · Storage device composed of RAM, etc · · · · Storage device composed of ROM etc, 4 ... Microphone array composed of at least two or more microphone elements, 5 ... A / D converter for converting analog sound pressure value to digital data, 6 ... A / D conversion means for converting analog sound pressure value to digital data, 7: digital data in time domain Band division means for converting data into digital data in the frequency domain, 8 ... signal processing means for calculating the phase difference of the band divided signals for each band and generating a histogram of the phase difference, 9 ... band division Sound source separation means for separating and extracting the target sound component from the signal, 10 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 12 paper rubbing noise presence determination means to determine whether paper rubbing noise is present for each frame Range of power 12. Means for estimating, 12 ... means for estimating the power in the range where the predetermined target sound exists, 13 ... dereverberation means for suppressing the reverberation component of noise from the signal after sound source separation, 14 ... dereverberation Inverse Fourier transform means for performing inverse Fourier transform on the subsequent signal and converting it to time domain signal, 15: superposition and addition means for superposing the inverse Fourier transformed signal at each frame shift, 16: plural subdivisions of an equilateral triangle An equilateral triangle microphone array having a microphone array, 17: a microphone array having a plurality of sub microphone arrays on the same circumference, S1: determination processing as to whether or not a paper rubbing sound is present, S2: paper rubbing Processing to determine whether or not reverberation exists based on whether it is within several frames after the sound exists. 04-05-2019 22

1/--страниц