close

Вход

Забыли?

вход по аккаунту

?

JP2004128707

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2004128707
An automatic sound tracking of a video camera and directional audio reception is provided. The
algorithm of the present invention has three major steps. In the first stage, an array of three
microphones M1 to M3 receives audio signals from two main directions. The signal processing
unit 10 extracts information on the azimuth angle and the elevation angle of the incoming speech
signal. The second stage is mainly mechanical. Data on these angles are applied to the servo
mechanism 20, 30 which drives the video camera towards the audio signal source. The third step
is to form a directional beam of audio in the front space of the video camera to suppress
interference from unintended directions. The signal processing of this part is performed using
the signals received by the five microphones. Processing in the frequency domain is the main
means in this algorithm. [Selected figure] Figure 1
Directional voice receiving apparatus and method thereof
TECHNICAL FIELD [0001] The present invention relates to a directional voice reception method
and apparatus. In particular, the present invention relates to an automatic directional voice
reception method and apparatus for realizing directional voice reception. [0002] As a technique
related to automatic measurement of the position of a speaker (speaker), for example, (1)
Performance of 3D speaker position detection using a small-scale microphone array by Chan and
Wilson (P. S. Chang and A. N. Wilson, Jr., Performance of 3D speaker localization using a small
array of microphones, Conference Record of the Thirty-First Conference on Signals, Systems
Computers, Vol. 1, 1997, pp328332 ", 2) Yamada et al. "Speaker position by microphone array
Robust recognition of speech with detection (T. Yamada, S. Nakamura and K. Shikano, Robust
speech recognition with speaker localization by a microphone array, Proceedings of ICSLP 96,
Fourth International Conference on Spoken Language, Vol. 3, 1996, pp13171320 ") There is.
Thus, there are different methods to detect the arrival angle of the audio signal. Most of these
04-05-2019
1
methods are based on the calculation of time difference of arrival at two or more microphones.
The audio signal is a wide band signal of frequency components ranging from 20 Hz to 20 KHz.
This makes it difficult to estimate the direction of arrival (DOA) using conventional techniques.
Recently, several methods for position detection of a speaker have been proposed, but in the
prior art of (2) above, a delayer and a combined beamformer are used for signal processing of a
microphone array . This microphone array is composed of 14 microphones. As another method
classified into the estimation method using time delay in the speaker position detection, for
example, (3) robust speaker position detection classified as “time delay estimation by strobel
and ravenstein (N Strobel R. Rabenstein Classification of time delay estimates for robust speaker
localization, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 6,
1999, pp. 30813084.
Furthermore, several methods have been proposed for joint audio-video object position detection
and tracking. See, for example, Strobel, et al., "Joint Audio-Video Object Position Detection and
Tracking (N. Strobel, S. Spors and R. Rabenstein, Joint audio-video object localization and
tracking, IEEE, Vol. 18,Issue1,Jan2001,pp2231」である。
Furthermore, in Japanese Patent Application Laid-Open No. 2002-62348, an apparatus for
decomposing an incoming signal into various components within a predetermined bandwidth,
determining the correlation between these components, and estimating the position and direction
of a sound source. And the way is public. However, in the above-described conventional methods,
not only a large number of microphones are required, but there is also an inherent complexity
associated with these conventional methods. For this reason, it is difficult to realize economically,
is not highly reliable, and has limited durability against failure and the like. Three-dimensional
(3D) space with a small number of microphones to make usages such as video conferencing or
long distance learning economically feasible and to minimize operational failure Automatic
detection of the speaker's position at The present invention has been made in view of the
problems related to the above-described conventional technology, and is directed to a method
and an apparatus for receiving directional audio, and an automatic directional voice receiving
method and apparatus applicable to an imaging apparatus such as a video camera. It is desirable
to provide. SUMMARY OF THE INVENTION According to a preferred embodiment of the present
invention, voice reception that suppresses interference from a direction different from the
direction of the main sound source and detects an audio signal from the main sound source An
apparatus provided with a part is provided. The voice reception unit suppresses the interference
using a plurality of audio sensors, a direction detection unit that detects a direction toward the
main sound source, and a direction toward the detected main sound source, and a voice with less
interference And a signal processing unit that detects and outputs a signal. The plurality of audio
sensors detect an audio signal including a target component from the main sound source and an
interference component coming from a direction different from the main sound source
superimposed on the target component, and the direction detection unit The output from the
audio sensor, which is smaller in number than the plurality of audio sensors, is used to detect the
04-05-2019
2
direction in which the audio signal of the target component arrives, and the signal processing
unit detects the output from the plurality of audio sensors And an audio signal including only the
target component is output.
The method proposed in another preferred embodiment of the invention has three main stages.
In the first stage, an array of at least three microphones receives audio signals from two
directions in space. A signal processor extracts information on the incoming speech signal with
respect to azimuth and elevation. The second stage is mainly mechanical and applies the
extracted angle data to drive a servo mechanism which drives an imaging device, for example a
video camera, to direct it in the direction that the sound or audio signal arrives. In the third stage,
an audio beam is formed in the space in front of the video camera to reduce interference from
undesired directions. Signal processing in this portion is preferably performed using signals
received by five microphones. According to the invention, in addition to the above, an algorithm
is proposed for performing frequency domain processing. In another preferred embodiment
according to the invention, a combination of loudspeaker position detection and interference
suppression is proposed. Frequency domain processing is realized in the following three steps.
(1) Three microphones arranged in a triangular array configuration receive an audio signal. The
received speech signal is transformed into the frequency domain, and processing of the very
noisy speech signal to obtain the incoming elevation and azimuth of the speech signal is
performed. (2) The servomechanism system tracks the positional change of an arbitrarily moving
speaker by orienting the imaging device or the related sensor or device in the direction indicated
by the servo signal. (3) The three microphones and the two additional microphones described
above constitute a five-element receiver array system to attenuate interference received from
other directions. According to another preferred embodiment of the present invention, there is
provided an apparatus having a detection unit for detecting the position of a sound source. Here,
the detection unit includes three audio sensors that detect audio signals disposed at mutually
different positions that are not on one straight line, and two of the three audio sensors that are
the first combination. From a first signal processing unit that receives an output and calculates a
first angle with respect to the direction of arrival of the audio signal detected by the detection
unit, and two of the three audio sensors, which are the second combination And a second signal
processing unit configured to receive an output of the signal processing unit and to calculate a
second angle related to the direction of arrival of the audio signal detected by the detection unit.
The first combination is different from the second combination. Further, the first angle and the
second angle are independent of each other, and the direction of arrival of the detected audio
signal with respect to a plane including detection points at which the three audio sensors detect
the audio signal ( Define DOA).
Furthermore, the device according to the above embodiment includes a unit having directional
characteristics, a mechanism for movably supporting the unit, and a control unit that controls the
04-05-2019
3
operation of the mechanism. The control unit receives an output indicating the first and second
angles of the detected audio signal, and controls the operation of the mechanism using the
output. Here, the unit having the directional characteristic may include a sensor with directivity.
Further, the control unit controls the mechanism to direct a unit having the directivity to a sound
source from which a detected sound signal is emitted. The apparatus according to the above
embodiment may include an imaging device such as a camera as a unit having directivity. In the
device according to the above-described embodiment, each of the first and second signal
processing units may convert the time-series signal detected by the audio sensor into a frequency
domain. Further, each of the first and second signal processing units converts the outputs from
the corresponding two audio sensors into a frequency domain, and detects the phase difference
between the two outputs using the value after the frequency conversion. It may be configured as
follows. Further, in the above embodiment, one of the three audio sensors is disposed at a cross
point in two orthogonal directions, and the other one is disposed along one of the two directions.
The last one may be arranged along a direction different from the one direction. According to still
another preferred embodiment of the present invention, there is provided an apparatus having
an audio receiver capable of suppressing interference coming from a direction different from the
direction of the main sound source. The five audio sensors are disposed at mutually different
positions and detect five audio signals each including a target component from the main sound
source and an interference component superimposed on the target component. And a signal
processing unit that receives an output from the audio sensor and separates the interference
component from the target component. The five audio sensors are divided into a first
combination of groups and a second combination of groups, and the first and second
combinations each include three of the five audio sensors, of which The first and second
combinations of audio sensors are disposed along orthogonal first and second directions,
respectively. In the device of the above embodiment, the signal processing unit includes a first
conversion unit that converts the detected audio signal into a frequency domain, an operation
unit that acquires a target component in the converted frequency domain, and A second
conversion unit may be provided that outputs an audio signal of the target component from
which the interference component has been separated by converting the acquired target
component in the frequency domain into the time domain.
According to another preferred embodiment of the present invention, there is provided a voice
detector with directivity. The voice detection apparatus having this directivity receives at least
two audio sensors for detecting voice signals from a sound source, and outputs from two of the
at least two audio sensors, respectively, and each received output has a frequency. And a signal
processing unit configured to convert into a region and estimate an arrival direction of the
detected voice signal using the converted output. The arrival direction is defined by the angle
between the incident direction of the detected audio signal and the direction passing through
both detection points of the two audio sensors. According to yet another preferred embodiment
of the present invention, an apparatus for attenuating interference is provided. The apparatus for
04-05-2019
4
attenuating this interference comprises at least three audio sensors for detecting an audio signal
including a target component from a sound source and an interference component
corresponding to an audio signal coming from a direction different from the direction toward the
sound source; Receiving outputs from three of the three audio sensors, converting each received
output into a frequency domain, acquiring a target component in the converted frequency
domain, and converting the acquired target component into a time domain And a signal
processing unit that outputs an audio signal of the target component from which the interference
component has been separated. Here, the three audio sensors are arranged along a straight line
direction. The voice detection apparatus having directivity according to the above embodiment
includes a unit having directivity, a mechanism for movably supporting the unit having
directivity, and a control unit for controlling the operation of the mechanism. You may comprise
further. Here, the control unit receives an output from the voice detection device having the
directivity indicating the arrival direction of the detected voice signal, and uses the received
output for operation control of the mechanism. DESCRIPTION OF THE PREFERRED
EMBODIMENTS FIG. 1 is a schematic view of a system (apparatus 1) for detecting a direction of
arrival (DOA) of speech. The present apparatus 1 receives signals from the three microphones
M1, M2 and M3 as inputs, and transmits signal processing results of calculation results of θ and
φ of the azimuth and elevation to determine the incident angle of the incoming voice signal as
an output Including ten. This process is described in detail below. Note that in FIG. 1, the arrival
directions determined by the calculated or estimated angles θ and φ are described in more
detail below, but move the device such as the imaging device towards the sound source
determined by the arrival direction Or provide an input to determine the operation of an
actuator, such as a servomotor that determines the position.
The operation of signal processor 20, which may be illustrated by signal processing steps or
processes, will now be described. FIG. 2 shows the geometrical locations of the three
microphones M1, M2 and M3. Two of them determine the x-axis at arbitrary coordinates (0, 0, 0)
and (dx, 0, 0). The third microphone is disposed on the negative z-axis (0, 0, -dz). The
microphones are shown as M1, M2 and M3 in FIG. 3 (A) and FIG. 3 (B) respectively. The voice
signal arrival directions can be characterized by azimuth and elevation angles θ and φ,
respectively. The goal in the first stage is the estimation of θ and φ. This is done by converting
the somewhat complex configuration of FIG. 2 into two simpler configurations, FIGS. 3A and 3B,
which are quite similar in practice. Now, the relationship between φ x, θ z, θ and φ is
calculated. The comparison between FIG. 2, FIG. 3 (A) and FIG. 3 (B) is as follows. Further, <img
class = "EMIRef" id = "197894493-000003" /> [0030] Further, <img class = "EMIRef" id =
"197894949-00004" / From this, it follows that: img class = "EMIRef" id = "197894493000005" /> The estimation algorithm of φx and θz is very similar. Therefore, only the first one
will be described here. The signals arriving at the microphone elements M1 and M2 are denoted
by x1 (t) and x2 (t) respectively, and τφ is the movement delay between the signals at the two
microphone elements along the voice signal arrival direction Can be expressed as follows. In the
04-05-2019
5
frequency domain, X (f) and X (f) exp (−j2πfτφ), and the phase difference at that time is
obtained. Is accompanied by ψ (f) = 2πfτφ.
This indicates that there is a linear relationship between phase and frequency, and if phase
information is obtained at one or more frequencies, the delay τφ and the resulting angle φx
are obtained. A similar procedure can be applied to θz. If φx and θz are known, the angles φ
and θ are calculated from Equations (1) and (3). FIG. 4 shows a schematic diagram of a system
for finding an incoming signal corresponding to the signal processor 10 of FIG. In order to
estimate the exact values of φ and θ, the averaging process is performed on the output of the
phase detected between the two frequencies fl and fh. The constants used in FIG. 1 are kx = c /
(π (fl + fh) dx) and ky = c / (π (fl + f) dz), and the function is f (φ, θz) = tan <− 1> (1 / tan θz ·
sin φ) As a numerical example of the audio signal, it is fl = 1.5 KHz, fh = 3.5 KHz, that is, (fl + fh)
/2=2.5 KHz of the center frequency. The average phase difference belongs to this center
frequency. By multiplying the coefficients shown in FIG. 1 and applying an inverse sine function,
φx and θz are calculated. In the final stage, equations (1) and (3) are used to obtain the azimuth
and elevation angles φ and θ respectively. As a numerical example, calculation is performed by
considering an input signal in the time domain. <Img class = "EMIRef" id = "197894493-000007"
/> Here, ## EQU6 ## <img class = "EMIRef" id = "197894493-000008" The above is for the
common microphone element M1, and n1 (t) is a random noise of dispersion σ n <2>. The
average power of the pure signal x (t) is 7.327 dB, and σ n <2> is assumed to be, for example,
−2.673 dB, ie a signal to noise ratio (SNR) of 10 dB. The signals at microphone elements M2 and
M3 can be written as: <Img class = "EMIRef" id = "197894493-000009" /> [0048] <img class =
"EMIRef" id = "19789493-000010" /> Here, And τφ is defined by equation (4).
τθ = dy / csinθz, and n2 (t) and n3 (t) are independent random noise signals having the same
variance σ n <2>. FIG. 5 shows the amplitude of the frequency response of the signal received at
each microphone element. Where φ = 30 ° and θ = 110 ° corresponding to c = 340 m / s, φx
= −60 ° and θz = −36 °, sample frequency fs = 10000 Hz, dx = dz = c / fs There is. As the
noise power decreases, the curves of FIG. 5 converge into a single shape. The phase difference of
the signals x2-x1 and x3-x1 is shown in FIG. In the absence of any noise, these curves would be
two lines passing through the origin. As a result of this simulation, 29.4 ° and 109.8 ° were
obtained as estimates of φ and θ, respectively. An interference suppression method and
apparatus using five signal receiving elements in another embodiment to which the present
invention is applied will be described below. According to the present embodiment, by knowing
the DOA of the incoming speech signal, it is possible to operate the servo mechanism to direct
the camera to the speech source. Here, five microphones arranged in two dimensions are used
for interference suppression and directional beamforming (hereinafter abbreviated as
beamforming). FIG. 6 shows the proposed structure for a two-dimensional beamforming process.
Here, it is assumed that at least one interference signal comes from the direction of (φi, θi). The
04-05-2019
6
problem was divided into two simpler curves as described above, and only three microphone
elements M1, M2 and M4 were considered to illustrate the basic theory of interference
suppression in the frequency domain. Furthermore, in the present embodiment, interference is a
clear consideration. Therefore, instead of the noise component (random noise) considered in the
previous embodiment described above, it is considered as one interference signal in the present
embodiment. Thus, the signal received by the first microphone element M 1 is again denoted x 1
(t) and can be written as: Here, so1 (t) and si1 (t) are the target signal and the signal that
interferes with it, respectively. is there. The time delays .tau.o2 and .tau.o4 of the target signal
received at M2 and M4 for the microphone element M1 are respectively expressed as follows.
<Img class = "EMIRef" id = "197894949-000012" /> and then: ## EQU6 ## <img class =
"EMIRef" id = "19789493-00000013" /> Thus, the time function of the target signal at M2 and
M4 can be written as: <Img class = "EMIRef" id = "197894949-000014" /> [Equation 13] <img
class = "EMIRef" id = "19789493-00000015" /> Equations also apply to interference in M2 and
M4. <Img class = "EMIRef" id = "197894493-000016" /> [0085] <img class = "EMIRef" id =
"19789493-00000017" /> Here, Where φ x is the AOA (arrival angle) of the interference signal,
and τ i 2 and τ i 4 are the interference delays at M 2 and M 4 for microphone element M 1.
Thus, the signals at M2 and M4 can be expressed as: <Img class = "EMIRef" id = "197894949000018" /> [Equation 17] <img class = "EMIRef" id = "19789493-000019" /> Considering the
Fourier transforms of (9), (16) and (17), and substituting equations (12) to (15), the following
equation is obtained as a result. <Img class = "EMIRef" id = "197894493-000020" /> [Image
Omitted] <img class = "EMIRef" id = "19789493-00000021" /> Here, So1 (f), So2 (f), So4 (f), Si1
(f), Si2 (f), and so on. Si4 (f) is the Fourier transform of so1 (t), so2 (t), so4 (t), si1 (t), si2 (t), si4
(t), respectively.
The complex phase functions φx (f) and φxi (f) are respectively given by the following
equations. <Img class = "EMIRef" id = "19789493-000023" /> [Equation 22] <img class =
"EMIRef" id = "19789493-000024" /> When three equations (8), (19) and (20) are included,
there are three unknown complex numbers that must be obtained by solving the equations.
These unknowns are So1 (f), Si1 (f) and φx (f). The desired direction φx or φx (f) is the main
parameter of beamforming and is a known value and function. The following results are
obtained: If So1 (f) is calculated from the equation (23), So1 (f) of the inverse Fourier transform is
So1 (f) = (img class = "EMIRef" id = "19789493-000025" /> (T), ie the signal of interest is
separated from the interference Si1 (t). With some knowledge of the features in the frequency
domain of the incoming signal, the amount of computation required to obtain So1 (f) in equation
(23) can be reduced. In general, since there is no limitation on bandwidth, interference
suppression in broadband can be realized according to the method according to the present
embodiment. An exemplary configuration for realizing the interference canceller according to the
present embodiment is shown in FIG. At the outputs of the sensors (microphones) M1 to M5, the
04-05-2019
7
analog-to-digital converters (ADCs) 920-1 to 920-5 respectively sample corresponding received
signals and send them to input buffers 940-1 to 940-5. Assuming that K samples are acquired
during each T second at each sensor (microphone) M1 to M5, FFT (fast Fourier transform) of K
points is applied to these time samples. The inputs and outputs of FFT blocks 960-1-960-5 are
denoted xik and Xik, respectively, where i = 1, 2, 3, 4, 5 and k = 1, 2,. . . , K. As main calculations,
Bkx, Bkz, k = 1, 2,..., Where Eq. (23) is calculated for each of the K frequency components. . . , K is
performed in the block shown.
For example, the angles .phi.x and .theta.z of the target signal calculated in the process described
in FIG. 4 above are converted to the phases .phi.xk and .phi.zk (not shown) of the target signal,
respectively, according to the following equations. <Img class = "EMIRef" id = "197894493000026" /> [Equation 25] <img class = "EMIRef" id = "197894493-000027" /> Sample The
frequency of is equal to <Img class = "EMIRef" id = "197894493-000028" /> Here, when the
input signal is a real value, in order to reduce the calculation amount to 1⁄2, the FFT output It
goes without saying that symmetry can be used. k=1、2、... The output of the block
shown in FIG. 9 with Bkx and Bkz representing K is as follows. <Img class = "EMIRef" id =
"197894493-000029" /> [Equation 28] <img class = "EMIRef" id = "197894493-000030" />
Specific examples of preferred embodiments of the invention are described below. First Example
FIG. 10 shows an application to which the proposed method is applicable for automatic tracking
and directional voice reception. In this example, a directional voice reception function for an
imaging device (for example, a digital camera) mounted on a personal computer is provided. As
shown in the figure, the imaging device of this example is mounted on a servo mechanism that
drives the imaging device at an azimuth angle and an elevation angle. Five microphones are
mounted at regular intervals on the frame of the display. It is pointed out here that equidistant
mounting is for the purpose of simplification. The processing in the first stage of the algorithm of
this embodiment can be performed at a frequency which is repeated every one second, but in this
case, it is assumed that no large movement occurs during this time period. In fact, there is a
restriction that a mechanical system activated during the second phase of the present
embodiment and a feeling of inconvenientness caused by the sudden movement of the imaging
device occur.
Speech enhancement is a third stage function that includes all five microphones. Second Example
FIG. 7 is a block diagram of a video conferencing apparatus as a second example in the
embodiment of the present invention. The video conference apparatus includes microphones M1
to M5, a DOA calculation unit 10, an interference canceller 90, a camera 710, an actuator 30,
and an actuator control unit 20. The DOA calculation unit 10 receives the output from the
microphone, calculates and outputs the arrival directions φ and θ of the audio signal using the
above-described first embodiment (three microphones). The interference canceller 90 receives
the output from the microphones M1 to M5 and φ and θ from the DOA calculator 10 in order
04-05-2019
8
to perform interference suppression using the other embodiment (five microphones) described
above. The actuator control unit 20 receives φ and θ from the DOA calculation unit 10 so that
the actuator 30 directs the camera 710 toward the source of the sound signal captured by the
microphones M1 to M5, and sends control signals to the actuator 30. Output The video
conferencing apparatus further comprises a display 720, speakers 730, an MPEG processing unit
700, a bus 790, and a network interface 792 for interfacing with the Internet 794. The MPEG
processing unit 700 receives an audio signal output from the interference canceller 90 for
suppressing interference and receives an MPEG audio encoding unit 701 for encoding and an
MPEG video encoding unit 702 for receiving and encoding a video signal from the camera 710,
MPEG multiplex converter 703 for multiplexing encoded audio and video signals, packet
assembler 704 for assembling a packet consisting of multiplexed data output from the MPEG
multiplex converter 703, and bus interface for interfacing with the bus 790 Including 709. The
MPEG processing unit 700 decodes, via the bus interface 709, the packet disassembler 708 for
decomposing the packet sent from the outside, the MPEG demultiplexer 707 for demultiplexing
the data carried by the received packet, and the display 720 It further includes an MPEG video
composite unit 705 that outputs analog data, and an MPEG audio composite unit 706 that
outputs decoded audio data to a speaker 730. The video conferencing apparatus further includes
a mouse 740, a keyboard 750, an input / output interface 760, a memory 770, and a CPU 780
that executes a program for realizing the video conferencing function.
According to the video conference apparatus of FIG. 7 described above, video conferencing can
be performed with appropriate imaging of speakers (speakers) and clear speech signals with little
interference. Although the present invention has been described by the preferred embodiments
in consideration of the degree of uniqueness, the present invention can be modified, changed,
combined, or quasi-combined. Thus, it is to be understood that changes may be made in a
manner different from the specific description provided above without departing from the scope
of the invention. Thus, for example, in the preferred embodiment shown in FIG. 1 above, the
directional voice receiving system or method of the present invention has been described which
provides the actuator 20 with inputs of angles φ and θ. However, the actuator control unit 20
and the actuator 30 may be omitted and simplified in an alternative system, method, device or
means for performing functions similar to this. Further, in the second example of the present
embodiment described above, the DOA estimation block (signal processing unit 10) and the
interference suppression block 90 are shown as devices separated in the configuration of FIG.
The functional blocks may be integrated in the same block or physical structure such as an
encapsulation or a package. As an actual application example, it is possible to adopt the
automatic voice monitoring technology according to the present invention in a personal
computer such as a notebook computer, for example. Furthermore, it is also possible to input an
estimate of the direction of arrival of the voice into the actuator control and finally into the
actuator and to change the position of a device connected to the actuator, for example an
imaging device such as a camera in the present invention. . According to the preferred
04-05-2019
9
embodiment of the invention described above, a simple method based on speech processing is
provided which provides automatic tracking of the sound source. Three receive elements are
used for DOA estimation, with five elements emphasizing the sound from the direction to the
speech source. In this method, the number of microphones required for signal processing in the
three-dimensional frequency domain is minimized. BRIEF DESCRIPTION OF THE DRAWINGS FIG.
1 is a schematic diagram illustrating a system or apparatus 1 for detecting a direction of arrival
(DOA) of an audio signal according to a preferred embodiment of the present invention. FIG. 2
shows a three-dimensional configuration with the geometrical configuration of three
microphones M1, M2 and M3 in relation to the direction of arrival (DOA) of the audio signal
according to a preferred embodiment of the present invention. FIG. 3 (A): Two resolved directions
of arrival (DOA) of an audio signal in relation to the geometry of three microphones M1, M2 and
M3 according to a preferred embodiment of the present invention Shows a simpler twodimensional configuration of
FIG. 3 (B): Two simpler cases of resolving the direction of arrival (DOA) of the speech signal in
relation to the geometry of the three microphones M1, M2 and M3 according to a preferred
embodiment of the present invention 2 shows a two-dimensional configuration. 4 is a schematic
diagram of a system for locating an incoming signal corresponding to the signal processor 10 of
FIG. 1 according to a preferred embodiment of the present invention. FIG. 5 shows the amplitude
of the frequency response of an input signal of c = 340 m / s, sampling frequency at fs = 10000
Hz, dx = dz = c / fs and θ = 110 ° according to a preferred embodiment of the present
invention. FIG. 6 shows the phase difference between the signals received by the three
microphones M1, M2 and M3 according to a preferred embodiment of the present invention. FIG.
7 shows a block diagram of a videoconferencing device as a second example of a preferred
embodiment of the present invention. FIG. 8 shows a proposed structure for two-dimensional
beamforming according to a preferred embodiment of the present invention. FIG. 9 is a diagram
showing a configuration when an interference canceller according to a preferred embodiment of
the present invention is realized. FIG. 10 shows an application to which the proposed method of
automatic tracking and directional voice reception according to a preferred embodiment of the
present invention is applicable. DESCRIPTION OF CODES 1 system or apparatus 10 signal
processing unit 20 actuator control unit 30 actuator 90 interference canceller 202-1, 202-2,
202-3 analog-to-digital conversion unit 204-1, 204-2 ... input buffer unit, 206-1, 206-2, 206-3 ...
FFT conversion unit, 701, 702 ... MPEG audio code unit, 703 ... MPEG multiplexer, 704 ... packet
assembler, 705 ... MPEG video complex part 706: MPEG audio complex part 707: MPEG
demultiplexer 708: packet disassembler 709: bus interface 710: camera 720: display device 730:
speaker 740: mouse 750 ... Keyboard, 760 ... Input / output interface, M1, M2, M3, M4, M5:
Microphones.
04-05-2019
10
Документ
Категория
Без категории
Просмотров
0
Размер файла
26 Кб
Теги
jp2004128707
1/--страниц
Пожаловаться на содержимое документа