close

Вход

Забыли?

вход по аккаунту

?

JP2012165189

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012165189
An object of the present invention is to provide a zoom microphone device capable of realizing
narrow directional speech enhancement technology having directivity that is sharper than that of
the prior art. SOLUTION: A surface of a fixed reflector is provided with one supporting structure,
2N fixed reflectors (N is an integer of 1 or more), and 2N microphone arrays in which a plurality
of microphones are linearly arranged. Microphone arrays are attached one by one to the surface
of the fixed reflector so that the microphone array direction of the microphone array is parallel
to one another, and the surfaces on which the microphone arrays of two fixed reflectors are
attached form 90 degrees with each other, Two fixed reflectors face each other so that the
microphone array direction of the microphone array attached to the fixed reflector of the above
and the microphone array direction of the microphone array attached to the other fixed reflector
make 90 degrees. A fixed set of fixed reflectors (N sets) is attached to the support structure.
[Selected figure] Fig. 18
Zoom microphone device
[0001]
The present invention relates to a zoom microphone device that implements a narrow range
speech enhancement technology (narrow directional speech enhancement technology) including
a desired direction.
[0002]
For example, in the case of taking a zoom-in image of a subject with a video camera (video
04-05-2019
1
camera or camcorder) equipped with a microphone, it is preferable for moving image shooting
that voices from only the vicinity of the subject are enhanced in conjunction with the zoom-in
shooting.
A technology (narrow directional speech enhancement technology) for emphasizing a narrow
range of speech including such a desired direction (target direction) has been conventionally
researched and developed. Note that the relationship between the direction around the
microphone and the sensitivity of the microphone is called directivity, and the sharper the
directivity in a certain direction, the narrower the range of voice including that direction is
emphasized, and the range other than that range is Speech can be suppressed. Here, a prior art
related to narrow directional speech enhancement technology <narrow directional speech
enhancement technology by selectively collecting a reflected sound> is illustrated. Note that, in
this specification, "voice" is not limited to human voices, but refers to "sounds" in general such as
musical tones and environmental noise as well as human and animal voices.
[0003]
<Narrow-Directed Speech Enhancement Technique by Selectively Collecting Reflected Sound> A
representative example of this technique is the multi-beam forming method (see Non-Patent
Document 1). The multi-beam forming method is a narrow directional speech enhancement
technology that can collect voice in the target direction with a high SN ratio by collecting
individual sounds such as direct sound and reflected sound, and it is a wireless field rather than a
voice field. Are well studied.
[0004]
The processing content of the multi-beam forming method in the frequency domain will be
described below. Before the explanation, define the symbol. Let the index of the frequency be ω,
and the index of the frame number be k. The frequency domain representation of an analog
signal received by M microphones is X <→> (ω, k) = [X 1 (ω, k),..., X M (ω, k)] <T>, direction θs
The arrival direction of the direct sound from the sound source to be emphasized is θs1, and the
arrival direction of the reflected sound is θs2,. T represents transposition, and R-1 is the total
number of reflected sounds. A filter that emphasizes the voice in the direction θsr is WW → (ω,
θsr). Here, r is an integer satisfying 1 ≦ r ≦ R.
04-05-2019
2
[0005]
In the multi-beam forming method, it is premised that the arrival directions and arrival times of
direct sound and reflected sound are known. In other words, the number of objects such as walls,
floors and reflectors for which the reflection of sound can be clearly predicted is equal to R-1.
Also, the number of reflected sounds R-1 is often set to a relatively small value of 3 or 4. This is
based on the high correlation between the direct sound and the low-order reflected sound. The
multi-beam forming method is a method of emphasizing each voice individually and
synchronously adding them, so the output signal Y (ω, k, θs) is given by equation (1). H
represents Hermite displacement.
[0006]
The delay synthesis method will be described as a design method of the filter W <→> (ω, θsr).
Assuming that the direct sound or the reflected sound is a plane wave, the filter W <→> (ω, θsr)
is given by equation (2). h <→> (ω, θsr) = [h1 (ω, θsr),..., hM (ω, θsr)] <T> is a propagation
vector of the voice coming from the direction θsr.
[0007]
Assuming that a plane wave arrives at a linear microphone array (a microphone array in which M
microphones are linearly arranged), an element hm (ω, θsr) constituting h <→> (ω, θsr) is an
expression It is given in 3). m is an integer that satisfies 1 ≦ m ≦ M. c represents the speed of
sound, and u represents the distance between adjacent microphones. j is an imaginary unit. τ
(θsr) represents the time delay for the direct sound of the reflected sound coming from the
direction θsr.
[0008]
Finally, by converting the output signal Y (ω, k, θs) into the time domain, a signal emphasizing
the voice of the sound source in the target direction θs can be obtained.
[0009]
The functional configuration of the narrow directional speech enhancement technology by the
multi-beam forming method is shown in FIG.
04-05-2019
3
[0010]
Step 1 The AD conversion unit 110 converts an analog signal, which is the output of the M
microphones 100-1, ..., 100-M, into a digital signal x <→> (t) = [x1 (t), ..., xM (t) Convert to <T>.
Here, t represents an index of discrete time.
[0011]
Step 2 The frequency domain transform unit 120 transforms the digital signal of each channel
into a frequency domain signal by a method such as high speed discrete Fourier transform.
For example, for the m-th (1 ≦ m ≦ M) microphone, N point signals x m ((k−1) N + 1),..., X m (k
N) are stored in a buffer. N is about 512 in the case of 16 KHz sampling. By performing highspeed discrete Fourier transform processing on the M channel analog signals stored in the buffer,
frequency domain signals X <→> (ω, k) = [X 1 (ω, k),..., X M (ω, k)] Get <T>.
[0012]
Step 3 Each emphasis filtering unit 130-r (1 ≦ r ≦ R) is a frequency domain signal X <→> (ω, k)
= [X 1 (ω, k),..., XM (ω, k)] <T The filter W <→ H> (ω, θsr) in the direction θsr is applied to>,
and a signal Zr (ω, k) in which the voice in the direction θsr is emphasized is output. That is,
each emphasis filtering unit 130-r (1 ≦ r ≦ R) performs the process represented by Expression
(4).
[0013]
Step 4 The addition unit 140 receives the signals Z1 (ω, k),..., ZR (ω, k) and outputs an addition
signal Y (ω, k). The addition process is expressed by equation (5).
04-05-2019
4
[0014]
Step 5 The time domain transformation unit 150 transforms the addition signal Y (ω, k) into the
time domain and outputs the time domain signal y (t) in which the voice in the direction θs is
emphasized.
[0015]
J. L. Flanagan, A. C. Surendran, E. E. Jan, "Spatially selective sound capture for speech and audio
processing," Speech Communication, Volume 13, Issue 1-2, pp. 207-222, October 1993.
[0016]
According to the narrow directional voice emphasizing technology described above, the voice in
the target direction is picked up at a high SN ratio so as not to be buried in the voice in directions
other than the target direction, and the voice in any direction is not required. It is possible to
emphasize, but it is difficult to achieve narrow directivity.
In particular, the human voice contains many frequency components of about 100 Hz to about 2
kHz, but according to the above-mentioned prior art, such a low frequency band with a sharp
pointing of about ± 5 ° to ± 10 ° with respect to the target direction It is difficult to realize
sex.
Under these circumstances, it is possible to pick up the sound with a sufficient SN ratio and
follow the voice in any direction without requiring physical movement of the microphone, while
pointing more sharply to the desired direction than before. There has not been a device suitable
for realizing a narrow directional speech enhancement technology with flexibility.
[0017]
Therefore, it is an object of the present invention to provide a zoom microphone device capable
of realizing narrow directional speech enhancement technology having directivity that is sharper
than that of the prior art.
[0018]
04-05-2019
5
The zoom microphone device of the present invention includes one support structure, 2N fixed
reflectors (N is an integer of 1 or more), and 2N microphone arrays in which a plurality of
microphones are linearly arranged.
In the zoom microphone device of the present invention, one microphone array is attached to the
surface of the fixed reflector so that the surface of the fixed reflector is parallel to the
microphone array direction of the microphone array, and the microphones of two fixed reflectors
The faces on which the array is mounted are at 90 degrees, and the microphone array direction
of the microphone array mounted on one fixed reflector is 90 degrees to the microphone array
direction of the microphone array mounted on the other fixed reflector. Thus, a set (N set) of
fixed reflectors in which two fixed reflectors face each other and is fixed is attached to the
support structure.
[0019]
According to the zoom microphone device of the present invention, it is possible to realize
narrow directional speech enhancement technology having directivity that is sharper than
before.
[0020]
The figure which shows the function structure of the narrow directivity speech enhancement
technique by the multi beam forming method as an example of a prior art.
(A) Diagram schematically showing that narrow directivity can not be realized sufficiently when
only direct sound is considered, (b) Schematically that narrow directivity can be realized
sufficiently when direct sound and reflected sound are considered Figure showing. FIG. 2
illustrates the direction dependency of coherence according to the prior art and the principles of
the present invention. The figure which shows the function structure of the narrow directivity
speech enhancement apparatus which concerns on embodiment. The figure which shows the
process sequence of the narrow directivity speech enhancement method which concerns on
embodiment. The figure which shows the structure of a 1st Example. The figure which shows the
experimental result of a 1st Example. The figure which shows the experimental result of a 1st
Example. The figure which shows the directivity by filter W <->> ((omega), (theta) in a 1st
Example. The figure which shows the structure of a 2nd Example. The figure which shows the
experimental result of a 2nd Example. The figure which shows the experimental result of a 2nd
04-05-2019
6
Example. The figure which shows the implementation structural example of this invention. (A)
Top view. (B) Front view. (C) Side view. (A) The side view which shows another implementation
structural example of this invention. (B) The side view which shows another implementation
structural example of this invention. The figure which shows the use form in the implementation
structural example shown in FIG.14 (b). The figure which shows the implementation structural
example of this invention. (A) Top view. (B) Front view. (C) Side view. The side view which shows
the implementation structural example of this invention. The front view which shows the
structure of a 3rd Example. The side view which shows the structure of 3rd Example. The front
view which shows the structure of the 1st modification of a 3rd Example. The front view which
shows the structure of the 2nd modification of a 3rd Example. The front view which shows the
structure of the 3rd modification of a 3rd Example. The front view which shows the structure of
the 4th modification of a 3rd Example. The front view which shows the structure of the 5th
modification of a 3rd Example.
[0021]
<< Principle >> The principle of the present invention will be described. The present invention is
based on the essence of microphone array technology that can follow voice in any direction
based on signal processing, and on the basis of picking up sound with high SN ratio by actively
utilizing reflected sound. One of the features is the combination of signal processing technology
that enables directivity.
[0022]
Prior to the explanation, define the symbol again. Since the discrete frequency index ω has a
relationship of ω = 2πf between the frequency f and the angular frequency ω, the discrete
frequency index ω may be identified with the angular frequency ω. With respect to ω, let
“index of discrete frequency” be simply “frequency”), and let k be the index of frame
number. The frequency domain representation of the kth frame of an analog signal received by
M microphones is X <→> (ω, k) = [X 1 (ω, k),..., X M (ω, k)] <T>, A filter that emphasizes the
frequency domain representation of the voice in the target direction θs at the frequency ω
when viewed from the center of the microphone array is W <→> (ω, θs). M is an integer of 2 or
more. T represents transposition. At this time, a frequency domain signal (hereinafter referred to
as an output signal) Y (ω, k, θs) in which the frequency domain representation of the voice in
the target direction θs is emphasized at the frequency ω is given by Equation (6). H represents
Hermite displacement.
04-05-2019
7
[0023]
Although "the center of the microphone array" can be arbitrarily determined, in general, the
geometric center of the arrangement of M microphones is "the center of the microphone array".
For example, in the case of a linear microphone array, the microphones at both ends In the case
of a planar microphone array in which the middle point of is the center of the microphone array
and is arranged in a square matrix of m × m (m <2> = M), for example, the positions where the
diagonals of the microphones at the four corners cross "Center of microphone array".
[0024]
There are various design methods for the filter W <→> (ω, θs), but here, the case based on the
minimum variance distortion response method (MVDR method) will be described.
In the minimum variance no-distortion response method, the filter W <→> (ω, θs) uses the
spatial correlation matrix Q (ω) under the constraint condition of equation (8) to generate
speech in a direction other than the target direction θs , The power of “a voice in a direction
other than the target direction θs” is also referred to as “noise” is designed to be minimum
at the frequency ω (see equation (7)). a <→> (ω, θs) = [a1 (ω, θs),. The transfer characteristic
at frequency .omega. In other words, a <→> (ω, θs) = [a1 (ω, θs),..., AM (ω, θs)] <T>
represents the voice of the direction θs to each microphone included in the microphone array. It
is a transfer characteristic at frequency ω.
[0025]
It is known that the filter W <→> (ω, θs) which is the optimum solution of the equation (7) is
given by the equation (9). (Reference 1) Simon Haykin, Suzuki Hiro et al., "Adaptive Filter
Theory", First Edition, Science and Technology Publishing Co., Ltd., 2001. pp. 66-73, 248-255
[0026]
As can be understood from the inverse matrix of the spatial correlation matrix Q (ω) contained in
the equation (9), it can be seen that the structure of the spatial correlation matrix Q (ω) is
important in realizing sharp directivity. Also, it can be seen from equation (7) that the power of
04-05-2019
8
the noise depends on the structure of the spatial correlation matrix Q (ω).
[0027]
Let a set to which the index p of the direction of arrival of noise belongs be {1, 2,..., P−1}. It is
assumed that the index s of the target direction θs does not belong to the set {1, 2,..., P−1}.
Assuming that P−1 noises come from any direction, the spatial correlation matrix Q (ω) is given
by equation (10a). From the viewpoint of making a sufficiently functional filter even in the
presence of a large amount of noise, P is preferably a somewhat large value, and is an integer of
about M. Here, from the viewpoint of clearly explaining the principle of the invention, the target
direction θs is described as if it is a specific direction (therefore, a direction other than the target
direction θs is the direction of “noise”). As will be apparent from the embodiments of the
present invention, in practice, the target direction θs is an arbitrary direction that can be the
target of speech enhancement, and a plurality of directions can generally be assumed as the
direction that can be the target direction θs. From this point of view, the distinction between the
target direction θs and the direction of the noise is almost subjective, and it is possible to
distinguish P as a plurality of directions assumed as the arrival direction of the voice regardless
of whether it is the target sound or the noise. It is more accurate to understand that the direction
is predetermined, and one of the P directions selected is the target direction and the other
direction is the noise direction. Then, assuming that a union of a set {1, 2,. Transfer
characteristics of voice in each direction θφ to each microphone by a <→> (ω, θφ) = [a1 (ω,
θφ),. Is a spatial correlation matrix represented by equation (10b). Note that | Φ | = P. | Φ |
represents the number of elements of the set Φ.
[0028]
Here, the transfer characteristics a <→> (ω, θs) of the voice in the target direction θs and the
transfer characteristics a <→> (ω, θp) of the voice in the direction p∈ {1, 2,. = [a1 (ω, θp), ...,
aM (ω, θp)] <T> is assumed to be orthogonal to each other. In other words, it is assumed that
there are P orthogonal basis sets that satisfy the condition expressed by equation (11). The
symbol 表 す represents orthogonality. When A <→> ⊥B <→>, the inner product value of the
vector A <→> and the vector B <→> is zero. Here, it is assumed that P ≦ M is satisfied. Note that
if it is assumed that there are P basis sets that can be regarded as approximately orthogonal basis
sets by relaxing the condition represented by equation (11), then P is about M, or is somewhat
larger than M It is preferable that it is a value.
04-05-2019
9
[0029]
At this time, the spatial correlation matrix Q (ω) can be expanded as shown in equation (12).
Expression (12) is a matrix V (ω) = [a <→> (ω, θs), a <→> (ω, θ1),. It means that the spatial
correlation matrix Q (ω) can be decomposed by →> (ω, θP−1)] <T> and the unit matrix Λ (ω).
ρ is an eigen value of the transfer characteristic a <→> (ω, θφ) satisfying the equation (11)
based on the spatial correlation matrix Q (ω) and is a real number.
[0030]
At this time, the inverse matrix of the spatial correlation matrix Q (ω) is given by equation (13).
[0031]
Substituting equation (13) into equation (7), it can be seen that the noise power is minimized.
When the noise power is minimized, directivity for the target direction θs is realized. Therefore,
the establishment of orthogonality between transfer characteristics in different directions is an
important condition for achieving directivity with respect to the target direction θs.
[0032]
Hereinafter, the reason why it is difficult to achieve sharp directivity with respect to the target
direction θs in the prior art will be discussed.
[0033]
In the prior art, filters were designed on the assumption that the transfer characteristics
consisted of only direct sound.
In reality, there is a reflected sound that the sound emitted from the same sound source is
reflected by the wall, ceiling, etc. and reaches the microphone, but the reflected sound is
considered as a factor that degrades directivity, and the presence of the reflected sound is
ignored. It is Assuming that the steering vector of only direct sound coming from the direction θ
04-05-2019
10
is h <→> d (ω, θ) = [hd1 (ω, θ),..., HdM (ω, θ)] <T> Properties a <→> conv (ω, θ) = [a1 (ω,
θ),..., AM (ω, θ)] <T>, a <→> conv (ω, θ) = h <→> d ( ω, θ). The steering vector is a complex
vector in which phase response characteristics at the frequency ω of each microphone with
respect to the reference point are arranged for the sound wave in the direction θ viewed from
the center of the microphone array.
[0034]
Assuming that the voice arrives as a plane wave in the linear microphone array, the m-th element
hdm (ω, θ) that constitutes the steering vector h <→> d (ω, θ) of the direct sound is given by,
for example, equation (14a) . m is an integer that satisfies 1 ≦ m ≦ M. c represents the speed of
sound, and u represents the distance between adjacent microphones. j is an imaginary unit. The
reference point is half the length of the linear microphone array (the center of the linear
microphone array). The direction θ is defined as an angle formed by the arrival direction of the
direct sound and the arrangement direction of the microphones included in the linear
microphone array when viewed from the center of the linear microphone array (see FIG. 6). Note
that there are various ways of expressing a steering vector. For example, assuming that the
reference point is the position of the microphone at one end of the linear microphone array, the
m-th to construct the steering vector h <→> d (ω, θ) of direct sound The element hdm (ω, θ) of
is given by, for example, equation (14b). Hereinafter, the m-th element hdm (ω, θ) constituting
the steering vector h <→> d (ω, θ) of the direct sound will be described as given by the equation
(14a).
[0035]
The inner product value γconv (ω, θ) of the transfer characteristic in the direction θ and the
transfer characteristic in the target direction θs is expressed by equation (15). Note that θ ≠
θs.
[0036]
Hereinafter, γconv (ω, θ) is referred to as coherence. The direction θ in which the coherence
γconv (ω, θ) becomes 0 is given by equation (16). q is any integer except 0. Further, since 0
<θ <π / 2, the range of q is limited for each frequency band.
04-05-2019
11
[0037]
In the equation (16), only the parameters (M and u) related to the size of the microphone array
can be changed. Therefore, when the difference in direction (angle difference) | θ−θs | It is
difficult to reduce the coherence γconv (ω, θ) without changing the parameters related to the
size. In this case, the power of noise does not become sufficiently small, and as schematically
shown in FIG. 2A, directivity becomes wide with a wide beam width with respect to the target
direction θs.
[0038]
On the other hand, according to the present invention, based on such a consideration, for filter
design to have sharp directivity with respect to the target direction θs, sufficient coherence is
obtained even when the difference in direction (angle difference) | θ−θs | Unlike the prior art,
it is characterized in that the reflected sound is actively considered, based on the finding that it is
important to be able to reduce the
[0039]
In each microphone of the microphone array, two types of plane waves of a direct sound from a
sound source and a reflected sound where the sound from the sound source is reflected by the
reflector 300 are mixed.
The number of reflections is considered. Ξ is one or more predetermined integers. At this time,
in the transfer characteristic a <→> (ω, θ) = [a1 (ω, θ),..., AM (ω, θ)] <T>, the voice in the
direction in which the speech can be enhanced is The sum of the direct sound transmission
characteristics that can be reached directly and the transmission characteristics of one or more
reflected sounds that are reflected by the reflecting object to the microphone array, specifically,
the direct sound and the first sound (1 ≦ ξ ≦) Assuming that the arrival time difference of the
reflected sound of Ξ) is τξ (θ), and αξ (1 ≦ ξ ≦ Ξ) is a coefficient for considering the
attenuation of the sound due to the reflection, as in equation (17a), It can be expressed by the
sum of a steering vector, and a steering vector of a number of reflections of which the sound
attenuation due to reflection and the arrival time difference with respect to the direct sound are
corrected. h <→> rξ (ω, θ) = [hr1ξ (ω, θ),..., hrMξ (ω, θ)] <T> represents the steering
vector of the reflected sound corresponding to the direct sound in the direction θ. αξ (1 ≦ ξ
≦ Ξ) is usually αξ ≦ 1 (1 ≦ ξ ≦ Ξ). For each reflection, if the number of reflections from
04-05-2019
12
the sound source to the microphone is one, then αξ (1 ≦ ξ ≦ Ξ) represents the reflectance of
the sound of the object on which the second reflection is reflected. It is safe to think that
[0040]
Because it is desirable to provide one or more reflections to a microphone array comprised of M
microphones, it is preferred that one or more reflectors be present. From this point of view,
assuming that the sound source exists in the target direction, the positional relationship between
the sound source, the microphone array, and one or more reflectors means that the sound from
the sound source is reflected by at least one reflector Preferably, each reflector is arranged to
reach the array. The shape of each reflector is a two-dimensional shape (for example, a flat plate)
or a three-dimensional shape (for example, a parabolic shape). The size of each reflector is
preferably equal to or greater than that of the microphone array (about 1 to 2 times). In order to
make effective use of the reflected sound, the reflectance αξ (1 ≦ ξ ≦ Ξ) of each reflector is
at least greater than 0, and further, the amplitude of the reflected sound reaching the
microphone array is that of the direct sound. For example, 0.2 times or more of the amplitude is
desirable, for example, each reflector is a solid having rigidity. The reflector may be a movable
object (eg, a reflector) or an immovable object (floor, wall or ceiling). Note that if an immovable
object is set as a reflector, it is necessary to change the steering vector of the reflected sound as
the installation position of the microphone array is changed (see the function Ψ (θ) or Ψξ (θ)
described later) ) And, consequently, recalculation (re-setting) of the filter calculation is required.
Therefore, in order to be robust against environmental changes, it is preferable that each
reflector be a follower of the microphone array (in this case, it is considered that the assumed
number of reflected sounds is due to each reflector). become). Here, "a subject of the microphone
array" is a "physical object that can follow changes in the position, orientation, etc. of the
microphone array while maintaining the arrangement relationship (geometrical relationship)
with the microphone array". A simple example is a configuration in which each reflector is fixed
to the microphone array.
[0041]
Hereinafter, from the viewpoint of specifically explaining the advantages of the present
invention, it is assumed that 反射 = 1, the number of reflections of the reflected sound is one, and
one reflector is present at a distance of L meters from the center of the microphone array.
Assume. The reflector is a thick rigid body. In this case, since Ξ = 1, equation (17a) can be
expressed as equation (17b) as a subscript representing this is abbreviated.
04-05-2019
13
[0042]
The m-th element of the steering vector h <→> r (ω, θ) = [hr1 (ω, θ),..., HrM (ω, θ)] <T> of the
reflected sound represents the steering vector of the direct sound (See equation (14a)) in the
same manner as in the above equation (18a). The function Ψ (θ) outputs the arrival direction of
the reflected sound. When the steering vector of the direct sound is expressed by equation (14b),
the steering vector of the reflected sound h <→> r (ω, θ) = [hr1 (ω, θ),..., HrM (ω, θ) The m-th
element of <T> is expressed by equation (18b). In general, the m-th steering vector h <→> r ξ
(ω, θ) = [hr 1 ξ (ω, θ),..., Hr M ξ (ω, θ)] <T> for the ξ-th (1 ξ ξ Ξ Ξ) The element of is
expressed by equation (18c) or equation (18d). The function Ψξ (θ) outputs the arrival
direction of the ξ-th (1 ≦ ξ ≦ Ξ) reflected sound.
[0043]
Since the position of the reflector can be set appropriately, the arrival direction of the reflected
sound can be treated as a changeable parameter.
[0044]
Assuming that the flat reflector is in the vicinity of the microphone array (the distance L is not
extremely large compared to the size of the microphone array), the coherence γ (ω, θ) is
expressed by equation (19).
Note that θ ≠ θs.
[0045]
From equation (19), it can be seen that the coherence γ (ω, θ) of equation (19) may be smaller
than the conventional coherence γconv (ω, θ) of equation (15). Since the parameters (Ψ (θ)
and L) which can be changed depending on the way of reflecting object are present in the second
to fourth items of the equation (19), h <→> d <H> (ω, θ) of the first item h) → d (ω, θ) may be
removed.
04-05-2019
14
[0046]
For example, if a flat reflector is arranged such that the array direction of the microphones is the
normal to the reflector for a linear microphone array, then Ψ (θ) = π-θ holds for the function
Ψ (θ), and direct Since equation (20) holds for the arrival time difference τ (θ) between the
sound and the reflected sound, the conditions of equations (21) and (22) are generated as
elements constituting equation (19). The symbol * is an operator representing a complex
conjugate.
[0047]
The absolute value of h <→> d <H> (ω, θ) h <→> r (ω, θ) is h <→> d <H> (ω, θ) h <→> d (ω,
θ) The coherence γ (ω, θ) can be approximated as in equation (23), ignoring the second and
third terms of equation (19), because it is sufficiently smaller than the above.
[0048]
Even if h <→> d <H> (ω, θ) h <→> d (ω, θ) ≠ 0, the approximate coherence γ ~ (ω, θ) is the
minimal solution θ of equation (24) Have.
q is any positive integer. Also, the range of q is limited for each frequency band.
[0049]
That is, the coherence can be suppressed not only in the direction given by equation (16) but also
in the direction given by equation (24). If the coherence can be suppressed, the power of noise
can be further reduced, so that sharp directivity can be realized as schematically shown in FIG. 2
(b).
[0050]
Although FIG. 2 schematically shows the difference in directivity according to the principle of the
present invention and the prior art, FIG. 3 is given by θ given by equation (16) and equation
04-05-2019
15
(24). Specifically, the difference of the calculated θ is shown. ω = 2π × 1000 [rad / s], L = 0.70
[m], θs = π / 4 [rad]. In FIG. 3, the direction dependency of the normalized coherence is shown
for comparison of the two, and the direction indicated by the symbol θ is θ given by equation
(16), and the direction indicated by the symbol + is This is θ given by equation (24). As apparent
from FIG. 3, according to the prior art, θ in which the coherence is zero with respect to θs = π
/ 4 [rad] is only the direction indicated by the symbol 、, but according to the principle of the
present invention θ with zero coherence for θ s = π / 4 [rad] exists in a number of directions
indicated by the symbol +, and in particular, θs = π / 4 [rad] than in the direction indicated by
the symbol ○ It can be seen that a sharp directivity is achieved compared to the prior art, as
there is a direction indicated by the symbol + in a much closer direction.
[0051]
As apparent from the above description, the gist of the feature of the present invention is that the
transfer characteristic a <→> (ω, θ) = [a1 (ω, θ),..., AM (ω, θ)] <T> For example, as shown in
equation (17a), it is expressed by the sum of the steering vector of the direct sound and the
steering vector of the 反射 number of reflections. Therefore, since the filter design concept itself
is not affected, the filter W <→> (ω, θs) can be designed by a method other than the minimum
variance non-distortion response method.
[0052]
As methods other than the minimum variance non-distortion response method, a filter design
method based on an SN ratio maximizing criterion and a filter design method based on power
inversion will be described. Refer to Reference 2 for the filter design method based on the SN
ratio maximization criterion and the filter design method based on power inversion. (Reference 2)
Nobuyoshi Kikuma, "Adaptive Antenna Technology", 1st Edition, Ohm Co., Ltd., 2003, pp. 35-90
[0053]
<1> Filter design method by SN ratio maximization criteria In filter design method by SN ratio
maximization criteria, filter W <→> (ω, θs) is a criterion that maximizes the SN ratio (SNR) in
the target direction θs. decide. A spatial correlation matrix of voice in the target direction θs is
Rss (ω), and a spatial correlation matrix of voice in a direction other than the target direction θs
is Rnn (ω). At this time, SNR is expressed by equation (25). R ss (ω) is expressed by equation
04-05-2019
16
(26), and R nn (ω) is expressed by equation (27). The transfer characteristic a <→> (ω, θs) = [a1
(ω, θs),..., AM (ω, θs)] <T> is represented by the equation (17a) (correctly, the equation (17a)
Of θ is θs)).
[0054]
The filter W <→> (ω, θs) that maximizes the SNR in Equation (25) can be obtained by setting
the gradient for the filter W <→> (ω, θs) to zero, that is, Equation (28) .
[0055]
Thus, the filter W <→> (ω, θs) that maximizes the SNR in equation (25) is given by equation
(29).
[0056]
Although the inverse matrix of the spatial correlation matrix R nn (ω) of speech in directions
other than the target direction θ s is included in Equation (29), the inverse matrix of R nn (ω) It
is known that the inverse of the spatial correlation matrix Rxx (ω) of the entire input including
speech in other directions may be substituted.
In addition, it is Rxx ((omega)) = Rss ((omega)) + Rnn ((omega)) = Q ((omega)) (refer Formula
(10a), Formula (26), Formula (27)).
That is, the filter W <→> (ω, θs) that maximizes the SNR in equation (25) may be determined by
equation (30).
[0057]
<2> Filter Design Method Based on Power Inversion In the filter design method based on power
inversion, the filter W <→> is a criterion that minimizes the output power with the filter
coefficient for one microphone fixed at a constant value. Determine ω, θs). Here, as an example,
the filter coefficient for the first microphone among the M microphones is described as being
fixed. In this design method, filters W <→> (ω, θs) are omnidirectional (all as assumed
directions of voice arrival using spatial correlation matrix Rxx (ω) under the constraint condition
04-05-2019
17
It is designed such that the power of voice in the direction) is minimized (see equation (31)). The
transfer characteristic a <→> (ω, θs) = [a1 (ω, θs),..., AM (ω, θs)] <T> is represented by the
equation (17a) (correctly, the equation (17a) Of θ is θs)). In addition, it is Rxx ((omega)) = Q
((omega)) (refer Formula (10a), Formula (26), Formula (27)).
[0058]
It is known that the filter W <→> (ω, θs) which is the optimum solution of the equation (31) is
given by the equation (33) (see reference 2).
[0059]
<< Embodiment >> The functional configuration and the processing flow of the embodiment of
the present invention are shown in FIG. 4 and FIG.
The narrow directional speech enhancement apparatus 1 of this embodiment includes an AD
conversion unit 210, a frame generation unit 220, a frequency domain conversion unit 230, a
filter application unit 240, a time domain conversion unit 250, a filter design unit 260, and a
storage unit 290.
[0060]
Step S1 In advance, the filter design unit 260 calculates filters W <→> (ω, θi) for each
frequency for each discrete direction in which speech enhancement can be performed. Assuming
that the total number of discrete directions that can be the target of speech enhancement is I (I is
a predetermined integer greater than or equal to 1 and satisfies I ≦ P), W <→> (ω, θ1),. <→>
(ω, θi), ..., W <→> (ω, θI) (1 i i I I, ω ∈ Ω; i is an integer and Ω is a set of frequencies ω) are
calculated in advance is there. For this purpose, the transfer characteristics a <→> (ω, θi) = [a1
(ω, θi),..., AM (ω, θi)] <T> (1 ≦ i ≦ I, ω∈Ω) are determined Although it is necessary, this is
the arrangement of the microphones in the microphone array, the relative positions of the
reflectors such as the reflector, floor, wall and ceiling with respect to the microphone array, and
the direct sound and the first (1 ≦ ξ ≦ Ξ) reflected sound (17a) based on environmental
information such as the time of arrival of the object and the reflectance of the sound of a
reflector (specifically, θ in equation (17a) is θi). The number 反射 of the reflected sound is set
to an integer satisfying 1 ≦ 1, but there is no particular limitation on the value of よ い, and the
value may be set appropriately according to the calculation capability and the like. When one
04-05-2019
18
reflector is placed in the vicinity of the microphone array, the transfer characteristics a <→> (ω,
θi) can be specifically calculated by the equation (17b) (precisely, θ of the equation (17b) It is
considered as θi). For example, equation (14a), equation (14b), equation (18a), equation (18b),
equation (18c), and equation (18d) can be used to calculate the steering vector. In addition, you
may use the transfer characteristic obtained by measurement in real environment, for example
not based on Formula (17a) or Formula (17b). Then, using the transfer characteristics a <→> (ω,
θi), W <→> (ω, θi), for example, according to any of the equations (9), (29), (30), and (33)
Determine (1 ≦ i ≦ I). When using the equation (9), the equation (30) or the equation (33), the
spatial correlation matrix Q (ω) (or Rxx (ω)) can be calculated by the equation (10b). When
equation (29) is used, the spatial correlation matrix R nn (ω) can be calculated by equation (27).
The filters W <→> (ω, θi) (1 ≦ i ≦ I, ω∈Ω) are stored in the storage unit 290.
| Ω | represents the number of elements of the set Ω.
[0061]
Step S2: Sound is collected using M microphones 200-1,..., 200-M constituting the microphone
array. M is an integer of 2 or more.
[0062]
There is no limitation on the arrangement of M microphones. However, arranging M
microphones two-dimensionally or three-dimensionally has an advantage of eliminating the
uncertainty in the direction of speech enhancement. In other words, when M microphones are
arranged in a straight line in the horizontal direction, for example, the microphone can not be
distinguished between the voice coming from the front and the voice coming from directly above,
so You can prevent that. Also, in order to widen the direction that can be set as the sound
collection direction, the directivity of each microphone is such that the sound can be collected
with a certain sound pressure in a direction that can be the target direction θs that is the sound
collection direction. You should have it. Therefore, a microphone with relatively moderate
directivity such as a nondirectional microphone or a unidirectional microphone is preferable.
[0063]
04-05-2019
19
Step S3: The AD conversion unit 210 converts the analog signal (pickup signal) picked up by the
M microphones 200-1, ..., 200-M into a digital signal x <→> (t) = [x1 (t), ,, XM (t)] <T>. t
represents an index of discrete time.
[0064]
Step S4 The frame generation unit 220 receives the digital signal x <→> (t) = [x1 (t),..., XM (t)]
<T> output from the AD conversion unit 210 and selects N samples for each channel. Are stored
in a buffer to output a digital signal x <→> (k) = [x <→> 1 (k),..., X <→> M (k)] <T> in frame units.
k is an index of a frame number. x <→> m (k) = [x m ((k−1) N + 1),..., x m (k N)] (1 ≦ m ≦ M). N
depends on the sampling frequency, but in the case of 16 kHz sampling, around 512 points are
appropriate.
[0065]
Step S5 The frequency domain conversion unit 230 converts the digital signal x <→> (k) of each
frame into a signal X <→> (ω, k) = [X1 (ω, k),. k) Convert to <T> and output. ω is the index of
the discrete frequency. Although there is a fast discrete Fourier transform as one of the methods
for converting time domain signals into frequency domain signals, the present invention is not
limited to this, and other methods for converting into frequency domain signals may be used. The
frequency domain signal X <→> (ω, k) is output for each frequency ω and frame k.
[0066]
Step S6 The filter application unit 240 sets the frequency domain signal X <→> (ω, k) = [X1 (ω,
k),..., XM (ω, k)] for each frequency ω∈Ω for each frame k. A filter W <→> (ω, θs)
corresponding to the target direction θs to be emphasized is applied to <T>, and an output
signal Y (ω, k, θs) is output (see equation (34)). Since the index s of the target direction θs is
sε {1,..., I} and the filter W <→> (ω, θs) is stored in the storage unit 290, for example, each time
the process of step S6 The filter application unit 240 may obtain the filter W <→> (ω, θs)
corresponding to the target direction θs to be emphasized from the storage unit 290. When the
index s of the target direction θs does not belong to the set {1,..., I}, that is, when the filter W
<→> (ω, θs) corresponding to the target direction θs is not calculated in the process of step S1.
The filter design unit 260 may temporarily calculate the filter W <→> (ω, θs) corresponding to
the target direction θs, or the filter W <→> (ω corresponding to the direction θs ′ closer to
04-05-2019
20
the target direction θs , θs ′) may be used.
[0067]
Step S7 The time domain transform unit 250 transforms the output signal Y (ω, k, θs) of each
frequency ω∈Ω of the k th frame into the time domain to obtain the frame unit time domain
signal y (k) of the k th frame. The obtained frame unit time domain signal y (k) is further
connected in the order of the index of the frame number, and the time domain signal y (t) in
which the voice in the target direction θs is emphasized is output. The method of converting the
frequency domain signal into the time domain signal is an inverse transform corresponding to
the conversion method used in the process of step S5, and is, for example, high-speed discrete
inverse Fourier transform.
[0068]
Here, an embodiment in which the filter W <→> (ω, θi) is calculated in advance in the process
of step S1 has been described, but the target direction θs is calculated according to the
calculation processing capability of the narrow directional speech enhancement device 1 or the
like. It is also possible to adopt an embodiment in which the filter design unit 260 calculates the
filter W <→> (ω, θs) for each frequency after it is determined.
[0069]
Experimental results according to an embodiment of the present invention (minimum dispersion
no distortion response method) will be described.
As shown in FIG. 6, 24 microphones are linearly arranged, and the reflector 300 is arranged such
that the arrangement direction of the microphones included in this linear microphone array is
the normal to the reflector 300. Although the shape of the reflecting plate 300 is not limited, a
flat reflecting plate which has a flat reflecting surface and a size of 1.0 m × 1.0 m, a suitable
thickness, and a rigidity is used. The distance between adjacent microphones was 4 cm, and the
reflectance α of the reflecting plate 300 was 0.8. The target direction θs was set to 45 degrees.
Assuming that the voice arrives as a plane wave in the linear microphone array, the transfer
characteristic is calculated by equation (17b) (see equation (14a) and equation (18a)) to verify
the directivity of the generated filter. Two conventional methods (a minimum dispersion zero
distortion response method without a reflector and a delay combining method with a reflector)
04-05-2019
21
were used as comparison objects.
[0070]
The experimental results are shown in FIG. 7 and FIG. It can be seen that the embodiment of the
present invention achieves sharp directivity in the target direction in any frequency band as
compared with the two conventional methods. In particular, the utility of the present invention is
understood as the lower frequency band. Further, FIG. 9 shows directivity by the filter W <→>
(ω, θ) generated according to the embodiment of the present invention. It can be seen from FIG.
9 that not only the direct sound but also the reflected sound is emphasized.
[0071]
Further, as shown in FIG. 10, the same as in the above-described experiment, the reflector 300 is
disposed such that the angle between the array direction of the microphones included in the
linear microphone array and the plane of the reflector 300 is 45 degrees. Experiment was
conducted. The target direction θs was set to 22.5 degrees, and the other experimental
conditions were the same as in the case where the reflecting plate 300 was disposed such that
the arrangement direction of the microphones included in the linear microphone array was the
normal to the reflecting plate 300.
[0072]
The experimental results are shown in FIG. 11 and FIG. It can be seen that the embodiment of the
present invention achieves sharp directivity in the target direction in any frequency band as
compared with the two conventional methods. In particular, the utility of the present invention is
understood as the lower frequency band.
[0073]
Next, an example of the implementation configuration of the present invention will be described
with reference to FIGS. 13 to 17. In these examples, the configuration of the microphone array is
illustrated as a linear microphone array, but is not limited to the configuration of the linear
04-05-2019
22
microphone array.
[0074]
In the embodiment shown in FIG. 13, the M microphones 200-1,..., 200-M constituting the linear
microphone array are fixed to the rectangular flat support member 400, and in this state the
sound pickup of each microphone The holes are arranged in a plane (hereinafter referred to as
an open surface) of the support member 400 (M = 13 in the illustrated example). Wirings
connected to the microphones 200-1,..., 200-M are not shown. The reflector 300 is fixed to the
end of the support member 400 such that the arrangement direction of the microphones 200-1,
..., 200-M is the normal to the rectangular flat reflector 300. The opening surface of the support
member 400 is a surface that makes an angle of 90 degrees with the reflection plate 300. In the
embodiment shown in FIG. 13, the preferred property of the reflecting plate 300 is the same as
the property of the reflecting member described above, and the property of the support member
400 is not particularly limited, and each microphone 200-1, ..., It is sufficient if it has a rigidity
that can firmly fix 200-M.
[0075]
In the embodiment shown in FIG. 14A, the shaft portion 410 is fixed to the end of the support
member 400, and the reflection plate 300 is rotatably attached to the shaft portion 410.
According to this implementation, it is possible to change the geometry of the reflector 300
relative to the microphone array.
[0076]
In the embodiment shown in FIG. 14 (b), two more reflecting plates 310 and 320 are added to
the embodiment shown in FIG. The properties of the two added reflectors 310 and 320 may be
the same as or different from the properties of the reflector 300. Further, the property of the
reflecting plate 310 may be the same as or different from the property of the reflecting plate
320. Hereinafter, the reflection plate 300 is referred to as a fixed reflection plate 300. The shaft
510 is fixed to the end of the fixed reflection plate 300 (the end opposite to the end of the fixed
reflection plate 300 fixed to the support member 400), and the reflection plate 310 is rotated to
the shaft 510. It is attached freely. The shaft 520 is fixed to the end of the support 400 (the end
opposite to the end of the support 400 to which the fixed reflection plate 300 is fixed), and the
04-05-2019
23
reflection plate 320 is fixed to the shaft 520. It is attached rotatably. The reflectors 310 and 320
are hereinafter referred to as movable reflectors 310 and 320, respectively. According to the
embodiment shown in FIG. 14B, for example, when the position of the movable reflection plate
310 is set so that the reflection surface of the fixed reflection plate 300 and the reflection
surface of the movable reflection plate 310 coincide with each other, the fixed reflection plate
300 and the movable reflection plate The combination of the reflectors 310 can function as a
reflector having a larger reflective surface than the fixed reflector 300. Further, according to the
embodiment shown in FIG. 14B, by setting the movable reflecting plates 310 and 320 at
appropriate positions, for example, as shown in FIG. 15, the supporting member 400, the fixed
reflecting plate 300, and the movable reflecting plate Since sound can be reflected many times in
the space surrounded by 310 and 320, the number of reflected sounds can be controlled. In the
case of the embodiment shown in FIG. 14B, since the support member 400 plays a role as a
reflector, it preferably has the same properties as the properties of the reflector described above.
[0077]
The embodiment shown in FIG. 16 is different from the embodiment shown in FIG. 13 in that the
reflector 300 is also provided with a microphone array (in the example shown, a linear
microphone array). In the embodiment shown in FIG. 16, the arrangement direction of the M
microphones fixed to the support member 400 and the arrangement direction of the M ′
microphones fixed to the reflection plate 300 are on the same plane, but It is not limited to the
arrangement configuration (M '= 13 in the illustrated example). For example, M ′ microphones
may be fixed to the reflection plate 300 so as to have an arrangement direction orthogonal to the
arrangement direction of the M microphones fixed to the support member 400. According to the
embodiment shown in FIG. 16, the microphone array provided on the support member 400 and
the reflector 300 (the reflector 300 is used as a reflector without using the microphone array
provided on the reflector 300) The combination of the present invention is implemented or a
combination of the support member 400 (the support member 400 is used as a reflector without
using the microphone array provided on the support member 400) and the microphone array
provided on the reflection plate 300 To implement the present invention.
[0078]
Further, as an extended implementation configuration example of the implementation
configuration example shown in FIG. 16, similarly to the implementation configuration example
shown in FIG. 14 (b), a configuration in which two more reflecting plates 310 and 320 are added
(See FIG. 17). Also, although not shown, at least one of the movable reflecting plates 310 and
04-05-2019
24
320 may be provided with a microphone array. The sound collection holes of the microphones
constituting the microphone array provided in the movable reflection plate 310 are disposed, for
example, in the plane (opening surface) of the movable reflection plate 310 that can face the
opening surface of the support member 400. The sound collection holes of the microphones
constituting the microphone array provided in the movable reflection plate 320 are disposed, for
example, in the plane (opening surface) of the movable reflection plate 320 which can form the
same plane as the opening surface of the support member 400. Even in such an implementation
configuration example, the same usage as the implementation configuration example shown in
FIG. 14 (b) is possible. Further, according to this embodiment, for example, when the position of
the movable reflection plate 320 is set so that the opening surface of the support member 400
and the opening surface of the movable reflection plate 320 coincide with each other, the
combination of the support member 400 and the movable reflection plate 320 , And can function
as a microphone array larger than the microphone array provided on the support member 400.
In the embodiment shown in FIG. 17 as well, in the embodiment in which the microphone array
is provided in at least one of the movable reflecting plates 310 and 320, the same usage as the
embodiment shown in FIG. 15 is possible. Further, in the embodiment shown in FIG. 17 as well,
in the embodiment in which the microphone array is provided in at least one of the movable
reflectors 310 and 320, for example, the movable reflectors 310 and 320 are used as ordinary
reflectors, It is also possible to use the microphone array provided on the support member 400
and the microphone array provided on the fixed reflector 300 as an integral microphone array.
In this case, it is equivalent to an embodiment using a microphone array composed of (M + M ')
microphones and two reflectors.
[0079]
When a microphone array is provided on the movable reflection plate 310, the sound pickup
holes of the microphones constituting the microphone array provided on the movable reflection
plate 310 are opposite to the flat surface of the movable reflection plate 310 which can face the
opening surface of the support member 400. The movable reflection plate 310 may be provided
with a microphone array so as to be disposed in the plane (opening plane) of Further, when the
movable reflection plate 320 is provided with a microphone array, the movable reflection plate
320 in which the sound collection holes of the microphones constituting the microphone array
provided in the movable reflection plate 320 can form the same plane as the opening surface of
the support member 400. The movable reflecting plate 320 may be provided with a microphone
array so as to be disposed in a plane (opening plane) opposite to the plane of. Of course, at least
one of the movable reflecting plates 310 and 320 may be provided with a microphone array on
the movable reflecting plate so as to have an opening on both sides thereof.
04-05-2019
25
[0080]
[A] In the case where the microphone array is provided on at least one of the movable reflecting
plates 310 and 320, the opening surface of the movable reflecting plate 310 is a flat surface that
can face the opening surface of the support member 400 Assuming that the opening surface of
320 is a plane capable of forming the same plane as the opening surface of the support member
400, the opening surface of the movable reflection plate 310 and / or the movable reflection
plate 320 with respect to the viewing direction in the usage configuration shown in FIG. The
movable reflective plate 310 and / or the movable reflective plate 320 is provided so that the
apparent array size in the sight line direction is reduced by arranging the movable reflective
plate 310 and / or the movable reflective plate 320 so as not to be visible. By using a
microphone array, the same effect as increasing the array size can be obtained.
[0081]
[B] In the case where the microphone array is provided on at least one of the movable reflecting
plates 310 and 320, the opening surface of the movable reflecting plate 310 is a plane opposite
to the plane that can face the opening surface of the support member 400. In the case where the
opening surface of the movable reflection plate 320 is a plane opposite to the plane which can
form the same plane as the opening surface of the support member 400, in the usage
configuration shown in FIG. The same effect as increasing the array size can be obtained while
maintaining the size.
[0082]
When at least one of the movable reflecting plates 310 and 320 is provided with a microphone
array on the movable reflecting plate so as to have an opening on both sides thereof, it is also
possible to obtain the effects of both [A] and [B]. It is possible.
[0083]
As described above, installing a flat reflector in the direction perpendicular to the arrangement
direction of the linear microphone array is one of the conditions under which narrow directional
beams can be generated.
In the embodiment shown in FIG. 16 and FIG. 17, since the microphones are expanded in a onedimensional manner, for example, when the arrangement direction of these microphones is set as
the horizontal direction, the direction control of the horizontal angle direction is possible There
is, however, no direction control in the elevation direction.
04-05-2019
26
Therefore, an embodiment in which an arbitrary direction in a three-dimensional space can be
emphasized by expanding the embodiment shown in FIG. 17 in a three-dimensional manner will
be described below.
18 and 19, according to the third embodiment in which the fixed reflecting plate 300 and the
supporting member 400 in the embodiment of FIG. 17 described above are combined to
constitute opposing pyramidal faces of regular octagonal weights. The zoom microphone device
600 will be described. FIG. 18 is a front view of a zoom microphone device 600 according to the
present embodiment. FIG. 19 is a side view of the zoom microphone device 600 according to the
present embodiment. In the present embodiment, the fixed reflection plate 300 and the support
member 400 which are separately referred to in FIG. 17 are all collectively referred to as a fixed
reflection plate. The configuration shown as movable reflecting plates 310 and 320 in FIG. 17 is
also called a movable reflecting plate in this embodiment. The zoom microphone device 600 of
this embodiment includes one support structure 601, eight fixed reflectors 611 to 618, eight
fixed microphone arrays 621 to 628, eight movable reflectors 631 to 638, and eight. Movable
microphone arrays 641 to 648, one central reflector 651, one central microphone array 661,
eight hinges 671 to 678, eight supporting metal plates (large) 681 to 688, eight supporting
metals Plates (small) 691 to 698 and parts for joining such as bolts and screws. The support
structure 601 is intended to fix the fixed reflectors 611 to 618 and the central reflector 651 at a
predetermined position and direction. The support structure 601 can be assembled by, for
example, mold steel, square steel pipe or the like. The fixed reflection plates 611 to 618 are flat
plates having a trapezoidal shape, and are made of a material having high reflectance. For
example, wood having a thickness of about 1 cm, an ABS resin material, or the like can be used
for the fixed reflection plates 611 to 618. The fixed microphone arrays 621 to 628 are
configured such that a plurality of microphones are aligned in a straight line. In this embodiment,
the fixed microphone arrays 621 to 628 align the plurality of microphones in the longitudinal
direction of the supporting metal plates (large) 681 to 688 on the long plate shaped supporting
metal plates (large) 681 to 688. And the microphones are fixed on the supporting metal plates
(large) 681 to 688 with bonding members such as bolts and screws. The surfaces of the fixed
reflectors 611 to 618 are arranged so that the surface of the fixed reflectors 611 to 618 is
parallel to the microphone array direction of the fixed microphone arrays 621 to 628. One for
each is attached. In the present embodiment, a plurality of small holes are provided in a straight
line in each fixed reflector.
The plurality of small holes are arranged on a vertical bisector of the upper and lower bases of
the trapezoidal fixed reflector. With the sound receiving parts of the microphones of the fixed
microphone arrays 621 to 628 arranged so as to pierce the small holes, the above-described
04-05-2019
27
supporting metal plates (large) 681 to 688 are fixed to the fixed reflecting plates 611 to 618
with long bolts or the like. By mounting, the fixed microphone arrays 621 to 628 are fixed to the
fixed reflectors 611 to 618. The fixed reflecting plates 611 to 618 are combined to form eight
pyramidal faces of a regular octagonal pyramid whose opposing pyramidal faces are
perpendicular, and their oblique sides are joined. At the upper bottom of the regular octagonal
pyramid, the central reflector 651 in the form of a regular octagonal flat plate is disposed, and on
each side of the central reflector 651, the upper bottom of the fixed reflectors 61 1 to 6 18 in the
trapezoidal shape is joined There is. If the surface on the concave side of the regular octagonal
pyramid formed by these fixed reflecting plates 611 to 618 and the central reflecting plate 651
is the front surface and the surface on the convex surface is the back surface, the abovementioned supporting metal plate (large) It is assumed that 681 to 688 are attached to the back
surface side which is the convex side with long bolts etc., and the microphones of the fixed
microphone arrays 621 to 628 have small holes toward the front surface which is the concave
side. It shall be exposed from. The regular octagonal frustum formed by the fixed reflecting
plates 611 to 618 and the central reflecting plate 651 is attached at a predetermined position of
the support structure 601 described above in a predetermined direction. The hinges 671 to 678
are attached to the opening sides of the regular octagonal pyramid formed by the fixed reflecting
plates 611 to 618 and the central reflecting plate 651, one for each side. In this embodiment, the
hinges 671 to 678 use hinges (for example, free stop hinges) which can be stopped at any angle.
Movable reflection plates 631 to 638 are rotatably mounted on the open octagonal pyramid end
side as an axis through the hinges 671 to 678, one for each side of the open octagonal pyramid
end side. In the present embodiment, the movable reflecting plates 631 to 638 are flat plates
having a trapezoidal shape, and are made of the same material as the fixed reflecting plates 611
to 618. The lengths of the lower bases of the fixed reflectors 611 to 618 and the lengths of the
lower bases of the fixed reflectors 631 to 638 are equal. The fixed reflecting plates 611 to 618
and the lower bases of the movable reflecting plates 631 to 638 are connected via hinges 671 to
678. The movable reflectors 631 to 638 can be manually changed in angle, and the manually
moved angle can be maintained, since the hinges 671 to 678 use hinges that can be stopped at
any angle.
The movable reflecting plates 631 to 638 are provided with small holes in a straight line on the
vertical bisector of the upper or lower bottom as in the case of the fixed reflecting plates 611 to
618 described above. Movable microphone arrays 641 to 648 are attached to the movable
reflective plates 631 to 638, respectively. Like the fixed microphone arrays 621 to 628
described above, the movable microphone arrays 641 to 648 are arranged such that a plurality
of microphones are aligned in the longitudinal direction on the long plate-shaped support metal
plates (small) 691 to 698. And is fixed to supporting metal plates (small) 691 to 698. The
movable microphone arrays 641 to 648, like the fixed microphone arrays 621 to 628, have long
supporting metal plates (small) 691 to 698 so that the sound receiving portion is exposed from
the small hole toward the front surface which is the concave side. It is attached to the movable
04-05-2019
28
reflecting plates 631 to 638 using bolts. The positions of the small holes provided in the
respective reflecting plates are such that the movable reflecting plates 631 to 638 are rotated
about the movable axis, and the plate surface on the concave side of the movable reflecting
plates 631 to 638 and the fixed reflecting plates 611 to 618 When the plate surface on the
concave side is in the same plane, the small holes provided in the movable reflection plate and
the small holes provided in the fixed reflection plate connected thereto are determined to be
arranged on the same straight line It is done. A central microphone array 661 is attached to the
central reflector 651 described above. The central microphone array 661 comprises a support
rod and a plurality of microphones. A plurality of round holes penetrating in the radial direction
of the round bar are axially arranged in the support bar. The microphone is disposed and fixed in
a round hole opened in the support rod. The central microphone array 661 configured in this
way is a straight line whose support rod is perpendicular to the bottom surface of the regular
octagonal weight passing through the top apex of the regular octagonal weight formed with the
surface of the fixed reflecting plates 611 to 618 as the pyramidal surface. It is vertically attached
to the front surface (concave side) of the central reflector 651 so as to be disposed above.
[0084]
As described above, the positions of the movable reflection plates 631 to 638 so that the
reflection surfaces (front surfaces) of the fixed reflection plates 611 to 618 and the reflection
surfaces (front surfaces) of the movable reflection plates 631 to 638 are disposed in the same
plane. Setting the fixed reflection plates 611 to 618 and the movable reflection plates 631 to
638 can function as a reflection plate having a reflection surface larger than that of the fixed
reflection plates 611 to 618. Furthermore, the sound reflected from the back can be blocked by
the movable reflecting plates 631 to 638. Both of these help create an environment that is more
likely to produce narrow directional beams. In addition, by turning and stopping the movable
reflecting plates 631 to 638 at an appropriate position, as described in FIG. 15, as described in
FIG. Since sound can be reflected, the number of reflected sounds can be controlled, and sound
can be picked up with high energy. This facilitates creating a difference between the target sound
and the transfer characteristic of noise coming from an angle that does not have a difference in
the direction of arrival, thereby helping to create an environment that is likely to generate a
beam of narrow directivity. If it is desired to pick up voice in the same direction as voice from a
specific direction by distance according to the configuration described above, the height and
angle of the above-mentioned octagonal pyramid may be adjustable. Further, in the present
embodiment, the central microphone array is disposed on a straight line which is perpendicular
to the bottom surface of the regular octagonal weight passing through the top apex of the
regular octagonal weight formed as the pyramidal surface as the surface of the fixed reflecting
plates 611 to 618. The 661 is attached. The purpose of this is to make the directivity
characteristic of the zoom microphone device 600 a more acute angle by using the strength of
04-05-2019
29
the directivity in the vertical direction, which is a feature of the linear microphone array.
[0085]
[Modifications 1 to 5] In the third embodiment, the zoom microphone apparatus has been shown
in which the opposing pyramidal faces of the octagonal pyramids are perpendicular to each
other, but the present invention is not limited to this. As described above, flat reflectors are
installed perpendicular to the array direction of the linear microphone array, and if the direction
control is possible in the horizontal angle direction and the elevation angle direction, the shape of
the fixed reflector is a regular octagonal pyramid. It is not necessary to arrange the pyramidal
surface. Moreover, although it can use for control of the number of reflected sound by the
movable reflecting plates 631 to 638, this is not essential. The central reflection plate 651 and
the central microphone array 661 can also be omitted as appropriate. The reflectors are
preferably in pairs, so it is desirable that the reflectors have a regular pyramidal frustum shape
or a regular pyramidal shape having a multiple of two pyramidal faces. For example, it may be a
regular square weight (base), a regular hexagonal weight (base), a regular octagonal weight
(base) or the like. In addition, a microphone that can be installed on a single reflector, such as a
regular dodecagonal pyramid (base) or a regular triangular pyramid (base), that reduces the
energy of the reflected sound if the number of reflectors is too large. As the number of decreases
and the distance between microphones increases, there arises a problem that the frequency band
that can be controlled becomes narrow. If the number of microphones that can be used is about
16 to 96, the shape of a regular square weight (base), a regular hexagonal weight (base), or a
regular octagonal weight (base) is just right. Hereinafter, first to fifth modified examples of the
third embodiment will be described below with reference to FIGS. FIG. 20 is a front view showing
the configuration of a first modification of this embodiment. FIG. 21 is a front view showing the
configuration of a second modification of this embodiment. FIG. 22 is a front view showing the
configuration of a third modification of this embodiment. FIG. 23 is a front view showing the
configuration of the fourth modification of this embodiment. FIG. 24 is a front view showing the
configuration of the fifth modification of this embodiment. The zoom microphone device 600a in
the first modification has four fixed reflection plates and four movable reflection plates of the
zoom microphone device 600 described above, and the fixed reflection plate constitutes a
pyramidal surface of a square pyramid. It is a combination. Similar to the zoom microphone
device 600, the opposing pyramidal faces are attached vertically. Also in this configuration, as in
the above-described zoom microphone device 600, any direction in the three-dimensional space
can be emphasized. In the zoom microphone device 600b according to the second modified
example, six fixed reflection plates and six movable reflection plates of the zoom microphone
device 600 described above are provided, and the fixed reflection plate constitutes a pyramidal
surface of a regular hexagonal pyramid. It is a combination.
04-05-2019
30
Similar to the zoom microphone device 600, the opposing pyramidal faces are attached vertically.
Also in this configuration, as in the above-described zoom microphone device 600, any direction
in the three-dimensional space can be emphasized. Also, as described above, the movable
reflector, the central reflector, and the central microphone array are not essential components.
Therefore, from zoom microphone device 600a in the first modification, the movable reflection
plate, the central reflection plate, and the central microphone array are omitted to be zoom
microphone device 600c, and from zoom zoom device 600b in the second modification, movable
reflection plate. The configuration in which the central reflection plate and the central
microphone array are omitted is the zoom microphone device 600d according to the third
embodiment, and the movable reflection plate and the central reflection plate are the
configuration in which the central microphone array is omitted the zoom microphone device
600e. As shown. Also in these zoom microphone devices 600c, 600d, and 600e, it is possible to
realize narrow directional speech enhancement technology having directivity that is sharper than
that in the past.
[0086]
<Example of application> The narrow directional speech enhancement technology corresponds to
generation of a clear image from a blurred, blurred image, when expressed in an image, and
helps to obtain sound field information in more detail. The following describes service examples
in which the present invention is useful.
[0087]
The first example is content production combined with video. By using the embodiment of the
present invention, it is possible to clearly emphasize distant target voice even in a noise
environment with many noises (non-target voice etc.), so, for example, a zoomed-in video that a
soccer player dribbling from outside the field dribbles Corresponding voice can be attached.
[0088]
A second example is application to a video conference system (which may be a voice conference
system). When meeting in a small room, even in the prior art, it was possible to emphasize the
voice of the speaker using several microphones, but in a large conference room (for example, at a
04-05-2019
31
distance of 5 m or more from the microphone) In a wide space where a speaker exists, it is
difficult to clearly emphasize the voice of a distant speaker, and for this reason, it was necessary
to place a microphone in front of each speaker. However, by using the embodiment of the
present invention, since it is possible to clearly emphasize distant sounds, a TV conference
system compatible with a wide conference room is constructed without installing a microphone
in front of each speaker. It is possible to
04-05-2019
32
Документ
Категория
Без категории
Просмотров
0
Размер файла
54 Кб
Теги
jp2012165189
1/--страниц
Пожаловаться на содержимое документа