close

Вход

Забыли?

вход по аккаунту

?

JPH10243494

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH10243494
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
method and apparatus for efficiently identifying the attention of a user (user) 's apparatus or an
arbitrary object in an interactive speech understanding apparatus or the like.
[0002]
2. Description of the Related Art In communication between human beings, a call, a cue, and a
voice at a gaze on an object are emitted toward a person or object to be acted upon. From this, in
the speech understanding system, it is possible to identify whether or not a speaker is directed to
the system, and control the rejection of unnecessary speech and the movement of the focus in
speech understanding. From this point of view, recognition of the gaze direction of a speaker
using a face image has been proposed as a method of identifying whether or not an utterance is
directed to the system.
[0003]
A pattern matching using an image will be described as an example of a conventional face
direction recognition method. In order to recognize the face direction from the image, there is a
method of pre-registering an image of a face that faces in multiple directions as a reference
pattern and collating with the input image to recognize the face direction [1]. An example of
03-05-2019
1
pattern matching of image information is shown in FIG. An entire or partial image of a face that
faces in multiple directions is registered in advance as a reference pattern group 101 whose face
direction is known. Here, when the evaluation pattern 110 whose face direction is unknown is
given, the most similar one of the individual reference patterns 102 to 106 is determined, and
the face direction of the matching pattern is recognized as the face direction of the input pattern.
Do.
[0004]
In order to check the image pattern, first, the image is divided into small sections, and the feature
amount is calculated for each section from the distribution of the color and brightness of the
image in the section. Subsequently, the feature quantities of all the sections constituting the
image are obtained. This procedure is performed for both the reference pattern and the
evaluation pattern, and the similarity is calculated from the distance of each feature amount
distribution. As can be seen from FIG. 4, in this method, a large number of reference patterns
must be registered in advance, feature quantities must be calculated for each pattern, and
similarity determination must be performed. In addition, the image is divided into small sections,
and many calculations are required to obtain each feature quantity.
[0005]
In the conventional face direction recognition method, there is a problem that the calculation
cost becomes high because pattern matching of image information is performed. An object of the
present invention is to provide a method and apparatus capable of calculating pattern matching
for face direction recognition more efficiently.
[0006]
According to the present invention, when a pair of microphones are projected on a vertical
surface facing a speaker using a pair of microphones, they are arranged to be offset from each
other. The difference in output power is detected, and the difference is used to determine at
which position on the straight line connecting the projected microphone positions on the vertical
plane the face of the speaker is facing. The power difference is detected for a specific frequency
band of the microphone output. By providing a pair of microphones in the left-right direction and
the up-down direction as a pair of microphones, it is detected which of the left-right direction and
03-05-2019
2
the up-down direction the face is facing.
[0007]
DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows an embodiment of a face
direction recognition apparatus according to the present invention. As shown in FIG. 6A, on the
front surface of the speaker 200, the microphones 202 and 203 arranged in line in the vertical
direction to determine the direction in the vertical direction of the speaker 201's face 201 and
the direction of the face 201 in the horizontal plane. The microphones 204 and 205 are arranged
in line in the horizontal direction in order to determine In this example, the microphones 202,
203, 204 and 205 are located in one substantially vertical plane 206 opposite to the front of the
speaker and between the midpoints of the microphones 202 and 203 and the microphones 204
and 205. A midpoint 208 coincides with and passes through the midpoint 207 such that a line
208 perpendicular to the vertical surface 206 passes through the lip 209 of the face 201 with
the face 201 facing the vertical surface 206. That is, the microphones 202 and 203 are disposed
on the vertical plane passing the center of the head of the speaker 200 (the vertical plane
passing through the straight line 208 in FIG. 1A), and the microphones 204 and 205 are
disposed on the left and right of the center Do. Further, each of the microphones 202 to 205 is
unidirectional, and its pointing direction is directed to the lip portion 209, and the microphones
202 and 203, 204 and 205 forming a pair have the same characteristics.
[0008]
The sound from one pair of microphones 202 and 203 is input to the feature value analyzers
210 and 211 as shown in FIG. 1B. The feature amount analyzers 210 and 211 detect an acoustic
change of voice due to a difference in direction from the lip 209 to the microphones 202 and
203 under the influence of the shapes of the vocal organs and the head. The difference extractor
212 calculates the difference between the two feature quantities detected by the feature quantity
analyzers 210 and 211. The discriminant analyzer 213 discriminates the direction (vertical
direction) in the vertical plane of the face 201 from the output of the difference extractor 212.
[0009]
Although not shown in the figure, the feature amounts of the voices captured by the other pair of
microphones 204 and 205 are also detected, and the direction (horizontal direction) of the face
03-05-2019
3
201 in the horizontal plane is determined from the difference between the feature amounts. FIG.
1C shows a specific implementation example of the device of the present invention. Voice power
is extracted as each of the feature amount analyzers 210 and 211, and the face direction is
determined by the difference between these voice powers. That is, the sound pressure around the
head attenuates by several to several dozen dB in the back and the back as compared with the
front of the face in both the horizontal plane and the distribution of the central and vertical
planes of the head. Therefore, the face direction can be determined by the difference in the audio
power.
[0010]
In this case, if the frequency characteristics of the microphones that compare the audio power
are different, a change in the spectral envelope causes a difference in power. In order to
eliminate this, the frequency domain to which the power is compared is limited, and the
fluctuation of the power due to the change of the spectral envelope is suppressed. Frequency
spectrum analyzers 301 and 302 are used as feature amount analyzers 210 and 211,
respectively. This is to extract the strength of the audio signal for each frequency at
predetermined intervals. As for the outputs of the pair of frequency analyzers 301 and 302,
transfer functions are calculated by the transfer function calculator 303 instead of calculating
differences of the inputs by the difference extractor 212. The transfer function is obtained by the
following equation when Px (f) and Py (f) are used as the analysis frequency spectra of the
frequency spectrum analyzers 301 and 302, respectively.
[0011]
Txy (f) = Pxy (f) / Pxx (f) where Pxx (f) is a square of Px (f) and Pxy (f) is a product of frequency
spectra Px (f) and Py (f) is there. The output of the difference extractor, that is, the output of the
transfer function calculator 303 is converted in the face direction by the discriminant analyzer
304 and output. The transfer function is a transfer function in which the output of one of the
microphones, for example, 202, is the input of the transmission path, and the output of the other
microphone 203 is the output of the transmission path.
[0012]
Next, experimental examples will be described. This is the case where the present invention is
03-05-2019
4
applied to detection of the face direction of the speaker 200 sitting toward the desk 401 in a
soundproof room as shown in FIG. An omnidirectional collectlet condenser microphone 403 is
attached to the chest of the speaker 200 on the center of the head and the vertical surface 402
of the speaker 200, and the directivity is uniform on the desk 401 with an equal distance left and
right from the vertical surface 402. Dynamic microphones 204 and 205 were placed, and the
pointing direction was directed to the face of the speaker 200. The face of the speaker 200 was
directed in a defined direction, and the prepared text was read aloud. For one male and one
female speaker 200, the output of each of the microphones 204, 205 and 403 in four directions,
front direction 404, left microphone direction 405, right microphone direction 406, and left and
right microphone symmetry plane 407 downward. Digital data was taken at a sampling
frequency of 48 kHz, and each was subjected to frequency spectrum analysis to determine the
power spectrum ratio of the paired microphones.
[0013]
The power spectrum ratio of the left and right microphones 204 and 205 when speaking in the
front direction 404 is substantially flat up to 20 kHz as shown in FIG. 3A. The power spectrum
ratio of the left and right microphones 204 and 205 when the speaker 200 turns to the left, as
shown in FIG. 3B, is attenuated near 13 kHz and 17 kHz in addition to the attenuation at 20 kHz
or more. This attenuation does not occur when the speaker turns to the front and makes the
same utterance. Therefore, it is possible to determine whether the speaker 200 faces the front or
the left based on the ratio of the output sound power of the microphones 204 and 205. In this
case, if only the band with particular changes in FIGS. 3A and 3B, in this example 10 kHz to 20
kHz, is taken out, the difference in voice power ratio depending on the orientation of the face
becomes even larger, and the orientation can be determined correctly. Can. Although not shown
in the figure, it is possible to discriminate between when facing the front and when facing the
right based on the magnitude of the output sound power ratio of the microphones 204 and 205.
[0014]
The ratio of the power spectrum of the left microphone 204 to the power spectrum of the center
microphone 403 when the speaker 200 faces the front is as shown in FIG. 3C. In this case, since
the frequency characteristics of the microphones 204 and 403 are different, a large continuous
drop occurs in the high frequency band of 15 kHz or more. When the speaker 200 turns
downward, the ratio of the power spectrum of the microphone 204 to the power spectrum of the
microphone 403 drops at about 13 kHz as in the comparison of the outputs of the left and right
microphones as shown in FIG. 3D. On the other hand, a slight difference occurs in the power
03-05-2019
5
spectrum ratio in the portion where the frequency characteristic is flat. Therefore, by detecting
this difference, it can be determined whether it is facing the front or facing downward. In this
case, in order to avoid the influence based on the difference in the frequency characteristics of
the microphones 204 and 403, in this example, only the component of the 10-15 kHz band is
extracted, and the ratio of the power spectrum is detected without any error. be able to.
[0015]
As understood from the above description, as the microphones, it is sufficient to provide at least
two microphones arrayed in the horizontal plane and at least two microphones arrayed in the
vertical direction in the vertical plane. In this case, it is possible to share one of the arrayed
microphones in the horizontal plane and one of the arrayed microphones in the vertical plane.
[0016]
As described above, by using two pairs of microphones, it is possible to determine whether the
direction of the face is in the left-right direction or the up-down direction.
As described above, by using a pair of microphones in addition to the detection in four directions,
it is possible to know which position on the straight line connecting the positions of the
microphones projected on the vertical surface facing the speaker to the face. It can be detected.
That is, the present invention can be applied, for example, only to detection of which side in the
left and right direction it is facing.
[0017]
The orientation of the microphone used should be directed to the face of the speaker in order to
avoid the influence of noise and to increase the reception level of the signal sound, but this does
not have to be so. The microphones to be used are preferably a pair of microphones, that is, ones
having the same frequency characteristic as the ones taking the voice power ratio or the
difference, but they may not necessarily be the same.
[0018]
03-05-2019
6
Further, in the case of band limitation, although the required components are extracted after
performing the frequency spectrum analysis in the above, the microphone output may be passed
through a filter having the required band as the pass band, and the voice power of the output
may be determined . Also, it was determined whether the face is facing front, left, right, or
downward, that is, left, right, up, or down with respect to the front, but the direction is largely
deviated from the front even in the same direction. Because the difference with respect to the
output ratio when facing the front becomes large, the magnitude of the output ratio is stored in
advance for each angle range, and the range for the front is determined according to which range
the measured power ratio is. The size of the orientation can also be determined.
[0019]
According to the conventional face direction recognition method, it is necessary to prepare a
large number of calculations and reference patterns in order to perform image pattern
recognition. However, according to the present invention, by extracting the feature amount from
the audio signal and performing the discriminant analysis, it is possible to reduce the reference
pattern required for recognition and the calculation amount itself, and to perform the recognition
process efficiently.
[0020]
Brief description of the drawings
[0021]
FIG. 1A is a diagram showing an example of an arrangement relationship between a target face
201 and microphones in one embodiment of the present invention, B is a block diagram showing
an example of processing apparatus of outputs of a pair of microphones, and C shows its specific
implementation example. It is a block diagram.
[0022]
2 is a perspective view showing the relationship between the arrangement of the speaker and the
microphone and the direction of the face in the experimental example of the present invention.
[0023]
3 is a diagram showing various examples of power spectrum ratio in the experimental results of
the present invention.
03-05-2019
7
[0024]
4 is a conceptual diagram showing a conventional face direction recognition method.
[0025]
References [1] James L. Flanagan.
Speech Analysis Synthesis and Perception.
Splinger-Verlag, 1972.
03-05-2019
8
Документ
Категория
Без категории
Просмотров
0
Размер файла
17 Кб
Теги
jph10243494
1/--страниц
Пожаловаться на содержимое документа