close

Вход

Забыли?

вход по аккаунту

?

JP2007074317

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2007074317
PROBLEM TO BE SOLVED: To provide an information processing apparatus capable of detecting
a speaker position as a position in a coordinate system in a space. SOLUTION: An information
processing apparatus 100 is configured such that relative positions between microphone arrays
103 and 104 and a sound source are based on audio signals from at least two microphone holon
arrays 103 and 104 whose positions and orientations in space are predetermined. Information
on angle calculation units 1012, 1022 for calculating various angles, relative angles between
microphone arrays 103, 104 and sound source, size of the space and position and orientation of
the microphone arrays 103, 104 in the space Based on each of the microphone arrays 103 and
104, a straight line toward the sound source is obtained, and a position calculation unit 1052 is
obtained to obtain a position at which the straight line intersects as the sound source. [Selected
figure] Figure 1
INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD
[0001]
The present invention relates to detection of a sound source, and more particularly to an
information processing apparatus and an information processing method for detecting a sound
source from outputs of a plurality of microphone arrays.
[0002]
The microphone array is a device capable of estimating the direction of a sound source by
arranging a plurality of microphone elements at appropriate intervals and calculating the time
04-05-2019
1
delay between outputs of individual microphone elements.
By using this microphone array, the direction of the sound source can be calculated as a relative
angle from the microphone array. Patent Document 1 proposes a method of using this property
to estimate the direction of a speaker with a microphone array and pointing the camera in that
direction, and then combining the face image detection to specify the speaker position.
[0003]
In addition, Patent Document 2 discloses a method capable of moving the position of a camera
and a microphone array, in which an ultrasonic oscillator or the like is further attached to a
sensor device in which a camera and a microphone array are integrated. To detect the JP 20018191 A JP 2000-41228 A
[0004]
However, in the method described in Patent Document 1, if the center of the microphone array
and the center of the swing of the camera are not aligned and arranged, both can not be
interlocked, and the setting position of the camera is limited. Further, the method described in
Patent Document 2 can not separate the arrangement of cameras from the arrangement of
microphone arrays.
[0005]
In any of the above methods, the purpose is to detect the relative orientation to the speaker from
the position where the sensor element such as the microphone array is installed, and it is not to
detect the position of the speaker in the room. Therefore, it can not be used for the purpose of
switching the screen to be used or controlling lighting and air conditioning according to the
position of the speaker in the conference room.
[0006]
This problem also applies to a camera as an image recording apparatus, and when a speaker
change occurs, shooting is performed from a camera angle that puts the previous speaker and
04-05-2019
2
the new speaker in the same frame. It can not cope with such flexible control.
[0007]
Such problems are caused by the fact that the combined sensor with the microphone array unit
and the camera only detects the relative speaker position from the sensor position.
[0008]
Then, this invention is made in view of the said problem, and an object of this invention is to
provide the information processing apparatus which can detect a sound source as a position in
the coordinate system in space, and an information processing method.
[0009]
In order to solve the above problems, the present invention calculates the relative angle between
the microphone array and the sound source based on audio signals from at least two microphone
holon arrays whose position and orientation in space are predetermined. From each microphone
array toward the sound source based on information on angle calculation means, relative angle
between the microphone array and the sound source, size of the space and position and
orientation of the microphone array in the space Position calculation means for obtaining
respective straight lines and obtaining positions where the straight lines intersect as the sound
source.
[0010]
According to the present invention, it becomes possible to detect the speaker position detected as
the relative position from the microphone array which is the sensor element as the position in
the indoor coordinate system.
This makes it possible to adaptively control various facilities in the room arranged independently
of the sensor element in response to dynamic changes in the speaker position.
[0011]
The information processing apparatus according to the present invention further includes
04-05-2019
3
holding means for holding information on the size of the predetermined space and the position
and orientation of the microphone array in the space.
The position calculation means may detect that a speaker change has occurred when the sound
source changes discontinuously.
According to the present invention, when the speaker change occurs, the detected speaker
position changes discontinuously, so that the speaker change can be detected.
[0012]
The information processing apparatus according to the present invention further includes
switching means for switching at least one of a screen, an illumination means, and an imaging
means based on the sound source determined by the position calculation means.
According to the present invention, it is possible to automate operations such as switching the
camera angle and controlling the lighting and the screen according to the speaker.
[0013]
The information processing method according to the present invention includes an angle
calculating step of calculating a relative angle between the microphone array and a sound source
based on audio signals from at least two microphone holon arrays whose position and orientation
in space are predetermined. And a straight line from each of the microphone arrays toward the
sound source based on the size of the space, information on the position and orientation of the
microphone array in the space, and the relative angle between the microphone array and the
sound source Calculating a position where the straight line intersects as the sound source.
[0014]
According to the present invention, it becomes possible to detect the speaker position detected as
the relative position from the microphone array which is the sensor element as the position in
the indoor coordinate system.
04-05-2019
4
This makes it possible to adaptively control various facilities in the room arranged independently
of the sensor element in response to dynamic changes in the speaker position.
[0015]
The information processing method according to the present invention further includes the step
of detecting that a speaker change has occurred when the sound source changes discontinuously.
According to the present invention, when the speaker change occurs, the detected speaker
position changes discontinuously, so that the speaker change can be detected.
[0016]
According to the present invention, it is possible to provide an information processing apparatus
and an information processing method capable of detecting a sound source as a position in a
coordinate system in space.
[0017]
The best mode for carrying out the present invention will be described below.
[0018]
FIG. 1 is a diagram showing an outline of a configuration of a speech information processing
apparatus (information processing apparatus) 100 according to an embodiment of the present
invention.
The utterance information processing apparatus 100 detects the position of the speaker in the
room using a plurality of microphone arrays.
As shown in FIG. 1, the speech information processing apparatus 100 includes computers 101
and 102, microphone arrays 103 and 104, and a computer 105 for position calculation. The
computers 101, 102, and 105 are configured using a central processing unit (CPU), a read only
member (ROM), a random access memory (RAM), and the like, and the functions in the computer
are realized by executing a predetermined program. Ru.
04-05-2019
5
[0019]
Microphone arrays 103 and 104 are connected to the computers 101 and 102, respectively. The
microphone arrays 103 and 104 include at least two, and their positions and orientations in the
room are predetermined. The computer 101 includes an input unit 1011 for inputting an audio
signal from the microphone array 103, an angle calculation unit 1012 and an output unit 1013.
The angle calculation unit 1012 calculates the relative angle between the microphone array 103
and the sound source based on the audio signal from the microphone array 103. Similarly, the
computer 102 includes an input unit 1021 that inputs an audio signal from the microphone
array 104, an angle calculation unit 1022, and an output unit 1023. The angle calculation unit
1022 calculates the relative angle between the microphone array 104 and the sound source
based on the audio signal from the microphone array 104.
[0020]
The computers 101 and 102 are connected to a computer 105 for position calculation via a
network 106. The computer 105 includes a holding unit 1051 and a position calculation unit
1052. The holding unit 1051 holds information on the size of a predetermined space (for
example, a room) and the position and orientation of the microphone arrays 103 and 104 in the
room as room plan information. The position calculation unit 1052 obtains straight lines from
the microphone arrays 103 and 104 toward the sound source based on the relative angles
between the microphone arrays 103 and 104 and the sound source and the layout information
held by the holding unit 1051, and these straight lines The position where the point intersects is
determined as the speaker position.
[0021]
FIG. 2 is a diagram showing an example in which the speech information processing apparatus
100 according to the present invention is disposed in a room 200. As shown in FIG. This figure is
a view of the room 200 viewed from the top, and the microphone arrays 103 and 104 are
arranged at the corners of the room 200. Moreover, although illustration is abbreviate | omitted,
computer 101, 102, and 105 mentioned above are arrange | positioned in a suitable position
indoors or outdoors via a signal wire | line. Information regarding the size of the room 200 and
the positions and orientations of the microphone arrays 103 and 104 is stored in the above-
04-05-2019
6
described holding unit 1051 of the computer 105 as the sketch information of the room 200.
[0022]
FIG. 3 is a diagram showing an overview of a speaker position detection method by the speech
information processing apparatus 100 in the present invention. When the speaker 303 utters,
the angle calculators 1012 and 1022 of the computers 101 and 102 generate relative values
from the microphone arrays 103 and 104 and the sound source based on the respective audio
signals from the microphone arrays 103 and 104. Calculate the angles θ1 and θ2. Then, the
output units 1013 and 1023 transmit the values to the computer 105. The position calculation
unit 1052 obtains straight lines 301 and 302 representing the direction of the sound source
from the information on the angles θ1 and θ2 and the positions and orientations of the
microphone arrays 103 and 104, and further obtains the intersection of these straight lines.
Detect (sound source).
[0023]
FIG. 4 is a diagram showing a flow of the above-mentioned process of detecting the speaker
position. In step S401, when information on relative angles between microphone arrays 103 and
104 and the sound source is input from computers 101 and 102, position calculation unit 1052
of computer 105 uses the angle information and the sketch information in step S402. The
straight line of the sound source direction in the indoor coordinate system is determined, and the
intersection of these straight lines is calculated in step S403 to obtain the speaker position
(sound source) in the indoor coordinate system.
[0024]
FIG. 5 is a diagram for explaining the calculation for generating a straight line in the sound
source direction in the indoor coordinate system. In this example, a room coordinate system is
set with the lower left corner of the rectangular room 502 as the origin, the longitudinal
direction as the x axis, and the short direction as the y axis. In this indoor coordinate system, it is
assumed that the microphone array 103 is disposed at the center of (x1, y1). In addition, the
microphone array 103 is arranged such that the front face forms an angle α1 with the x axis.
These parameters are determined when the microphone array 103 is disposed in the room 502,
and are stored in advance in the holding unit 1051 as part of the layout information. Thereby,
04-05-2019
7
the size of the predetermined space and the position (x1, y1) and the direction (α1) of the
microphone array 103 in the space are determined. Here, assuming that the microphone array
103 detects the direction of the sound source 503 as θ1, an expression in a room coordinate
system of a straight line 504 connecting the position of the sound source 503 and the position of
the microphone array 103 is shown in Expression (1).
[0025]
The position calculation unit 1052 similarly obtains the expression of a straight line connecting
the microphone array 104 (not shown) disposed at different positions and the sound source 503,
and calculates the point of intersection of these to calculate the position of the sound source 503
which is the speaker position. It becomes possible to detect. This makes it possible to switch the
camera angle according to the speaker, and to supply information necessary to automate
operations such as lighting and screen control.
[0026]
Next, a second embodiment of the present invention will be described. FIG. 6 is a diagram for
explaining the second embodiment. In a room 610 in FIG. 6, microphone arrays 103 and 104
similar to those of the first embodiment, and computers 101, 102 and 105 not shown are
installed. In addition to these, swing cameras 603, 604, 605, and 606 are installed. Consider the
case where speakers 607 and 608 interact in this room. When the speaker position detection
process using the method described in the first embodiment is continuously applied, the speaker
position detected when the speaker change occurs between the speaker 607 and the speaker
608 is discontinuous. Therefore, this is used as detection of speaker change.
[0027]
When the cameras 603, 604, 605 and 606 for capturing this dialogue are switched and used,
assuming that the camera 604 is used for capturing the speaker 607, the remaining cameras 603
at the timing when the speaker change is detected, By switching to one of 605 and 606, it
becomes possible to shoot an image focusing on the speaker. In this case, it is preferable to use
the camera 605 at a position farther from the speaker 608 as the subject in consideration of the
degree of freedom of composition.
04-05-2019
8
[0028]
Thus, the computer 105 can detect that the speaker change has occurred when the speaker
position changes discontinuously. In addition, the computer 105 performs operations such as
switching the camera angle according to the speaker, controlling the lighting and the screen, etc.
by switching the screen, the illumination unit or the imaging unit (not shown) based on the
determined speaker position. Can be automated.
[0029]
On the other hand, considering the imaginary line 609 pointed out in the field such as video
grammar, camera switching should not go over this line, in this case the camera 603 should be
selected . In the present embodiment, although a plurality of camera selection possibilities have
been described, the present invention is not limited to this selection method.
[0030]
Although the preferred embodiments of the present invention have been described in detail, the
present invention is not limited to the specific embodiments, and various modifications may be
made within the scope of the subject matter of the present invention described in the claims.
Changes are possible. For example, although the example which calculates | requires a sound
source as a speaker position was demonstrated in the said Example, a sound source is not limited
to the speech by a speaker.
[0031]
It is a figure which shows the outline | summary of a structure of the speech information
processing apparatus 100 which concerns on the Example of this invention. It is a figure which
shows the example at the time of arrange | positioning the speech information processing
apparatus 100 which concerns on the Example of this invention in the chamber | room 200. FIG.
It is a figure which shows the outline | summary of the speaker position detection method by the
speech information processing apparatus which concerns on the Example of this invention. It is a
figure which shows the flow of the process of the above-mentioned speaker position detection. It
04-05-2019
9
is a figure explaining calculation which generates a straight line of a sound source direction in a
room coordinate system. It is a figure for demonstrating Example 2 of this invention.
Explanation of sign
[0032]
DESCRIPTION OF SYMBOLS 100 Speech processing information processing apparatus 101, 102,
105 Computer 1011, 1021 Input part 1012, 1022 Angle calculation part 1013, 1023 Output
part 103, 104 Microphone array 106 Network 603, 604, 605, 606 Swing camera
04-05-2019
10
Документ
Категория
Без категории
Просмотров
0
Размер файла
19 Кб
Теги
jp2007074317
1/--страниц
Пожаловаться на содержимое документа