close

Вход

Забыли?

вход по аккаунту

?

JP2013150260

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2013150260
Abstract: The present invention provides a voice analysis device or the like in which the analysis
result of voice acquired by a microphone is less dependent on the front and back of the device.
SOLUTION: A plate-shaped device body 30, and a plurality of microphones 11c and 11d which
are disposed on both sides of the plate-shaped device body 30, and obtain the voice of a speaker,
and disposed on one surface of the device body 30 A sound pressure comparison unit that
compares the sound pressure of the sound acquired by the third microphone 11c with the sound
pressure of the sound acquired by the fourth microphone 11d disposed on the other surface, and
the sound pressure by the sound pressure comparison unit And a voice signal selection unit that
selects information related to a voice signal of voice obtained by the microphone determined to
be large. [Selected figure] Figure 2
Speech analysis device, speech analysis system and program
[0001]
The present invention relates to a voice analysis device, a voice analysis system, and a program.
[0002]
In Patent Document 1, in a multi-channel acoustic signal collecting apparatus, sound collecting
means in which at least two or more microphones or sensors are planarly arranged and
integrated, and acoustic signals from at least two or more microphones or sensors are selected.
What is output is disclosed.
[0003]
03-05-2019
1
Japanese Patent Application Laid-Open No. 2002-165292
[0004]
An object of the present invention is to provide a voice analysis device or the like in which the
analysis result of the voice acquired by the voice acquisition means is less dependent on the front
and back of the device.
[0005]
The invention according to claim 1 comprises a plate-shaped device body, a plurality of voice
acquiring means for obtaining speech of a speaker, disposed on both sides of the plate-shaped
device body, and one side of the device body. A sound pressure comparison unit for comparing
the sound pressure of the sound acquired by the sound acquisition unit arranged with the sound
pressure of the sound acquired by the sound acquisition unit arranged on the other side; the
sound pressure comparison unit And an audio signal selection unit for selecting information
related to the audio signal of the audio acquired by the audio acquisition unit determined to have
a large sound pressure.
[0006]
The invention according to claim 2 further comprises: a cord member connected to the apparatus
body and used to carry the apparatus body from the neck of the wearer; and the voice acquiring
means provided on the cord member The voice analysis according to claim 1, characterized in
that the positional relationship in the horizontal direction of the voice acquisition means
provided in the member and the positional relationship between the front and back of the voice
acquisition means disposed on both sides of the device main body are associated. It is an
apparatus.
The invention according to claim 3 is characterized in that the string member is connected to the
device body with both ends of the string member separated by a predetermined distance in the
horizontal direction of the device body. Voice analysis device.
According to the fourth aspect of the present invention, whether the speaker is the wearer based
on the sound pressure difference of the voices acquired by the two voice acquisition means
having different distances from the wearer's mouth among the voice acquisition means. The
speech analysis apparatus according to any one of claims 1 to 3, further comprising a self / other
03-05-2019
2
identification unit that identifies the other person.
[0007]
The invention according to claim 5 comprises a plate-shaped device body, a plurality of voice
acquiring means disposed on both sides of the plate-shaped device body for acquiring voices of a
speaker, and disposed on one surface of the device body. A sound pressure comparing unit that
compares the sound pressure of the sound acquired by the sound acquiring unit with the sound
pressure of the sound acquired by the sound acquiring unit arranged on the other side; and the
sound pressure comparing unit An audio analysis unit including an audio signal selection unit
that selects information related to an audio signal of an audio acquired by the audio acquisition
unit determined to have a large sound pressure; a reception unit that receives information related
to the audio signal; A speech analysis system characterized by comprising:
[0008]
According to a sixth aspect of the present invention, there is provided a computer having a
function of acquiring speech of a speaker from a plurality of speech acquisition means disposed
on both sides of a plate-shaped apparatus main body, and one side of the apparatus main body.
The function of comparing the sound pressure of the sound acquired by the sound acquisition
unit with the sound pressure of the sound acquired by the sound acquisition unit disposed on the
other side, and the sound acquisition determined to have a large sound pressure And a function
of selecting information related to an audio signal of audio acquired by the means.
[0009]
According to the first aspect of the present invention, it is possible to provide a voice analysis
device in which the analysis result of the voice acquired by the voice acquisition means is less
influenced by the front and back of the device than when the present invention is not adopted.
According to the invention of claim 2, as compared with the case where the present invention is
not adopted, the front and back relationship between the device body and the strap becomes
easier to match.
According to the invention of claim 3, as compared with the case where the present invention is
not adopted, the front and back relationship between the device body and the strap becomes
easier to match.
03-05-2019
3
According to the invention of claim 4, compared with the case where the present invention is not
adopted, it is possible to identify whether the voice acquired by the voice acquisition means is the
voice of the wearer or the voice of another person.
According to the invention of claim 5, it is possible to construct a system capable of grasping the
communication relationship of the wearer based on the voices acquired by the voice acquiring
means of the plurality of wearers.
According to the invention of claim 6, as compared with the case where the present invention is
not adopted, it is possible to realize the function by which the analysis result of the voice
acquired by the voice acquiring means is less influenced by the front and back of the device.
[0010]
It is a figure showing the example of composition of the speech analysis system by this
embodiment. It is a figure which shows the structural example of a terminal device. It is a figure
explaining the face-to-face angle in this embodiment. It is a figure explaining the method to
obtain | require a facing angle using a 1st microphone and a 2nd microphone. (A)-(c) is a figure
explaining the method to obtain | require a time difference in this Embodiment. (A)-(b) is a figure
explaining the case where the normal mounting state of a terminal device and the front and back
were mounted | worn reversely. It is a figure explaining a speech analysis part. It is a flowchart
explaining operation of a terminal unit.
[0011]
Hereinafter, embodiments of the present invention will be described in detail with reference to
the accompanying drawings. <System Configuration Example> FIG. 1 is a view showing a
configuration example of a speech analysis system according to the present embodiment. As
shown in FIG. 1, the voice analysis system 1 of the present embodiment is configured to include a
terminal device 10 which is an example of a voice analysis device (voice analysis means) and a
host device 20. The terminal device 10 and the host device 20 are connected via a wireless
communication line. As a type of wireless communication line, a line according to an existing
system such as Wi-Fi (Wireless Fidelity), Bluetooth (registered trademark), ZigBee, UWB (Ultra
03-05-2019
4
Wideband) may be used. Further, in the illustrated example, only one terminal device 10 is
described, but as will be described in detail later, the terminal device 10 is worn and used by
each user, and is actually used The terminal devices 10 for the number of persons are prepared.
[0012]
The terminal device 10 includes a plurality of microphones (a first microphone 11a, a second
microphone 11b, a third microphone 11c, and a fourth microphone 11d) and an amplifier (a first
microphone 11a, a second microphone 11b, and a fourth microphone 11d) as an example of a
plurality of voice acquisition means for acquiring the voice of the speaker. The first amplifier
13a, the second amplifier 13b, the third amplifier 13c, and the fourth amplifier 13d are provided.
(Hereinafter, when the first microphone 11a, the second microphone 11b, and the third
microphone 11c are not distinguished from one another, they are described as the microphones
11a, 11b, 11c, and 11d, respectively). The terminal device 10 further includes a voice analysis
unit 15 that analyzes the acquired voice, a data transmission unit 16 that transmits the analysis
result to the host device 20, and further includes a power supply unit 17.
[0013]
In the present embodiment, the first microphone 11a and the second microphone 11b are spaced
apart by a predetermined distance in the horizontal direction. Here, the first microphone 11a and
the second microphone 11b are arranged in the horizontal direction at a position close to the
mouth of the wearer, and the distance therebetween is, for example, 10 cm to 20 cm. The third
microphone 11 c and the fourth microphone 11 d are respectively disposed on both sides of a
plate-shaped main body 30 described later. The third microphone 11c and the fourth
microphone 11d are disposed at positions far from the first microphone 11a and the second
microphone 11b from the mouth (the vocalization site) of the wearer. Here, the third microphone
11 c and the fourth microphone 11 d are disposed below the first microphone 11 a and the
second microphone 11 b at a distance of, for example, about 35 cm. That is, in the present
embodiment, the microphones disposed in the terminal device 10 can select both of two different
distances from the wearer's mouth and two separated in the horizontal direction. Here, as the
former, the pair of the first microphone 11a and the third microphone 11c (or the fourth
microphone 11d) and the pair of the second microphone 11b and the third microphone 11c (or
the fourth microphone 11d) can be selected. Further, as the latter, a set of the first microphone
11a and the second microphone 11b can be selected. As types of microphones used as the
microphones 11a, 11b, 11c, and 11d of the present embodiment, various existing microphones
such as a dynamic type and a capacitor type may be used. In particular, a nondirectional MEMS
03-05-2019
5
(Micro Electro Mechanical Systems) microphone is preferable.
[0014]
The first amplifier 13a, the second amplifier 13b, the third amplifier 13c, and the fourth
amplifier 13d are electric signals output by the first microphone 11a, the second microphone
11b, the third microphone 11c, and the fourth microphone 11d according to the acquired voice,
respectively. Amplify the signal. As the amplifiers used as the first amplifier 13a, the second
amplifier 13b, the third amplifier 13c, and the fourth amplifier 13d of the present embodiment,
an existing operational amplifier or the like may be used.
[0015]
The voice analysis unit 15 analyzes the electrical signals output from the first amplifier 13a, the
second amplifier 13b, the third amplifier 13c, and the fourth amplifier 13d. And although the
details will be described later, while judging the front and back of the device main body 30, it is
identified whether the speaker is the wearer or another person, and the angle at which the
wearer faces the speaker when the speaker is identified as the other person. Output the face-toface angle.
[0016]
The data transmission unit 16 transmits the acquired data including the analysis result by the
voice analysis unit 15 and the ID of the terminal to the host device 20 via the above-described
wireless communication line. As information to be transmitted to the host device 20, according to
the contents of processing performed in the host device 20, in addition to the above analysis
result, for example, acquisition time of voice by microphones 11a, 11b, 11c, 11d, sound of
acquired voice Information such as pressure may be included. The terminal device 10 may be
provided with a data storage unit for storing the analysis result by the voice analysis unit 15 and
batch transmission of storage data for a fixed period may be performed. In addition, you may
transmit by a wire line. In the present embodiment, the data transmission unit 16 functions as an
audio signal transmission unit that transmits information of an audio signal of audio.
[0017]
03-05-2019
6
The power supply unit 17 supplies power to the microphones 11a, 11b, 11c, and 11d, the first
amplifier 13a, the second amplifier 13b, the third amplifier 13c, the fourth amplifier 13d, the
voice analysis unit 15, and the data transmission unit 16. . As a power supply, for example, an
existing power supply such as a dry battery or a rechargeable battery is used. Further, the power
supply unit 17 includes known circuits such as a voltage conversion circuit and a charge control
circuit, as necessary.
[0018]
The host device 20 outputs a data receiving unit 21 that receives data transmitted from the
terminal device 10, a data storage unit 22 that stores the received data, a data analysis unit 23
that analyzes the stored data, and an analysis result. And an output unit 24. The host device 20 is
realized by, for example, an information processing device such as a personal computer. Further,
as described above, in the present embodiment, a plurality of terminal devices 10 are used, and
the host device 20 receives data from each of the plurality of terminal devices 10.
[0019]
The data reception unit 21 corresponds to the above-described wireless channel, receives data
from each terminal device 10, and sends the data to the data storage unit 22. In the present
embodiment, the data receiving unit 21 functions as a receiving unit that receives information on
the audio signal of the audio transmitted by the data transmitting unit 16. The data storage unit
22 stores the reception data acquired from the data reception unit 21 for each speaker. Here, the
identification of the speaker is performed by collating the terminal ID transmitted from the
terminal device 10 with the speaker name and the terminal ID registered in advance in the host
device 20. Also, the wearer state may be transmitted from the terminal device 10 instead of the
terminal ID.
[0020]
The data analysis unit 23 analyzes the data stored in the data storage unit 22. The specific
analysis content and analysis method can take various contents and methods according to the
usage purpose and usage mode of the system of the present embodiment. For example, analyzing
03-05-2019
7
the frequency of interaction between the wearers of the terminal device 10 and the tendency of
the other party of the interaction with each wearer, or analogizing the relationship of the
interlocutors from the information of the length and sound pressure of each utterance in the
dialogue To be done.
[0021]
The output unit 24 outputs an analysis result by the data analysis unit 23 or performs an output
based on the analysis result. The means for outputting this analysis result and the like can take
various means such as display display, print output by a printer, voice output, etc., depending on
the purpose of use and usage mode of the system, contents and format of the analysis result.
[0022]
<Example of Configuration of Terminal Device> FIG. 2 is a view showing an example of the
configuration of the terminal device 10. As shown in FIG. As described above, the terminal device
10 is worn and used by each user. As shown in FIG. 2, the terminal device 10 of the present
embodiment is an example of a string member having an annular shape by connecting both ends
to the device body 30 and the device body 30 so that the user can wear the device. It is set as the
composition provided with strap 40. In the present embodiment, the strap 40 is connected to the
device body 30 with both ends of the strap 40 separated by a predetermined distance in the
horizontal direction of the device body 30. Then, in the illustrated configuration, the user puts his
/ her body 30 on his / her neck by putting his / her neck through the strap 40. In the present
embodiment, the user who wears the terminal device 10 may be expressed as a wearer.
[0023]
The device main body 30 has a plate shape, and at least a first amplifier 13a, a second amplifier
13b, a third amplifier 13c, a fourth amplifier 13d, and an audio analysis unit in a thin rectangular
case 31 made of, for example, metal or resin. 15. A circuit for realizing the data transmission unit
16 and the power supply unit 17 and a power supply (battery) of the power supply unit 17 are
accommodated. Further, in the present embodiment, the third microphone 11 c and the fourth
microphone 11 d are provided on both sides of the case 31. Furthermore, the case 31 may be
provided with a pocket into which an ID card or the like displaying ID information such as the
name or affiliation of the wearer is inserted. Also, such ID information or the like may be
03-05-2019
8
described on the surface of the case 31 itself. Here, the device body 30 does not have to be a
rigid body or a rectangular shape. Therefore, it may be configured by a non-rigid body such as
cloth or a non-rectangular shape. From this, for example, the apparatus main body 30 may be a
number, an apron, or the like to which necessary members (microphones 11a, 11b, 11c, 11d,
etc.) are attached.
[0024]
The strap 40 is provided with a first microphone 11 a and a second microphone 11 b. As
materials of the strap 40, various existing materials such as leather, synthetic leather, cotton and
other natural fibers, synthetic fibers such as resin, metals, etc. may be used. Moreover, the
coating process using a silicone resin, a fluorine resin, etc. may be given.
[0025]
The strap 40 has a tubular structure, and the microphones 11 a and 11 b are housed inside the
strap 40. By providing the microphones 11a and 11b inside the strap 40, it is possible to prevent
the microphones 11a and 11b from being damaged or soiled and to prevent the communicator
from being aware of the presence of the microphones 11a and 11b.
[0026]
<Description of Method to Identify Whether a Speaker is a Wearer or Another Person> In the
above configuration, it is identified whether the speaker is a wearer or another person other than
the wear person (self-other identification). The method will be described. The system according
to the present embodiment uses, for example, information of voices acquired by the first
microphone 11 a and the third microphone 11 c among the microphones provided in the
terminal device 10, and the speech voice of the wearer of the terminal device 10 himself And the
voice of others. In other words, the present embodiment identifies oneself and the other with
respect to the speaker of the acquired speech. Further, in the present embodiment, among the
information of the acquired voice, not the language information obtained by using the
morphological analysis or the dictionary information, but the non-language such as sound
pressure (input volume to the first microphone 11a and the third microphone 11c) Identify the
speaker based on the information. In other words, the speaker of the voice is identified from the
speech situation specified by the non-language information, not the speech content specified by
03-05-2019
9
the language information.
[0027]
As described with reference to FIGS. 1 and 2, in the present embodiment, the third microphone
11 c of the terminal device 10 is disposed at a position far from the mouth (speaking part) of the
wearer, and the first microphone 11 a is the wearer. Placed at a position close to the mouth
(speaking part) of That is, when the wearer's mouth (speaking part) is used as a sound source,
the distance between the first microphone 11a and the sound source and the distance between
the third microphone 11c and the sound source are largely different. For example, the distance
between the first microphone 11a and the sound source can be set to about 1.5 to 4 times the
distance between the third microphone 11c and the sound source. Here, the sound pressure of
the acquired sound in the microphones 11a and 11c attenuates (distance attenuates) as the
distance between the microphones 11a and 11c and the sound source increases. Therefore,
regarding the speech voice of the wearer, the sound pressure of the acquired voice at the first
microphone 11a and the sound pressure of the acquired voice at the third microphone 11c are
largely different.
[0028]
On the other hand, considering the case where the mouth (speaking part) of a person (other)
other than the wearer is the sound source, the distance between the first microphone 11a and
the sound source is because the other person is separated from the wearer. The distance between
the third microphone 11c and the sound source does not change significantly. Although there
may be a difference between the two depending on the position of the other person with respect
to the wearer, the distance between the first microphone 11a and the sound source is the third as
in the case where the wearer's mouth (speaking part) is used as the sound source. It will not be
several times the distance between the microphone 11c and the sound source. Therefore,
regarding the speech sound of the other person, the sound pressure of the acquired sound in the
first microphone 11a and the sound pressure of the acquisition sound in the third microphone
11c do not differ greatly as in the case of the speech of the wearer.
[0029]
So, in this embodiment, the sound pressure ratio which is a ratio of the sound pressure of the
03-05-2019
10
acquisition sound in the 1st microphone 11a, and the sound pressure of the acquisition sound in
the 3rd microphone 11c is calculated. Then, the difference between the sound pressure ratios is
used to identify the voice of the wearer of the acquired voice and the voice of the other. More
specifically, in the present embodiment, a threshold is set to the ratio between the sound
pressure of the third microphone 11 c and the sound pressure of the first microphone 11 a.
Then, the sound whose sound pressure ratio is larger than the threshold is judged as the
speaker's own speech, and the sound whose sound pressure ratio is smaller than the threshold is
judged as the other's speech.
[0030]
In the example described above, the determination of self-other identification is performed using
the first microphone 11a and the third microphone 11c, but the present invention is not limited
to this, and the second microphone 11b and the third microphone 11c may be used. Of course,
the same is true. In the example described above, the determination of self-other identification is
performed based on the sound pressure of the acquired voice by the first microphone 11a and
the third microphone 11c, but it is also conceivable to add information of the phase difference of
the acquired voice to this. . That is, assuming that the wearer's mouth (speaking part) is a sound
source, as described above, the distance between the first microphone 11a and the sound source
and the distance between the third microphone 11c and the sound source are largely different.
Therefore, the phase difference between the voice acquired by the first microphone 11a and the
voice acquired by the third microphone 11c becomes large. On the other hand, when the mouth
(speaking part) of a person (other) other than the wearer is used as the sound source, as
described above, since the other person is separated from the wearer, the distance between the
first microphone 11a and the sound source The distance between the third microphone 11c and
the sound source does not change significantly. Therefore, the phase difference between the
voice acquired by the first microphone 11a and the voice acquired by the third microphone 11c
is reduced. Therefore, the accuracy of the determination of the self / other identification is
improved by taking into consideration the phase difference between the voices acquired by the
first microphone 11a and the third microphone 11c.
[0031]
<Description of Face-to-face Angle> FIG. 3 is a view for explaining the face-to-face angle in the
present embodiment. In the present embodiment, the facing angle is an angle at which the
wearer of the terminal device 10 faces the speaker. As an example of the facing angle defined in
the present embodiment, FIG. 3 shows the facing angle in the horizontal direction. That is, FIG. 3
03-05-2019
11
is a view of the wearer and the speaker from above. In the present embodiment, in the present
embodiment, as the facing angle α, a line segment connecting the first microphone 11a and the
second microphone 11b, which are two voice acquisition means, and the midpoint and the
speech of this line segment Adopt the angle with the line connecting the This makes
mathematical handling of the facing angle easier. When this definition is adopted, for example,
when the wearer and the speaker face each other and face each other, the facing angle α
between the two is 90 °.
[0032]
<Description of Method for Determining Face-to-Face Angle> FIG. 4 is a diagram for describing a
method for determining the face-to-face angle α using the first microphone 11 a and the second
microphone 11 b. Here, it is assumed that the point S is the position of the speaker, more
precisely, the position of the utterance point which is the sound source of the speaker's voice.
And the sound emitted from the utterance point spreads concentrically from the point S.
However, since the voice spreads at the speed of sound which is a finite speed, the time when the
voice reaches the first microphone 11a and the time when it reaches the second microphone 11b
are different, and a time difference Δt corresponding to the path difference δ of the voice
occurs. When the distance between the first microphone 11a and the second microphone 11b is
D and the distance between the middle point C and the point S is L, the following equation (1) is
established.
[0033]
δ = (L <2> + LD cos α + D <2> / 4) <0.5>-(L <2> -LD cos α + D <2> / 4) <0.5> (1)
[0034]
Since the influence of L is small when L> D, the formula (1) can be approximated to the following
formula (2).
[0035]
δ ≒ D cos α (2)
[0036]
Further, using the sound velocity c and the time difference Δt, the following equation (3) is
established.
03-05-2019
12
[0037]
δ = cΔt (3)
[0038]
That is, the facing angle α can be obtained by using the equations (2) and (3).
In other words, based on the time difference Δt that the voice of the speaker reaches the first
and second microphones 11a and 11b, which are two voice acquisition means, and the distance
D between the first and second microphones 11a and 11b, It is possible to calculate a face-toface angle α, which is an angle at which a person faces.
[0039]
Further, the time difference Δt at which the voice of the speaker reaches the first microphone
11a and the second microphone 11b can be obtained as follows.
FIGS. 5 (a) to 5 (c) are diagrams for explaining the method of obtaining the time difference Δt in
the present embodiment.
Among these, FIG. 5A is a diagram in which the voice of the speaker reaching the first
microphone 11a and the second microphone 11b is sampled at a sampling frequency of 1 MHz,
and 5000 continuous points are extracted from the data.
Here, the horizontal axis represents the data number assigned to each of the 5000 points of data,
and the vertical axis represents the amplitude of the speaker's voice.
The solid line is a waveform signal of the voice of the speaker who has reached the first
microphone 11a, and the dotted line is a waveform signal of the voice of the speaker who has
reached the second microphone 11b.
03-05-2019
13
[0040]
In the present embodiment, the cross-correlation function of these two waveform signals is
determined. That is, one waveform signal is fixed, and the other product is shifted and the
product sum is calculated. FIGS. 5 (b) to 5 (c) are diagrams showing cross-correlation functions
for these two waveform signals. Among them, FIG. 5 (b) is a cross correlation function of the
entire sampled 5000 point data, and FIG. 5 (c) is an enlarged view of the vicinity of the peak of
the cross correlation function shown in FIG. . In FIGS. 5B to 5C, the waveform signal of the voice
of the speaker arriving at the first microphone 11a is fixed, and the waveform signal of the voice
of the speaker arriving at the second microphone 11b is shifted for cross-correlation It shows the
case of finding the function. As shown in FIG. 5C, the peak position is shifted by −227 points
with reference to the data number 0. This means that the voice of the speaker reaching the
second microphone 11b with respect to the first microphone 11a is delayed with this amount
and arrives. In the present embodiment, since the sampling frequency is 1 MHz as described
above, the time between sampled data is 1 × 10 <−6> (s). Therefore, the delay time is 227 × 1
× 10 <−6> (s) = 227 (μs). That is, in this case, the time difference Δt is 227 (μs).
[0041]
Further, in the present embodiment, the amplitude is divided into predetermined frequency
bands, and the cross correlation function is obtained by weighting the frequency band having the
largest amplitude with a large weight. The time difference Δt thus determined is more accurate.
Further, in order to obtain this time difference Δt more accurately, it is preferable that the
distance between the first microphone 11a and the second microphone 11b be in the range of 1
cm to 100 cm. If the distance between the first microphone 11a and the second microphone 11b
is less than 1 cm, the time difference Δt becomes too small, and the error of the face-to-face
angle derived thereafter tends to be large. On the other hand, if the distance is larger than 100
cm, the influence of the reflected sound is likely to occur when deriving the time difference Δt.
In addition, since it is necessary to carry out the calculation for a longer time width when
obtaining the cross correlation function, the load required for the calculation becomes large.
[0042]
<Description of Wearing State of Terminal Device> By the way, when the wearer wears the
03-05-2019
14
terminal device 10, the wearing may be performed with the front and back reversed as compared
with the case shown in FIG. FIGS. 6 (a) and 6 (b) are diagrams illustrating the normal mounting
state of the terminal device 10 and the case where the front and back are mounted reversely.
Here, FIG. 6A shows a normal mounting state of the terminal device 10, which is the same as the
mounting state shown in FIG. At this time, the positional relationship among the microphones
11a, 11b, 11c, and 11d of the terminal device 10 is such that the first microphone 11a is located
on the left side and the second microphone 11b is located on the right side as viewed from the
wearer. Then, the third microphone 11 c faces outward with respect to the wearer, and the
fourth microphone 11 d faces inward with the wearer, that is, the wearer. On the other hand, FIG.
6 (b) shows the wearing state when the front and back are worn in reverse. In this case, the
positional relationship among the microphones 11a, 11b, 11c, and 11d of the terminal device 10
is that the first microphone 11a is located on the right side and the second microphone 11b is
located on the left side, as viewed from the wearer. Then, the third microphone 11 c faces inward
to the wearer, and the fourth microphone 11 d faces outward to the wearer.
[0043]
When the terminal device 10 is mounted in the state of FIG. 6B, the positional relationship
between the microphones 11a, 11b, 11c, and 11d is different from that in the normal state. Will
lead to In the present embodiment, as described above, both ends of the strap 40 are connected
to the device body 30 at predetermined distances in the horizontal direction of the device body
30 as described above. Therefore, the positional relationship between the microphones 11a and
11b provided in the strap 40 in the horizontal direction is related to the positional relationship
between the front and back of the microphones 11c and 11d disposed on both sides of the
device body 30. That is, it is difficult for only the device body 30 to rotate, so when the third
microphone 11c faces the wearer, the first microphone 11a is located on the left and the right
when viewed from the wearer. The second microphone 11b is located (in the case of FIG. 6A).
When the fourth microphone 11 d faces the wearer, the second microphone 11 b is located on
the left side and the first microphone 11 a is located on the right side as viewed from the wearer
(in the case of FIG. 6B) ). When the wearer wears the terminal device 10, one of the two is used.
[0044]
Therefore, in the present embodiment, the positional relationship among the microphones 11a,
11b, 11c, and 11d is grasped, and it is determined which of the two mounting states described
above.
[0045]
03-05-2019
15
<Description of Speech Analysis Unit> FIG. 7 is a diagram for explaining the speech analysis unit
15.
The voice analysis unit shown in FIG. 7 includes the sound pressure of the voice acquired by the
third microphone 11 c disposed on one side of the device main body 30 and the sound pressure
of the voice acquired by the fourth microphone 11 d disposed on the other side. A sound
pressure comparison unit 151 that compares sound pressure with the sound pressure, a sound
signal selection unit 152 that selects information related to a sound signal of a sound acquired
by a microphone whose sound pressure is determined to be large by the sound pressure
comparison unit 151, a microphone 11a 11b, 11c, and 11d, and the positional relationship
between the microphones 11a, 11b, 11c, and 11d determined by the positional relationship
determination unit 153, and the comparison results by the sound pressure comparison unit 151.
Among them, whether the speaker is the wearer or the wearer is based on the sound pressure
difference of the sound acquired by two microphones whose distances from the wearer's mouth
are different from each other. Comprises a self-other identification unit 154 to identify whether
the others are persons other than persons (own and other identification), and the facing angle
output unit 155 for outputting the facing angle alpha, the.
[0046]
FIG. 8 is a flowchart for explaining the operation of the terminal device 10. Hereinafter, the
operation of the terminal device 10 will be described using FIG. 2, FIG. 7, and FIG. First, the
microphones 11a, 11b, 11c, and 11d acquire the voice of the speaker (step 101). Then, the first
amplifier 13a, the second amplifier 13b, the third amplifier 13c and the fourth amplifier 13d
respectively amplify the audio signals from the microphones 11a, 11b, 11c and 11d (step 102).
[0047]
Next, the amplified audio signal is sent to the audio analysis unit 15, and the sound pressure
comparison unit 151 compares the sound pressures of the sounds acquired by the microphones
11c and 11d (step 103). Then, in the sound signal selection unit 152, among the comparison
results by the sound pressure comparison unit 151, the sound pressure of the sound acquired by
the third microphone 11c and the sound pressure of the sound acquired by the fourth
microphone 11d are compared. It is determined that the larger the sound pressure is facing
03-05-2019
16
outward to the wearer. Then, information on the audio signal of the sound acquired by the
microphone determined to have a large sound pressure is selected (step 104).
[0048]
That is, in the positional relationship between the third microphone 11c and the fourth
microphone 11d, the user who is facing outward is better at acquiring voice than the person
facing inward, so that the sound pressure is larger. Become. Therefore, when the sound pressure
of the sound acquired by the third microphone 11c is larger than the sound pressure of the
sound acquired by the fourth microphone 11d, it can be determined that the third microphone
11c faces outward. In this case, it can be determined that the mounting state of the terminal
device 10 is the case shown in FIG. Conversely, when the sound pressure of the sound acquired
by the fourth microphone 11d is larger than the sound pressure of the sound acquired by the
fourth microphone 11d, it can be determined that the third microphone 11c is facing outward. In
this case, it can be determined that the mounting state of the terminal device 10 is the case
shown in FIG.
[0049]
Next, based on the determination result of the sound pressure comparison unit 151, the
positional relationship determination unit 153 determines the positional relationship between
the microphones 11a and 11b (step 105). That is, the positional relationship between the
microphones 11a and 11b is either the case described in FIG. 6 (a) or the case described in FIG. 6
(b). As described above, since it is known from the comparison result of the sound pressure
comparison unit 151, which one of them is known, the positional relationship between the
microphones 11a and 11b can be understood from this. That is, the positional relationship
determining unit 153 determines the positional relationship between the microphones 11 a and
11 b provided in the strap 40 from the determination result of the sound pressure comparing
unit 151 in the horizontal direction.
[0050]
Then, the self / other identification unit 154 identifies whether the speaker is a wearer or
another person other than the wearer (self / other identification) (step 106). At this time, among
the microphones 11 c and 11 d disposed in the device main body 30, a microphone facing
03-05-2019
17
outward with respect to the wearer selected by the audio signal selection unit 152 is used. If a
microphone facing inward is used, the voice acquisition state may be deteriorated, so that it may
not be possible to obtain an accurate self-other identification result. Therefore, when the
mounting state of the terminal device 10 is shown in FIG. 6 (a), the sound pressure of the sound
acquired by the third microphone 11c is used, and when it is shown in FIG. 6 (b), the fourth
microphone 11d is used. Use the sound pressure of the acquired voice.
[0051]
When the speaker is identified as the wearer in the self / other identification unit 154 (that is, it
is identified as not being the other person) (No in step 107), the process returns to step 101. On
the other hand, when the speaker is identified to be the other person (Yes in step 107), the
meeting angle output unit 155 first uses the first microphone 11a and the second microphone
11b according to the method described in FIG. The time difference .DELTA.t at which the voices
of the voices arrive is determined (step 108). Furthermore, in the facing angle output unit 155,
the facing angle at which the wearer and the speaker face each other based on the time
difference Δt and the distance D separating the first microphone 11a and the second
microphone 11b by the method described in FIG. The angle α is determined (step 109). At this
time, the positional relationship between the first microphone 11a and the second microphone
11b determined by the positional relationship determination unit 153 is used. That is, as shown
in FIG. 6A, the first microphone 11a is on the left side and the second microphone 11b is on the
right side as viewed from the wearer, or conversely as viewed from the wearer as shown in FIG.
The face-to-face angle α is calculated in consideration of whether the first microphone 11a is on
the right side and the second microphone 11b is on the left side.
[0052]
Then, the information on the audio signal of the voice including the self / other discrimination
result and the information on the face-to-face angle α is output to the host device 20 by the data
transmission unit 16 (step 110). At this time, the data transmission unit 16 selects and transmits
information related to the audio signal of the audio acquired by the microphone disposed on the
surface facing outward to the wearer. Further, with regard to the microphones 11a and 11b, the
information on the audio signal of the audio is transmitted in correspondence with the positional
relationship in the horizontal direction determined by the positional relationship determination
unit 153.
03-05-2019
18
[0053]
The voice analysis system 1 as described above can use the self / other discrimination result and
the information on the face-to-face angle as information for determining the relationship between
the wearer and the speaker. Here, as a matter to be determined as the relation between the
wearer and the speaker, for example, there is a communication relation between the wearer and
the speaker. For example, if it is known that the wearer and the speaker are located at a short
distance and that they face each other according to the information on the facing angle, there is a
high possibility that the wearer and the speaker are having a conversation. If the wearer and the
speaker turn in the opposite direction, there is a high possibility that they are not talking. In
practice, the determination is made using other information such as the timing at which the
voices of the speaker and the wearer are acquired, and the gaps. It is also possible to use, for
example, the vertical facing angle as the relationship between the wearer and the speaker to
determine that one is looking down from the other. Further, based on information from a
plurality of terminal devices 10, processing such as positional relationship between a plurality of
people who are in conversation may be performed.
[0054]
Further, in the above-described example, the determination of the self-other identification and
the output of the face-to-face angle are performed by the terminal device 10. However, the
present invention is not limited to this. As the voice analysis system 1 in this embodiment, for
example, the data analysis unit 23 of the host device 20 performs the functions of the self-other
identification unit 154 and the facing angle output unit 155 performed by the voice analysis unit
15 as compared with those of FIG. Do with In this case, the data analysis unit 23 functions as a
self / other identification unit that identifies whether the speaker is a wearer or another person.
[0055]
<Description of Program> The processing performed by the terminal device 10 according to the
present embodiment described with reference to FIG. 8 is realized by cooperation of software and
hardware resources. That is, a CPU (not shown) in the control computer provided in the terminal
device 10 executes a program for realizing each function of the terminal device 10 to realize
each of these functions.
03-05-2019
19
[0056]
Therefore, the processing performed by the terminal device 10 described with reference to FIG. 8
includes the function of acquiring the voice of the speaker from the plurality of microphones 11
c and 11 d disposed on both sides of the plate-shaped device body 30 in the computer. A
function of comparing the sound pressure of the sound acquired by the third microphone 11c
disposed on one side of the 30 with the sound pressure of the sound acquired by the fourth
microphone 11d disposed on the other side; The function of selecting information related to the
audio signal of the voice acquired by the microphone determined to be large can be regarded as
a program for realizing.
[0057]
DESCRIPTION OF SYMBOLS 1 ... Voice analysis system, 10 ... Terminal device, 11a ... 1st
microphone, 11b ... 2nd microphone, 11c ... 3rd microphone, 11d ... 4th microphone, 15 ... Voice
analysis part, 16 ... Data transmission part, 20 ... Host Device 21 Data reception unit 30 Device
body 40 Cordage 151 Sound pressure comparison unit 152 Audio signal selection unit 154 Selfother identification unit
03-05-2019
20
Документ
Категория
Без категории
Просмотров
0
Размер файла
35 Кб
Теги
jp2013150260
1/--страниц
Пожаловаться на содержимое документа