close

Вход

Забыли?

вход по аккаунту

?

JPWO2013069229

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPWO2013069229
The voice dividing means 82 divides the input voice of the volume adjusted by the input volume
adjusting means 81 into voice for speech recognition and monitoring voice. The monitor volume
adjustment means 83 adjusts the volume of the monitor sound. The output sound volume
adjustment means 84 outputs a synthetic voice which is a voice synthesized from information
created as a result of speech recognition of a speech recognition voice and a monitor voice
synthesized with the monitor volume control means 83. Adjust the volume of the sound and
output it to the output device. The control means 85 instructs the monitor sound volume
adjusting means 83 to adjust the sound volume of the monitor sound so that the amplification
factor of the sound volume of the output sound relative to the sound volume of the input sound
does not exceed 1.
Voice input / output device, method for preventing howling, and program for preventing howling
[0001]
The present invention relates to a voice input / output device for preventing howling when
outputting input voice and a result of voice recognition of the voice, a method for preventing
howling and a program for preventing howling.
[0002]
BACKGROUND There is known an audio input / output device that includes an audio input device
such as a microphone and an audio output device such as a headphone as a headset microphone.
09-05-2019
1
In addition, a voice data input device is known which recognizes voice input from a voice input
device and converts it into text, converts the text of the recognition result into voice and outputs
it from the voice output device. The user translates the text of the recognition result into speech
(hereinafter referred to as synthetic speech). By confirming), it can be determined whether or not
the voice uttered by the user is properly recognized.
[0003]
That is, the input voice may be confirmed using the data input device described above
(hereinafter referred to as a monitor. When this is done, the data input device outputs not only
the synthesized voice but also the input voice to the voice output device.
[0004]
FIG. 10 is an explanatory view showing an example of the data input device. In the example
shown in FIG. 10, when the voice uttered by the user is input to the microphone 71, the voice is
output from the speaker 72. At this time, the voice uttered by the user is simultaneously input to
the voice recognition / synthesis device 73, and the synthesized voice created by performing the
voice recognition and the voice synthesis process is similarly output from the speaker 72.
[0005]
One reason for monitoring the input voice from the voice input device with the voice output
device is to confirm that voice can be input from the voice input device. Another reason is to
prevent the decrease in the speech recognition rate due to the so-called Lombard effect when
emitting speech in an environment where the surrounding sound is loud. Also, when headphones
are used as the audio output device, the ears may be blocked and the surrounding sound may not
be heard. Even in such a case, it is possible to hear surrounding sounds by outputting the input
sound from the sound input device to the sound output device (headphones).
[0006]
09-05-2019
2
Generally, there is a gap between the timing at which the voice input to the voice input device is
output and the timing at which the synthesized voice is output. This is because it takes a certain
processing time for speech recognition when creating synthetic speech. Therefore, the user
listens to the synthesized voice after a predetermined time has elapsed since the voice was
emitted.
[0007]
In a voice input / output device including a voice input device and a voice output device, it is
necessary to adjust the balance between the voice input level and the output level in order to
prevent howling. Therefore, various methods of adjusting these levels are known.
[0008]
Patent Document 1 describes a karaoke apparatus having a function of adjusting a microphone
used to input a singing voice. In the karaoke apparatus described in Patent Document 1, when
adjusting the microphone volume and effects, the voice of the singer is converted by Pulse Code
Modulation (PCM), and the converted data is recorded as voice. Then, by repeatedly playing the
voice recorded in this way, the singer adjusts the microphone volume and re-record the voice,
which eliminates the need for the user to speak again and again.
[0009]
Patent Document 2 describes a karaoke apparatus that automatically adjusts each sound output
from a plurality of speakers to make it difficult for howling to occur. According to the
relationship between a predetermined speaker position and a designated microphone position,
the karaoke apparatus described in Patent Document 2 lowers the microphone input audio signal
level or lowers the mixing level at the time of outputting from each speaker Prevent the
occurrence of howling.
[0010]
Patent No. 4360212 Patent No. 2958930
09-05-2019
3
[0011]
In the data input device described above, the input sound may be output from the sound output
device in order to monitor the input sound.
However, as in the case of the karaoke apparatus, howling may occur when sound leaks from the
voice output device to the voice input device. Specifically, when the sound leaks from the voice
output device to the voice input device and the leaked sound is further amplified and output from
the voice output device, howling may occur.
[0012]
The easiest way to prevent howling is to lower the volume of the audio input device and the
audio output device. However, if the volume of the voice input device is lowered, the accuracy of
voice recognition may be reduced. On the other hand, if the volume of the voice output device is
lowered, the synthesized voice may be difficult to hear.
[0013]
In the case of the karaoke apparatus described in Patent Document 1, it is necessary for the user
to detect that howling has occurred, and for the user to adjust the volume each time. That is,
when using the karaoke apparatus described in Patent Document 1, there is a problem that
howling can not be easily prevented since the user must adjust the volume each time so that
howling does not occur.
[0014]
Further, it is possible to prevent the howling by lowering the volume level as in the karaoke
apparatus described in Patent Document 2. However, as described above, if the input level is
lowered, the accuracy of speech recognition may be lowered, and if the output level is lowered, it
is difficult for the synthesized speech to be output. There's a problem.
09-05-2019
4
[0015]
Therefore, according to the present invention, when monitoring the result of speech recognition
of the input speech together with the input speech, it is difficult for the synthetic speech
outputted as a result of speech recognition of the input speech to be suppressed, while
suppressing the decrease in speech recognition accuracy of the input speech It is an object of the
present invention to provide a voice input / output device, a howling prevention method, and a
howling prevention program that can easily prevent howling while suppressing the above.
[0016]
The voice input / output device according to the present invention is a voice used for voice
recognition of input volume adjustment means for adjusting the volume of input voice input to
the input device, and input voice of the volume adjusted by the input volume adjustment means.
A voice division means is divided into a voice for voice recognition and a monitor voice that is a
voice used for monitoring an input voice, a monitor volume adjustment means for adjusting the
volume of the monitor voice, and voice recognition of voice for voice recognition Output volume
control to adjust the volume of output voice which is voice synthesized from synthetic voice
which is voice synthesized from information created as a result and monitor voice adjusted by
the monitor volume control means and to output it to the output device Control means for
instructing the monitor sound volume adjustment means to adjust the sound volume of the
monitor sound so that the amplification factor of the sound volume of the output sound relative
to the sound volume of the input sound does not exceed 1 Characterized by comprising a.
[0017]
The howling prevention method according to the present invention adjusts the volume of the
input voice input to the input device, and adjusts the adjusted input voice to a voice for voice
recognition, which is a voice used for voice recognition, and a monitor for the input voice. It is
divided into monitor voice which is voice to be used, the volume of monitor voice is adjusted, and
synthetic voice which is voice synthesized from information created as a result of voice
recognition voice and voice volume adjusted Adjust the volume of the output voice which is the
voice synthesized from the monitor voice and output it to the output device, and adjust the
volume of the monitor voice so that the amplification factor of the volume of the output voice to
the volume of the input voice does not exceed 1. It is characterized by
[0018]
The method for preventing howling according to the present invention is a method of using a
computer to perform input volume adjustment processing for adjusting the volume of input voice
09-05-2019
5
input to the input device, and voice used for voice recognition of the input voice of the volume
adjusted in the input volume adjustment processing. Voice division processing to be divided into
voice for voice recognition and monitor voice that is voice used for monitor of input voice,
monitor volume adjustment processing for adjusting volume of monitor voice, voice recognition
for voice recognition voice Output volume adjustment to adjust the volume of the output voice,
which is the voice synthesized from the synthetic voice that is a voice synthesized from the
information created as a result, and the monitor voice that has been volume-adjusted in the
monitor volume adjustment process Execute processing and control processing to adjust the
volume of the monitor sound so that the amplification factor of the volume of the output sound
relative to the volume of the input sound does not exceed 1. It is characterized in.
[0019]
According to the present invention, when monitoring the result of speech recognition of the input
speech together with the input speech, it is difficult to hear the synthesized speech outputted as a
result of speech recognition of the input speech while suppressing a decrease in speech
recognition accuracy of the input speech. Howling can be easily prevented while suppressing the
problem.
[0020]
It is a block diagram showing an example of composition of a 1st embodiment of an audio input
and output device by the present invention.
It is explanatory drawing which shows the relationship of the amplification factor of sound
volume.
It is a flowchart which shows the operation example of the voice input / output device of a 1st
embodiment.
It is a block diagram showing an example of composition of a 2nd embodiment of an audio input
and output device by the present invention.
It is a block diagram showing an example of composition of a 3rd embodiment of an audio input
and output device by the present invention.
09-05-2019
6
It is a block diagram showing an example of composition of a 4th embodiment of an audio input
and output device by the present invention. It is an explanatory view showing an example of a
voice input / output device. It is an explanatory view showing an example of a voice recognition
system including a voice input and output device of an example. It is a block diagram which
shows the example of the minimum structure of the audio | voice input / output device by this
invention. It is an explanatory view showing an example of a data entry device.
[0021]
Hereinafter, embodiments of the present invention will be described with reference to the
drawings.
[0022]
Embodiment 1
FIG. 1 is a block diagram showing a configuration example of a first embodiment of a voice input
/ output device according to the present invention. The audio input / output device 10 of the
present embodiment includes an input volume adjustment unit 11, a monitor volume adjustment
unit 12, an output volume adjustment unit 13, a control unit 14, an input sound division unit 15,
and an input unit 16. And an output unit 17.
[0023]
Further, the voice input / output device 10 communicates with the voice recognition unit 18 and
the voice synthesis unit 19. Communication between the speech input / output device 10 and the
speech recognition unit 18 and the speech synthesis unit 19 may be wireless communication or
wired communication. Further, the voice input / output device 10 may include the voice
recognition unit 18 and the voice synthesis unit 19. In the present embodiment, the speech
recognition unit 18 and the speech synthesis unit 19 are provided in a device different from the
speech input / output device 10.
[0024]
09-05-2019
7
The input unit 16 is an input device for inputting a user's voice and surrounding sounds. The
input unit 16 is realized by, for example, a microphone. The input unit 16 inputs the input voice
to the input volume adjustment unit 11. The input unit 16 may directly input an analog signal
representing the input voice to the input volume adjustment unit 11. In addition, the input unit
16 may A / D (Analog / Digital) convert voice represented by an analog signal, and may input the
converted digital signal to the input volume adjuster 11.
[0025]
The input volume adjustment unit 11 adjusts the volume of the sound input to the input unit 16.
The input sound volume adjustment unit 11 includes a sound volume designation unit (not
shown) such as an operation panel used to designate a sound volume, and adjusts the input
sound volume according to the user's operation on the sound volume designation unit.
[0026]
For example, when the input voice is converted into a digital signal, the input volume adjuster 11
may adjust the volume by increasing or decreasing the value indicated by the digital signal.
When the sound input from the input unit 16 is an analog signal, the input sound volume
adjusting unit 11 may adjust the sound volume when A / D-converts the input sound. In addition,
since the method to adjust a sound volume is widely known, detailed description is abbreviate |
omitted. The input sound volume adjustment unit 11 inputs the adjusted input sound of the
sound volume to the input sound division unit 15.
[0027]
The input voice dividing unit 15 uses the input voice of the volume adjusted by the input volume
adjusting unit 11 as a voice used by the voice recognition unit 18 for voice recognition
processing (hereinafter referred to as voice for voice recognition) and a monitor of the input
voice. Divided into voices (hereinafter referred to as monitor voices). Specifically, the input voice
dividing unit 15 duplicates digital data representing the input voice inputted from the input
volume adjusting unit 11, and inputs the duplicated digital data to the voice recognition unit 18
and the volume adjusting unit 12 for monitor. Do.
09-05-2019
8
[0028]
Note that the input voice dividing unit 15 may receive an instruction from the user indicating the
presence or absence of the monitor function. For example, when an instruction indicating
“monitor function present” is received from the user, the input sound dividing unit 15 may
input the input sound to the monitor volume adjustment unit 12. On the other hand, when an
instruction indicating “no monitor function” is received from the user, the input sound dividing
unit 15 may not input the input sound to the monitor volume adjustment unit 12.
[0029]
Further, in the present embodiment, the input sound volume adjustment unit 11 inputs the input
sound after the sound volume adjustment to the input sound division unit 15, and the input
sound division unit 15 inputs the sound to the sound recognition unit 18 and the volume control
unit 12 for monitor. The case of inputting will be described. The input sound volume adjusting
unit 11 may have the function of the input sound dividing unit 15. That is, the input sound
volume adjusting unit 11 may input the input sound to the voice recognition unit 18 and the
monitor sound volume adjusting unit 12, respectively.
[0030]
The monitor volume adjustment unit 12 adjusts the volume of the monitor sound input from the
input sound division unit 15 in the same manner as the input sound volume adjustment unit 11.
That is, the monitor volume adjuster 12 may adjust the volume of the monitor sound in
accordance with an instruction from the user. Further, the monitor sound volume adjustment unit
12 adjusts the sound volume of the monitor sound in accordance with an instruction of the
control unit 14 described later. When both the volume adjustment instruction from the user and
the volume adjustment instruction from the control unit 14 are received, the monitor volume
adjustment unit 12 gives priority to the instruction from the control unit 14. The monitor volume
adjuster 12 inputs the adjusted monitor sound to the output volume adjuster 13.
[0031]
09-05-2019
9
The voice recognition unit 18 performs voice recognition processing based on the voice input
from the input voice division unit 15. Then, the speech recognition unit 18 inputs the speech
recognition result to the speech synthesis unit 19. The speech recognition unit 18 performs
speech recognition processing using a general method. For example, the speech recognition unit
18 may convert the speech recognition result into text and input the created text to the speech
synthesis unit 19. Here, the detailed description of the speech recognition process is omitted.
[0032]
The speech synthesis unit 19 generates synthesized speech from the speech recognition result
input from the speech recognition unit 18. Then, the voice synthesis unit 19 inputs the generated
synthesized voice to the output volume adjustment unit 13. The speech synthesis unit 19
performs speech synthesis processing using a general method. Here, the detailed description of
the speech synthesis process is omitted.
[0033]
Similar to the input volume adjustment unit 11, the output volume adjustment unit 13 combines
the synthesized voice input from the speech synthesis unit 19 and the monitor voice input from
the monitor volume adjustment unit 12 (hereinafter referred to as output voice). . Adjust the
volume of). That is, the output sound volume adjustment unit 13 includes a sound volume
designation unit (not shown) such as an operation panel used to designate the sound volume, and
adjusts the output sound volume according to the user's operation on the sound volume
designation unit.
[0034]
The output sound volume adjustment unit 13 inputs the output sound after the sound volume
adjustment to the output unit 17. The output sound volume adjusting unit 13 may D / A convert
the output sound and input the converted analog signal to the output unit 17. In addition, the
output sound volume adjusting unit 13 may input the digital signal indicating the output sound
after the sound volume adjustment to the output unit 17 as it is. However, in this case, the output
unit 17 includes a D / A converter.
09-05-2019
10
[0035]
The output unit 17 outputs the output sound input from the output volume adjustment unit 13.
The output unit 17 is realized by, for example, a speaker.
[0036]
The control unit 14 instructs the monitor volume adjustment unit 12 to adjust the volume of the
monitor sound. Specifically, the control unit 14 adjusts the volume of the monitor sound so that
the amplification factor of the volume of the output sound output by the output unit 17 does not
exceed 1 with respect to the volume of the input sound input to the input unit 16 To the monitor
volume adjustment unit 12.
[0037]
Howling occurs when the output voice is amplified. That is, howling can be prevented unless the
amplification factor of the output sound volume with respect to the input sound volume exceeds
one. Therefore, howling can be prevented by controlling so that the amplification factor of the
volume does not exceed 1.
[0038]
Specifically, the control unit 14 uses the input sound volume adjustment unit 11, the monitor
sound volume adjustment unit 12, and the output sound volume adjustment unit 13 to indicate
information (hereinafter referred to as sound volume) indicating the rate (amplification factor) of
increasing or decreasing the sound volume in each adjustment unit. It may be described as
information. Receive). Then, the control unit 14 adjusts the amplification factor of the monitor
volume adjustment unit 12 so that the amplification factor of the output sound volume with
respect to the input sound volume does not exceed 1 based on the received amplification factor
of each adjustment unit. Do.
[0039]
09-05-2019
11
FIG. 2 is an explanatory view showing a relationship of amplification factor of volume. Here, the
amplification factor adjusted by the input volume adjustment section 11 is C 1, the amplification
factor adjusted by the monitor volume adjustment section 12 is C 2, and the amplification factor
adjusted by the output volume adjustment section 13 is C 3. . Further, the volume of the sound
input to the input volume adjustment unit 11 is i 0, and the volume of the sound output from the
input volume adjustment unit 11 and input to the monitor volume adjustment unit 12 is i 1, and
the monitor volume adjustment unit 12 , And the volume of the sound input to the output volume
adjustment unit 13 is i 2, and the volume output from the output volume adjustment unit 13 is i
3.
[0040]
Further, for the voice output from the output unit 17, the amplification factor of the voice input
to the input unit 16 is C 4. The amplification factor C 4 is determined by characteristics of the
output unit 17 (speaker), transfer characteristics from the output unit 17 (speaker) to the input
unit 16 (microphone), identification of the input unit 16 (microphone), and the like. Although an
actually measured value may be used for the amplification factor C 4, if the amplification circuit
does not exist while the sound output from the output unit 17 leaks to the input unit 16, the
energy is attenuated, so the amplification factor The maximum value of C 4 can be assumed to be
1.
[0041]
In this case, i 1 = C 1 i 0, i 2 = C 2 i 1 = C 1 C 2 i 0, i 3 = C 3 i 2 = C 1 C 2 C 3 i 0, i 4 = C 4 i 3 <I 3
holds. Here, since it is necessary to satisfy i 0> i 4, it is sufficient to satisfy i 0> i 3 = C 1 C 2 C 3 i
0, that is, C 1 C 2 C 3 <1. Therefore, the control unit 14 may adjust the amplification factor of the
monitor volume adjustment unit 12 so as to satisfy the condition of C 2 <(1 / C 1 C 3).
[0042]
Specifically, while C 2 <(1 / C 1 C 3) is satisfied, the monitor volume adjuster 12 may adjust the
amplification factor in accordance with the user's instruction on volume adjustment. On the other
hand, when an amplification factor C 2 which does not satisfy C 2 <(1 / C 1 C 3) is instructed, the
09-05-2019
12
control unit 14 monitors to set C 2 <(1 / C 1 C 3) as the amplification factor. It instructs the
sound volume adjustment unit 12.
[0043]
The input volume adjustment unit 11, the monitor volume adjustment unit 12, the output volume
adjustment unit 13, and the control unit 14 are realized by the CPU of a computer that operates
according to a program (voice input / output program). For example, the program is stored in a
storage unit (not shown) of the audio input / output device 10, and the CPU reads the program,
and according to the program, the input volume adjustment unit 11, the monitor volume
adjustment unit 12, the output volume adjustment unit 13 and the control unit 14 may operate.
[0044]
Further, the input volume adjusting unit 11, the monitor volume adjusting unit 12, the output
volume adjusting unit 13, and the control unit 14 may be respectively realized by dedicated
hardware. Specifically, each of the input volume adjustment unit 11, the monitor volume
adjustment unit 12, and the output volume adjustment unit 13 includes a volume designation
unit (not shown) such as an operation panel used to designate a volume. It may be.
[0045]
Next, the operation of the voice input / output device of this embodiment will be described. FIG. 3
is a flowchart showing an operation example of the voice input / output device of this
embodiment.
[0046]
When the user inputs a voice to the input unit 16 (step S1), the input unit 16 inputs the input
voice to the input volume adjustment unit 11 (step S2). The input sound volume adjustment unit
11 adjusts the input sound to the sound volume specified by the user (step S3). The input voice
dividing unit 15 divides the input voice of the volume adjusted by the input volume adjusting
unit 11 into voice for voice recognition and monitor voice (step S4). Then, the input voice
09-05-2019
13
dividing unit 15 transmits the voice for voice recognition to the voice recognition unit 18 and
inputs the monitor voice to the volume adjustment unit for monitor 12. At this time, the input
speech dividing unit 15 may wirelessly transmit the speech for speech recognition to the speech
recognition unit 18.
[0047]
The voice recognition unit 18 performs voice recognition of the received input voice (step S21).
Then, the speech synthesis unit 19 generates a synthesized speech from the speech recognition
result by the speech recognition unit 18 (step S22), and inputs the generated synthesized speech
to the output volume adjustment unit 13 (step S23).
[0048]
On the other hand, when the volume of the monitor sound is designated by the user, the monitor
volume adjuster 12 adjusts the monitor sound to the designated volume (step S5).
[0049]
Further, the control unit 14 determines whether the amplification factor of the volume of the
output sound output from the output unit 17 exceeds 1 with respect to the volume of the input
sound input to the input unit 16 (step S6).
If the amplification factor exceeds 1 (YES in step S6), the control unit 14 instructs the monitor
volume adjustment unit 12 to adjust the volume of the monitor sound so that the amplification
factor does not exceed 1 (step S7). ). In this case, the monitor volume adjustment unit 12 adjusts
the volume of the monitor sound according to the instruction from the control unit 14 (step S8),
and inputs the monitor sound after the volume adjustment to the output volume adjustment unit
13 (step S9). ).
[0050]
On the other hand, when the amplification factor does not exceed 1 (NO in step S5), the control
unit 14 does not issue an instruction to the monitor volume adjustment unit 12. That is, the
09-05-2019
14
monitor sound volume adjustment unit 12 inputs the monitor sound of the sound volume
specified by the user to the output sound volume adjustment unit 13 (step S9).
[0051]
The output volume adjuster 13 adjusts the volume of the output voice obtained by combining the
synthesized voice and the monitor voice to the volume specified by the user (step S10). The
output sound volume adjustment unit 13 inputs the output sound after the sound volume
adjustment to the output unit 17. The output unit 17 outputs the output sound after the volume
adjustment (step S11).
[0052]
As described above, according to the present embodiment, the input sound volume adjusting unit
11 adjusts the sound volume of the input sound input to the input unit 16, and the input sound
dividing unit 15 adjusts the input sound of the adjusted sound volume, Divide into voice for
speech recognition and monitor speech. Also, the monitor volume adjustment unit 12 adjusts the
volume of the monitor voice, and the output volume adjustment unit 13 adjusts the volume of the
output voice obtained by combining the synthesized voice and the monitor voice whose volume
is adjusted, and the output unit 17 Make it output. Then, the control unit 14 adjusts the volume
of the monitor sound so that the amplification factor of the volume of the output sound with
respect to the volume of the input sound does not exceed 1.
[0053]
Therefore, when monitoring the result of speech recognition of the input speech together with
the input speech, the decrease in the speech recognition accuracy of the input speech is
suppressed, and the synthetic speech output as a result of speech recognition of the input speech
is suppressed from becoming difficult to hear However, howling can be easily prevented.
[0054]
Embodiment 2
09-05-2019
15
FIG. 4 is a block diagram showing an example of the configuration of a second embodiment of
the voice input / output device according to the present invention. In addition, about the
structure similar to 1st Embodiment, the code | symbol same as FIG. 1 is attached | subjected,
and description is abbreviate | omitted.
[0055]
The audio input / output device 20 according to the present embodiment includes two or more
input units 16 (input units 16a and 16b), an input volume adjustment unit 11 (input volume
adjustment units 11a and 11b) corresponding to each input unit 16, and each input It differs
from the audio input / output device 10 according to the first embodiment in that the monitor
volume adjustment unit 12 (monitor volume adjustment units 12a and 12b) corresponding to the
volume adjustment unit 11 is provided. The other respects are the same as in the first
embodiment.
[0056]
Although FIG. 4 illustrates two each of the input unit 16, the input volume adjustment unit 11,
and the monitor volume adjustment unit 12, the input unit 16, the input volume adjustment unit
11, and the monitor volume adjustment unit 12 are illustrated. The number is not limited to two,
and may be three or more.
[0057]
Further, FIG. 4 exemplifies a case where the monitor volume adjuster 12 is provided for each
input unit 16. However, if the volume of the monitor voice divided for each input voice can be
adjusted, the monitor volume adjuster 12 May be one.
[0058]
Also in the present embodiment, howling can be prevented if the amplification factor of the
volume of the output voice with respect to the volume of the input voice does not exceed 1.
Therefore, the volume of the input voice may be considered for each input unit 16.
09-05-2019
16
That is, the control unit 14 instructs the monitor volume adjustment unit 12 to adjust the volume
of the monitor sound so that the amplification factor of the volume of the output sound with
respect to the volume of each input sound does not exceed 1.
[0059]
Here, the amplification factors adjusted by the input volume adjusters 11a and 11b are C 1a and
C 1b respectively, and the amplification factors adjusted by the monitor volume adjusters 12a
and 12b are C 2a and C 2b, and the output volume adjusters Let the amplification factor adjusted
by 13 be C 3. Further, the volume of the sound input to the input volume adjusters 11a and 11b
is i 0a and i 0b, respectively, and the volume of the sound output from the input volume
adjusters 11a and 11b and input to the monitor volume adjuster 12 is each i 1a and i 1b are
output from the monitor volume adjustment units 12 a and b and input to the output volume
adjustment unit 13 as i 2 a and i 2 b, and the volume output from the output volume adjustment
unit 13 is i Assume it is 3.
[0060]
Further, it is assumed that the sound output from the output unit 17 is input to the input units
16a and 16b at volume i 3 respectively. That is, it is assumed that the amplification factor of the
sound input to the input unit 16 is 1 for the sound output from the output unit 17. In this case, i
0a> i 3 and i 0b> i 3 need to be satisfied. Organizing in the same manner as in the first
embodiment, the following equation is obtained.
[0061]
(1−C 1a C 2a C 3 )(1−C 1b C 2b C 3 )>(C 1a C 2a C 3 )(C
1b C 2b C 3 )すなわち、(C 1a C 2a +C 1b C 2b )C 3 <1
[0062]
Therefore, the control unit 14 may adjust the amplification factor of the monitor volume
adjusters 12a and 12b so as to satisfy the above equation.
[0063]
09-05-2019
17
In addition, also in the present embodiment, the input sound dividing unit 15 may receive an
instruction indicating the presence or absence of the monitor function from the user.
For example, when an instruction indicating “with monitor function” is received from the user
to the input voice dividing unit 15 corresponding to the input unit 16, the input voice dividing
unit 15 is input to the corresponding input unit 16. The input sound may be input to the monitor
volume adjustment unit 12.
On the other hand, when an instruction indicating “no monitor function” is received from the
user to the input voice dividing unit 15 corresponding to the input unit 16, the input voice
dividing unit 15 is input to the corresponding input unit 16. The input sound may not be input to
the monitor volume adjustment unit 12.
[0064]
Moreover, although the case where the input audio | voice division part 15 is provided for every
input part 16 is demonstrated in this embodiment, the number of the input audio division parts
15 may be one. In this case, the input sound dividing unit 15 includes a switch for specifying the
input unit 16 to which the sound to be monitored is input, and only the sound input to the input
unit 16 specified by the switch is input to the monitor volume adjustment unit 12 It may be
input.
[0065]
That is, in the present embodiment, when there are a plurality of input units 16 (microphones),
the input unit 16 may be selected to output monitor sound. When one input unit 16 is selected,
the operation is similar to that of the first embodiment.
[0066]
As described above, in the present embodiment, the plurality of input sound volume adjusting
units 11 adjust the volume of the input sound input to each input unit 16. Further, the monitor
09-05-2019
18
volume adjustment unit 12 adjusts the volume of the monitor sound divided for each input
sound. Then, the control unit 14 instructs the monitor volume adjustment unit 12 to adjust the
volume of the monitor sound so that the amplification factor of the volume of the output sound
with respect to the volume of each input sound does not exceed 1. Therefore, in addition to the
effects of the first embodiment, howling can be prevented even when processing is performed
using a plurality of input voices input from a plurality of input devices.
[0067]
Embodiment 3 FIG. 5 is a block diagram showing a configuration example of the third
embodiment of the voice input / output device according to the present invention. In addition,
about the structure similar to 1st Embodiment, the code | symbol same as FIG. 1 is attached |
subjected, and description is abbreviate | omitted.
[0068]
The audio input / output device 30 according to the present embodiment includes two or more
output units 17 (output units 17 c and 17 d), an output volume adjustment unit 13 (output
volume adjustment units 13 c and d) corresponding to each output unit 17, and each output It
differs from the audio input / output device 10 in the first embodiment in that the monitor
volume adjustment unit 12 (monitor volume adjustment unit 12 c, d) corresponding to the
volume adjustment unit 13 is provided. The other respects are the same as in the first
embodiment.
[0069]
Although two output units 17, two output volume control units 13, and two monitor volume
control units 12 are illustrated in FIG. 5, the output unit 17, the output volume control unit 13,
and the monitor volume control units 12 are illustrated. The number is not limited to two, and
may be three or more.
[0070]
Further, FIG. 5 exemplifies a case where the monitor volume adjuster 12 is provided for each
output unit 17. However, if the volume of monitor sound can be adjusted for each output unit 17,
the monitor volume adjuster 12 It may be one.
09-05-2019
19
[0071]
In the present embodiment, howling can be prevented if the amplification factor of the total
volume of the output sound output from each output unit 17 does not exceed 1 with respect to
the volume of the input sound.
Therefore, the volume of the input sound may be taken into consideration of the total volume of
the sound output from the output unit 17.
That is, the control unit 14 adjusts the monitor sound volume adjustment instruction so that the
total amplification factor of the output sound volume output from each output unit 17 does not
exceed 1 with respect to the input sound volume. It does with respect to section 12.
[0072]
Here, the amplification factor adjusted by the input volume adjustment section 11 is adjusted by
C 1, the amplification factors adjusted by the monitor volume adjustment sections 12 c and d are
adjusted by C 2 c and C 2 d, and the output volume adjustment sections 13 c and d. The
amplification factors are C 3c and C 3d, respectively. Further, the volume of the sound input to
the input volume adjustment unit 11 is i 0, and the volume of the sound output from the input
volume adjustment unit 11 and input to the monitor volume adjustment units 12c and d i 1, and
the volume adjustment for the monitor The volume of the sound output from the units 12c and d
and input to the output volume adjusters 13c and d is i 2c and i 2d, and the volume output from
the output volume adjusters 13c and d is i 3c and i 3d, respectively. Do.
[0073]
In addition, it is assumed that the sound output from the output units 17 c and 17 d is input to
the input unit 16 at the volume i 3 c + i 3 d. That is, it is assumed that the amplification factor of
the sound input to the input unit 16 is 1 for the sounds output from the output units 17 c and 17
d. In this case, i 0> i 3c + i 3d needs to be satisfied. Organizing in the same manner as in the first
embodiment, the following equation is obtained.
09-05-2019
20
[0074]
C 1 (C 2c C 3c +C 2d C 3d )<1
[0075]
Therefore, the control unit 14 may adjust the amplification factor of the monitor volume
adjusters 12 c and 12 d so as to satisfy the above equation.
[0076]
Further, in the present embodiment, each output sound volume adjustment unit 13 may receive
an instruction indicating the presence or absence of an output of sound to each output unit 17.
For example, when an instruction indicating “audio output is present” is received from the
user to the output volume adjustment unit 13 corresponding to the output unit 17, the output
volume adjustment unit 13 outputs the synthesized voice to the corresponding output unit 17. It
may be output.
On the other hand, when an instruction indicating “no sound output” is received from the user
to output volume adjustment unit 13 corresponding to output unit 17, output volume adjustment
unit 13 outputs the synthesized voice to corresponding output unit 17. You may make it not
output.
[0077]
As described above, according to the present embodiment, the plurality of output sound volume
adjusting units 13 adjust the sound volume of the output sound output from each output unit 17.
Further, the monitor sound volume adjustment unit 12 adjusts the sound volume of the monitor
sound for each output unit 17. Then, the control unit 14 instructs the volume adjustment unit for
monitor to adjust the volume of the monitor sound so that the amplification factor of the total
volume of the output sound output from each output unit 17 does not exceed 1 with respect to
the volume of the input sound. Do on 12 Therefore, in addition to the effects of the first
embodiment, howling can be prevented even when audio is output from a plurality of output
units.
09-05-2019
21
[0078]
Embodiment 4 FIG. 6 is a block diagram showing a configuration example of the fourth
embodiment of the voice input / output device according to the present invention. In addition,
about the structure similar to 1st-3rd embodiment, the code | symbol same as FIG.1, FIG.4 or FIG.
5 is attached | subjected, and description is abbreviate | omitted.
[0079]
The audio input / output device 40 of the present embodiment includes a control unit 14, two or
more input units 16 (input units 16 a and 16 b), and an input volume adjustment unit 11 (input
volume adjustment unit 11 a corresponding to each input unit 16). , B), monitor sound volume
adjustment unit 12 (monitor sound volume adjustment units 12 a, b) corresponding to each input
sound volume adjustment unit 11, two or more output units 17 (output units 17 c, 17 d), and
each output An output volume adjustment unit 13 (output volume adjustment units 13c and 13d)
corresponding to the unit 17 and a monitor volume adjustment unit 12 (monitor volume
adjustment units 12c and d) corresponding to each output volume adjustment unit 13 are
provided. .
[0080]
The process in the case where voice is input to the plurality of input units 16 is the same as that
in the second embodiment.
In addition, processing when voices are output from the plurality of output units 17 is the same
as that in the third embodiment.
[0081]
Further, in the present embodiment, a monitor voice may be output by selecting a combination of
the input unit 16 for inputting voice and the output unit 17 for outputting synthetic voice. For
example, each input sound dividing unit 15 receives an instruction indicating the presence or
absence of a monitor function from the user, and each output volume adjustment unit 13
09-05-2019
22
receives an instruction indicating the presence or absence of an output of sound to each output
unit 17. A combination of the input unit 16 for inputting the and the output unit 17 for
outputting the synthetic speech may be selected.
[0082]
At this time, the monitor sound volume adjustment unit 12 adjusts the sound volume of the
monitor sound divided for each input sound input to the selected input unit 16 and the sound
volume of the monitor sound for each selected output unit 17. May be Then, the control unit 14
is configured such that the amplification factor of the total volume of the output sound output
from the selected output unit 17 does not exceed 1 with respect to the volume of the input sound
input to each of the selected input units 16. An instruction to adjust the volume of monitor sound
may be issued to the monitor volume adjustment unit 12. In this case, howling can be prevented
even when processing is performed using a plurality of input voices and voices are output from a
plurality of output units.
[0083]
Hereinafter, the present invention will be described by way of specific examples, but the scope of
the present invention is not limited to the contents described below.
[0084]
FIG. 7 is an explanatory view showing an example of the voice input / output device of this
embodiment.
In the voice input / output device 50 of the present embodiment, the input unit and the output
unit are accommodated in one case. Specifically, the audio input / output device 50 includes two
microphones 56a and 56b as an input unit, and one speaker 57 as an output unit. Of the two
microphones 56a and 56b, one microphone 56a is disposed at the mouth of the user, and the
other microphone 56b is disposed at the ear of the user. In addition, the speaker 57 is also
disposed at the ear of the user.
[0085]
09-05-2019
23
Further, there is a voice recognition device 60 that performs voice recognition and voice
synthesis, and the voice input / output device 50 transmits the sound input to the microphones
56a and 56b to the voice recognition device 60 by wireless communication. Further, the voice
input / output device 50 receives from the voice recognition device 60 the synthesized voice
received by wireless communication.
[0086]
The microphone 56a is used particularly for the user's voice input, and the microphone 56b is
used for the ambient noise input. The voice recognition device 60 has a function of extracting the
user's voice by removing the ambient noise input to the microphone 56b from the sound
included in the microphone 56a. Further, the speech recognition device 60 has a function of
recognizing a user's speech and generating a synthetic speech. As described above, a method of
extracting a user's voice from two sound sources and recognizing the extracted voice to generate
a synthesized voice is widely known, and thus the description thereof is omitted here.
[0087]
FIG. 8 is an explanatory view showing an example of a voice recognition system including the
voice input / output device of this embodiment. The input sound volume adjustment unit 51a is
connected to the microphone 56a, and the input sound division unit 55a is connected to the
input sound volume adjustment unit 51a. The input voice division unit 55a divides the voice
input to the microphone 56a, and transmits the input voice to the voice recognition device 60
and the monitor volume adjustment unit 52a. The voice recognition device 60 wirelessly
transmits the synthesized voice as a result of voice recognition to the output volume adjustment
unit 53. Further, the monitor sound volume adjustment unit 52 a transmits monitor sound to the
output sound volume adjustment unit 53.
[0088]
Similarly, the input sound volume adjustment unit 51b is connected to the microphone 56b, and
the input sound division unit 55b is connected to the input sound volume adjustment unit 51b.
The input voice dividing unit 55b divides the voice input to the microphone 56b, and transmits
the input voice to the voice recognition device 60 and the monitor volume adjustment unit 52b.
09-05-2019
24
The voice recognition device 60 wirelessly transmits the synthesized voice as a result of voice
recognition to the output volume adjustment unit 53. Further, the monitor sound volume
adjustment unit 52 b transmits the monitor sound to the output sound volume adjustment unit
53.
[0089]
The output sound volume adjustment unit 53 inputs the adjusted output sound to the speaker
57. Then, the speaker 57 outputs an output sound. At this time, the control unit 54 controls the
monitor volume adjusters 52a and 52b.
[0090]
Specifically, when the volume of the output voice output from the speaker 57 is larger than the
volume of the input voice input to the microphone 56a, the control unit 54 causes the volume of
the output voice to be less than or equal to the volume of the input voice The monitor volume
control unit 52a is instructed to adjust the volume of the monitor sound.
[0091]
Similarly, when the amplification factor of the volume of the output sound output from the
speaker 57 with respect to the volume of the input sound input to the microphone 56 b exceeds
1, the control unit 54 sets the volume of the monitor sound so that the amplification factor does
not exceed 1. Is instructed to the monitor volume adjustment unit 52b.
[0092]
In the present embodiment, the microphone 56 b and the speaker 57 for acquiring ambient noise
are disposed close to the user's ear.
In such a case, the sound output from the speaker 57 is easily input to the microphone 56b as it
is, which tends to cause howling.
However, in the present embodiment, when the amplification factor of the volume of the output
sound output from the speaker with respect to the volume of the input sound input to the
09-05-2019
25
microphone exceeds 1, the volume of the monitor sound is adjusted so that the amplification
factor does not exceed 1. Therefore, the occurrence of howling can be suppressed.
[0093]
Next, a minimum configuration example of the present invention will be described. FIG. 9 is a
block diagram showing an example of the minimum configuration of the voice input / output
device according to the present invention. The audio input / output device according to the
present invention comprises an input volume adjustment unit 81 (for example, an input volume
adjustment unit 11) for adjusting the volume of input sound input to an input device (for
example, an input unit 16 and a microphone); A voice dividing unit 82 (for example, a voice
dividing unit 82) that divides the input voice of the volume adjusted by the unit 81 into a voice
for speech recognition that is a voice used for voice recognition and a monitoring voice that is a
voice used for monitoring the input voice. Input audio division unit 15), monitor volume
adjustment means 83 (for example, monitor volume adjustment unit 12) for adjusting the volume
of monitor audio, and information synthesized from information obtained as a result of speech
recognition of speech recognition speech An output device (e.g., an output device) by adjusting
the volume of an output voice which is a voice obtained by synthesizing a synthetic voice which
is a voice and a monitor voice whose volume is adjusted by the monitor volume adjusting means
83. Output volume adjustment means 84 (for example, output volume adjustment unit 13) to be
output to the output unit 17 and the speaker, and adjustment of the volume of the monitor sound
so that the amplification factor of the volume of the output sound to the volume of the input
sound does not exceed 1. And a control unit 85 (for example, the control unit 14) that gives the
monitor volume adjustment unit 83 an instruction to
[0094]
With such a configuration, when monitoring the result of speech recognition of the input speech
together with the input speech, it is difficult to hear the synthetic speech outputted as a result of
speech recognition of the input speech while suppressing the decrease in speech recognition
accuracy of the input speech. Howling can be easily prevented while suppressing the problem.
[0095]
The audio input / output device is provided for each of the two or more input devices, and two or
more input volume adjustment means (for example, the input volume adjustment unit 11a, b)
may be provided.
09-05-2019
26
Then, the monitor volume adjustment means 83 adjusts the volume of the monitor voice divided
for each input voice, and the control means 85 does not make the amplification factor of the
volume of the output voice to the volume of each input voice exceed one. An instruction to adjust
the volume of monitor sound may be issued to the monitor volume adjustment means 83.
[0096]
With such a configuration, howling can be prevented even when processing is performed using a
plurality of input voices input from a plurality of input devices.
[0097]
Also, the audio input / output device is provided for each of two or more output devices, and two
or more output volume adjustment means (for example, output volume adjustment unit 13c, etc.)
that adjust the volume of each output sound to be output for each output device. d) may be
provided.
Then, the monitor volume adjustment means 83 adjusts the volume of the monitor sound for
each output device, and the control means 85 controls the amplification factor of the total
volume of the output sound output from each output device with respect to the input sound
volume. An instruction to adjust the volume of the monitor sound so as not to exceed 1 may be
issued to the monitor volume adjustment means.
[0098]
With such a configuration, howling can be prevented even when audio is output from a plurality
of output units.
[0099]
Further, the voice input / output device is provided with selection means (for example, an input
voice division unit 15 and an output volume adjustment unit 13) for selecting a set of an input
device for receiving an input of input voice and an output device for outputting a synthetic voice.
It is also good.
09-05-2019
27
Then, the monitor volume adjustment means 83 may adjust the volume of the monitor audio
divided for each input audio input to the selected input device and the volume of the monitor
audio for each selected output device. Further, the monitor means 85 controls the monitor sound
so that the amplification factor of the total sound volume of the output sound outputted from the
selected output device does not exceed 1 with respect to the sound volume of the input sound
inputted to each selected input device. The monitor volume adjustment means 83 may be
instructed to adjust the volume.
[0100]
With such a configuration, howling can be prevented even when processing is performed using a
plurality of input voices and voices are output from a plurality of output units.
[0101]
Further, the voice dividing means 82 may transmit the voice for voice recognition to the voice
recognition device by radio, and the output volume adjusting means 84 may receive the
synthesized voice transmitted via radio.
[0102]
Also, the voice input / output device creates a synthesized voice from voice recognition means
(for example, voice recognition unit 18) that performs voice recognition based on the voice for
voice recognition, and the result of voice recognition by the voice recognition means A voice
synthesis unit (for example, a voice synthesis unit 19) for inputting a voice to the output volume
adjustment unit 84 may be provided.
In this case, the voice input / output device plays a role as a voice recognition device.
[0103]
Also, a microphone as an input device and a speaker as an output device may be housed in one
case.
[0104]
09-05-2019
28
As mentioned above, although this invention was demonstrated with reference to embodiment
and an Example, this invention is not limited to the said embodiment and Example.
The configurations and details of the present invention can be modified in various ways that can
be understood by those skilled in the art within the scope of the present invention.
[0105]
This application claims priority based on Japanese Patent Application No. 2011-245615 filed on
Nov. 9, 2011, the entire disclosure of which is incorporated herein.
[0106]
The present invention is suitably applied to a voice input / output device that prevents howling
when outputting input voice and the result of voice recognition of the voice.
[0107]
10, 20, 30, 40, 50 audio input / output devices 11, 11a, 11b input volume adjustment units 12,
12a, 12b, 12c, 12d monitor volume adjustment units 13, 13c, 13d output volume adjustment
unit 14 control unit 15, control unit 15, 15a, 15b Input voice dividing unit 16, 16a, 16b Input
unit 17, 17c, 17d Output unit 18 Speech recognition unit 19 Speech synthesis unit
09-05-2019
29
Документ
Категория
Без категории
Просмотров
0
Размер файла
41 Кб
Теги
jpwo2013069229
1/--страниц
Пожаловаться на содержимое документа