close

Вход

Забыли?

вход по аккаунту

?

JPH07336790

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH07336790
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
microphone system for collecting a speaker's voice signal using a plurality of microphones in a
conference room or studio.
[0002]
2. Description of the Related Art When a conference (including a video conference) is conducted
using a conventional microphone system to pick up a speaker's voice signal, for example, when
one microphone is used, the directivity of the microphone is used. You can only pick up the voice
signal in the direction that you followed. As a countermeasure, it was necessary to manually
rotate the microphone toward the speaker, or to rotate the microphone by remote control using a
motor. Alternatively, when omnidirectional microphones were used, it was not possible to
separate multiple audio signals. Furthermore, even when trying to fix a speaker by mixing
technology by arranging a plurality of microphones spatially, manual intervention is required.
The same is true for studio equipment.
[0003]
As described above, if it is a single microphone system, only a voice signal in a direction
according to the directivity of the microphone can be picked up. The microphone was manually
03-05-2019
1
rotated, or was rotated using a motor with a remote control. It is difficult to separate a specific
voice signal by using an omnidirectional microphone, and even if some priority is given to pick
up the voice signal of a certain speaker, a plurality of overlapping voice signals are overlapped.
Can not be separated. Furthermore, even when trying to fix a speaker by mixing technology by
arranging a plurality of microphones spatially, manual intervention is required. As described
above, in the prior art, when trying to pick up a voice signal of only a speaker, there is a problem
that it is not possible to manually separate voice signals of multiple speakers.
[0004]
An object of the present invention is to provide a microphone system that improves the problem
that human voice signals from multiple speakers can not be separated.
[0005]
SUMMARY OF THE INVENTION The present invention relates to a microphone array for picking
up a voice signal in a microphone system for picking up a specific voice signal in a sound field
having a plurality of speakers, and an amplifier for amplifying the output of the microphone
array. A noise level detection unit that detects a noise level from the output of the amplifier
group, a maximum level detection unit that detects the maximum level of the output of the
amplifier group from the output of the amplifier group and the output of the noise level detection
unit; The time lag detection unit detects a time lag from the output, the output of the noise level
detection unit, and the output of the maximum level detection unit and outputs a selection signal,
and one of the outputs of the amplifier group based on the selection signal from the time lag
detection unit And a switching control unit for selecting one audio signal.
[0006]
The voice signal coming in from the microphone is amplified by the amplifier, passes through the
delay element, and one of them is selected by the switching control unit.
The parameter that controls the switching control unit is the output of the time lag detection
unit.
[0007]
03-05-2019
2
First, the conference starts from silence.
The system then determines the noise level PN in the room. This is input to the time lag detection
unit and the maximum level detection unit. The system then proceeds to detect silence intervals.
Then, when someone's sound starts and the power level P1 exceeds the noise level PN, this is
selected by the switching control unit to fix the speaker. If the start point of the speaker's
utterance is close, and there is competition for the selection of microphones, it is not possible to
fix the microphone to one speaker, so the speaker who used the time lag and the timing of the
utterance was the earliest Fix the selected voice to. Furthermore, if the time lag is within the
error range, the selected voice is fixed to the speaker that has detected the maximum level.
[0008]
Next, an embodiment of the present invention will be described with reference to the drawings.
[0009]
FIG. 1 is a block diagram showing an embodiment of the present invention.
In this embodiment, a microphone array 11 for inputting a voice, an amplifier group 12 for
amplifying an output of the microphone array 11, and a delay element group for delaying a voice
signal output from the amplifier group 12 to prevent missing of a word 13, a time lag detection
unit 14 that detects time lags of a plurality of audio signals output from the amplifier group 12,
and an audio signal having the highest level among the plurality of audio signals output from the
amplifier group 12 The maximum level detection unit 15 outputs the maximum level to the time
lag detection unit 14, the noise level in the room is always detected by all the microphones, and
the average value or the maximum value of those outputs is the time lag detection unit 14 and
the maximum level detection unit One of a plurality of audio signals based on the outputs of the
noise level detection unit 16 and the time lag detection unit 14 to be output to 15. It consists
switching control unit 17 for selecting a voice signal.
[0010]
FIG. 2 is a view showing the arrangement of microphones in the present embodiment. Here, as an
assumption of the sound field, it is assumed that there is a round desk and a plurality of
03-05-2019
3
conference participants surround this desk. Reference numeral 18 is a directional microphone,
19 is a circle, and 20 is directivity of a single microphone.
[0011]
FIG. 3 is a timing chart for explaining this embodiment. P1 is the observed power level of the
microphone directed to speaker 1, P2 is the observed power level of the microphone directed to
speaker 2, and P3 is the observed power of the microphone directed to speaker 3 It is a level,
and PN is the noise level of the microphone system including the sound field. Further, t1 is the
speech start time of the speaker 1, t2 is the speech start time of the speaker 2, t3 is the speech
start time of the speaker 3, and a represents a silent period of the speaker 1. Here, the utterance
start time refers to the time when the power level of the observed speech exceeds the noise level
PN. The above-mentioned time lag indicates the time difference | t2 -t1 |, | t3 -t2 | of the
utterance start times t1, t2 and t3 of the speakers 1, 2 and 3, respectively.
[0012]
FIG. 4 is a diagram showing the relationship between the audio signal and the power level. p is
the sound pressure level of the audio signal (so-called audio signal), which indicates the vowel / a
/. P1 is the power level of speaker 1 and has a relationship of P1 = p * p. P1 'represents the
discrete time moving average value of the power level of speaker 1, and c represents the time
window at that time. On the other hand, P1 ′ ′ represents a continuous time moving average
value with respect to the time of the speaker 1 power level.
[0013]
Next, the operation of this embodiment will be described.
[0014]
The audio signal from the microphone array 11 is amplified by the amplifier group 12, passes
through the delay element group 13, and is output via the switching control unit 17.
Since the microphone array 1 comprises a plurality of microphones, one of them is to be selected
03-05-2019
4
by the switching control unit 17. The maximum level is detected by the maximum level detection
unit 15 and input to the time lag detection unit 14. Further, the noise level detection unit 16
always detects the noise level in the room using all the microphones, and inputs the average
value or the maximum value to the time lag detection unit 14 and the maximum level detection
unit 15 in advance.
[0015]
First, the conference starts from silence. At this time, the system must obtain the noise level PN
as shown in FIG. 3 (a). This is performed by the noise level detection unit 16. Here, a is a silent
period, which is the time from when the audio output P1 of the amplifier falls below the noise
level PN to when it exceeds the noise level PN for the first time. The time lag detection unit 14
detects this silent section. The selected audio signal is kept fixed to the input signal of the
microphone selected immediately before the silent period by the switching control unit 17 as
long as the silent period continues. Then, when the voice output P1 exceeds the noise level PN as
shown in FIG. 3A, the time lag detection unit 14 detects this and outputs an instruction to select
the speaker 1 to the switching control unit 17 as a selection signal. Then, the switching control
unit 17 selects and fixes the microphone of this speaker in accordance with the selection signal.
[0016]
If the utterance start points of the speakers 1, 2 and 3 are close as shown in t1, t2 and t3 in FIGS.
3B and 3C, the microphone can not be fixed to one speaker. By detecting a time lag t1, i.e., a time
difference t1 <t2 <t3 at the start of speech, the selected speech is fixed to the speaker headed to
the microphone with the earliest speech. This is performed by the time lag detection unit 14 and
the switching control unit 17. The time lag detection unit 14 measures the time difference at the
speech start time, and the switching control unit 17 selects the output of the amplifier group 2
and Choose an early speaker. Furthermore, in the time lag detection unit 14, when t1, t2 and t3
fall within the timing error range, the maximum levels of the respective voice outputs P1, P2 and
P3 are input from the maximum level detection unit 15 to the time lag detection unit 14 Switches
the selection signal to the control unit 17 so as to select the microphone of the largest speaker.
Here, the range of the error is a fixed value that the time lag detection unit 14 has.
[0017]
03-05-2019
5
Next, the process when the speaker 1 finishes talking will be described. In FIG. 3, let t4 be the
end point of the utterance of the speaker 1. Here, the speech end point is a point when the
speech output P1 falls below the noise level PN. Also, let b be a silent section generated at this
time. If this silence interval b is actually detected, the system leaves the microphone fixed to this
speaker. If, at time t4, the speaker 2 and the speaker 3 are speaking, the time lag detection unit
14 selects a speaker with an earlier timing by referring to t2 and t3 stored in the time lag
detection unit 14 Do. Furthermore, if t2 and t3 fall within the error range, the maximum level
detection unit 15 outputs the maximum level of each speaker at time t4, and the time lag
detection unit 14 outputs a selection signal for selecting the speaker having the maximum level. .
[0018]
Finally, supplementary explanation of the maximum level detection unit 15 will be made. As
shown in FIG. 4, the actual voice signal of the vowel / a / has a time waveform like p and vibrates
finely, so the power level P1 also vibrates finely. In this case, even if the maximum level is sought,
the system is not stable, and it is not guaranteed which microphone is selected. Therefore, a time
moving average value is required, and P1 'and P1' correspond to it.
[0019]
The reason for using the delay element 3 is to avoid the omission of the beginning in the
switching control unit 17 due to the processing time in the time lag detection unit 14 and the
maximum level detection unit 15. This delay amount is the maximum delay amount of the time
lag detection unit 14 and the maximum level detection unit 15.
[0020]
As described above, according to the present invention, while a silent interval is being detected,
the selection microphone is fixed, and if one of the speakers speaks first, this speaker is selected,
and this speaker is selected. Then, if the utterance timing is in the range of error across multiple
speakers, the speaker who selected the earliest according to the time lag of these speech signals
is selected, and if this timing difference is in the range of error, the maximum level Choose a
speaker with. In this manner, according to the present invention, manual operation of a
microphone, interference of audio signals, and manual operation of mixing, which are
03-05-2019
6
conventionally performed, can be automated and improved.
03-05-2019
7
Документ
Категория
Без категории
Просмотров
0
Размер файла
15 Кб
Теги
jph07336790
1/--страниц
Пожаловаться на содержимое документа