close

Вход

Забыли?

вход по аккаунту

?

JP2007235864

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2007235864
An object of the present invention is to protect the privacy of a speaking party by effectively
fusing the hearing aid sound output over the speaking voice of the speaking party with the
speaking voice of the speaking party. SOLUTION: A sound processing device 10 speaks a sound
image of the hearing aid sound so that a third party other than the talking party perceives that
the hearing aid sound outputted by being put on the conversational voice of the talking party is
outputted from the position of the talking party. It localizes to the sound source position 21
which is a position where the party is in conversation, and outputs the hearing aid sound in
which the sound image is localized from each of the speakers 20a to 20d. [Selected figure] Figure
1
Speech processing apparatus and speech processing method
[0001]
The present invention relates to a voice processing apparatus and a voice processing method for
outputting a second voice output by covering a first voice of a speaking party, and in particular,
to a hearing sound output by covering a speaking voice of a speaking party Relates to a speech
processing apparatus and speech processing method capable of effectively fusing the speech of
the speaking party with the speech of the speaking party to protect the privacy of the speaking
party (including the secrecy protection of the speech).
[0002]
Conventionally, in open spaces such as banks, hospitals, and securities companies, conversations
09-05-2019
1
regarding content related to privacy frequently occur.
Therefore, for the purpose of protecting the privacy of the conversation party, a masking
apparatus has been developed which outputs a masking sound to the speech of the conversation
party (voice in conversation) (see, for example, Patent Document 1).
[0003]
Specifically, as such "masking sound", an interference sound such as white noise or BGM that
obscures the speech sound of the talking party is output over the speech sound of the talking
party, and the speech sound of the talking party is scratched and said Protect the privacy of
conversational parties by making the content hard to hear.
[0004]
However, there is a problem that the privacy of the conversation party can not be sufficiently
protected only by outputting such interference sound from the speaker.
Specifically, since the disturbing sound is heard by the third party from the position of the
speaker, there is a problem that the conversation voice and the disturbing sound of the talking
party can be discriminated.
[0005]
Further, in recent years, a sound pressure control technology capable of reproducing an arbitrary
sound pressure distribution with high accuracy in a three-dimensional space has been actively
studied (see, for example, Patent Document 2). In this technique, a filter to be applied to an audio
signal is determined from a preset desired signal transfer characteristic (impulse response) and
an actual signal transfer characteristic, and the desired sound pressure is obtained by applying
the filter to the audio signal. The distribution is reproduced.
[0006]
Patent Document 1: Japanese Patent Application Laid-Open No. 6-175666 Patent Document 2:
Japanese Patent No. 2558445
09-05-2019
2
[0007]
However, the prior art described in the above-mentioned Patent Document 2 is not made to solve
the problem of protecting the privacy of the conversation party, and it is effective to reproduce
the privacy of the conversation party by reproducing any sound pressure distribution. It did not
indicate what could be protected.
[0008]
Therefore, it remains as an important issue how to prevent the third party from hearing the
conversation voice and the hearing aid of the conversation party and to effectively protect the
privacy of the conversation party. .
[0009]
The present invention has been made to solve the above-mentioned problems according to the
prior art, and effectively integrates the hearing sound output over the speech of the speaking
party with the speech of the speaking party, It is an object of the present invention to provide a
voice processing device and a voice processing method capable of protecting the privacy of
[0010]
In order to solve the problems described above and to achieve the object, the speech processing
device according to the invention of claim 1 is a speech processing device that outputs a second
speech output over a first speech of a speaking party. Sound localization means for localizing the
sound image of the second voice so that a third person other than the conversation party
perceives that the second voice is output from the position of the conversation party, and the
sound image localization means And an audio output means for outputting the second audio
generated.
[0011]
Further, in the voice processing apparatus according to the invention of claim 2, in the invention
of claim 1, the voice output unit is configured such that the strength of the second voice becomes
equal to or higher than the strength of the first voice at the position of the third party. And
adjusting the intensity of the second voice to output the second voice whose intensity is adjusted.
[0012]
09-05-2019
3
Further, in the voice processing device according to the invention of claim 3, in the invention of
claim 1 or 2, the voice output unit is configured such that the strength of the second voice at the
position of the conversation party is smaller than the strength of the first voice. To further adjust
the intensity of the second voice to output the second voice whose intensity has been adjusted.
[0013]
Also, in the voice processing device according to the invention of claim 4, in the invention of
claim 1, 2 or 3, the voice output means outputs a second voice including voice not correlated
with the first voice. It is characterized by
[0014]
Further, the voice processing method according to the invention of claim 5 is a voice processing
method for outputting a second voice output while covering the first voice of the speaking party,
wherein the second voice is a position of the speaking party A sound image localization step of
localizing a sound image of the second sound so that a third person other than the conversation
party perceives the sound output from the sound source, and a sound output for outputting the
second sound in which the sound image is localized in the sound image localization step And a
process.
[0015]
According to the inventions of claims 1 and 5, the third voice other than the speaking party
perceives that the second voice outputted over the first speech of the speaking party is output
from the location of the speaking party. Since the sound image of the second voice is localized
and the second voice in which the sound image is localized is output, the hearing aid sound that
is output over the conversation voice of the conversation party is effectively combined with the
conversation voice of the conversation party It has the effect of being able to fuse and protect the
privacy of the speaking party.
[0016]
Further, according to the second aspect of the present invention, the intensity of the second voice
is further adjusted to adjust the intensity of the second voice so that the intensity of the second
voice is equal to or higher than the intensity of the first voice at the third party's position. Since
the second voice is output, it is possible to make it difficult for the third party to listen to the
conversation of the conversation party, and the privacy of the conversation party can be
effectively protected.
09-05-2019
4
[0017]
Further, according to the invention of claim 3, the intensity of the second voice is further
adjusted to adjust the intensity of the second voice so that the intensity of the second voice
becomes smaller than the intensity of the first voice at the position of the conversation party.
Since the second voice is output, the second voice output to protect the privacy of the
conversation party can be prevented from interfering with the conversation of the conversation
party.
[0018]
Further, according to the invention of claim 4, since the second voice including the voice having
no correlation with the first voice is output, it is difficult to understand that the second voice is
output from the speaker. The effect of being able to
[0019]
Hereinafter, preferred embodiments of a speech processing apparatus and speech processing
method according to the present invention will be described in detail with reference to the
accompanying drawings.
[0020]
First, the concept of speech processing according to the present invention will be described.
FIG. 1 is a diagram for explaining the concept of audio processing according to the present
invention.
As shown in FIG. 1, in this voice processing, the voice processing device 10 is a voice that
obscures the conversation content of the conversation party (this voice prevents the conversation
voice from being heard by a third party In the following, this voice is referred to as a hearing
protection sound) and output from each of the speakers 20a to 20d (in the case of N = 4
speakers).
[0021]
09-05-2019
5
At that time, the voice processing apparatus 10 not only outputs the hearing aid sound, but also
the third party feels that the hearing aid sound outputted from each of the speakers 20a to 20d
is outputted from the position of the conversation party as if it were. , The sound image of the
hearing aid the position of the conversation party (the sound source position 21 in FIG. 1).
A sound input unit such as a microphone is installed at the sound source position 21 and, as will
be described later, the sound localization is generated in the hearing aid sound based on the
conversational sound acquired by the sound input unit.
[0022]
Further, the voice processing device 10 further makes it possible to further prevent the hearing
aids from being output from the speakers 20a to 20d, so that the hearing aids including the voice
signal such as BGM having no correlation with the conversational voice from the speakers 20a to
20d Output.
[0023]
Furthermore, the voice processing device 10 adjusts the sound pressure level of the hearing aid
output from each of the speakers 20a to 20d to be equal to or higher than the sound pressure
level of the conversation voice of the conversation party at the position where the third party is
present.
[0024]
Further, the voice processing apparatus 10 outputs the sound pressure level of the hearing aid
sound output from each of the speakers 20a to 20d at the position where the talking party is
present and having traveled behind each of the speakers 20a to 20d. Adjust to be less than
pressure level.
[0025]
FIG. 2 is a diagram for explaining sound pressure level adjustment processing according to the
present invention.
FIG. 2 shows the sound pressure level distribution when the speakers 30a to 30e for outputting
the hearing aid are installed at various positions.
09-05-2019
6
[0026]
As shown in FIG. 2, the sound pressure level of the conversational voice becomes gradually
smaller as the distance from the position of the conversational party who made the
conversational speech increases.
In addition, the sound pressure levels of the hearing aid sounds output from the speakers 30a to
30e gradually decrease as the distance from the positions of the speakers 30a to 30e increases.
[0027]
In this sound pressure level adjustment process, the sound pressure level of the hearing aid
sound is adjusted to be equal to or higher than the sound pressure level of the conversation voice
of the conversation party at the position where the third party is present.
FIG. 2 shows the case where the sound pressure level of the hearing aid sound and the sound
pressure level of the conversation voice of the talking party coincide with each other.
[0028]
Furthermore, in this sound pressure level adjustment process, the sound pressure level of the
hearing aid sound is adjusted to be lower than the sound pressure level of the conversation voice
of the conversation party at the position where the conversation party is present.
Specifically, a sound pressure level difference in which the hearing aid sound does not interfere
with the conversation is set in advance, and the sound pressure level of the hearing aid sound is
smaller than the sound pressure level of the conversation voice of the conversation party by
more than this sound pressure level difference To be adjusted.
[0029]
09-05-2019
7
The sound image localization process and the sound pressure level adjustment process as
described in FIGS. 1 and 2 are executed each time the arrangement of the speakers changes.
FIG. 3 is a view for explaining an arrangement example of the speakers.
Two exemplary arrangements of the loudspeakers are shown in FIG.
[0030]
In Arrangement Example 1, two speakers 40a and 40b are arranged to output an anti-hearing
sound toward the direction of the conversation party.
Further, in the arrangement example 2, the two speakers 42a and 42b are arranged so as to
output the soundproofing sound in the direction in which the third party is present.
[0031]
In any case, sound image localization processing is performed to localize the sound image of the
hearing aid sound at the positions of the conversation parties indicated as the sound source
positions 41 and 43.
Also, at the position where the third party is present, the sound pressure level of the hearing aid
sound is adjusted to be equal to or higher than the sound pressure level of the conversation voice
of the conversation party. A sound pressure level adjustment process is performed to adjust the
sound pressure level to less than the sound pressure level of the conversation voice of the
conversation party.
[0032]
By performing such sound image localization processing and sound pressure level adjustment
processing, the third party feels as if the hearing aid sound output from each speaker is being
09-05-2019
8
output from the position of the talking party, so the conversation party's conversation The voice
and the hearing aid can be effectively fused.
[0033]
Also, at the position where the third party is present, the sound pressure level of the hearing aid
sound is equal to or higher than the sound pressure level of the conversation voice, so the
contents of the conversation can be obscured. Since the pressure level is smaller than the sound
pressure level of the conversation voice, the hearing protection sound can be prevented from
interfering with the conversation.
[0034]
Next, the functional configuration of the speech processing apparatus according to the present
embodiment will be described.
FIG. 4 is a diagram showing a functional configuration of the speech processing apparatus 10
according to the present embodiment.
As shown in FIG. 4, the speech processing apparatus 10 includes an input unit 11, a display unit
12, a speech input reception unit 13, a spectrum envelope database 14, a control unit 15, a
speech generation unit 16, sound image localization units 17a to 17d, speech It has output parts
18a-18d.
[0035]
The input unit 11 is an input device such as a keyboard or a mouse used to input various
information.
The display unit 12 is a display device such as a display that outputs various information.
The voice input receiving unit 13 is a receiving unit that receives an input of a voice signal from
a microphone, performs A / D conversion and amplification processing, and outputs the result to
09-05-2019
9
the control unit 15.
[0036]
Specifically, the voice input acceptance unit 13 is installed at a position where the conversation
party is present and generates a hearing aid sound based on the conversation voice of the
conversation party, and a microphone for capturing the conversation voice of the conversation
party (the conversation party side Accept the speech signal of conversational speech from the
microphone).
[0037]
Further, the voice input acceptance unit 13 is installed at a position where a third party is
present, and is output from each of the speakers 20a to 20d in order to set the sound pressure
level of the hearing aid output from each of the speakers 20a to 20d. Accept an audio signal of
the anti-hearing sound from the microphone (third party microphone) for capturing the antihearing sound.
[0038]
The spectrum envelope database 14 is a database in which a plurality of spectrum envelopes of
typical human speech signals are stored.
Here, representative human speech signals are extracted from speech signals of various persons
using statistical methods such as clustering.
[0039]
The control unit 15 is a control unit having a control program such as an operating system (OS),
a program defining processing procedures of various processes, and a memory for storing
various data, and executing various processes.
The control unit 15 includes a hearing aid spectrum generation unit 15a, a filter design
processing unit 15b, and a sound pressure level setting unit 15c.
09-05-2019
10
[0040]
The hearing aid spectrum generation unit 15a is a generation unit that generates a spectrum of
the hearing aid sound based on the conversational voice of the conversation party.
FIG. 5 is a diagram for explaining the process of generating the soundproofing spectrum by the
soundproofing spectrum generator 15a.
[0041]
As shown in FIG. 5, the hearing aid spectrum generation unit 15a acquires the speech signal of
the speaking party from the speech input reception unit 13, performs spectrum analysis of the
speech signal at predetermined time intervals, and so forth. Extract the features of the
conversational speech of
FIG. 5 shows an example of a spectrogram 51 obtained as a result of applying spectrum analysis
to speech waveforms 50 of "A", "I", "U", "E" and "O".
[0042]
Then, from the short-time spectrum 52 obtained from such a spectrogram 51, the hearing aid
spectrum generation unit 15a extracts a spectral envelope 53 representing phonological
information and a spectral fine structure 54 representing sound source information.
[0043]
Specifically, the hearing aid spectrum generation unit 15a applies a predetermined window
function such as a Hanning window or a Hamming window to the audio signal, and performs a
short-time spectrum analysis using Fast Fourier Transform (FFT). Run.
[0044]
Subsequently, the hearing aid spectrum generation unit 15a obtains the absolute value of the
09-05-2019
11
value obtained as a result of the fast Fourier transform, and further calculates the logarithm of
the absolute value.
Then, the hearing aid spectrum generation unit 15a applies the inverse fast Fourier transform
(IFFT) to the calculated logarithm value to calculate the cepstrum coefficient.
[0045]
Thereafter, the hearing aid spectrum generation unit 15a performs lifting on the calculated
cepstrum coefficient using a cepstrum window to extract a high quefrency part and a low
quefrance part.
[0046]
Then, the hearing aid spectrum generation unit 15a extracts the spectral envelope 53 by
applying the fast Fourier transform using the low quefrency part, and further performs the fast
Fourier transform by using the high quefrence part. Extract
[0047]
Thereafter, the hearing aid spectrum generation unit 15a calculates the spectral distance
between the extracted spectral envelope and the spectral envelope registered in the spectral
envelope database 14, and the spectral envelope among the spectral envelopes registered in the
spectral envelope database 14 The spectral envelope 55 with the largest distance is selected as a
substitute for the extracted spectral envelope 53.
[0048]
Here, as the spectral distance, the Euclidean distance of a vector consisting of the components of
the low quefency part is used.
Note that the spectral distance used here is not limited to this, and various conventionally
proposed spectral distances such as spectral distances by FFT and spectral envelopes obtained by
linear predictive (LPC) analysis are available. Spectral distances may be used.
[0049]
09-05-2019
12
Then, the hearing aid spectrum generation unit 15a combines the selected spectral envelope 55
and the extracted spectral fine structure 54 to generate the spectrum 56 of the hearing aid
sound, and generates the spectrum 56 of the generated hearing aid sound as the voice
generation unit Output to 16
[0050]
Further, the hearing aid spectrum generation unit 15a reselects the spectrum envelope 55 used
for generation of the hearing aid sound according to the time change of the spectrogram 51 of
the original speech.
Specifically, the hearing aid spectrum generation unit 15a stores the spectrum of the voice signal
received from the voice input reception unit 13 in the past.
[0051]
Then, the hearing aid spectrum generation unit 15a calculates the spectral distance between the
spectrum of the newly received audio signal and the spectrum of the audio signal received in the
past, and the spectral distance becomes equal to or greater than a predetermined value. Choose a
new spectral envelope.
[0052]
Thus, the hearing aid spectrum generation unit 15a generates the hearing aid spectrum 56 using
the spectrum fine structure 54 holding the sound source information of the conversation party
and the spectrum envelope 55 of a typical human speech signal. Therefore, since the anti-hearing
sound holds the sound source information of the talking party, the anti-hearing sound can be
easily fused with the talking voice of the talking party, and it is possible to make it difficult to
hear the speech contents of the talking party.
[0053]
In addition, since the hearing aid spectrum generation unit 15a generates the hearing aid sound
using the spectrum envelope 55 of a typical voice signal of a person, the hearing aid turns into
an unnatural sound and the third party who listens to the hearing aid hears an unnatural sound.
It can prevent giving a feeling of pleasure.
09-05-2019
13
[0054]
Here, although the hearing aid spectrum generation unit 15a selects the spectrum envelope from
among the spectrum envelopes registered in the spectrum envelope database 14 based on the
spectrum distance, the spectrum envelope registered in the spectrum envelope database 14 is
The spectral envelope may be randomly selected from the envelopes, or may be selected in other
manners.
[0055]
The filter design processing unit 15b is a processing unit that designs an FIR (Finite Impulse
Response) filter set in the sound image localization units 17a to 17d and sets an FIR filter in each
of the sound image localization units 17a to 17d.
[0056]
FIG. 6 is a diagram for explaining the design of the FIR filter.
The filter design processing unit 15b detects the impulse detected by the third party microphone
installed at the third party's position when the voice output units 18a to 18d generate pulse
sounds using the speakers 20a to 20d. Information on the response is accepted from the voice
input acceptance unit 13.
[0057]
Then, the filter design processing unit 15b calculates the transfer characteristics G1 (z), G2 (z),
G3 (z), G4 (z) of the speakers 20a to 20d from the impulse response received from the voice
input reception unit 13. .
Here, the transfer characteristics G1 (z), G2 (z), G3 (z), G4 (z) are discrete number sequences
obtained by sampling the impulse response of the pulse sound output from each of the speakers
20a to 20d. z-transformed.
[0058]
Subsequently, the filter design processing unit 15b outputs inverse characteristics H1 (z) = G1 (z)
09-05-2019
14
<-1>, H2 (transfer characteristics G1 (z), G2 (z), G3 (z), G4 (z)). z) = G2 (z) <-1>, H3 (z) = G3 (z) <1>, H4 (z) = G4 (z) <-1> are calculated, and based on the calculated inverse characteristic Then,
the design of an FIR filter that cancels the transfer characteristics G1 (z), G2 (z), G3 (z), G4 (z),
that is, the tap coefficients of the FIR filter are set.
[0059]
Thus, the correction filters H1 (z), H2 (z), H3 (z), H4 (z) that cancel the transfer characteristics G1
(z), G2 (z), G3 (z), G4 (z) are used. Thereby, when a desired signal is input to the speakers 20 a to
20 d via the FIR filter, the signal can be received as it is at the position of a third party.
[0060]
That is, it is outputted as an input signal using a speaker installed at the position of the
conversation party (a position at which the sound image of the hearing aid is to be localized, a
sound source position 21 in FIG. 1). When the signal detected by the user side microphone is
used and the input signal is output from the speakers 20a to 20d through the FIR filter, the
signals output from the speakers 20a to 20d are as if they were output from the position of the
conversation party Can make others feel.
[0061]
Returning to the description of FIG. 4, the sound pressure level setting unit 15 c sets the sound
pressure level of the hearing aid sound output from each of the speakers 20 a to 20 d and
outputs the set value of the sound pressure level to the sound output units 18 a to 18 d. It is a
processing unit.
[0062]
Specifically, the sound pressure level setting unit 15c sets the sound pressure level of the hearing
aid sound to be less than the sound pressure level of the conversation voice at the position of the
conversation party, and prevents the sound pressure level at the third party's position. The sound
pressure level of the listening sound is set to be equal to or higher than the sound pressure level
of the conversation voice.
[0063]
When setting such a sound pressure level, first, the sound pressure level setting unit 15c
calculates the ratio of the sound pressure level of the hearing aid sound output from each of the
speakers 20a to 20d.
09-05-2019
15
[0064]
Specifically, the sound pressure level setting unit 15c sets T to the sound pressure level of the
conversation voice at the position of the conversation party (fixed value obtained by experiment
etc.), and S to the voice of the third party. Sound pressure level (fixed value determined by
experiment etc.), A: difference between sound pressure level of deafness sound and speech sound
at third party's position (input value), B: soundproof sound and speech speech at position of
talking party When the sound pressure level difference (input value) is used, Lsi = S + A. . .
(1) and LTi = T-B. . .
(2) Detect the sound pressure level that becomes
Here, i takes a value of 1 to 4 and corresponds to the speakers 20a to 20d, respectively.
[0065]
The detection of the sound pressure level is performed by the sound pressure level setting unit
15 c controlling the sound output units 18 a to 18 d and outputting hearing aid sounds of
various sound pressure levels from the speakers 20 a to 20 d.
[0066]
And the sound pressure level setting part 15c calculates ratio L1: L2: L3: L4 of the sound
pressure level of each speaker 20a-20d from the value Li obtained by the following formula |
equation.
Li = min (LSi, LTi). . .
(3) Here, min (LSi, LTi) means selecting the smaller one of the value LSi and the value LTi.
09-05-2019
16
[0067]
Thereafter, the sound pressure level setting unit 15c determines the sound pressure level of the
hearing aid sound output from each of the speakers 20a to 20d.
Specifically, when the sound pressure level of the hearing aid output from each of the speakers
20a, 20b, 20c, 20d is SP1, SP2, SP3, SP4, the sound pressure level setting unit 15c sets the
sound pressure levels SP1, SP2 , SP3, SP4, S + A = SP1 + SP2 + SP3 + SP4. . .
(4) および T−B=SP1+SP2+SP3+SP4...
(5) is satisfied, and further, SP1: SP2: SP3: SP4 = L1: L2: L3: L4. . .
Determine to satisfy (6).
[0068]
The sound generation unit 16 is a generation unit that generates a sound signal of the hearing
aid sound from the spectrum of the hearing aid sound generated by the hearing aid sound
spectrum generation unit 15a.
[0069]
The sound image localization units 17a to 17d receive the information of the FIR filter designed
by the filter design processing unit 15b, and apply the FIR filter to the sound signal of the
hearing aid sound generated by the sound generation unit 16 based on the information. This
processing unit localizes the sound image of the hearing aid sound so that the third person can
hear the hearing aid sound from the position of the conversation party.
[0070]
The sound output units 18a to 18d are output units that perform D / A conversion and
amplification processing of the sound signals output from the sound image localization units 17a
to 17d, and output the sound signals to the speakers 20a to 20d.
09-05-2019
17
[0071]
In addition, the sound output units 18a to 18d output sound signals such as BGM having no
correlation with the conversational sound by the sound image localization units 17a to 17d in
order to make it difficult to understand that the hearing aid sound is output from the speakers
20a to 20d. It outputs as a hearing aid with a voice signal.
[0072]
Furthermore, when the audio output units 18a to 18d perform amplification processing, the
sound pressure level setting unit 15c sets the sound pressure level of the hearing aid sound to
the sound pressure level set for each of the audio output units 18a to 18d. Perform amplification
processing.
[0073]
Next, a processing procedure of setting processing for setting an FIR filter will be described.
FIG. 7 is a flowchart showing the processing procedure of setting processing for setting an FIR
filter.
In addition, before the start of this setting process, it is assumed that the position of the third
party and the positions of the speakers 20a to 20d are determined in advance.
[0074]
In the setting process of the FIR filter, first, the filter design processing unit 15b of the voice
processing device 10 determines that the impulse response of the pulse signal output from each
of the speakers 20a to 20d is placed on the third party side Information on the measurement
result measured using the microphone is acquired via the voice input unit 13 (step S101).
[0075]
Then, the filter design processing unit 15b calculates transfer characteristics G1 (z) to G4 (z)
from the measured impulse response (step S102), and inverse characteristics H1 of the transfer
characteristics G1 (z) to G4 (z). The FIR filter is designed based on (z) to H4 (z) (step S103).
09-05-2019
18
[0076]
After that, the filter design processing unit 15b outputs the information of the designed FIR filter
to the sound image localization units 17a to 17d, and applies the FIR filter to the sound image
localization units 17a to 17d (step S104). Finish.
[0077]
Next, the processing procedure of setting processing for setting the sound pressure level of the
hearing aid sound output from each of the speakers 20a to 20d will be described.
FIG. 8 is a flowchart showing the procedure of setting processing for setting the sound pressure
level of the hearing aid sound output from each of the speakers 20a to 20d.
In addition, before the start of this setting process, it is assumed that the position of the third
party and the position of each speaker are determined in advance.
[0078]
In this sound pressure level setting process, first, the sound pressure level setting unit 15c
determines the position of the third party and the sound pressure level of the conversation voice
at the position of the conversation party (S in equation (1) and The input of T) in 2) is received
through the input unit 11 (step S201).
[0079]
Then, the sound pressure level setting unit 15c receives the input of the difference (A in the
equation (1)) of the allowable sound pressure level between the hearing aid sound and the
conversation voice at the position of the third party through the input unit 11. It receives (step
S202).
[0080]
Further, the sound pressure level setting unit 15c of the voice processing device 10 inputs the
input of the difference (B in the equation (2)) of the allowable sound pressure level between the
hearing aid sound and the conversation voice at the position of the conversation party 11 is
received (step S203).
09-05-2019
19
[0081]
Subsequently, the sound pressure level setting unit 15c calculates the ratio of the sound pressure
level of the hearing aid sound output from each of the speakers 20a to 20d using the equations
(1), (2) and (3) (step S204) ).
[0082]
Then, the sound pressure level setting unit 15c uses the equations (4), (5) and (6) to calculate
each speaker 20a from the calculated ratio of the sound pressure level and the difference
between the input sound pressure level. The sound pressure levels SP1 to SP4 of the hearing aid
sound output by to 20d are calculated (step S205).
[0083]
Thereafter, the sound pressure level setting unit 15c outputs the information of the calculated
sound pressure levels SP1 to SP4 to the audio output units 18a to 18d, and the sound pressure
levels SP1 to SP4 are output to the respective audio output units 18a to 18d. It sets as a sound
pressure level of the hearing aid sound which speaker 20a-20d outputs (step S206), and the
setting process of this sound pressure level is complete | finished.
[0084]
As described above, according to the present embodiment, when the sound image localization
units 17a to 17d of the voice processing device 10 output the hearing aid sound to be output
from the position of the talking party, the hearing sound being output from the position of the
talking party Since the sound image of the hearing aid sound is localized as perceived by a third
party other than the third party, and the audio output units 18a to 18d output the hearing aid
sound in which the sound image is localized, the speech sound of the conversation party is
covered It is possible to effectively blend the outputted hearing aid sound with the conversation
voice of the conversation party to protect the privacy of the conversation party.
[0085]
Further, according to the present embodiment, the audio output units 18a to 18d adjust the
sound pressure level of the hearing aid sound so that the sound pressure level of the hearing aid
sound is equal to or higher than the sound pressure level of the conversation voice at the third
party's position. Since the soundproofing sound whose sound pressure level is adjusted is output,
it is possible to make it difficult for a third party to hear the conversation of the conversation
party, and the privacy of the conversation party can be effectively protected.
09-05-2019
20
[0086]
Further, according to the present embodiment, the audio output units 18a to 18d further adjust
the sound pressure level of the hearing aid sound so that the sound pressure level of the hearing
aid sound becomes smaller than the sound pressure level of the conversation voice at the
position of the conversation party Since the soundproofing sound whose sound pressure level is
adjusted is output, the soundproofing sound output to protect the privacy of the talking party
can be prevented from interfering with the conversation of the talking party.
[0087]
Further, according to the present embodiment, since the voice output units 18a to 18d output
the hearing aid sound including the voice such as BGM having no correlation with the
conversational voice, the hearing aid sound is output from the speakers 20a to 20d. Can be
difficult to understand.
[0088]
Now, although the embodiments of the present invention have been described above, the present
invention can be practiced in various different embodiments within the scope of the technical
idea described in the above-mentioned claims besides the above-described embodiments. Is also
good.
[0089]
For example, in the above embodiment, as described with reference to FIG. 5, the spectrum of the
hearing aid sound is generated using the spectral envelope of a typical human speech signal, but
other methods, for example, the spectrum It is also possible to generate a soundproof sound
spectrum by performing deformation such as unevenness inversion or shift of the envelope.
[0090]
FIG. 9 is a diagram for explaining another method of generating a hearing aid spectrum, and FIG.
10 is a diagram for explaining a correction method of the hearing aid spectrum.
In the method shown in FIG. 9, first, the spectral envelope and the spectral fine structure are
extracted from the speech signal of the speaking party.
09-05-2019
21
[0091]
Then, the extracted spectral envelope is inverted about a predetermined inversion axis (in the
example of FIG. 9, a cosine function that approximates the extracted spectral envelope) or
deformed by shifting the extracted spectral envelope. By combining the transformed spectral
envelope and the spectral fine structure, the spectrum of the hearing aid is generated.
[0092]
However, if such a method is used, the sound of the hearing aid may become loud and it may
cause human discomfort. Therefore, the spectrum of the frequency domain that causes the step
height in the spectrum of the generated hearing aid. Perform processing to correct the intensity.
[0093]
The amount of correction of the spectral intensity is calculated by comparing the spectral
features of the speech signals of the various speaking parties with the spectrum of the hearing
aids generated on the basis of the speech signals of the speaking parties.
[0094]
Specifically, as shown in FIG. 10, spectra of speech signals of various conversational parties are
collected, and an average value of the collected spectra (spectral average of original sound) is
calculated.
On the other hand, the spectrum of the audio signal of the hearing aid sound generated from the
speech signals of various conversational parties is collected, and the average value of the
collected spectrum (spectral average of the hearing aid sound) is calculated.
[0095]
Then, an increase in the spectrum obtained by subtracting the spectrum average of the original
sound from the spectrum average of the hearing aid (the spectrum increase of the hearing aid) is
calculated, and a frequency band in which the spectrum increase of the hearing aid is a positive
value is detected.
[0096]
09-05-2019
22
Since this frequency band causes the instep height, the spectrum increase of the hearing noise in
that frequency band is set as the correction amount of the spectral strength of the hearing sound,
and the spectral strength of the hearing noise is suppressed by the correction amount. Perform
the process.
[0097]
Also, among the processes described in the above embodiments, all or part of the processes
described as being automatically performed can be manually performed, or the processes
described as being manually performed. All or part of them can be automatically performed by a
known method.
In addition to the above, the processing procedures, control procedures, specific names, and
information including various data and parameters shown in the above documents and drawings
can be arbitrarily changed unless otherwise specified.
[0098]
Further, each component of each device illustrated in the drawings is functionally conceptual,
and does not necessarily have to be physically configured as illustrated.
That is, the specific form of the distribution and integration of each device is not limited to the
illustrated one, and all or a part thereof may be functionally or physically dispersed in any unit
depending on various loads, usage conditions, etc. It can be integrated and configured.
Furthermore, all or any part of each processing function performed in each device may be
realized by a CPU and a program analyzed and executed by the CPU, or may be realized as wired
logic hardware.
[0099]
The voice processing method described in the above embodiment can be realized by executing a
09-05-2019
23
prepared program on a computer such as a personal computer or a workstation.
This program can be distributed via a network such as the Internet.
The program may also be recorded on a computer-readable recording medium such as a hard
disk, a flexible disk (FD), a CD-ROM, an MO, and a DVD, and may be executed by being read from
the recording medium by a computer.
[0100]
As described above, the voice processing apparatus and the voice processing method according
to the present invention effectively merge the hearing aid sound output over the voice of the
speaking party with the voice of the speaking party to protect the privacy of the speaking party
Useful for voice processing systems that need to be done.
[0101]
It is a figure explaining the concept of the speech processing concerning the present invention.
It is a figure explaining sound pressure level adjustment processing concerning the present
invention.
It is a figure explaining the example of arrangement of a speaker.
It is a figure which shows the function structure of the speech processing unit 10 which concerns
on a present Example.
It is a figure explaining the generation | occurrence | production process of the soundproofing
sound spectrum by the soundproofing sound spectrum generation part 15a.
It is a figure explaining design of a FIR filter.
09-05-2019
24
It is a flowchart which shows the process sequence of the setting process which sets a FIR filter.
It is a flowchart which shows the process procedure of the setting process which sets the sound
pressure level of the soundproofing sound output from each speaker 20a-20d.
It is a figure explaining the other method of generating the spectrum of hearing aid sound.
It is a figure explaining the correction method of the spectrum of hearing aid sound.
Explanation of sign
[0102]
DESCRIPTION OF SYMBOLS 10 voice processing apparatus 11 input part 12 display part 13
voice input reception part 14 spectrum envelope database 15 control part 15a hearing aid sound
spectrum production part 15b filter design processing part 15c sound pressure level setting part
16 voice production part 17a-17d sound image localization part 18a 18 18d voice output unit
20a to 20d, 30a to 30e, 40a, 40b, 42a, 42b Speaker 21, 41, 43 sound source position 50 voice
waveform 51 spectrogram 52 short time spectrum 53 spectrum envelope 54 spectrum fine
structure 55 substituted spectrum envelope 56 Hearing aid spectrum
09-05-2019
25
Документ
Категория
Без категории
Просмотров
0
Размер файла
36 Кб
Теги
jp2007235864
1/--страниц
Пожаловаться на содержимое документа