close

Вход

Забыли?

вход по аккаунту

?

JP2017034519

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2017034519
Abstract: Unlike directivity, which is a sensitivity related to the direction in which speech comes
in, selectively emphasizing speech generated within a certain range in space or selectively
suppressing speech generated outside the certain range By doing this, it is possible to selectively
record audio generated within the predetermined range. A specific area of a space in which the
voice input unit is installed by weighting and combining two or more voice input units that
convert voice into an electrical signal and electrical signals output from the two or more voice
input units. A voice processing apparatus comprising: a voice synthesis unit that generates a
voice signal in which an internal voice is emphasized or suppressed. [Selected figure] Figure 1
Speech processing apparatus, speech processing system and speech processing method
[0001]
The present invention relates to an audio processing device, an audio processing system, and an
audio processing method.
[0002]
Conventionally, various devices have been devised for directivity control of microphones, and
there are Patent Documents 1 to 3 listed below as an example.
[0003]
In Patent Document 1, at least three unidirectional microphone elements are disposed at
03-05-2019
1
substantially equal intervals in a direction orthogonal to the main axis of directivity, and an
output signal from each microphone element is added using an adder. .
In each of the microphone elements, the sound collecting surfaces face in the same direction.
As described above, in the microphone device in which the microphone elements are arranged,
sharp directivity is realized in the middle range required for the input means of the voice
recognition device, and high sensitivity to the voice input from the front and side surfaces are
realized. It is possible to realize sound collection with very few noise components input from.
[0004]
Patent Document 2 discloses a reference microphone, a first pair of microphones disposed at the
center of the reference microphone, and a second, disposed orthogonal to the first pair of
microphones at the center of the reference microphone. And a third pair of microphones
disposed at an angle of 45 degrees with respect to the first pair of microphones and the second
pair of microphones centered on the reference microphone A microphone device is disclosed that
is provided on a plane and that variably controls the main axis of directivity and / or the
directivity of the directivity based on the reference microphone and the first, second and third
pairs of microphones. As a result, it becomes possible to separate the left and right sound sources
centered on the microphone and to perform voice recording or voice recognition in real time.
[0005]
U.S. Pat. No. 5,075,019 describes three or less microphones, each having substantially the same
tilt order and frequency response, each microphone generating an electrical signal responsive to
the sound pressure at each microphone, and A processor coupled to receive an electrical signal
from each microphone and operative to generate an output signal for a tilted directional
microphone system having a tilting order at least two tilting orders higher than that of each
microphone An inclined directional microphone system configured by This tilt-directed
microphone system can substantially reduce system size and complexity as compared to the
prior art.
[0006]
03-05-2019
2
Japanese Patent Application Publication No. 08-25018 Japanese Patent Application Publication
No. 2002-271885 Japanese Patent Application Publication No. 08-505514
[0007]
By the way, in the conventional microphone technology for recording the target voice, the target
voice with less noise is recorded by forming a superdirective microphone by a delay-and-sum
array or forming a blind spot by an adaptive microphone array. ing.
However, in order to sharpen directivity, a large number of microphones are required, and there
is a problem that it becomes expensive.
[0008]
Also, the directivity of the conventional microphone is discussed only for the direction, and there
is no discussion about the distance between the microphone and the sound source. That is, even
if the directivity is adjusted variously, there is a problem that noise coming from the direction of
directivity behind the target sound in the vicinity of the microphone is also recorded.
[0009]
The present invention selectively emphasizes voice generated within a certain range in space or
selectively selects voice generated outside the certain range, unlike directivity, which is
sensitivity to the direction in which voice comes in. An object of the present invention is to
realize an audio processing device, an audio processing system, and an audio processing method
capable of selectively recording audio generated within the predetermined range by suppressing.
[0010]
In one aspect of the present invention, the voice input unit is installed by weighting and
combining two or more voice input units that convert voice into an electrical signal, and electrical
signals output from the two or more voice input units. And a voice synthesis unit for generating a
voice signal emphasizing voice coming from a sound source in a specific area formed as a closed
space in a closed space.
03-05-2019
3
[0011]
In the voice processing apparatus configured in this manner, it is possible to combine specific
signals different from directivity only in space by merely combining electrical signals generated
by converting voices input to two or more voice input units and generating them. An audio signal
can be generated that selectively emphasizes the incoming sound from a sound source within a
certain range.
As a result, it is possible to make the voice generated in the specific area easy to hear and the
voice generated outside the specific area hard to hear.
[0012]
In one aspect of the present invention, the voice input unit is installed by weighting and
combining two or more voice input units that convert voice into an electrical signal, and electrical
signals output from the two or more voice input units. And a voice synthesis unit that generates a
voice signal in which voice coming from a sound source outside the specific area formed as a
closed space is reduced.
[0013]
In the voice processing apparatus configured in this manner, it is possible to combine specific
signals different from directivity only in space by merely combining electrical signals generated
by converting voices input to two or more voice input units and generating them. It is possible to
generate an audio signal in which audio coming from a sound source outside a certain range is
selectively suppressed.
This makes it possible to make the voice generated outside the specific area inaudible and the
voice generated in the specific area easy to hear.
[0014]
One of the selective aspects of the present invention is that the speech synthesis unit performs
the weighting synthesis by digitally processing a digital speech signal obtained by analog / digital
conversion of two or more microphone inputs at a predetermined sampling frequency. And the
03-05-2019
4
spacing between the two or more microphones is equal to or less than a half wavelength of the
sampling frequency.
[0015]
In the audio processing apparatus configured as described above, when digital processing is
performed on digital audio signals obtained by analog / digital converting two or more of the
microphone inputs at a predetermined sampling frequency, the audio coming from the same
sound source reaches each microphone Since the time difference between the two voices is
sufficiently short compared to the wavelength of the input speech, and the phases of the input
speech of both microphones can be regarded as substantially the same phase, the influence of
the phase difference of the input speech can be ignored.
[0016]
One of the optional aspects of the present invention further includes position information input
means for inputting position information, and the voice synthesis unit is configured to include
two or more of the above so that the specific area includes a position corresponding to the
position information. A speech processing apparatus characterized by weighting and combining
electric signals output from a speech input unit.
[0017]
In the voice processing apparatus configured as described above, the specific area is configured
to include the position information input from the position information input means, so that the
voice coming from the sound source at that position can be made easy to hear or vice versa. It is
possible to make it difficult to hear the voice coming from the sound source at that position.
As position information input by the position information input means, for example, information
indicating the position of a sound source to be recorded like a speaker etc., information
indicating the position of a sound source not to be recorded like a noise source etc. Ru.
[0018]
According to another aspect of the present invention, the voice input unit comprises two or more
voice input units for converting voice into an electrical signal, and the voice input unit by
weighting and combining electrical signals output from the two or more voice input units. And a
voice synthesis unit for generating a voice signal emphasizing voice coming from a sound source
03-05-2019
5
in a specific area of the installed space.
[0019]
According to another aspect of the present invention, the voice input unit comprises two or more
voice input units for converting voice into an electrical signal, and the voice input unit by
weighting and combining electrical signals output from the two or more voice input units. And a
voice synthesis unit configured to generate a voice signal in which voice coming from a sound
source outside the specific area of the installed space is suppressed.
[0020]
One of the other aspects of the present invention is a voice processing method performed by
using two or more voice input units for converting voice into an electrical signal, and combining
and combining electrical signals output from the two or more voice input units. The voice
processing method is characterized by generating a voice signal emphasizing voice coming from
a sound source in a specific area of the space where the voice input unit is installed.
[0021]
One of the other aspects of the present invention is a voice processing method performed by
using two or more voice input units for converting voice into an electrical signal, and combining
and combining electrical signals output from the two or more voice input units. The voice
processing method is characterized by generating a voice signal in which voice coming from a
sound source outside the specific area of the space where the voice input unit is installed is
suppressed.
[0022]
The above-described voice processing apparatus and voice processing system include various
aspects such as being implemented in the state of being incorporated into another device or
being implemented with another method.
In addition, the above-described speech processing method includes various aspects such as
being implemented as part of other methods.
The present invention can also be realized as a program that causes a computer to realize the
03-05-2019
6
function corresponding to the configuration of the above-described audio processing method, or
a computer-readable recording medium having the program recorded thereon.
[0023]
According to the invention as set forth in claims 1, 5 and 7, the space called the specific area
different from the directivity only by weighting and combining the electric signals generated by
converting the incoming voice into two or more voice input units. An audio signal can be
generated that selectively emphasizes the incoming audio from the sound sources within the
upper range.
As a result, it is possible to make the sound generated in the specific area easy to hear and the
sound generated outside the specific area hard to hear.
[0024]
According to the invention as set forth in claims 2, 6 and 8, the specific area in the space is a
specific area only by weighting and combining the electric signals generated by converting the
incoming voice into two or more voice input units. It is possible to generate an audio signal in
which audio coming from an external sound source is selectively suppressed.
This makes it possible to make the voice generated outside the specific area inaudible and the
voice generated in the specific area easy to hear.
[0025]
According to the third aspect of the present invention, when digital processing is performed on a
digital audio signal obtained by analog / digital conversion of two or more microphone inputs at
a predetermined sampling frequency, the influence of the phase difference of the input audio can
be ignored.
[0026]
According to the invention of claim 4, since the specific area is configured to include the position
information input from the position information input means, the voice coming from the sound
03-05-2019
7
source at that position can be made easy to hear, or conversely, It is possible to make it difficult
to hear the incoming voice from the position source.
[0027]
It is a block diagram showing a schematic structure of a speech processing unit of a 1st
embodiment.
It is a figure which shows the directivity of an audio | voice input part.
It is a figure which shows the specific area of the synthetic | combination speech signal which
carried out simple addition of two audio | voice signals.
It is a figure which shows the specific area of the synthetic | combination speech signal which
weighted-added two audio | voice signals.
It is a figure which shows the specific area of the synthetic | combination speech signal which
carried out the simple subtraction of two audio | voice signals.
It is a figure which shows the specific area of the synthetic | combination speech signal which
weighted-subtracted two audio | voice signals. It is a block diagram which shows schematic
structure of the speech processing unit of 2nd Embodiment. It is a figure which shows the
specific area of the synthetic | combination audio | voice signal which carried out weighting
addition of the two left audio | voice signals. It is a figure which shows the specific area of the
synthetic | combination speech signal which carried out the weighting addition of the two audio |
voice signals on the right. It is a figure which shows the specific area of the synthetic |
combination audio | voice signal which added three audio | voice signals. It is a figure which
shows the structure of the speech processing unit of 3rd Embodiment.
[0028]
Hereinafter, the present technology will be described in the following order. (1) First
03-05-2019
8
Embodiment: (2) Second Embodiment: (3) Third Embodiment:
[0029]
(1) First Embodiment FIG. 1 is a block diagram showing a schematic configuration of a speech
processing apparatus 100 according to this embodiment.
[0030]
The voice processing apparatus 100 weights and combines voice signals output from the voice
input units 11 and 12 and the voice input units 11 and 12 as two or more voice input units that
convert voice into an electrical signal, and synthesizes a synthesized voice signal Smix. A voice
synthesis unit 20 to be generated, a voice output unit 30 to convert a synthesized voice signal
Smix output from the voice synthesis unit 20 into voice, and an operation unit 40 for operating
the voice processing apparatus 100 are provided.
[0031]
The voice processing apparatus 100 may further include a display unit such as a liquid crystal
display panel, and the display unit is a voice input situation to the voice input units 11 and 12, a
voice processing situation in the voice synthesis unit 20, a voice processing unit The voice output
status, the operation menu of the operation unit 40, and the like are displayed.
[0032]
The voice input units 11 and 12 convert incoming voice into electrical signals.
The voice input unit 11 outputs a voice signal S11 according to the incoming voice, and the voice
input unit 12 outputs a voice signal S12 according to the incoming voice.
In the present embodiment, the case where each voice input unit is configured by a microphone
will be described as an example.
[0033]
03-05-2019
9
The voice input units 11 and 12 may be configured by a single microphone or may be configured
by a plurality of microphones.
When a plurality of microphones constitute one audio input unit, the audio input unit outputs, as
an audio signal, an electrical signal obtained by combining the outputs of the plurality of
microphones constituting the one audio input unit.
[0034]
In the present embodiment, the audio input units 11 and 12 have the same sensitivity, but the
sensitivity of the audio input unit may be different. In this case, the weighting similar to the
weighting described later is an audio signal in advance. It will be the same as being done.
[0035]
FIG. 2 is a diagram showing the directivity of the audio input units 11 and 12.
In FIG. 3 and FIGS. 3 to 5 described later, the directivity of the voice input units 11 and 12 is
indicated by a two-dot chain line.
[0036]
In FIG. 2, the symbol of ECM (electret condenser microphone) is shown as the audio | voice input
parts 11 and 12, and the audio | voice input parts 11 and 12 have illustrated directivity similar
to unidirectionality. However, the types of microphones constituting the voice input units 11 and
12 are not limited to this, and the type of directivity of each voice input unit is not particularly
limited, and various directivity can be adopted Directivity (non-directionality), bi-directionality,
unidirectionality, narrow directivity, sharp directivity, superdirectivity, etc. Although control of
the formation range of the specific area by speech synthesis described later is complicated, the
directivity of each speech input unit does not have to be the same.
[0037]
The voice synthesis unit 20 includes an arithmetic unit 21 having arithmetic processing
capability such as a microcomputer, a plurality of signal input ports 22 and 23 corresponding to
03-05-2019
10
the number of voice input units, and voice output ports 24 of the number corresponding to the
number of voice output units. , And the number of control signal input ports 25 corresponding to
the number of control units.
[0038]
Audio signals S11 and S12 are input to the signal input ports 22 and 23 from the audio input
units 11 and 12, respectively.
The signal output port 24 outputs a synthetic speech signal Smix. A control signal Sc for
controlling arithmetic processing to be executed in the speech synthesis unit 20 is input from the
operation unit 40 to the control signal input port 25.
[0039]
When the specific area to be described later is fixed without being changed according to the
operation of the operation unit 40 or based on the sound source position to be described later,
the operation unit 40 may not be provided. In this case, the control signal input port 25 of the
voice synthesis unit 20 may not be provided.
[0040]
The voice output unit 30 converts the input synthetic voice signal Smix into voice and outputs it.
The audio output unit 30 can be configured by a so-called speaker. Note that the audio output
unit 30 is not an essential component of the audio processing device 100, and may be configured
to be able to externally connect headphones, speakers, etc. as an audio output unit by providing
output ports such as earphone jacks and line out terminals. .
[0041]
Hereinafter, an example of speech synthesis performed by the speech synthesis unit 20 will be
described. In the speech synthesis described below, speech synthesis using digital operation
03-05-2019
11
processing using a digital speech signal generated by analog / digital conversion of a speech
signal will be described.
[0042]
Needless to say, the speech synthesis is not limited to digital arithmetic processing, and the same
synthetic speech signal can be obtained by performing synthesis such as addition, subtraction,
etc. using the analog circuit as it is in the analog speech signal input from each speech input unit.
You can also get it.
[0043]
In the arithmetic processing according to the present embodiment, a digital arithmetic processing
of weighting and synthesizing the audio signals S11 and S12 output from the audio input units
11 and 12 is performed, thereby specifying the space in which the audio input units 11 and 12
are installed. The synthetic speech signal Smix emphasizing the incoming voice from the sound
source in the area or the synthetic speech signal Smix suppressing the incoming speech from the
sound source in the specific area is generated.
[0044]
When performing digital arithmetic processing, the distance between the audio input units 11
and 12 constituting the plurality of audio input units is equal to or less than the wavelength of
the sampling frequency.
As a result, the time difference between the arrival of sound from the same sound source to reach
each microphone becomes sufficiently short compared to the wavelength of the input sound, and
the phases of the input sound of both microphones can be regarded as substantially the same
phase. The effect of the phase difference of can be ignored.
[0045]
The sampling frequency (in fact, the maximum frequency detectable at that sampling frequency)
is determined according to the frequency of the target speech.
03-05-2019
12
For example, when the target voice is a voice, the frequency about twice the frequency (3 kHz to
4 kHz) of the voice is a sampling frequency (8 kHz etc.), and the distance between the voice input
centers 11c and 12c of the voice input units 11 and 12 is Is set to a wavelength (about 5 cm) or
less of the sampling frequency.
[0046]
In the voice processing apparatus 100 configured in this manner, voices S11 and S12 of the
voice input units 11 and 12 are added and subtracted to emphasize voice coming from a sound
source in a specific area, or to be outside the specific area Can suppress the sound coming from
the sound source.
[0047]
That is, when the voice signals S11 and S12 are added, the voice of the sound source near the
voice input units 11 and 12 is emphasized, and the voice emphasis degree decreases as the voice
coming from the sound source separated from the voice input units 11 and 12 Therefore, the
voice is relatively suppressed.
The vicinity of the voice input units 11 and 12 means that the radius is in the range of about one
wavelength of the frequency of the target voice, and the center of gravity of this range changes
according to the directivity of the voice input units 11 and 12.
[0048]
Further, when the voice signals S11 and S12 are subtracted, the voices of the sound sources at
the intermediate distance between the voice input units 11 and 12 in the vicinity of the voice
input units 11 and 12 cancel each other out and are suppressed. With regard to the sound of the
sound sources in the vicinity of each of the sound input units 11 and 12 except for the above,
since a significant volume difference occurs between the sound signals S11 and S12, the sound
remains without being suppressed. On the other hand, since the volume difference between the
audio signals S11 and S12 gradually decreases and the voices of the sound sources far from the
audio input units 11 and 12 are equalized, they cancel each other by subtracting the audio
signals S11 and S12. Be suppressed.
03-05-2019
13
[0049]
In the following, a specific area in which sound is emphasized or a specific area in which sound is
not suppressed is referred to as a sensed area Rp, and an area other than the sensitive area Rp is
referred to as a dead area Rn. In FIGS. 3 to 5, the directivity of the voice input units 11 and 12 is
indicated by a two-dot chain line, and the boundary of the specific area is indicated by a one-dot
chain line.
[0050]
FIGS. 3 to 5 respectively show diagrams for explaining a specific area in the case of generating
the synthetic speech signal Smix by digital arithmetic processing.
[0051]
FIG. 3 shows the sensitive area Rp and the insensitive area Rn of the synthetic speech signal Smix
obtained by simply adding the speech signal S11 and the speech signal S12.
In the figure, the sensing area Rp has a center of gravity at any point on a line passing
approximately halfway between the voice input unit 11 and the voice input unit 12 (in the
present embodiment, the voice input units 11 and 12 have directivity). Therefore, the center of
gravity is formed in a closed space located in front of the voice input units 11 and 12), and a
dead area Rn is formed outside the closed space.
[0052]
FIG. 4 shows a sensing area Rp and a blind area Rn of a synthesized speech signal Smix obtained
by weighting and adding the speech signal S11 and the speech signal S12. In the example shown
in the figure, the weighting of the audio signal S11 is larger than that of the audio signal S12. In
this case, the center of gravity of the closed space of the sensitive area Rp is closer to the voice
input unit 11 than in the case of simple addition, and the outside of the closed space of the
sensitive area Rp becomes the insensitive area Rn. That is, when the audio signal is added, the
center of gravity of the closed space of the sensitive area Rp is located closer to the audio input
unit with higher weighting.
03-05-2019
14
[0053]
FIG. 5 shows the sensitive area Rp and the insensitive area Rn of the synthetic speech signal Smix
obtained by simply subtracting the speech signal S12 from the speech signal S11. In the same
figure, a sense area Rp closer to the voice input unit 11 and a sense area Rp closer to the voice
input unit 12 are formed as the sense area Rp. One sensitive area Rp is formed in a closed space
whose center of gravity is any point on a line passing the outer side of the voice input unit 11,
and the other sensitive area Rp is outside the voice input unit 12. It is formed in a closed space
whose center of gravity is any point on the passing line, and the dead area Rn is formed outside
these closed spaces.
[0054]
FIG. 6 shows the sensitive area Rp and the insensitive area Rn of the synthesized speech signal
Smix obtained by weighting and subtracting the speech signal S12 from the speech signal S11. In
the example shown in the figure, the weighting of the audio signal S11 is larger than that of the
audio signal S12.
[0055]
Also in this case, the sense area Rp1 closer to the voice input unit 11 and the sense area Rp2
closer to the voice input unit 12 are formed as the sense area Rp, but the sense area Rp1 closer
to the voice input unit 11 is a voice The degree of separation from the voice input unit 11 of the
gravity center of the closed space of the sensing area Rp1 formed larger than the sensing area
Rp2 closer to the input unit 12 is from the speech input unit 12 of the gravity center of the
closed space of the sensing area Rp2. Greater than the degree of separation of
[0056]
That is, when the audio signals S11 and S12 are subtracted, the sensitive area Rp closer to the
audio input portion of the audio signal with a larger weight is formed wider than the sensory
area Rp closer to the audio input portion of the audio signal with a smaller weight Ru.
Further, the degree of separation of the center of gravity of the sensing area Rp formed closer to
03-05-2019
15
the voice input portion of the voice signal with a larger weight from the voice input portion is a
sense area formed closer to the voice input portion of the voice signal with a smaller weight The
degree of separation of the center of gravity of Rp from the voice input unit is larger.
[0057]
Thus, in the speech processing apparatus 100 having two speech input units, the speech coming
from the sound source in the sensing area Rp as a closed space formed within a certain range is
emphasized, or the sensing area Rp It is possible to generate a synthetic speech signal Smix in
which the speech coming from the sound source of the outer dead area Rn is suppressed.
Further, by adjusting the weighting of the audio signal S11 and the audio signal S12, the range of
the sensitive area Rp and the insensitive area Rn is increased or decreased, or the formation
position of the sensitive area Rp and the insensitive area Rn is adjusted. be able to.
[0058]
(2) Second Embodiment FIG. 7 is a block diagram showing a schematic configuration of the
speech processing apparatus 200 of the present embodiment. Since the voice processing device
200 shown in the figure has the same configuration as the voice processing device 100 except
that the number of voice input units is three, the first embodiment is the same as the
configuration common to the voice processing device 100. The same reference numerals as in
the embodiment are given and the detailed description is omitted. The configuration of the voice
input unit 13 is the same as that of the other voice input units 11 and 12, and the voice input
unit 13 inputs the voice signal S13 to the signal input port 26 of the voice synthesis unit 20.
[0059]
The voice input units 11 to 13 are arranged in a positional relationship in which vertexes of
triangles are different from one another so that the three voice input units do not line up on a
straight line. When performing digital arithmetic processing, the distance between each of the
audio input units 11, 12, and 13 constituting a plurality of audio input units is equal to or less
than the wavelength of the sampling frequency.
[0060]
03-05-2019
16
According to the voice processing apparatus 200 configured in this way, specific areas are
formed with various positions and shapes by adding / subtracting the voice signals S11, S12, S13
from the voice input units 11, 12, 13 It is possible to emphasize the incoming voice from the
sound source in the area, and to suppress the incoming voice from the sound source outside the
specific area.
[0061]
FIGS. 8 to 10 respectively show diagrams for explaining a specific area in the case of generating
the synthetic speech signal Smix by digital arithmetic processing using the speech processing
apparatus 200 having a three-input configuration.
[0062]
FIG. 8 shows the sensitive area Rp and the insensitive area Rn of the synthesized speech signal
Smix obtained by weighting and subtracting the speech signal S11 from the speech signal S13.
In the example shown in the figure, the weighting of the audio signal S13 is larger than that of
the audio signal S11.
In this case, as in the sense area Rp and the insensitive area Rn shown in FIG. 6, a sense area Rp
closer to the voice input unit 13 and a sense area Rp closer to the voice input unit 11 are formed
as the sense area Rp. However, the sense area Rp closer to the voice input unit 13 is formed
larger than the sense area Rp closer to the voice input unit 11, and the voice input unit 13
generates the center of gravity of the closed space of the sense area Rp formed larger. The
degree of separation is larger than the degree of separation from the voice input unit 11 of the
center of gravity of the closed space of the small sensing area Rp.
[0063]
FIG. 9 shows the sensitive area Rp and the insensitive area Rn of the synthesized speech signal
Smix obtained by weighting and subtracting the speech signal S12 from the speech signal S13. In
the example shown in the figure, the weighting of the audio signal S13 is larger than that of the
audio signal S12. Also in this case, as in the sense area Rp and the insensitive area Rn shown in
FIG. 6, the sense area Rp closer to the voice input unit 13 and the sense area Rp closer to the
03-05-2019
17
voice input unit 12 are formed as the sense area Rp. However, the sense area Rp closer to the
voice input unit 13 is formed larger than the sense area Rp closer to the voice input unit 12, and
from the voice input unit 13 of the center of gravity of the closed space of the sense area Rp
formed larger. The degree of separation is larger than the degree of separation from the voice
input unit 12 of the center of gravity of the closed space of the small sensing area Rp.
[0064]
FIG. 10 shows a sensitive area Rp and a dead area Rn of the synthetic speech signal Smix
obtained by simply adding the result of weighted subtraction shown in FIG. 8 and the result of
weighted subtraction shown in FIG. In the example shown in the figure, the sense areas Rp
formed in the vicinity of the voice input unit 13 in FIGS. 8 and 9 are added, and the directivity of
the voice input unit 13 is included while including both of these sense areas Rp. A wide sense
area Rp is formed in the direction.
[0065]
As described above, by arranging the three voice input units in such a positional relationship that
the three vertex positions of the triangle are provided, the position and size at which the sensitive
area Rp is formed can be compared to the case where two voice input units are provided. , Can be
adjusted with a high degree of freedom in the direction of extension of the plane in which the
three audio input units are arranged.
[0066]
Furthermore, by arranging the four voice input units in such a positional relationship that the
four are at the apex positions of the triangular pyramid, four voice input units are arranged at the
position and size at which the sensing area Rp is formed. It can be adjusted variously in a threedimensional space.
[0067]
Also, if the direction and distance of the desired sound source to be recorded are known,
calculation processing is performed to form a sensitive area Rp having sensitivity in the area
where the sound source is present, and the sound emitted by the sound source is selectively
recorded. It is also possible to generate a synthetic speech signal Smix.
03-05-2019
18
Also, conversely, if the direction and distance of the desired sound source that you do not want to
record are known, the sensitive area Rp with no sensitivity is formed in the area where the sound
source is located, and calculation processing is performed. It is also possible to generate a
synthetic speech signal Smix which is not selectively recorded.
Of course, it is also possible to generate a synthetic speech signal Smix that does not record a
desired sound source that you do not want to record while recording the voice of the desired
sound source that you want to record. Note that the direction and distance of the desired sound
source may be designated by the user through the operation of the operation unit and may be
input to the calculation unit, or the desired sound source (for example, a speech may be
generated by various methods known or The position of the person) may be automatically
identified and input to the computing unit.
[0068]
(3) Third Embodiment: FIG. 11 is a diagram showing an example of a circuit that specifically
realizes the speech processing apparatus.
[0069]
In the voice processing apparatus 300 shown in the figure, capacitor microphones 311, 312, and
313 connected in series with the resistors R1, R1, and R1 of 10 kΩ, respectively, between the
constant voltage source Vcc (3.3 V or the like) and the ground Gnd It is connected.
The connection points J1, J2 and J3 of the resistor R and the condenser microphones 311, 312
and 313 are connected to the input ports 321, 322 and 323 of the microcomputer 320,
respectively. It is desirable that the phase relationship be transmitted as faithfully as possible
between the condenser microphones 311, 312, 313 and the input ports 321, 322, 323, and in
FIG. It is directly connected.
[0070]
The voltage input to the input ports 321, 322 and 323 is converted to a digital signal by the
analog / digital converter in the microcomputer 320, and the operation unit 324 in the
03-05-2019
19
microcomputer 320 uses the digital signal to synthesize the synthetic speech signal Smix.
Generate The digital synthetic audio signal Smix is converted into an analog signal by a digital /
analog conversion unit in the microcomputer 320, and is analogly output to the output port 325
of the microcomputer 320.
[0071]
In the speech processing apparatus 300 shown in FIG. 11, the condenser microphones 311, 312,
and 313 constitute speech input units, and the microcomputer 320 constitutes a speech
synthesis unit.
[0072]
The output port 325 is connected to the noninverting input terminal of the operational amplifier
Op via a resistor R2, and a voltage obtained by resistively dividing the output terminal voltage of
the operational amplifier Op at a constant rate is feedback input to the inverting input terminal
of the operational amplifier Op. ing.
The operational amplifier Op functions as a non-inverting amplifier circuit. An output terminal
330 to which a speaker or the like is connected via a capacitor C is provided at the output
terminal of the operational amplifier Op, and the synthetic voice signal Smix whose direct current
component is cut is input to the output terminal 330.
[0073]
The voice processing device 300 configured as described above operates as follows according to
the program in the microcomputer 320.
[0074]
In the microcomputer 320, when the apparatus is activated, first, activation processing such as
initialization of various variables is performed, and then the zero level of the input from each
condenser microphone is set.
This zero level setting is performed, for example, by integrating the output of each condenser
03-05-2019
20
microphone a predetermined number of times in a silent state and dividing it by the
predetermined number to detect a DC component.
[0075]
Next, the sampling frequency set at a frequency higher than the target voice frequency (when the
target voice is a voice, 8 kHz at approximately twice the frequency of the voice (3 kHz to 4 kHz,
etc.)) is repeated from the condenser microphones 311, 312, 313 A / D conversion (analog /
digital conversion) is acquired by A / D conversion and acquired, the operation of emphasizing or
suppressing the sound of the specific area mentioned above is performed, and the synthetic
speech signal Smix generated as a result is analog to the output port 325 Output.
[0076]
The present invention is not limited to the above-described embodiment, but includes the
configurations in which the configurations disclosed in the above-described embodiment are
replaced with each other or the combination is changed, the known technique, and the abovedescribed embodiment. A configuration in which each configuration is replaced with each other
or a combination is changed is also included.
Further, the technical scope of the present invention is not limited to the above-described
embodiment, but extends to the matters described in the claims and the equivalents thereof.
[0077]
11: voice input unit, 12: voice input unit, 13: voice input unit, 20: voice synthesis unit, 21:
arithmetic unit, 22: signal input port, 23: signal input port, 24: voice output port, 25: control
Signal input port 30 audio output unit 40 operation unit 100 audio processing device 200 audio
processing device 300 audio processing device 311 condenser microphone 312 condenser
microphone 320 microcomputer 321. Input port, 322: input port, 323: arithmetic unit, 324:
analog conversion unit, 325: analog conversion unit, 326: output port, 330: output terminal, Rn:
insensitive area, Rp: sensitive area
03-05-2019
21
Документ
Категория
Без категории
Просмотров
0
Размер файла
33 Кб
Теги
jp2017034519
1/--страниц
Пожаловаться на содержимое документа