close

Вход

Забыли?

вход по аккаунту

?

JP2006162849

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2006162849
PROBLEM TO BE SOLVED: To enable voice recognition of a user without giving a sense of
inconvenience or discomfort to the user. SOLUTION: A user speaks toward a microphone unit
100 in which microphones are arranged in an array. When sound is input to a plurality of
microphones provided in the microphone unit 100, the control device 200 generates a sound
pressure level distribution when sound is generated from signals output from the plurality of
microphones provided in an array. The frequency spectrum is detected, and the voice emitted by
the user is determined from the sound pressure level distribution and the frequency spectrum.
[Selected figure] Figure 1
Voice processing device
[0001]
The present invention relates to a technology for recognizing human speech.
[0002]
When practicing pronunciation in language learning, for example, a learning method of
reproducing model speech recorded on a recording medium such as a CD (Compact Disk) and
reproducing the model speech by imitation is used.
When performing this learning, in order to advance learning more effectively, it is necessary to
objectively evaluate the difference between the model voice and your own voice, but listening to
04-05-2019
1
the model voice recorded on the CD and imitating it There is a problem that it is difficult to grasp
specifically whether one's pronunciation is correct only by doing.
[0003]
As a technique for specifically recognizing the correctness of one's own pronunciation, for
example, there is a technique disclosed in Patent Document 1. In Japanese Patent Application
Laid-Open No. 2008-101501, a human nasal discharge, unvoiced sound, voiced sound, friction
sound and the like are detected by a microphone provided for each sound, and the level of each
sound is displayed by the number of lighted LEDs (Light Emitting Diodes). A pronunciation
practice device is disclosed. According to this device, the level of each sound that is usually
invisible can be visualized, and it can be specifically grasped whether or not the correct
pronunciation is generated.
[0004]
Further, as a technology for performing sound recognition, for example, there is a technology
disclosed in Patent Document 2. The device disclosed in Patent Document 2 includes a
microphone array on the focal plane of the parabolic reflector, and detects the sound pressure
distribution in front of the parabolic reflector based on the output signals from the microphones
constituting the microphone array. . Then, the measured sound pressure distribution is compared
with the correct sound pressure distribution stored in advance, and the abnormality of the sound
is judged from the difference. JP-A-11-352876 JP-A-6-113387
[0005]
By the way, in the pronunciation practice device disclosed in Patent Document 1, in order to
correctly detect the voice emitted by the user, the housing containing the microphones is
brought into close contact with the face so as to cover the nose and the mouth; It is necessary to
position the microphone provided for each sound in the immediate vicinity of the nose and lips.
However, if the user covers the nose and mouth in this way, some users may feel inconvenient or
uncomfortable, and the device may touch the face, so that it may be used by an unspecified
number of people. Some people feel discomfort in terms of hygiene. Then, due to discomfort, if
the user does not perform positioning correctly, a problem may occur that voices can not be
correctly recognized. Further, in the device disclosed in Patent Document 2, there is a problem
04-05-2019
2
that the device becomes large-scaled since the parabolic antenna is used, and here also there
arises a problem of inconvenience of using the device.
[0006]
The present invention has been made under the background described above, and it is an object
of the present invention to provide a technique for enabling user's speech recognition without
giving the user a sense of inconvenience or discomfort.
[0007]
In order to solve the problems described above, the present invention is a storage means for
storing the sound pressure level distribution of standard language sounds and the frequency
spectrum of standard language sounds, a microphone unit in which a plurality of microphones
are arranged in an array, Sound pressure level detection means for detecting the sound pressure
level of the sound represented by each signal output from each of the microphones of the
microphone unit, and the frequency spectrum of the signal output from the predetermined
microphone of the microphone unit Sound detecting means; sound pressure distribution
detecting means for obtaining a sound pressure level distribution of the surface on which the
microphone is disposed in the microphone unit from the plurality of sound pressure levels
obtained by the sound pressure level detecting means; Sound pressure level distribution
determined by pressure distribution detection means and the spectrum detection means Voice
discrimination means for comparing the obtained frequency spectrum with the sound pressure
level distribution of the standard language sound and the frequency spectrum of the standard
language sound stored in the storage means, and determining the speech sound input to the
microphone unit; An audio processing apparatus is provided, comprising: output means for
outputting the determination result of the audio determination means.
In the present invention, the user speaks toward a microphone unit in which the microphones are
arranged in an array. Sound is input to a plurality of microphones, and a sound pressure level
distribution and a frequency spectrum at the time of sound generation are detected from signals
output from the plurality of microphones arranged in an array. Since the voice generated by the
user is determined from the sound pressure level distribution and the frequency spectrum, the
voice can be accurately determined, and the user does not need to closely contact the
microphone unit with the user.
[0008]
04-05-2019
3
In a preferred aspect of the present invention, the audio processing device has a light emitter
associated with each of the plurality of microphones, and based on the sound pressure level
corresponding to each microphone detected by the sound pressure level detection means. And
control means for lighting the light emitters associated with the respective microphones. Further,
in another preferable aspect, the control means may change the illuminance of the light emitter
in accordance with the sound pressure level. Further, in another preferable aspect, the sound
processing device has a display means for generating a sound pressure distribution image
indicating the sound pressure level distribution obtained by the sound pressure distribution
detection means and displaying the sound pressure distribution image. . Further, in another
preferable aspect, the voice processing device comprises: voice request means for requesting the
user to produce speech sound from at least one of an image and a voice; the speech sound
determined by the speech determination means; And determining means for determining
whether or not the speech sound requested by the requesting means matches, and the output
means outputs the determination result of the determining means.
[0009]
According to the present invention, the user can easily recognize pronunciation without feeling
discomfort.
[0010]
[Configuration] FIG. 1 is a diagram showing the configuration of a speech processing apparatus 1
according to an embodiment of the present invention.
As shown in FIG. 1, the audio processing device 1 is roughly divided into a microphone unit 100
and a control device 200.
[0011]
The microphone unit 100 includes a stand 140 and a substrate 130 attached to the stand 140 as
shown in FIG. On the substrate 130, rectangular silicon microphones 110A-1, 110A-2, ..., 110A16 to silicon microphones 110P-1, 110P-2, ..., 110P-- connected to the audio input unit 220. As
shown in FIG. 2, 16 are arranged on the grid in the vertical and horizontal directions, and the
04-05-2019
4
LEDs 120A-1 and 120A-2 connected to the control unit 210 are arranged between the silicon
microphones in the vertical direction. , 120A-16 to LED 120P-1, 120P-2, ..., 120P-16 are
disposed. Each silicon microphone disposed on the substrate 130 converts the input voice into
an electrical signal and outputs it, and each LED is turned on / off under the control of the
control unit 210. In addition, since each silicon | silicone microphone is the respectively same
structure, it describes as a silicon | silicone microphone 110 below, when it is not necessary to
distinguish an individual silicon microphone in particular. Also for the same reason, the LEDs will
be described as the LEDs 120 unless there is a need to distinguish the individual LEDs from one
another. Further, the number of the silicon microphones 110 and the LEDs 120 provided is not
limited to the number described above, and it goes without saying that other numbers may be
used.
[0012]
The voice input unit 220 functions as an interface for receiving the electric signal output from
each silicon microphone 110, and outputs all the input electric signals to the sound pressure
level detection unit 230 and also outputs the electric signal output from a predetermined silicon
microphone The signal is output to the voice detection unit 240. The sound pressure level
detection unit 230 calculates the sound pressure level of the sound input to each silicon
microphone from the individual electric signals output from the sound input unit 220 (that is, for
each electric signal output from each silicon microphone) Sound pressure data indicating the
calculated sound pressure level is output to the CPU 211 of the control unit 210. The voice
detection unit 240 samples the input electrical signal and stores it as digital data, performs
frequency analysis by fast Fourier transform from this data, and obtains a spectrum of voice
represented by the electrical signal. Next, the voice detection unit 240 obtains the formant
frequency from this spectrum, compares it with the formant frequency for each of the various
voices stored in advance, and determines what kind of voice is produced by, for example, a
method such as pattern matching. . Then, the voice detection unit 240 generates sound
generation data indicating the determined sound, and outputs the sound generation data to the
control unit 210.
[0013]
The control unit 210 includes a CPU 211, a ROM 212, a RAM 213, and an HDD (Hard Disk Drive)
214. The CPU 211 reads out and executes a program stored in the ROM 212 or a program
stored in the HDD 214 to control each part of the audio processing device 1. The HDD 214 is a
storage device for storing various application programs and data, and is a pronunciation practice
04-05-2019
5
program for realizing an application for practicing pronunciation, data used in this application,
and sounds such as vowels and consonants. Audio data representing the sound pressure, sound
pressure distribution data indicating a sound pressure distribution when a sound such as a vowel
or a consonant is produced, threshold data for determining whether to turn on the LED 120, and
the like are stored.
[0014]
The display unit 215 includes a display device such as a CRT (Cathode Ray Tube) or an LCD
(Liquid Crystal Display), and displays characters and images under the control of the CPU 211.
The operation unit 216 includes a keyboard and a mouse (both not shown). The user can input
various instructions to the control unit 210 by operating the operation unit 216.
[0015]
[Operation] Next, the operation of the embodiment will be described. When the user operates the
operation unit 216 to instruct execution of the pronunciation practice program, the CPU 211
reads out the pronunciation practice program from the HDD 214 and executes it. When the
pronunciation practice program is executed, a screen prompting the selection of the
pronunciation to practice is displayed on the display unit 215 as illustrated in FIG. 4, for example
(FIG. 3: step SA1). Here, when an operation to display the next menu screen is performed on the
operation unit 216 (step SA2; YES), the CPU 211 displays, for example, a menu screen prompting
practice of short vowels and a menu screen prompt practice of pronunciation of words. It
displays on the part 215 (step SA3). In addition, when an operation to end the execution of the
pronunciation practice program is performed (step SA4; NO, step SA13; YES), the CPU 211 ends
the execution of the program. On the other hand, if the user performs an operation to select a
sound to practice (step SA4: YES), the CPU 211 exemplifies, for example, FIG. It displays as
having done (step SA5). Then, the CPU 211 waits for data output from the sound pressure level
detection unit 230 and the voice detection unit 240 (step SA6).
[0016]
When the user imitates the shape of the displayed mouth and sounds toward the microphone
unit 100, each silicon microphone 110 disposed in the microphone unit 100 converts the user's
voice into an electrical signal and outputs the voice input unit Output to 220. When the electric
04-05-2019
6
signal output from the microphone unit 100 is input, the sound input unit 220 outputs all the
input electric signals to the sound pressure level detection unit 230 and a predetermined silicon
microphone (for example, the silicon microphone 110H). -8) is output to the voice detection unit
240.
[0017]
The sound pressure level detection unit 230 first generates sound pressure data in the order of
110A-2, ..., 110A-16 from the electric signal corresponding to the silicon microphone 110A-1,
and then the silicon microphone 110B-1 Sound pressure data are generated in the order of 110
B-16, silicon microphones 110C-1 to 110 C-16, ..., silicon microphones 110P-1 to 110 P-16, and
the sound pressure data is output to the CPU 211 in the generated order. . Further, the voice
detection unit 240 samples the input electrical signal and stores it as digital data, performs
frequency analysis by fast Fourier transform from this data, and obtains a spectrum of voice
represented by the electrical signal. Then, a sound (language sound) uttered by the user is
determined, and sound generation data indicating the determined sound is output to the CPU
211.
[0018]
When the sound pressure data and the sound generation data are input, the CPU 211 stores the
sound pressure data in the RAM 213 in the order of input and stores the sound generation data
in the RAM 213 (step SA7). Next, the CPU 211 first compares the sound pressure level
represented by the sound pressure data corresponding to the silicon microphone 110A-1 with
the value represented by the threshold data. When the sound pressure level represented by the
sound pressure data is equal to or higher than the value represented by the threshold data, the
LED 120A-1 under the silicon microphone 110A-1 is turned on, and the sound pressure level
represented by the sound pressure data is the threshold If the data is less than the value
represented, the LED 120A-1 under the silicon microphone 110A-1 is turned off. Then, the CPU
211 compares the sound pressure data corresponding to each silicon microphone 110 with the
threshold data in the order stored in the RAM 213, and turns on / off the LED 120 under each
silicon microphone 110 (step SA8).
[0019]
04-05-2019
7
Next, the CPU 211 reads, from the HDD 214, sound pressure distribution data indicating the
distribution of sound pressure levels when the sound selected in step SA4 is correctly sounded.
Then, from the sound pressure data (data indicating the sound pressure level of the sound input
to each silicon microphone) stored in the RAM 213, the sound pressure level distribution on the
surface of the microphone unit 100 is determined, and the determined sound pressure level
distribution; The coincidence with the distribution of sound pressure levels indicated by the
sound pressure distribution data is observed by a method such as pattern matching (step SA9).
Here, if the sound pressure level distributions match (step SA9: YES), the CPU 211 reads out,
from the HDD 214, audio data representing the sound when the sound selected in step SA4 is
correctly pronounced. Match the voice data stored in (step SA10). Here, if the voices match (step
SA10: YES), the user determines that the pronunciation has been performed correctly, and a
screen indicating that the user has pronounced and that the correct pronunciation has been
performed is shown in FIG. 5 (b). The display unit 215 displays the information as illustrated in
(step SA11).
[0020]
On the other hand, if it is determined NO in step SA9 or NO in step SA10, that is, if it is
determined that the user's pronunciation is not correct, the CPU 211 controls the display unit
215 to perform the user's pronunciation and correct pronunciation. For example, as shown in
FIG. 5C, the display unit 215 displays a screen indicating that there is no content (step SA12).
[0021]
As described above, according to the present embodiment, the sound pressure distribution
according to the sound generation can be detected by the microphone unit 100, and the sound
can be accurately determined together with the spectrum analysis.
In addition, since the microphone unit 100 for detecting the voice uttered by the user can be
used apart from the user, the user can use it without worrying and feel safe from the viewpoint
of hygiene.
[0022]
[Modifications] Although the embodiment of the present invention has been described above, the
present invention is not limited to the above-described embodiment, and can be practiced in
04-05-2019
8
various other forms. For example, the above-described embodiment may be modified as follows
to implement the present invention.
[0023]
The drive voltage of the LED is controlled according to the sound pressure level represented by
the sound pressure data, and the illuminance of the LED 120 under each silicon microphone 110
is varied according to the sound pressure level detected by each silicon microphone 110 May be
Alternatively, the microphone unit 100 in which only the silicon microphone 110 is disposed
may be disposed in front of the display unit 215, and the display unit 215 may display the sound
pressure distribution. In recent years, small-sized silicon microphones have been developed, and
even if arranged in an array, they do not become large-scale devices such as parabolic antennas.
Therefore, the microphone unit 100 is arranged in front of the display unit 215 in this way. Can
be confirmed on the display unit 215.
[0024]
The sound pressure level distribution in the case of correct pronunciation may be displayed
before the user pronounces. In addition, when an LED capable of emitting a plurality of colors is
adopted and sound pressure level distribution when sounding correctly sounds and sound
pressure level distribution obtained by the user's sounding are lighted in different colors,
sounding correctly The difference between the sound pressure level distribution and the sound
pressure level distribution due to the user's pronunciation may be visible.
[0025]
In the embodiment described above, it is shown to the user whether or not the input voice and
the predetermined voice coincide with each other. However, instead of judging the coincidence
with the predetermined voice, the voiced voice is generated. May be recognized, and the
recognized voice may be displayed. For example, the voice processing apparatus displays
“thing” on the display unit 215 when detecting that the word “thing” is pronounced, and
displays “sing” when the word “sing” is detected. . When the user pronounces "thing" and
displays "sing", the user can know that his pronunciation is not correctly recognized and that the
pronunciation is incorrect.
04-05-2019
9
[0026]
The frequency spectrum when the verbal sound is correctly pronounced may be stored in the
HDD 214, and the verbal sound pronounced by the user may be determined by comparing it with
the frequency spectrum obtained by the voice detection unit 240.
[0027]
It is a figure showing the composition of the speech processing unit concerning the embodiment
of the present invention.
FIG. 2 is an external view of a microphone unit 100. It is the flowchart which showed the flow of
the processing which CPU211 performs. FIG. 6 is a diagram showing an example of a screen
displayed on a display unit 215. FIG. 6 is a diagram showing an example of a screen displayed on
a display unit 215.
Explanation of sign
[0028]
DESCRIPTION OF SYMBOLS 100 ... Microphone unit, 110A-1-110P-16 ... Silicon microphone,
120A-1-120P-16 ... LED, 130 ... Substrate, 140 ... Stand, 200 ... Control apparatus , 210 ... control
unit, 211 ... CPU, 212 ... ROM, 213 ... RAM, 214 ... HDD, 215 ... display unit, 216 ... operation unit,
220 ... Voice input unit, 230: sound pressure level detection unit, 240: voice detection unit.
04-05-2019
10
Документ
Категория
Без категории
Просмотров
0
Размер файла
20 Кб
Теги
jp2006162849
1/--страниц
Пожаловаться на содержимое документа