close

Вход

Забыли?

вход по аккаунту

?

JP2018524620

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018524620
Abstract An embodiment of the present invention provides a method and terminal device for
identifying a sound generation position. The method comprises the steps of collecting K first
speech signals, where K is an integer greater than or equal to 2 and according to the N position
parameters corresponding to N different positions, the K Extracting M second audio signals from
the first audio signal, where M is less than or equal to N and N is an integer greater than or equal
to 2, determining the steps corresponding to each second audio signal And the step of In an
embodiment of the present invention, the M second audio signals are extracted from the K first
audio signals according to the position parameter using a beamforming algorithm, and the
occurrence position corresponding to each second audio signal is determined Ru. In this way,
voice signals emanating from different locations can be efficiently extracted, providing voice
recognition capabilities, thereby providing the user with a better user experience.
METHOD AND TERMINAL DEVICE FOR IDENTIFYING VOICE GENERATION LOCATION
[0001]
The present invention relates to the field of mobile communications, and in particular to a
method and terminal device for locating the origin of speech.
[0002]
Speech recognition is the core technology of the human-computer interaction interface of current
intelligent information systems.
03-05-2019
1
In order to improve the success rate of speech recognition, generally, a solution of collecting
speech signals using a sound collection sensor is used, and the collection and speech recognition
of speech signals are performed according to the location where speech is generated. To be
executed.
[0003]
At present, in a solution to improve the success rate of speech recognition, it is possible to
extract speech signals generated from only one location. Voice signals originating from different
locations are only considered as noise and are filtered out. As a result, the voice signal can not be
accurately extracted, the voice generation position can not be specified, and voice recognition
can not be performed. An in-vehicle system mounted on a car is used as an example. At present,
sound signals of the surrounding environment can be collected using a sound collection sensor
mounted on the in-vehicle system, sound signals emitted from the driver's seat are extracted, and
sound signals emitted from the driver's seat are detected. Voice recognition is performed. The invehicle system can respond to the voice signal emitted from the driver's seat. However, the audio
signal emitted from the front passenger seat or the audio signal emitted from the rear seat in the
car is determined as noise and is filtered by the on-vehicle system. As a result, the voice signal
can not be accurately extracted, the voice generation position can not be specified, and voice
recognition can not be performed. For example, the in-vehicle system may perform extraction
and speech recognition in response to the speech command "open sun roof" issued from the
driver's seat. However, the voice command "open sunroof" issued from another position such as a
passenger seat or a rear seat in a car can not be extracted, and the generation position of another
voice signal in the on-vehicle system can be specified. Can not. Therefore, in an application
scenario of an in-vehicle in-vehicle system, the in-vehicle system can not efficiently and
accurately specify the generation position of another audio signal in the vehicle. As a result, the
efficiency of specifying the generation position of the audio signal is reduced and the user
experience is poor.
[0004]
Embodiments of the present invention identify the location of voice generation, which solves the
problem that only voice signals emitted from a single location can be identified and extracted,
and voice signals emitted from another location can not be identified and extracted. Method and
terminal device for providing
[0005]
03-05-2019
2
According to a first aspect of the present invention, there is provided a method for identifying the
location of voice generation, the step of collecting K first voice signals, wherein K is an integer
greater than or equal to two. Extracting M second speech signals from the K first speech signals
according to N position parameters corresponding to N different positions, where M is less than
or equal to N, and N is 2 Including the steps of being an integer above, and determining the
position corresponding to each second audio signal.
[0006]
In the first possible implementation manner, the step of extracting the M second audio signals
from the K first audio signals according to the N position parameters corresponding to the N
different positions comprises Extracting the M second audio signals from the K first audio signals
according to the N position parameters using a beamforming algorithm.
[0007]
In relation to the first aspect, in the second possible implementation manner, the step of
determining the position corresponding to each second audio signal specifically comprises the
position corresponding to the Lth second audio signal Determining a position L corresponding to
the L-th second audio signal according to a parameter, wherein the L-th second audio signal is
any one of the M second audio signals There are steps included.
[0008]
In relation to the first aspect and any one of the aforementioned possible implementations, in a
third possible implementation, M second audio signals are extracted from the K first audio
signals. After the step, the method performs speech recognition on the M extracted second
speech signals and obtaining M speech commands corresponding to the M second speech
signals. And further included.
[0009]
In relation to the first aspect and the third possible implementation manner, in the fourth
possible implementation manner, after the step of obtaining M speech commands corresponding
to the M second speech signals The method further includes the step of responding to the M
voice commands.
[0010]
In relation to the first aspect and the fourth possible implementation mode, in the fifth possible
implementation mode, the step of responding to the M audio commands corresponds to M steps
03-05-2019
3
corresponding to the M audio commands. Preferentially responding to higher priority voice
commands according to the priorities of the different locations.
[0011]
According to a second aspect of the present invention, there is provided a terminal device,
wherein the terminal device is K sound collection sensors configured to collect K first audio
signals, where K is two or more. M second audio signals are extracted from the K first audio
signals according to K sound collection sensors which are integers of N and N position
parameters corresponding to N different positions, A processor configured to determine a
position corresponding to the two audio signals, wherein M is less than or equal to N and N is an
integer greater than or equal to 2;
[0012]
In a first possible mode of implementation, the processor is arranged to extract the M second
audio signals from the K first audio signals according to N positional parameters corresponding
to N different positions. In particular, the processor separately extracts the M second speech
signals from the K first speech signals according to the N positional parameters using a
beamforming algorithm. Including being configured to
[0013]
In relation to the second aspect and the first possible implementation mode, in the second
possible implementation mode, the processor is configured to determine a position
corresponding to each second audio signal Specifically, in the step of determining the position L
corresponding to the L-th second audio signal according to the position parameter corresponding
to the L-th second audio signal, the L-th second audio signal is , Any one of the M second audio
signals.
[0014]
In relation to the second aspect, and any one of the aforementioned possible implementation
modes, in a third possible implementation mode, the processor is configured to determine the M
second ones from the K first audio signals. The system is further configured to perform speech
recognition on the M extracted second speech signals after extracting the speech signals, and to
obtain M speech commands corresponding to the M second speech signals. Be done.
[0015]
In relation to the second aspect and any one of the aforementioned possible implementations, in
03-05-2019
4
a fourth possible implementation, the terminal device further comprises an output device, the
output device being the processor The system is configured to respond to the M voice commands
after obtaining M voice commands corresponding to the M second voice signals.
[0016]
In relation to the second aspect and the fourth possible implementation manner, in the fifth
possible implementation manner, the output device is configured to be responsive to the M voice
commands. In one embodiment, the output device is configured to preferentially respond to a
high priority command according to the priority of M different positions corresponding to the M
voice commands.
[0017]
In relation to the second aspect and any one of the aforementioned possible implementations, the
coordinates of the K sound collection sensors in three-dimensional space are different.
[0018]
According to a third aspect of the present invention there is provided an apparatus for
identifying the location of sound generation, said apparatus comprising an acquisition module, an
extraction module, and a determination module.
The acquisition module is configured to acquire K first speech signals, where K is an integer
greater than or equal to 2, and the extraction module is configured to adjust the K according to N
positional parameters corresponding to N different positions. Configured to extract M second
audio signals from the first audio signals, wherein M is less than or equal to N, N is an integer
greater than or equal to 2, and the determination module corresponds to each second audio
signal Configured to determine the position to be
[0019]
In a first possible mode of implementation, the extraction module is arranged to extract the M
second audio signals from the K first audio signals according to N positional parameters
corresponding to N different positions. Specifically, the step of extracting comprises separately
extracting the M second audio signals from the K first audio signals according to the N position
parameters using a beamforming algorithm .
03-05-2019
5
[0020]
In relation to the third aspect and the first possible implementation mode, in the second possible
implementation mode, the determination module is configured to determine a position
corresponding to each second audio signal In particular, the determination module is configured
to determine the position L corresponding to the L-th second audio signal according to the
position parameter corresponding to the L-th second audio signal, The L-th second audio signal is
any one of the M second audio signals.
[0021]
In relation to the third aspect and any one of the aforementioned possible modes of
implementation, in a third mode of possible implementation, the apparatus further comprises a
speech recognition module and an acquisition module, the speech recognition module
comprising After the M second speech signals are extracted from the K first speech signals,
speech recognition is performed on the M extracted second speech signals, the acquisition The
module is configured to obtain M voice commands corresponding to the M second voice signals.
[0022]
In relation to the third aspect and the third possible implementation manner, in the fourth
possible implementation manner, the device further includes a response module, and the
response module includes the M acquisition modules. The M voice commands corresponding to
the second voice signal are acquired, and then the M voice commands are responded to.
[0023]
In relation to the third aspect and the fourth possible implementation mode, in the fifth possible
implementation mode, the response module is configured to respond to the M voice commands.
And preferentially responding to higher priority voice commands according to the priority of the
M different locations corresponding to the M voice commands.
[0024]
From the above technical solution, it can be seen that the embodiments of the present invention
have the following advantages: M second speech signals from K first speech signals according to
position parameters using a beamforming algorithm Are extracted and the occurrence position
corresponding to each second audio signal may be determined.
03-05-2019
6
According to this method, voice signals emanating from different locations can be efficiently
extracted, providing voice recognition capabilities, thereby providing a better user experience for
the user.
The conflicting commands are processed in a priority manner, thereby reducing errors that occur
when the onboard central control device responds to multiple commands simultaneously.
[0025]
BRIEF DESCRIPTION OF DRAWINGS To describe the technical solutions in the embodiments of
the present invention more clearly, the following briefly introduces the accompanying drawings
required for describing the embodiments.
The accompanying drawings in the following description merely illustrate some embodiments of
the present invention and yet one skilled in the art can obtain other drawings from these
accompanying drawings without creative efforts. it is obvious.
[0026]
FIG. 1 is a flow chart of a method of identifying a sound generation position according to an
embodiment of the present invention.
FIG. 2A is a schematic view of a position in a car that is a sound generation position of specified
sound according to an embodiment of the present invention.
FIG. 2B is a schematic view of a position in a car, which is a specified sound generation position,
according to another embodiment of the present invention.
FIG. 3 is a flow chart of a method of identifying a sound generation position according to another
embodiment of the present invention.
03-05-2019
7
FIG. 3A is a flow chart of a method of identifying a sound generation position according to
another embodiment of the present invention.
FIG. 3B is a flow chart of a method of identifying a sound generation position according to
another embodiment of the present invention.
FIG. 4 is a schematic block diagram of a terminal device 400 according to one embodiment of the
present invention.
[0027]
The following clearly and completely describes the technical solutions in the embodiments of the
present invention with reference to the accompanying drawings in the embodiments of the
present invention.
It will be appreciated that the described embodiments are merely illustrative of some, but not all
of the embodiments of the present invention.
All other embodiments obtained by those skilled in the art based on the embodiments of the
present invention without creative efforts shall fall within the protection scope of the present
invention.
[0028]
Embodiments of the present invention provide a method for identifying the location of sound
generation.
The terminal device included in the embodiments of the present invention may be an in-vehicle
central control device, a smartphone, a tablet computer, and the like.
03-05-2019
8
[0029]
In the prior art, the beamforming algorithm is combined with a solution for collecting speech
signals using a collection sensor, and it is applied to the collection and speech recognition of
speech signals, using this mode for speech recognition Success rate is greatly improved.
However, using this mode, it is possible to identify only the voice signal emitted from a single
voice generation position.
When voice signals are emitted from multiple voice generation positions, the voice recognition
system can not simultaneously recognize multiple voice signals.
[0030]
In the embodiment of the present invention, the first audio signal or the second audio signal is
merely for distinction, and does not indicate order or order.
[0031]
FIG. 1 is a flow chart of a method of identifying a sound generation position according to an
embodiment of the present invention.
The application scenario of this embodiment of the invention may be any scenario of speech
collection and speech recognition.
In this embodiment of the invention, voice collection and speech recognition in an on-board
system is used, the method comprising the following steps:
[0032]
S101: Collect K first speech signals.
03-05-2019
9
K is an integer of 2 or more.
[0033]
In the in-vehicle system, there are K sound collection sensors inside the in-vehicle system, the
processor can collect K first audio signals, and K is an integer of 2 or more.
[0034]
For example, in the on-vehicle system, K can be set to 2, that is, the first sound collection sensor
and the second sound collection sensor may be installed in the driver's seat and the passenger's
seat, respectively.
[0035]
The first sound collection sensor and the second sound collection sensor simultaneously collect
the first audio signal.
Optionally, in the in-vehicle system, another sound collection sensor may be further installed at
the rear seat in the vehicle or at another position in the vehicle.
[0036]
In this embodiment of the present invention, the first audio signal is an environmental sound in
the in-vehicle system and includes audio signals emitted from different positions in the vehicle
and an audio signal outside the vehicle.
The first audio signal is an audio signal emitted from the driver's seat position (for example,
position (1) (1 in the figure, the same applies hereinafter) as shown in FIG. 2A), the position of
the passenger seat (for example, FIG. As shown in FIG. 2A, the sound signal emitted from position
(2) (2 in the figure, the same applies hereinafter), the position of the rear seat in the in-vehicle
system (for example, as shown in FIG. It may include at least one of audio signals emitted from
the circle 3 and the same below and the position (4) (4 in the diagram and the same below)) or
noise outside the on-vehicle system.
03-05-2019
10
[0037]
S102: Extract M second speech signals from the K first speech signals according to N position
parameters corresponding to N different positions.
M is N or less, and N is an integer of 2 or more.
[0038]
Similarly, the case of an on-board system is used as an example for the purpose of illustration.
The coordinates of the first and second sound collection sensors do not overlap in spatial
position, and there is a specific distance between the first and second sound collection sensors.
As shown in FIG. 2A, the first sound collection sensor and the second sound collection sensor are
respectively disposed on the left side and the right side of the central rearview mirror A of the invehicle system. The first sound collection sensor is disposed at position C of the in-vehicle
system, and the second sound collection sensor is disposed at position B of the in-vehicle system.
Thus, the time of the audio signal collected by the first sound collection sensor is different from
the time of the audio signal collected by the second sound collection sensor. In this case, a phase
difference is formed between the audio signal collected by the first sound collection sensor and
the audio signal collected by the second sound collection sensor.
[0039]
In another embodiment of the present invention, as shown in FIG. 2B, the in-vehicle system
includes four sound collection sensors. In this case, K is four. The four sound collection sensors
are disposed at the central position of the in-vehicle system, as shown in FIG. 2B.
[0040]
Specifically, the step of extracting the M second audio signals from the K first audio signals
extracts M second audio signals from the K first audio signals using a beam forming algorithm.
03-05-2019
11
The second voice signal may be extracted from the K first voice signals by filtering other voice
signals using a beam forming algorithm.
[0041]
For example, the generation position of the audio signal is the position of the driver's seat, and
the corresponding position parameter is a parameter of the position of the driver's seat.
The in-vehicle central control device extracts a second audio signal emitted from the driver's seat
from the K first audio signals according to the position parameter of the driver's seat
corresponding to the driver's seat.
[0042]
S103: The position corresponding to each second audio signal is determined. The in-vehicle
central control device extracts M second speech signals separately from the K first speech signals
according to the N position parameters using a beamforming algorithm.
[0043]
For example, if the position parameter is the driver's seat position parameter, the second audio
signal is extracted according to the driver's seat position parameter using a beamforming
algorithm and extracted according to the position parameter corresponding to the second audio
signal The generated position corresponding to the second audio signal is determined to be the
driver's seat.
[0044]
The present invention provides a method for determining the location of speech generation,
wherein M second speech signals are extracted from the K first speech signals according to
position parameters using a beamforming algorithm, each An occurrence position corresponding
to the second audio signal may be determined.
03-05-2019
12
According to this method, voice signals emitted from different positions can be efficiently
extracted, and the ability of voice recognition can be improved, thereby providing the user with a
better user experience.
[0045]
FIG. 3 is a flow chart of a method of identifying a sound generation position according to another
embodiment of the present invention. Similarly, in this embodiment of the invention, an
application to a vehicle-mounted system is used as an example for the purpose of illustration. As
shown in FIG. 3, the method includes the following steps.
[0046]
S301a: Set priorities to respond to voice commands from N different positions.
[0047]
Similarly, the schematic of the position in FIG. 2A is used as an example.
In FIG. 2A, in the in-vehicle system, position (1) is the position of the driver's seat, position (2) is
the position of the front passenger seat, and in position (3) the in-vehicle system is the position
of the rear left seat, Position (4) is the position of the rear right seat.
[0048]
In this embodiment of the invention, an in-vehicle system is used as an example. Kは2であり、
Nは4であり、Mは2であるとする。
[0049]
In the in-vehicle central control device in the in-vehicle system, priority is given to responding to
voice commands from four different positions according to the four different positions.
03-05-2019
13
[0050]
For example, the priority of voice commands set in a normal family sedan is used as an example.
[0051]
[0052]
From Table 1, when commands such as “Open sunroof”, “Close sunroof”, “turn on radio”,
“play music” are issued from position (1), the commands issued from position (1) The priority
is higher than the command issued from another position, which has the same meaning.
[0053]
In another embodiment of the present invention, where priority is given to responding to voice
commands from N different locations, the decisions regarding children's voice and adult's voice
are added.
The priority of the voice command that is the child's voice is set to low, or if the voice command
is the child's voice, the voice command that is the child's voice is set to block.
The priority of voice commands that are adult voices is set to high priority.
[0054]
In this embodiment of the present invention, the case where the command of "turn on air
conditioner" emitted from the position (1) and the command "turn off air conditioner" emitted
from the position (4) simultaneously are used as an example.
[0055]
S301: Collect K first audio signals.
[0056]
In this embodiment of the invention, the case where K is 2 is used for the purpose of illustration.
03-05-2019
14
[0057]
In the in-vehicle system, the first sound collection sensor and the second sound collection sensor
are respectively installed on the left side and the right side of the central rearview mirror A.
[0058]
The first sound collection sensor and the second sound collection sensor simultaneously collect
the first audio signal.
Optionally, in the in-vehicle system, another sound collection sensor may be further installed at a
rear seat in the vehicle or at another position in the vehicle.
[0059]
For example, when an audio signal of a command of "turn on air conditioner" is emitted from
position (1) and at the same time a voice signal of a command of "turn off air conditioner" is
emitted from position (4), the first sound collection sensor and The two sound collection sensors
simultaneously collect an audio signal of the "turn on air conditioner" command emitted from the
position (1).
Similarly, the first sound collection sensor and the second sound collection sensor simultaneously
collect the audio signal of the "turn off air conditioner" command emitted from the position (4).
[0060]
S302: Extract M second speech signals from the K first speech signals according to N position
parameters corresponding to N different positions.
M is N or less, and N is an integer of 2 or more.
03-05-2019
15
[0061]
In this embodiment of the invention, it is used that N is 4 and M is 2 for the purpose of
illustration.
[0062]
The coordinates of the first and second sound collection sensors do not overlap in spatial
position, and there is a specific distance between the first and second sound collection sensors.
Thus, the time of the audio signal collected by the first sound collection sensor is different from
the time of the audio signal collected by the second sound collection sensor.
In this case, a phase difference is formed between the audio signal collected by the first sound
collection sensor and the audio signal collected by the second sound collection sensor.
[0063]
In the present invention, an example in which the first sound collection sensor and the second
sound collection sensor are disposed on the left side and the right side of the central rearview
mirror, respectively, is used.
In the present invention, the amount of the sound collection sensor is not limited, and the
position of the sound collection sensor is also not limited.
For example, another sound collection sensor may be further disposed around one where sound
can be generated, for example, behind the seat at position (1) or position (2) shown in FIG. 2A.
Ru.
[0064]
For example, the in-vehicle central control device extracts the second audio signal emitted from
03-05-2019
16
position (1) according to the preset position parameter of position (1). The in-vehicle central
control device extracts a second audio signal emitted from position (1) from the acquired first
audio signal according to a preset position parameter of position (1) using a beamforming
algorithm.
[0065]
At the same time, the in-vehicle central control device extracts the second audio signal emitted
from position (4) according to the preset position parameter of position (4). The in-vehicle
central control device extracts a second audio signal emitted from position (4) from the acquired
first audio signal according to the preset position parameters of position (4) using a
beamforming algorithm.
[0066]
For example, the on-vehicle central control device uses a beamforming algorithm to extract an
audio signal that matches the preset position parameter of position (1) according to the position
parameter of position (1). For example, the "turn on air conditioner" sound signal emitted from
position (1) is collected. The in-vehicle central control device uses a beamforming algorithm to
extract an audio signal that matches the preset position parameter of position (2) according to
the position parameter of position (4). For example, an "off air conditioner" sound signal emitted
from position (4) is collected.
[0067]
S303: The position corresponding to each second audio signal is determined.
[0068]
The in-vehicle central control device extracts two second audio signals separately from the two
first audio signals according to the four position parameters using a beamforming algorithm.
[0069]
For example, if the position parameter is the position parameter of position (1), the second audio
signal emitted from position (1) is extracted according to the position parameter of position (1)
using a beamforming algorithm, According to the position parameter corresponding to the two
03-05-2019
17
audio signals, it is determined that the generation position corresponding to the extracted second
audio signal is the position (1).
[0070]
S304: Implement speech recognition on the M extracted second speech signals.
[0071]
The on-board central control device performs speech recognition on the extracted speech signal
and recognizes the extracted speech signal.
[0072]
For example, the on-vehicle central control device performs voice recognition on the voice signal
extracted from the position (1), and recognizes that the extracted voice signal is "turn on air
conditioner".
The in-vehicle central control device performs voice recognition on the voice signal extracted
from the position (4), and recognizes that the extracted voice signal is "turn off air conditioner".
[0073]
S305: Acquire voice commands corresponding to the M second voice signals.
[0074]
The in-vehicle central control device obtains a voice command corresponding to the M extracted
second voice signals.
[0075]
For example, the on-vehicle central control device obtains a voice command corresponding to the
voice signal emitted from the extracted position (1), and obtains a voice command "turn on air
conditioner".
03-05-2019
18
The in-vehicle central control device obtains a voice command corresponding to the voice signal
emitted from the extracted position (4), and obtains a voice command "turn off air conditioner".
[0076]
S306: Respond to M voice commands.
[0077]
The in-vehicle central control device responds to the M voice commands in accordance with the
obtained voice commands corresponding to the M extracted second voice signals.
[0078]
For example, after obtaining the voice command "turn on air conditioner" issued from position
(1), the on-board central control device responds to the voice command to turn on the air
conditioner.
[0079]
In another embodiment of the present invention, the in-vehicle central control device performs
speech recognition on the speech signal extracted from the position (1) and the speech signal
extracted from the position (4), and the extracted speech Recognize the signal.
The in-vehicle central control device performs voice recognition on the voice signal emitted from
the extracted position (1) and the voice signal emitted from the extracted position (4), and
recognizes the extracted voice signal Do.
A voice command corresponding to the voice signal emitted from the extracted position (1) is
acquired, and a voice command corresponding to the voice signal emitted from the extracted
position (4) is acquired.
For example, the voice command "turn on air conditioner" issued from position (1) and the voice
command "turn off air conditioner" issued from position (4) are acquired.
03-05-2019
19
According to the voice command "turn on air conditioner" issued from acquired position (1) and
the voice command "turn off air conditioner" emitted from acquired position (4), the on-vehicle
central control device performs two voice commands Respond to
Optionally, when obtaining voice commands from two locations by voice recognition, the invehicle central control device prioritizes higher priority voice commands according to the priority
of two different locations corresponding to the two voice commands. Can respond to
For example, the priority of position (1) is higher than the priority of position (4).
The on-board central control device responds preferentially to the voice command "turn on air
conditioner" from position (1) and turns on the air conditioner.
The in-vehicle central control device then responds to the voice command "turn off air
conditioner" from position (4). In this case, the voice command from the position (1) to which the
in-vehicle central control device responds is "turn on the air conditioner", but the voice command
from the position (4) is "turn off the air conditioner". Therefore, the voice command from the
position (1) and the voice command from the position (4) are in conflict with each other, and the
on-vehicle central control device receives the voice command from the position (1) and the
position (4) Can not respond to both voice commands. Therefore, after performing voice
recognition on the voice signal from position (4), the in-vehicle central control device acquires a
voice command corresponding to the extracted voice signal, and responds to the voice command
from position (4) do not do. Command conflicts are handled in a priority fashion, and if the invehicle central control device responds to multiple conflicting commands, the in-vehicle central
control device is no longer able to respond correctly due to conflicting commands, and errors
caused by a response error Is reduced.
[0080]
The conflicting commands are specifically defined as follows: the same resource is used for at
least two commands and different actions are performed for the same resource while at least two
commands are executed If so, the at least two commands mentioned above are conflicting
commands.
[0081]
03-05-2019
20
In another embodiment of the present invention, when two acquired voice commands conflict
with each other, a decision element regarding time is added.
If the in-vehicle central control device recognizes a conflict command within the preset time T1
after a high priority command is recognized, but the conflict command recognized has a
relatively low priority, then it is relatively low The command with priority is ignored. If the invehicle central control device recognizes the conflict command within the preset time T1 after
the high priority command is recognized, the in-vehicle central control device generates the
acquired voice command according to the time sequence of recognizing the voice command.
respond.
[0082]
FIG. 3A is a flow chart of a method of identifying a sound generation position according to
another embodiment of the present invention. In another embodiment of the present invention,
the following steps may be performed before step S301 is performed.
[0083]
S401: It is determined whether at least one seat in the in-vehicle system is occupied.
[0084]
Specifically, the onboard system may determine by gravity sensing whether a seat in the onboard
system is occupied.
[0085]
For example, gravity sensing determines whether a seat in the in-vehicle system of FIG. 2A is
occupied.
For example, it is determined whether position (1), position (2), position (3), or position (4) in
FIG. 2A is occupied.
03-05-2019
21
[0086]
If the in-vehicle system determines that none of the seats in the in-vehicle system are occupied,
step S301 is not performed.
[0087]
If the in-vehicle system determines that at least one seat of the in-vehicle system is occupied, step
S301 is performed.
[0088]
Before the audio signal is collected, it is first determined whether at least one seat of the invehicle system is occupied.
The location of voice generation is identified only when the seat in the on-board system is
occupied, which improves the efficiency of voice collection and improves the efficiency of
determination of voice generation location.
[0089]
In another embodiment of the present invention, as shown in FIG. 3B, after step S305 is
performed, step S305a: recognizing a voiceprint of M extracted second voice signals may be
performed. .
[0090]
S305b: Measure the weight of the user on the occupied seat in the in-vehicle system.
[0091]
S305c: Determine the identity of the user based on the measured weight of the user and the
recognized voiceprint of the second speech signal.
[0092]
S305d: Determine the priority of the voice command corresponding to the second voice signal
03-05-2019
22
issued by the user according to the determined identity of the user.
[0093]
S305e: Respond to the voice command corresponding to the second voice signal according to the
priority of the voice command corresponding to the second voice signal issued by the user.
[0094]
The combined gravity sensing and voiceprint recognition style determines the user's identity and
the priority of voice commands corresponding to voice signals emitted by the user.
Based on the priority of the voice command corresponding to the voice signal issued by the user,
the priority for responding to the plurality of voice commands is determined, whereby the onvehicle central control device needs to respond to the plurality of voice commands. As such, it
reduces errors and mistakes that occur within the on-board central control device.
[0095]
The present invention provides a method for determining the location of speech generation,
wherein M second speech signals are extracted from the K first speech signals according to
position parameters using a beamforming algorithm, each An occurrence position corresponding
to the second audio signal may be determined.
In addition, voice commands are prioritized, and a method of prioritizing high priority commands
is used to process conflicting commands, such that the on-board central control device responds
to multiple conflicting commands. Reduce contention due to false positives and reduce errors due
to false responses, thereby improving the user experience.
[0096]
FIG. 4 is a terminal device 400 according to one embodiment of the present invention.
03-05-2019
23
Terminal device 400 is configured to implement the aforementioned method of embodiments of
the present invention.
As shown in FIG. 4, the terminal device 400 may be a mobile phone, a tablet computer, a PDA
(Personal Digital Assistant, personal digital assistant), a POS (Point of Sale, point-of-sales
information management), or an in-vehicle central control terminal device. Terminal device.
The terminal device 400 includes an RF (Radio Frequency) circuit 410, a memory 420, an input
device 430, a display device 440, a sensor 450, an audio circuit 460, a Wireless Fidelity (WiFi)
module 470, a processor 480, And a component such as a power supply 490.
Those skilled in the art will appreciate that the configuration of the terminal device shown in FIG.
4 is merely an example of implementation and does not limit the terminal device, and includes
more or less components than those shown in the figure. It may be appreciated that it may be, or
a combination of several components, or components may be arranged differently.
[0097]
The RF circuit 410 may be configured to receive and transmit signals during information
reception and transmission processing or ringing processing.
Specifically, RF circuit 410 receives downlink information from a base station, and then
distributes the downlink information to processor 480 for processing and transmits associated
uplink data to the base station. In general, the RF circuit 410 includes, but is not limited to, an
antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), and a
duplexer. Additionally, the RF circuit 410 can further communicate with the network and other
terminal devices by wireless communication. Wireless communication includes GSM (Global
System of Mobile communication, Global System for Mobile Communications), GPRS (General
Packet Radio Service, General Packet Radio System), CDMA (Code Division Multiple Access, Code
Division Multiple Access), WCDMA (A registered trademark) (Wideband Code Division Multiple
Access, Wide-area Code Division Multiple Access), LTE (Long Term Evolution, Long Term
Revolution), e-mail, and any communication standard including SMS (Short Messaging Service,
Short Message Service) Alternatively, a protocol may be used, but is not limited thereto.
03-05-2019
24
[0098]
Memory 420 may be configured to store software programs and modules, and processor 480
executes software programs and modules stored in memory 420 to implement various functional
applications and data processing of terminal device 400. . The memory 420 mainly includes a
program storage area and a data storage area, and the program storage area stores an operating
system and an application program required by at least one function (such as an audio
reproduction function or an image display function). The data storage area may store data (such
as voice data and a telephone directory) generated according to the use of the terminal device
400. In addition, memory 420 may include high-speed random access memory, and may further
include at least one magnetic disk storage, non-volatile memory such as flash storage, or another
volatile solid-state storage.
[0099]
Display 440 may be configured to display information entered by or provided for the user and
various menus of terminal device 400. The display device 440 may include a display panel 441.
Optionally, the display panel 441 is configured using a shape such as LCD (Liquid Crystal
Display, Liquid Crystal Display), OLED (Organic Light-Emitting Diode, Organic LED). Furthermore,
the touch panel 431 may be covered by the display panel 441. After detecting a touch operation
on or near the touch panel 431, the touch panel 431 transmits the touch operation to the
processor 480 to determine the type of touch event. Processor 480 then provides corresponding
visual output on display panel 441 according to the type of touch event. In FIG. 4, the touch
panel 431 and the display panel 441 are used as two independent components for implementing
the input function and the output function of the terminal device 400. However, in some
embodiments, the touch panel 431 and the display panel 441 may be integrated to implement
the input and output functions of the terminal device 400. For example, the touch panel 431 and
the display panel 441 may be integrated as a touch screen for performing the input function and
the output function of the terminal device 400.
[0100]
The terminal device 400 may further include at least one sensor 450, such as a light sensor, a
motion sensor, and other sensors. Specifically, the light sensor may include an ambient light
sensor or a proximity sensor, and the ambient light sensor may adjust the light intensity of the
03-05-2019
25
display panel 441 according to the brightness of the ambient light, and the proximity sensor
causes the terminal device 400 to When approaching, the display panel 441 or the backlight can
be switched off. An acceleration sensor, which is a type of motion sensor, can detect acceleration
values in various directions (usually three axis directions), and can detect gravity values and
directions when the terminal device 400 is at rest, so that the mobile phone It can be applied to
applications that recognize posture (e.g., switching between landscape and portrait orientation,
related games, magnetometer attitude calibration), and can be applied to functions related to
vibration recognition (such as pedometers and tapping). Other sensors such as gyros,
barometers, hygrometers, thermometers, and infrared sensors may be further configured on the
terminal device 400, the details of which will not be described herein.
[0101]
Audio circuit 460, loudspeaker 461, and microphone 462 may provide an audio interface
between the user and terminal device 400. Audio circuit 460 may convert the received audio
data into an electronic signal and transmit the electronic signal to loudspeaker 461. Loudspeaker
461 converts the electronic signal into an audio signal for output. Meanwhile, the microphone
462 converts the collected audio signal into an electronic signal. Audio circuit 460 receives the
electronic signal, converts the electronic signal to audio data, and outputs the audio data to
processor 480 for processing. The processor 480 then transmits the audio data to another
mobile phone, for example, using the RF circuit 410 or outputs the audio data to the memory
420 for further processing.
[0102]
The terminal device 400 may use the WiFi module 470 to help the user send and receive email,
view web pages, and access streaming media. WiFi module 470 provides users with wireless
broadband Internet access. Although FIG. 4 shows the WiFi module 470, the WiFi module 470 is
not a necessary component of the terminal device 400, and as necessary, the WiFi module 470
can be used as long as it does not deviate from the scope of the features of the present invention.
It can be understood that it may be omitted.
[0103]
The processor 480 is a control center of the terminal device 400, is connected to all parts of the
03-05-2019
26
whole mobile phone using various interfaces and lines, performs various functions of the
terminal device 400, and is stored in the memory 420. Execute or implement a program, or a
module, or both, process data by calling data stored in the memory 402, and carry out overall
monitoring of the terminal device. Optionally, processor 480 may include one or more
processing units. Preferably, processor 480 may integrate an application processor and a modem
processor. The application processor mainly handles an operating system, a user interface, an
application program, and the like. The modem processor mainly handles wireless
communications. It can be appreciated that the aforementioned modem processor need not be
integrated into processor 480. Specifically, the processor 480 may be a central processing unit
(CPU).
[0104]
The terminal device 400 may further include a power supply 490 (eg, a battery) that provides
power to the various components. Preferably, the power source may be logically connected to
processor 480 using a power management system to perform functions such as charging,
discharging, and power consumption management using the power management system.
[0105]
In this embodiment of the present invention, the terminal device 400 includes K sound collection
sensors 450 and a processor 480, and has the following functions.
[0106]
The sound collection sensor 450 is configured to collect K first audio signals, where K is an
integer of 2 or more.
[0107]
Specifically, the coordinates of the K sound collection sensors in the three-dimensional space are
different.
[0108]
The processor 480 extracts the M second audio signals from the K first audio signals according
to the N position parameters corresponding to the N different positions, and determines the
position corresponding to each second audio signal. M is less than or equal to N, and N is an
integer greater than or equal to two.
03-05-2019
27
[0109]
In another embodiment of the present invention, the processor 480, which is configured to
determine the position corresponding to each second audio signal, specifically, according to the
position parameter corresponding to the L-th second audio signal The step of determining the
position L corresponding to the L-th second audio signal, wherein the L-th second audio signal is
any one of the M second audio signals.
[0110]
In another embodiment of the present invention, the processor 480 performs speech recognition
on the M extracted second speech signals after the step of extracting the M second speech
signals from the K first speech signals. Are further configured to obtain M voice commands
corresponding to the M second voice signals.
[0111]
In another embodiment of the present invention, the terminal device 400 further comprises an
output device 510, the output device 510 being M after the processor has acquired M voice
commands corresponding to the M second voice signals. Configured to respond to voice
commands.
[0112]
The output device 510, which is configured to respond to the M voice commands, specifically
prioritizes higher priority commands according to the priority of the M different locations
corresponding to the M voice commands. And an output device configured to respond.
[0113]
In this embodiment of the present invention, the output device 510 may specifically be an audio
circuit 460 or a display device 440.
[0114]
In an embodiment of the present invention, a method and terminal device for identifying a sound
generation position are provided.
03-05-2019
28
M second audio signals may be extracted from the K first audio signals in accordance with the
position parameters using a beamforming algorithm, and the occurrence position corresponding
to each second audio signal may be determined.
According to this method, voice signals emitted from different locations can be efficiently
extracted, and voice recognition ability can be provided, thereby providing the user with a better
user experience.
[0115]
One of ordinary skill in the art may realize that the steps of the units, algorithms and methods
may be implemented by a combination of computer software and electronic hardware in
combination with the embodiments described in the embodiments disclosed herein.
Whether a function is performed by hardware or software depends on the specific application
and design constraints of the technical solution.
Those skilled in the art may use different methods to implement the described functions for a
particular application, but the implementation should not be considered to be beyond the scope
of the present invention.
[0116]
It can be clearly understood by those skilled in the art that the above-mentioned mobile terminal
and access terminal separately perform work processing in the embodiment of the method of the
present invention for the sake of brief and simple explanation. With regard to the specific
operation, reference can be made to the corresponding process in the above-described method
embodiment, the details will not be described again here.
[0117]
In some embodiments provided in the present application, the disclosed server and method may
be implemented in other manners.
03-05-2019
29
For example, the described server embodiment is merely an example.
For example, unit division is merely logical function division and may be other division in actual
implementation.
For example, multiple units or components may be combined or integrated into another system,
or some functions may be ignored or not performed.
Further, the displayed or discussed mutual coupling or direct coupling or communication
connection may be implemented using several interfaces.
The indirect coupling or communication connection between the devices or units may be
implemented electronically, mechanically or in other forms.
[0118]
The units described as separate parts may or may not be physically separated, and the parts
displayed as units may or may not be physical units, and may be located in one location Or may
be distributed to multiple network units.
In order to achieve the purpose of the solution of the embodiments of the present invention,
some or all of the units can be selected according to the actual needs.
[0119]
Also, the functional units in embodiments of the present invention may be integrated into one
processing unit, or each of the units may be physically present alone, or two or more units are
one It may be integrated into a unit.
[0120]
03-05-2019
30
Those skilled in the art will understand that all or part of the steps of the method embodiments
may be implemented by a program instructing relevant hardware.
The program may be stored on a computer readable storage medium. When the program is run,
the steps of the method embodiment are performed. The storage medium includes any medium
that can store program code, such as a ROM, a RAM, a magnetic disk, or an optical disk.
[0121]
The above description is only a specific embodiment of the present invention, but does not limit
the protection scope of the present invention. Any variation or substitution made by a person
skilled in the art without departing from the technical scope described in the present invention
shall fall within the protection scope of the present invention. Accordingly, the protection scope
of the present invention shall be subject to the protection scope of the claims.
[0122]
401 RF circuit 490 power supply 420 memory 430 input device 431 touch panel 432 other
input device 470 WiFi module 460 audio circuit 461 loudspeaker 462 microphone 450 sensor
510 output device 440 display device 441 display panel
03-05-2019
31
Документ
Категория
Без категории
Просмотров
0
Размер файла
43 Кб
Теги
jp2018524620
1/--страниц
Пожаловаться на содержимое документа