close

Вход

Забыли?

вход по аккаунту

?

JP2013030946

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2013030946
Abstract: An object of the present invention is to efficiently perform data aggregation in a sensor
network system and to significantly reduce network traffic. SOLUTION: A plurality of nodes
having known location information are mutually connected on a network via a propagation path,
and using a sensor network system synchronized in time, data measured at each node is made
into one base station In a sensor network system that collects together, the base station
calculates the position of the signal source based on the angle estimates of the signals from each
node and the position information of each node, and clusters the node closest to the signal
source Designated as a head node, sent the location of the signal source and cluster head node
information to each node, clustered each node located within the hop count from each cluster
head node as a node belonging to each cluster, and designated For each node belonging to the
cluster, emphasizing the signal received by the sensor array and transmitting it to the base
station [Selected figure] Figure 25
Sensor network system and communication method thereof
[0001]
The present invention relates to a sensor network system such as a microphone array network
system for high-quality sound acquisition, and a communication method thereof.
[0002]
Conventionally, in order to use high-quality voice in an application system that uses voice (for
example, a voice conference system that connects a plurality of microphones, a robot system that
03-05-2019
1
recognizes voice, a system including various voice interfaces, etc.) , Sound source localization,
sound source separation, noise removal, echo cancellation, etc.
In particular, for the purpose of obtaining high-quality sound, microphone arrays that mainly
perform sound source localization and sound source separation have been widely studied. Here,
the sound source localization is to specify the direction and position of the sound source from the
difference in arrival time of sound etc. Further, the sound source separation eliminates the sound
source that becomes noise using the result of the sound source localization, and a specific sound
source in a specific direction. To extract the
[0003]
In audio processing using a microphone array, it is generally known that audio processing
performance such as noise processing improves as the number of microphones increases. Also, in
such voice processing, many methods of sound source localization using the position information
of the sound source exist (for example, see Non-Patent Document 1). The more accurate the
sound source localization result, the more effective the audio processing. That is, it is required to
increase the number of microphones to simultaneously achieve high accuracy of sound source
localization and noise removal for high sound quality.
[0004]
In the case of sound source localization using a conventional large scale microphone array, the
position range of the sound source is divided into a mesh shape, and the sound source position is
determined stochastically for each section. In this calculation, all voice data is collected in a
single voice processing server such as a workstation, and all voice data is collectively processed
to estimate the position of the sound source (see, for example, Non-Patent Document 2). . In the
case of such collective processing of all voice data, the signal wiring length between the
microphone for voice collection and the voice processing server, the communication amount, and
the calculation amount in the voice processing server are enormous. There is a problem that the
number of microphones can not be increased due to the wiring length, the amount of
communication, the increase in the amount of computation in the voice processing server, and
the physical limitation that many A / D converters can not be arranged at one location of the
voice processing server. There is also the problem of noise generation due to the lengthening of
the signal wiring. Therefore, there has been a problem that it is difficult to increase the number
of microphones to pursue high sound quality.
03-05-2019
2
[0005]
As a method for solving the problem, a voice processing system using a microphone array is
known which divides a plurality of microphones into small arrays and integrates them (see, for
example, Non-Patent Document 3). However, even in the case of such a voice processing system,
there is a problem of increase in network communication traffic because voice data of all
microphones acquired by the small array is integrated into a single voice server via the network.
In addition, there is a problem that a delay in voice processing occurs due to the increase in the
amount of communication data and the amount of communication traffic.
[0006]
Also, in the future, in order to meet the requirements for sound collection in a ubiquitous system
and a video conference system, more microphones will be required (see, for example, Patent
Document 1). However, as described above, in the current microphone array network system,
audio data obtained by the microphone array is merely transferred to the server as it is. There is
no system in which each node of the microphone array mutually exchanges sound source
position information to reduce the calculation amount of the entire system and the
communication amount of the network. Therefore, assuming an increase in the scale of the
microphone array network system, it is important to reduce the amount of calculation of the
entire system and to reduce the amount of communication on the network.
[0007]
As described above, it is required to improve sound source localization accuracy by using a large
number of microphone arrays while effectively suppressing voice processing such as noise
removal while suppressing the communication amount and calculation amount in the voice
processing server. Further, recently, position measurement systems using sound sources have
been proposed. For example, Patent Document 2 discloses that an ultrasound tag is calculated
using an ultrasound tag and a microphone array. Further, Patent Document 3 discloses that
sound collection is performed using a microphone array.
[0008]
03-05-2019
3
Patent Document 1: Japanese Patent Application Publication No. 2008-113164 Patent Document
2: International Publication No. 2008/026643 Patent Document 2: Japanese Patent Application
Publication No. 2008-058342
[0009]
R. O. Schmidt, "Multiple emitter location and signal parameter estimation", In Proceedings of the
RADC Spectrum Estimation Workshop, pp. 243-248, October 1979.
E. Weinstein et al., "Loud: A 1020-node modular microphone array and beamformer for
intelligent computing spaces", MIT, MIT / LCS Technical Memo MIT-LCS-TM-642, April 2004. A.
Brutti et al., "Classification of Acoustic Maps to Determining Speaker Position and Orientation
from a Distributed Microphone Network", In Proceedings of ICASSP, Vol. IV, pp. 493-496, April.
2007. Wendi Rabiner Heinzelman et al., "Energy-Efficient Communication Protocol for Wireless
Microsensor Networks", Proceedings of the 33rd Hawaii International Conference on System
Sciences, 2000, Vol. 8, pp. 1-10, January 2000. Vivek Katiyar et al., "A Survey on Clustering
Algorithms for Heterogeneous Wireless Sensor Networks", International Journal of Advanced
Networking and Applications, Vol. 02, Issue 04, pp. 745-754, 2011. J. Benesty et al., "Handbook
of Speech Processing", Springer, 2007. F. Asano et al., "Sound Source Localization and Signal
Separation for Office Robot (Jijo-2)", Proceedings of IEEE MFI, pp.
243-248, 1999. M. Maroti et al., "The Flooding Time Synchronization Protocol", Proceedings of
2nd ACM SenSys, pp. 39-49, 2004. T. Takeuchi et al., "Cross-Layer Design for Wireless Sensor
Node Using Wave Clock", IEICE Transactions on Communications, Vol. E91-B, No. 11, pp. 34803488, November 2008. Maleq Khan et al., "Distributed Algorithms for Constructing Approximate
Approximate Spanning Trees in Wireless Networks", IEEE Transactions on Parallel and
Distributed Systems, Vol. 20, No 1, pp. 124-139, January 2009. W. Ye et al., "Adaptive Sleeping
with Wireless Sensor Networks", IEEE / ACM Transactions on Networking, Vol. 12, No. 3, pp.
493-506, 2004.
[0010]
However, with the position measurement function of the GPS system or WiFi system installed in
many mobile terminals, although the rough position on the map can be acquired, the positional
relationship between the terminals in a short distance such as several tens of cm can not be
acquired There was a problem of that.
03-05-2019
4
[0011]
For example, Non-Patent Document 4 discloses a communication protocol for performing
wireless communication using transmission energy efficiently in a wireless sensor network.
In addition, Non-Patent Document 5 discloses using a clustering technology as a method for
reducing energy consumption in a wireless sensor network in order to extend the life of the
sensor network.
[0012]
However, the clustering method according to the prior art is a method limited to the network
layer, and does not take into consideration the sensing target (application layer) and the
hardware configuration of the node. For this reason, the conventional method has a problem that
it is not adapted to an application that requires route construction based on the actual physical
signal source position.
[0013]
The object of the present invention is to solve the above problems, for example, in sensor
network systems such as microphone array network systems, data aggregation can be performed
more efficiently than in the prior art, and network traffic can be significantly reduced. Another
object of the present invention is to provide a sensor network system capable of reducing power
consumption of a sensor node and a communication method thereof.
[0014]
A sensor network system according to the present invention comprises a sensor array, and a
plurality of nodes having known position information are mutually connected on a network via a
predetermined propagation path using a predetermined communication protocol, and time
synchronized The sensor network system collects data measured at each of the nodes so as to be
collected in one base station using the sensor network system, wherein each of the nodes
arranges a plurality of sensors in an array. When the above-described signal is detected based on
the signal from the predetermined signal source received by the sensor array, the detection
message is transmitted to the base station, and the angle of the arrival direction of the signal is
determined. Estimates and sends angle estimates to the base station or signals received from
03-05-2019
5
other nodes with a predetermined number of hops A direction estimation processing unit that is
activated in response to an activation message at the time of detection, estimates an angle of an
arrival direction of the signal, and transmits an estimated angle value to the base station; A
communication processing unit that performs emphasis processing on the signal from the
predetermined signal source received by the sensor array and transmits the signal subjected to
the emphasis processing to the base station for each node belonging to the designated cluster;
The base station calculates the position of the signal source based on the estimated angle value
of the signal from each node and the position information of each node, and the node closest to
the signal source is used as a cluster head node. By specifying and transmitting the position of
the signal source and the information of the specified cluster head node to each of the nodes,
each node located within the hop count from each of the cluster head nodes is Cluster as nodes
belonging to each cluster, each node corresponding to the sound source for each node belonging
to the cluster designated from the base station from the predetermined signal source received by
the sensor array The signal processing apparatus is characterized in that emphasis processing is
performed on the signal of (1) and the signal subjected to the emphasis processing is transmitted
to the base station.
[0015]
In the sensor network system, each node is set to sleep mode before detecting the signal or
receiving the activation message, and receives the circuit for detecting the signal and the
activation message. Power supply to circuits other than the above circuit.
[0016]
Further, in the above sensor network system, the sensor is a microphone for detecting voice.
[0017]
A communication method of a sensor network system according to the present invention
comprises a sensor array, wherein a plurality of nodes having known position information are
mutually connected on a network via a predetermined propagation path using a predetermined
communication protocol, and A communication method of a sensor network system in which
data measured at each of the nodes are collected so as to be collected in one base station using a
sensor network system synchronized in time, each of the nodes comprising a plurality of sensors.
When the above-mentioned signal is detected based on a signal from a predetermined signal
source received by the sensor array, the detection message is transmitted to the base station, and
the above-mentioned signal is transmitted. Estimate the angle of the direction of arrival and
transmit the estimated value to the base station, or A direction estimation processing unit which
is activated in response to an activation message at the time of signal detection received by the
number of hops, estimates an angle of an arrival direction of the signal, and transmits an
03-05-2019
6
estimated angle value to the base station; Correspondingly, each node belonging to the cluster
designated from the base station performs emphasis processing on the signal from the
predetermined signal source received by the sensor array, and transmits the signal subjected to
the emphasis processing to the base station A communication processing unit, wherein the base
station calculates the position of the signal source based on an estimated angle value of the
signal from each node and position information of each node. Each cluster head is designated by
designating the node closest to the signal source as a cluster head node and transmitting the
position of the signal source and the information of the designated cluster head node to the
nodes. Clustering each node located in the hop count from the node as a node belonging to each
cluster, and for each node belonging to the cluster designated by the base station corresponding
to the sound source, by each node And emphasizing the signal from the predetermined signal
source received by the sensor array, and transmitting the emphasized signal to the base station.
[0018]
Further, in the communication method of the sensor network system, before each node detects
the signal or receives the activation message, the node is set to a sleep mode to detect the signal
and the activation; The method may further include the step of stopping power supply to circuits
other than the circuit that receives the message.
[0019]
Furthermore, in the communication method of the sensor network system, the sensor is a
microphone that detects voice.
[0020]
Therefore, according to the sensor network system and the communication method according to
the present invention, signals to be sensed are used for clustering, cluster head determination,
and routing on the sensor network, and physical arrangement of a plurality of signal sources is
performed. Correspondingly, by constructing a network route specialized for data aggregation,
redundant routes can be reduced and, at the same time, the efficiency of data aggregation can be
enhanced.
In addition, since the communication overhead for route construction is small, network traffic
can be reduced, and the operation time of the communication circuit with large power
consumption can be reduced.
03-05-2019
7
Therefore, in the sensor network system, data aggregation can be performed more efficiently
than in the prior art, network traffic can be significantly reduced, and power consumption of the
sensor node can be reduced.
[0021]
FIG. 3 is a block diagram showing a detailed configuration of a node used in a sound source
localization system according to the first embodiment of the present invention and a position
measurement system according to the second embodiment.
It is a flowchart which shows the process in the microphone array network system used with the
system of FIG.
It is a wave form diagram which shows detection (VAD) of voice activity by the zero crossing
point used with a system of FIG.
It is a block diagram which shows the detail of the delay sum circuit part used with the system of
FIG.
FIG. 5 is a plan view showing the basic principle of the plurality of delay / sum circuit units of
FIG. 4 distributed and disposed.
FIG. 6 is a graph showing time delay from a sound source illustrating operation in the system of
FIG. 5; It is an explanatory view showing composition of a sound source localization system
concerning a 1st embodiment. It is explanatory drawing explaining the two-dimensional sound
source localization in the sound source localization system of FIG. It is explanatory drawing
explaining the three-dimensional sound source localization in the sound source localization
system of FIG. It is a block diagram which shows the structure of the microphone array network
system based on Example 1 of this invention. It is a block diagram which shows the structure of
the node provided with the microphone array of FIG. It is a functional diagram which shows the
function of the microphone array network system of FIG. It is explanatory drawing explaining the
experiment of the three-dimensional sound source localization precision in the microphone array
network system of FIG. It is a graph which shows the measurement result which shows the three-
03-05-2019
8
dimensional sound source localization precision improvement in the microphone array network
system of FIG. It is a block diagram which shows the structure of the microphone array network
system which concerns on Example 2 of this invention. It is explanatory drawing explaining the
sound source localization system which concerns on Example 2 of FIG. It is a block diagram
which shows the structure of the network used with the position measurement system which
concerns on the 2nd Embodiment of this invention. (A) is a perspective view which shows the
method of Flooding Time Synchronization Protocol (FTSP) used with the position measurement
system of FIG. 17, (b) is the timing which shows the condition of the data propagation which
shows the method. It is a chart. It is a graph which shows the time synchronization with a linear
interpolation used by the position measurement system of FIG. It is a 1st part of the timing chart
which shows the signal transmission procedure between each tablet in the position measurement
system of FIG. 17, and each process performed by each tablet. It is a 2nd part of the timing chart
which shows the signal transmission procedure between each tablet in the position measurement
system of FIG. 17, and each process performed by each tablet. It is a top view which shows the
method of measuring the distance between each tablet from the angle information measured
with each tablet of the position measurement system of FIG. It is a block diagram which shows
the structure of the node of the data aggregation system for the microphone array network
system which concerns on the 3rd Embodiment of this invention. It is a block diagram which
shows the detailed structure of the data communication part 57a of FIG. 24 is a table showing a
detailed configuration of a table memory in the parameter memory 57 b of FIG. FIG. 23 is a
schematic plan view showing the processing operation of the data aggregation system of FIG. 22,
where (a) is a schematic plan view showing processing and routing (FT 11) of FTSP from the base
station, and (b) is voice activity detection ( It is a model top view which shows VAD) and
detection message transmission (T12), (c) is a model top view which shows a wakeup message
and clustering (T13), (d) selects a cluster and performs delay sum processing ( It is a model top
view which shows T14).
FIG. 23 is a timing chart showing a first part of the processing operation of the data aggregation
system of FIG. 22. FIG. FIG. 23 is a timing chart showing a second part of the processing
operation of the data aggregation system of FIG. 22. FIG. It is a top view which shows the
structure of the Example of the data aggregation system of FIG.
[0022]
Hereinafter, embodiments according to the present invention will be described with reference to
the drawings. In the following embodiments, the same components are denoted by the same
reference numerals.
03-05-2019
9
[0023]
As described in the prior art, in a sensor network composed of a large number of nodes, an
autonomous distributed routing algorithm is essential. There are multiple sources of the signal to
be sensed in the sensing area, and in order to construct an optimal route for them, routing using
clustering is effective. In an embodiment according to the present invention, in a sensor network
system related to a microphone array network system aiming at high-quality sound acquisition, a
sensor network system capable of efficiently collecting data using a sound source localization
system and the same The communication method will be described below.
[0024]
First Embodiment FIG. 1 is a block diagram showing a detailed configuration of a node used in a
sound source localization system according to a first embodiment of the present invention, which
is also used in a position measurement system according to a second embodiment. The sound
source localization system according to the present embodiment is constructed using, for
example, the ubiquitous network system (UNS), and for example, by connecting a small scale
microphone array (sensor node) having 16 microphones by a predetermined network, as a whole
A sound source localization system is constructed by constructing a large scale microphone array
sound processing system. Here, each of the sensor nodes is equipped with a microphone
processor, and performs speech processing in a distributed and coordinated manner.
[0025]
Each sensor node is, as shown in FIG. 1, (1) AD conversion circuit 51 connected to a plurality of
microphones 1 for picking up sound, (2) utterance for detecting an audio signal connected to the
AD conversion circuit 51 Voice Activity Detection: Hereinafter, this will be referred to as a VAD
processing unit. Also, VAD is called voice activity detection. And 52) (3) an audio signal or the
like including an audio signal or a sound signal AD-converted by the AD conversion circuit 51
(here, the sound signal means an audio frequency signal such as 500 Hz or an ultrasonic signal.
And (4) Execute sound source localization (Sound Source Localization) processing to estimate the
position of the sound source with respect to digital data such as audio signals output from the
SRAM 54 Then, the SSL processing unit 55 outputs the result to the SSS processing unit 56, (5)
sound source separation for extracting a specific sound source from digital data such as voice
signals output from the SRAM 54 and the SSL processing unit 55 SSS processing unit 56 that
03-05-2019
10
executes a Sound Source Separation process and collects voice data with high SNR obtained as a
result of those processes with another node via the network interface circuit 57, 6) Configure a
data communication unit connected to other ambient sensor nodes Nn (n = 1, 2, ..., N) to transmit
and receive audio data Configured with a network interface circuit 57 that.
[0026]
Although each sensor node Nn (n = 0, 1, 2, ..., N) has the same configuration as each other, the
sensor node N0 of the base station further enhances the SNR by collecting the voice data on the
network. Voice data is obtained. Although the VAD processing unit 52 and the power
management unit 53 are used in the sound source localization of the first embodiment, they are
not used in principle in the position estimation of the second embodiment. In addition, distance
estimation, which will be described later, is performed by, for example, the SSL processing unit
55.
[0027]
In the system configured as described above, the input voice data from the 16 microphones 1 is
digitized by the AD conversion circuit 51, and the information of the voice data is stored in the
SRAM 54. The information is then used for source localization and source separation. The voice
processing including them is executed by the power management unit 53 and the VAD
processing unit 52 which save the standby power. If audio is not present around the microphone
array, then the audio processor is off, and power management is basically necessary, as many
microphones 1 consume much more power when not in use. is there.
[0028]
FIG. 2 is a flow chart showing processing in the microphone array network system used in the
system of FIG.
[0029]
In FIG. 2, voice from one microphone 1 is input (S1), and a detection process (S2) of voice activity
(VA) is executed.
03-05-2019
11
Here, the zero crossing point is counted (S2a), it is judged whether voice activity (speech
estimation) is detected or not (S2b), and if it is detected, the surrounding subarrays are put into
wakeup mode (S3). Voice is input (S4). Then, in the sound source localization process (S5), after
performing direction estimation (S5a) in the sub-array, communication of position information
(S5b) and sound source localization process (S5c), sound source separation process (S6) is
performed. Here, separation within the sub-array (S6a), communication of audio data (S6b) and
separation of a further sound source (S6c) are performed, and audio data is output (S7).
[0030]
The salient features of the system are as follows. (1) Low power voice activity detection is
performed to activate the entire node. (2) Localization of sound source is performed for sound
source localization. (3) A sound source separation process is performed to reduce the noise level
of the sound. Also, the nodes of the sub-arrays are connected together to support
intercommunication. Thus, audio data obtained at each node can be collected to further improve
the SNR of the source. The system configures multiple microphone arrays through interaction
with surrounding nodes. Thus, computations can be distributed among the nodes. The system has
scalability in terms of the number of microphones. Also, each node performs pre-processing on
the captured voice data.
[0031]
FIG. 3 is a waveform diagram showing detection of voice activity (VAD: detection of speech
estimation) by a zero crossing point used in the system of FIG.
[0032]
The network of the microphone array according to the present embodiment is composed of a
large number of microphones whose power consumption is easily increased.
The intelligent microphone array system according to the present embodiment needs to operate
with a limited energy source to save power as much as possible. Even when the environment is
quiet, the audio processing unit and the microphone amplifier consume a certain amount of
power, so power saving audio processing is effective. Although our previous devices proposed a
low power VAD hardware implementation to reduce the standby power of the sub-arrays, this
03-05-2019
12
embodiment uses a zero crossing algorithm for VAD. As apparent from FIG. 3, after the voice
signal crosses the trigger line which is the high trigger value or the low trigger value, the zero
crossing point exists at the first crossing of the input signal and the offset line. This ratio of zero
crossings differs significantly between voice and non-voice signals. The zero crossing VAD
detects this difference and detects speech by outputting the end point with the first point of the
speech segment. The only requirement is to capture cross points across the trigger and offset
lines. At this time, detailed voice signal detection is not necessary, and as a result, the sampling
frequency and the number of bits can be reduced.
[0033]
In our VAD, the sampling frequency can be reduced to 2 kHz and the number of bits per sample
can be set to 10 bits. A single microphone is sufficient to detect the signal, and the remaining 15
microphones are turned off as well. These values are sufficient to detect human language, in this
case only 3.49 μW of power is consumed in a 0.18-μm CMOS process.
[0034]
By separating the low power VAD processing unit 52 from the voice processing unit, the power
management unit 53 can be used to turn off the voice processing unit (such as the SSL
processing unit 55 and the SSS processing unit 56). Furthermore, it is necessary to operate all
VAD processing units 52 at all nodes. When the VAD processing unit 52 is activated only with a
limited number of nodes in the system, and the VAD processing unit 52 detects an audio signal,
the processor related to the main signal starts execution, and the sampling frequency and the
number of bits are It has been increased to a sufficient value. Note that these parameters that
determine the analog to the specification of the AD conversion circuit 51 can be changed
according to the specific application integrated in the system.
[0035]
Next, the distributed voice acquisition process will be described below. FIG. 4 is a block diagram
showing details of the delay and sum circuit unit used in the system of FIG. The following two
types of methods to improve the main sound source have been proposed to obtain high SNR
voice data. (1) A method using geometrical position information, and (2) A statistical method not
using position information.
03-05-2019
13
[0036]
The system according to the present embodiment is based on the premise that the positions of
the nodes in the network are known, so that an algorithm (for example, see Non-Patent
Document 6, FIG. 4) classified as a geometrical method is formed. The delayed sum beam was
selected. This method produces less distortion than statistical methods. Fortunately, it requires a
small amount of computation, which is easily applicable to distributed processing. The key point
for collecting audio data from distributed nodes is to juxtapose the phases of audio between
adjacent nodes, where phase mismatch (= time delay) is the distance from the source to each
node Caused by differences.
[0037]
FIG. 5 is a plan view showing the basic principle of the plurality of delay / sum circuit units of
FIG. 4 distributed and arranged, and FIG. 6 is a graph showing time delay from a sound source
showing operation in the system of FIG. In this embodiment, a two-layer algorithm is introduced
in order to realize the dispersive delay sum beam formed as shown in FIG. In the local layer, each
node collects 16 channels of speech with a local delay from the node's origin, and then an
expanded single sound is generated within the node using the basic delay sum algorithm. To be
acquired. Then, the global data-emphasized speech data that can be calculated at the location of
the summing array is transmitted to the neighboring nodes of the global layer and finally
aggregated into speech data with high SNR. The voice packet contains a time stamp and voice
data of 64 samples. Here, the time stamp is given by TPacket = TREC-Dsender. Here, TREC
represents a timer value at the transmitting node when voice data in a packet is recorded, and
DSender indicates a global delay at the origin of the transmitting node. At the receiving node, the
received timestamp is adjusted by adding its global delay (DReceiver) to TPacket, and the voice
data is aggregated in the form of a delay sum (FIG. 6). Each node transmits single-channel voice
data, but as a result, high SNR voice data can be acquired at the base station.
[0038]
FIG. 7 shows an explanatory view of sound source localization according to the present invention.
As shown in FIG. 7, six nodes provided with a microphone array and one voice processing server
20 are connected by a network 10. Six nodes including a microphone array configured by
03-05-2019
14
arranging a plurality of microphones in an array form are present on four walls of a room, and a
processor for sound collection processing present in each node estimates the sound source
direction And identify the position of the sound source by integrating the result into the speech
processing server. In order to process data at each node, the amount of communication on the
network can be reduced, and the amount of computation can be distributed among the nodes.
[0039]
Hereinafter, the case of two-dimensional sound source localization and the case of threedimensional sound source localization will be separately described in detail. First, the twodimensional sound source localization method of the present invention will be described with
reference to FIG. FIG. 8 illustrates a two-dimensional sound source localization method. As shown
in FIG. 8, nodes 1 to 3 estimate the sound source direction from the collected sound signal
collected from each microphone array. Each node calculates the response strength of the MUSIC
method for each direction, and estimates the direction of the maximum value as the sound source
direction. In FIG. 8, the node 1 sets the perpendicular direction (front direction) of the array
surface of the microphone array to 0 °, and calculates the response strength with respect to the
direction from −90 ° to 90 °, and the direction of θ1 = −30 ° Shows a case of estimating
the sound source direction. Similarly, the node 2 and the node 3 also calculate the response
strength for each direction, and estimate the direction having the maximum value as the sound
source direction.
[0040]
Then, weighting is performed on the intersection of the sound source direction estimation results
of the two nodes, such as the node 1 and the node 2 or the node 1 and the node 3. Here, the
weight is determined based on the maximum response strength of the MUSIC method of each
node (for example, the product of the maximum response strengths of two nodes). In FIG. 8, the
scale of the weight is represented by the diameter of the circle at the intersection. Circles
(position and scale) indicating a plurality of obtained weights are sound source position
candidates. And the sound source position is estimated by calculating | requiring the gravity
center of the several obtained sound source position candidate. In the case of FIG. 8, finding the
centroids of a plurality of sound source position candidates means finding weighted centroids of
circles (positions and scales) indicating a plurality of weights.
[0041]
03-05-2019
15
Next, the three-dimensional sound source localization method of the present invention will be
described with reference to FIG. FIG. 9 illustrates a three-dimensional sound source localization
method. As shown in FIG. 9, the nodes 1 to 3 estimate the sound source direction from the
collected sound signal collected from each microphone array. Each node calculates the response
strength of the MUSIC method with respect to the three-dimensional direction, and estimates the
direction having the maximum value as the sound source direction. FIG. 9 shows the case where
the node 1 calculates the response strength with respect to the direction of the rotational
coordinate system in the perpendicular direction (front direction) of the array surface of the
microphone array, and estimates the direction with the higher strength as the sound source
direction. . Similarly, the node 2 and the node 3 also calculate the response strength for each
direction, and estimate the direction having the maximum value as the sound source direction.
[0042]
Then, weights are obtained for the intersection points of the estimation results of the sound
source direction of two nodes, such as node 1 and node 2 or node 1 and node 3. However, in the
case of three dimensions, the intersection points are Often not available. Therefore, an
intersection point is virtually obtained on a line segment connecting the straight lines of the
sound source direction estimation results of the two nodes at the shortest. The weight is
determined based on the maximum response strength of the MUSIC method of each node (for
example, the product of the maximum response strength of two nodes), as in the case of two
dimensions. In FIG. 9, as in FIG. 8, the scale of the weight is expressed by the diameter of the
circle at the intersection.
[0043]
Circles (position and scale) indicating a plurality of obtained weights are sound source position
candidates. And the sound source position is estimated by calculating | requiring the gravity
center of the several obtained sound source position candidate. In the case of FIG. 9, to obtain the
gravity centers of a plurality of sound source position candidates is to obtain weighted gravity
centers of circles (position and scale) indicating a plurality of weights.
[0044]
03-05-2019
16
One embodiment of the present invention will be described. FIG. 10 shows a configuration
diagram of the microphone array network system of the first embodiment. FIG. 10 shows a
system configuration in which a node (1a, 1b,..., 1n) provided with a microphone array in which
16 microphones are arranged in an array form and one voice processing server 20 are connected
by a network 10. In each node, as shown in FIG. 11, the signal lines of the microphones (m11,
m12,..., M43, m44) arranged in an array of 16 are input / output units (I / Os) of the sound
collection processing unit 2. And the signal collected from the microphone is input to the
processor 4 of the sound collection processing unit 2. The processor 4 of the sound collection
processing unit 2 performs an algorithm process of the MUSIC method using the input sound
collection signal to estimate the sound source direction.
[0045]
Then, the processor 4 of the sound collection processing unit 2 transmits the sound source
direction estimation result and the maximum response strength to the speech processing server
20 shown in FIG. 7.
[0046]
As described above, sound localization is performed in a distributed manner in each node, and
the result is integrated into the speech processing server, and the above-described twodimensional localization and three-dimensional localization processing are performed to estimate
the position of the sound source.
[0047]
FIG. 12 shows a functional diagram of the microphone array network system of the first
embodiment.
[0048]
The node provided with the microphone array A / D converts the signal from the microphone
array (step S11), and inputs the collected sound signal of each microphone (step S13).
The processor mounted on the notebook estimates the sound source direction as a sound
collection processing unit using the signals collected from the microphones (step S15).
03-05-2019
17
[0049]
The sound collection processing unit sets the front (vertical direction) of the microphone array to
0 ° and calculates the response intensity of the MUSIC method for the direction from -90 ° to
90 ° to the left and right as shown by the graph in FIG.
Then, the direction in which the response intensity is strong is estimated as the sound source
direction.
The sound collection processing unit is connected to the voice processing server via a network
(not shown), and exchanges data between the sound source direction estimation result (A) and
the maximum response strength (B) in the node (step S17). The sound source direction
estimation result (A) and the maximum response strength (B) are sent to the speech processing
server.
[0050]
The voice processing server receives data sent from each node (step S21). A plurality of sound
source position candidates are calculated from the maximum response strength of each node
(step S23). Then, the position of the sound source is estimated based on the sound source
direction estimation result (A) and the maximum response strength (B) (step S25).
[0051]
In the following, the three-dimensional sound source localization accuracy will be described. FIG.
13 is a schematic view showing the experiment of the three-dimensional sound source
localization accuracy. The floor area is assumed to be a room of 12m x 12m and 3m in height. A
microphone array in which 16 microphones are arranged in an array is assumed to be 16 subarrays arranged at equal intervals in a square of the floor surface (case A of 16 sub-arrays). In
addition, it is assumed that 41 sub-arrays in which 16 microphone arrays are arranged at four
sides of the floor and 16 microphone arrays at four sides of the ceiling at equal intervals, and
nine microphone arrays are arranged at equal intervals on the floor (Case B of 41 subarrays). In
addition, it is assumed that 73 sub-arrays in which 32 microphone arrays are arranged at four
03-05-2019
18
sides of the floor and 32 microphone arrays are arranged at equal intervals in the four sides of
the ceiling and nine microphone arrays are arranged at equal intervals on the floor (Case C of 73
subarrays).
[0052]
Using these three cases A to C, the number of nodes and the error dispersion of the estimation of
the sound source direction of each node were changed, and the results of the three-dimensional
position estimation were compared. In three-dimensional position estimation, each node
randomly selects one communication partner to obtain a virtual intersection point.
[0053]
The measured results are shown in FIG. The horizontal axis of FIG. 14 indicates the variation
(standard deviation) of the direction estimation error, and the vertical axis indicates the position
estimation error. From the results of FIG. 14, it can be understood that the accuracy of threedimensional position estimation can be improved by increasing the number of nodes even if the
estimation accuracy of the sound source direction is poor.
[0054]
Another embodiment of the present invention will be described. FIG. 16 shows a configuration
diagram of the microphone array network system of the second embodiment. FIG. 17 shows a
system configuration in which nodes (1a, 1b, 1c) provided with a microphone array in which 16
microphones are arranged in an array are connected by networks (11, 12). In the case of the
system of the second embodiment, unlike the system configuration of the first embodiment, there
is no voice processing server. In each node, as shown in FIG. 11, as in the first embodiment, the
signal lines of the microphones (m11, m12,..., M43, m44) arranged in an array of 16 are a sound
collection processing unit The signal collected from the microphone is input to the processor 4 of
the sound collection processing unit 2. The processor 4 of the sound collection processing unit 2
performs an algorithm process of the MUSIC method using the input sound collection signal to
estimate the sound source direction.
[0055]
03-05-2019
19
Then, the processor 4 of the sound collection processing unit 2 exchanges data of the sound
source direction estimation result with the adjacent node or another node. The processor 4 of the
sound collection processing unit 2 performs the above-described two-dimensional localization
and three-dimensional localization processing from the sound source direction estimation results
and the maximum response intensities of the plurality of nodes including the own node to
estimate the position of the sound source.
[0056]
Second Embodiment FIG. 1 is a block diagram showing a detailed configuration of a node used in
a position measurement system according to a second embodiment of the present invention. The
position measurement system according to the second embodiment is characterized by using the
sound source localization system according to the first embodiment to measure the position of
the terminal with high accuracy as compared with the prior art. The position measurement
system according to the present embodiment is constructed using, for example, a ubiquitous
network system (UNS), and for example, by connecting a small scale microphone array (sensor
node) having 16 microphones by a predetermined network, as a whole A position measurement
system is configured by constructing a large scale microphone array speech processing system.
Here, each of the sensor nodes is equipped with a microphone processor, and performs speech
processing in a distributed and coordinated manner.
[0057]
The sensor node has the configuration of FIG. 1, and an example of processing in each sensor
node will be described below. First, in the initial stage, all the sensor nodes are in the sleep state,
and some sensor nodes separated by a certain distance, for example, one sensor node transmits a
sound signal for a predetermined time (for example, 3 seconds) The sensor node that has
detected the signal source starts estimation of sound source direction by multi-channel input. At
the same time, a wakeup message is broadcast to other sensor nodes present in the vicinity, and
the received sensor node also immediately starts source direction estimation. Each sensor node
transmits the estimation result to the base station (the sensor node connected to the server
apparatus) after the estimation of the sound source direction. The base station estimates the
sound source position using the collected direction estimation results of each sensor node, and
broadcasts the results to all the sensor nodes that have performed the sound source direction
estimation. Next, each sensor node performs sound source separation using the position
03-05-2019
20
estimation result received from the base station. Similarly to sound source localization, sound
source separation is performed in two steps within the sensor node and between the sensor
nodes. The voice data obtained at each sensor node is aggregated again to the base station via
the network. The finally obtained voice signal with a high SNR is transferred from the base
station to the server apparatus, and is used for a predetermined application on the server
apparatus.
[0058]
FIG. 17 is a block diagram showing the configuration (a specific example) of a network used in
the position measurement system of this embodiment. FIG. 18 (a) is a perspective view showing a
method of Flooding Time Synchronization Protocol (FTSP) used in the position measurement
system of FIG. 17, and FIG. 18 (b) is data showing the method. It is a timing chart which shows
the situation of propagation. Further, FIG. 19 is a graph showing time synchronization with linear
interpolation used in the position measurement system of FIG.
[0059]
In FIG. 17, the sensor nodes N0 to N2 including the server apparatus SV are connected by, for
example, a UTP cable 60, and communication is performed using 10 BASE-T Ethernet (registered
trademark). In the present embodiment, the sensor nodes N0 to N2 are connected in a linear
topology, and one sensor node N0 among them is operated as a base station, and is connected to
a server apparatus SV such as a personal computer. The data link layer of the communication
system uses a known Low Power Listening Method for low power consumption, and the route
construction in the network layer uses a known Tiny Diffusion method. .
[0060]
In the present embodiment, in order to collect voice data between the sensor nodes N0 to N2, it
is necessary to synchronize time (value of timer) at all sensor nodes on the network. In this
embodiment, a synchronization method in which linear interpolation is added to a known
Flooding Time Synchronization Protocol (FTSP) is used. The FTSP achieves high accuracy
synchronization only by one-way simple communication. Although the accuracy of
synchronization by FTSP is less than 1 microsecond between adjacent sensor nodes, the crystal
oscillators of the sensor nodes have variations, and as shown in FIG. 19, time lag occurs with
03-05-2019
21
time after synchronization processing as shown in FIG. This shift is from several microseconds to
several tens of microseconds in one second, which may degrade the performance of sound
source separation.
[0061]
FIG. 18 (a) is a perspective view showing a method of flooding time synchronization protocol
(FTSP; for example, see Non-Patent Document 8) used in the position measurement system of
FIG. 17, and FIG. 18 (b) Is a timing chart showing the state of data propagation showing the
method.
[0062]
In the proposed system of this embodiment, the time shift between sensor nodes is stored at the
time of time synchronization by the FTSP, and the advance of the timer is adjusted by linear
interpolation.
Assuming that the reception timestamp at the first synchronization, the timestamp at the second
synchronization, and the timer value on the reception side, the deviation of the oscillation
frequency is adjusted by adjusting how the timer advances only in the period It can be corrected.
As a result, it is possible to suppress the time lag after completion of synchronization within 0.17
microseconds in one second. Even if time synchronization by FTSP is once per minute, linear
interpolation can suppress the time gap between sensor nodes within 10 microseconds and
maintain the sound source separation performance. .
[0063]
In each sensor node, relative time (for example, the time when the first sensor node is turned on
is defined as 0 and the elapsed time is defined as relative time. Or an absolute time (for example,
a calendar date, hour, minute, and second). ) Is stored, and time synchronization is performed
between the sensor nodes in the manner described above. This time synchronization is used to
measure the correct distance between sensor nodes as described later.
[0064]
03-05-2019
22
FIG. 20A and FIG. 20B are timing charts showing a signal transmission procedure between the
tablets T1 to T4 in the position measurement system according to the second embodiment and
each process executed by the tablets T1 to T4. Here, for example, each of the tablets T1 to T4
having the configuration of FIG. 1 includes the sensor node. In the following description, an
example in which the tablet T1 is a master and the tablets T2 to T4 are slaves will be described.
However, the number of tablets and the master may use any tablet. Also, the sound signal may be
an audible sound wave or an ultrasonic wave exceeding the frequency of the sound area. Here,
for the sound signal, for example, the AD conversion circuit 51 is also provided with a DA
conversion circuit, and generates an omnidirectional sound signal from one microphone 1 in
response to an instruction of the SSL processing unit 55, or an ultrasonic wave generating
element And may generate an omnidirectional sound signal of ultrasonic waves in response to an
instruction of the SSL processing unit 55. Furthermore, the SSS process may not be performed in
FIGS. 20A and 20B.
[0065]
In FIG. 20A, first, in step S31, the tablet T1 instructs the tablets T2 to T4 to "prepare to receive a
sound signal by the microphone 1 and to execute SSL processing in response to the sound signal.
After transmitting the "SSL indication signal", after a predetermined time, the sound signal is
transmitted for a predetermined time such as 3 seconds. The SSL instruction signal includes
transmission time information of the sound signal, and each of the tablets T2 to T4 calculates the
difference between the time when the sound signal is received and the transmission time
information, that is, the transmission time of the sound signal The distance between the tablet T1
and one's own tablet is calculated and stored in the built-in memory by multiplying the speed of
the known sound wave or ultrasonic wave by the calculated transmission time. In addition, each
of the tablets T2 to T4 uses the MUSIC method described in detail in the first embodiment based
on the received sound signal (see, for example, Non-Patent Document 7). By performing the
process of sound source localization using the above-mentioned), calculate the direction of
arrival of the sound signal, and store it in the built-in memory. That is, in the SSL processing of
each of the tablets T2 to T4, the distance from the tablet T1 to its own tablet and the angle with
respect to the tablet T1 are estimated and stored.
[0066]
Next, in step S32, the tablet T1 transmits, to the tablets T3 and T4, "an SSL instruction signal
instructing preparation for reception by the microphone 1 and execution of SSL processing in
03-05-2019
23
response to the sound signal". Thereafter, after a predetermined time, a sound generation signal
instructing to generate a sound signal is transmitted to the tablet T2. Here, the tablet T1 is also in
a standby state for the sound signal. In response to the sound generation signal, the tablet T2
generates a sound signal and transmits it to the tablets T1, T3 and T4. Each tablet T1, T3 and T4
estimates and calculates the arrival direction of the sound signal by performing processing of
sound source localization using the MUSIC method described in detail in the first embodiment
based on the received sound signal, and incorporates the internal memory Remember to That is,
in the SSL processing of each of the tablets T1, T3 and T4, the angle with respect to the tablet T2
is estimated, calculated and stored.
[0067]
Furthermore, in step S33, the tablet T1 transmits, to the tablets T2 and T4, "an SSL instruction
signal instructing preparation for reception by the microphone 1 and executing SSL processing in
response to the sound signal". Thereafter, after a predetermined time, a sound generation signal
is sent to the tablet T3 to generate a sound signal. Here, the tablet T1 is also in a standby state
for the sound signal. In response to the sound generation signal, the tablet T3 generates a sound
signal and transmits it to the tablets T1, T2 and T4. Each tablet T1, T2 and T4 performs
processing of sound source localization using the MUSIC method described in detail in the first
embodiment based on the received sound signal to estimate and calculate the direction of arrival
of the sound signal, thereby incorporating the internal memory Remember to That is, in the SSL
process of each of the tablets T1, T2 and T4, the angle with respect to the tablet T3 is estimated,
calculated and stored.
[0068]
Furthermore, in step S34, the tablet T1 transmits, to the tablets T2 and T3, "an SSL instruction
signal for preparing for reception by the microphone 1 and instructing execution of the SSL
processing in response to the sound signal". Then, after a predetermined time, a sound
generation signal instructing to generate a sound signal is transmitted to the tablet T4. Here, the
tablet T1 is also in a standby state for the sound signal. In response to the sound generation
signal, the tablet T4 generates a sound signal and transmits it to the tablets T1, T2 and T3. Each
tablet T1, T2, T3 estimates and calculates the arrival direction of the sound signal by performing
processing of sound source localization using the MUSIC method described in detail in the first
embodiment based on the received sound signal, and built-in memory Remember to That is, in
the SSL process of each of the tablets T1, T2 and T3, the angle with respect to the tablet T4 is
estimated, calculated and stored.
03-05-2019
24
[0069]
Next, in step S35 in which data communication is performed, the tablet T1 transmits an
information reply instruction signal to the tablet T2. In response to this, the tablet T2 calculates
the distance between the tablet T1 and T2 calculated in step S31 and the angle when the
respective tablet T1, T3 and T4 are viewed from the tablet T2 calculated in steps S31 to S34. And
an information reply signal including the information to the tablet T1. In addition, the tablet T1
transmits an information reply instruction signal to the tablet T3. In response to this, the tablet
T3 calculates the distance between the tablet T1 and T3 calculated in step S31 and the angle
when the respective tablet T1, T2 and T4 are viewed from the tablet T3 calculated in steps S31
to S34. And an information reply signal including the information to the tablet T1. Furthermore,
the tablet T1 transmits an information reply instruction signal to the tablet T4. In response to
this, the tablet T4 determines the distance between the tablet T1 and T4 calculated in step S31
and the angle when the tablets T1, T2 and T3 are viewed from the tablet T4 calculated in steps
S31 to S34. And an information reply signal including the information to the tablet T1.
[0070]
In the overall SSL processing of the tablet T1, based on the information collected as described
above, the tablet T1 calculates the distance between each tablet as described below with
reference to FIG. The XY coordinates of the other tablets T2 to T4 when, for example, the tablet
T1 (A in FIG. 21) is set as the origin of the XY coordinates based on angle information when
viewing the other tablets in each of the tablets T1 to T4 The coordinate values of all the tablets
T1 to T4 can be determined by calculation using known trigonometric function definition
expressions. The coordinate values may be displayed on a display or may be output and printed
on a printer. Further, for example, a predetermined application described in detail later may be
executed using the above-mentioned coordinate values.
[0071]
In addition, about the SSL whole process of tablet T1, only tablet T1 which is a master may be
performed, and you may carry out by all the tablets T1-T4. That is, at least one tablet or server
device (for example, SV in FIG. 17) may execute. The SSL processing and the SSL overall
processing are executed by, for example, the SSL processing unit 55 which is a control unit.
03-05-2019
25
[0072]
FIG. 21 corresponds to the tablets T1 to T4 (A, B, C, and D in FIG. 21) of the position
measurement system according to the second embodiment. It is a top view which shows the
method of measuring the distance between each tablet from the angle information measured by
2.). The server device calculates distance information for all of the tablets after all the tablets
obtain angle information. In the calculation of the distance information, as shown in FIG. 21, the
lengths of all the sides are obtained according to the sine theorem using the values of 12 angles
and the length of any one side. Assuming that the length of AB is d, the length of AC can be
obtained by the following equation.
[0073]
[0074]
Similarly, the lengths of the other sides can be determined using 12 angles and the length d.
If each sensor node can perform the above-described time synchronization, each sensor node can
obtain the distance from the difference between the sound generation start time and the arrival
time without using the above calculation method. Although the number of nodes in FIG. 21 is
four, the present invention is not limited thereto. The number of nodes is two or more, and the
inter-node distance can be obtained regardless of the number of nodes.
[0075]
Although the two-dimensional position is estimated in the second embodiment described above,
the present invention is not limited to this, and the three-dimensional position may be estimated
using a similar formula.
[0076]
Further, the implementation of the sensor node on the mobile terminal will be described below.
03-05-2019
26
When the network system is put into practical use, it may be possible to mount the sensor node
on a mobile terminal such as a robot as well as using it fixed to a wall or a ceiling. If the position
of the person to be recognized can be estimated, an operation of bringing the robot closer to the
person to be recognized becomes possible for collection of images with higher resolution and
high-accuracy voice recognition. In addition, mobile terminals such as smart phones, which are
rapidly spreading in recent years, can acquire their own current position using the GPS function,
but it is difficult to acquire the positional relationship between terminals in a short distance.
However, if the sensor node of the network system is mounted on the mobile terminal, it is
possible to obtain the positional relationship between the terminals in a short distance which can
not be determined by the GPS function etc. by emitting sound from the terminals and localizing
each other. . In this embodiment, two types of a message exchange system and a multiplayer
hockey game system are implemented using a programming language java as an application that
uses the positional relationship between terminals.
[0077]
In this embodiment, a tablet personal computer that executes an application and a prototype
sensor node are connected. A general-purpose OS is installed as the OS of the tablet personal
computer, and a wireless network is configured by having two USB 2.0 ports and a wireless LAN
function compliant with IEEE 802.1b / g / n. The microphones of the prototype sensor node are
arranged at intervals of 5 cm on the four sides of this tablet personal computer, and the sound
source localization module is operating in the sensor node (constituted by FPGA), and the
localization result is output to the tablet personal computer Configured. The position estimation
accuracy in the present embodiment is about several centimeters, which is significantly higher
than that of the prior art.
[0078]
Third Embodiment FIG. 22 is a block diagram showing a configuration of a node of a data
aggregation system for a microphone array network system according to a third embodiment of
the present invention, and FIG. 23 is a data communication of FIG. It is a block diagram which
shows the detailed structure of the part 57a. FIG. 24 is a table showing a detailed configuration
of the table memory in the parameter memory 57 b of FIG. The data aggregation system
according to the third embodiment is a data aggregation that efficiently collects voice data using
the sound source localization system according to the first embodiment and the sound source
03-05-2019
27
position measurement system according to the second embodiment. It is characterized in that the
system is configured. Specifically, the communication method of the data aggregation system
according to the present embodiment is used as a route construction method for a microphone
array network system corresponding to a plurality of sound sources. The microphone array
network is a technology for obtaining an audio signal with high SNR using a plurality of
microphones. By building a network with data processing and communication functions, it is
possible to collect a wide range of high SNR voice data. In the present embodiment, by applying
to the microphone array network, it is possible to construct an optimal path for a plurality of
sound source positions and to collect voices from each sound source at the same time. Thus, for
example, an audio conference system compatible with a plurality of speakers can be realized.
[0079]
Each sensor node is, as shown in FIG. 22, (1) AD conversion circuit 51 connected to a plurality of
microphones 1 for picking up sound, and (2) VAD connected to AD conversion circuit 51 for
detecting an audio signal. A processing unit 52, (3) an SRAM 54 for temporarily storing audio
data such as an audio signal including an audio signal or a sound signal AD-converted by the AD
conversion circuit 51, and (4) audio data stored in the SRAM 54. (5) A sound source localization
process for estimating the position of the sound source is performed on the sound data output
from the SRAM 54, and the result is used as a sound source. The separation processing (SSS
processing) and other processing are executed, and the voice data with high SNR obtained as a
result of those processing is transmitted to the other node and the data communication unit 57a.
And a microprocessor unit (MPU) 50, which collects data by transmitting and receiving, (6) a
data memory connected to the data communication unit 57a and the MPU 50, a timer for time
synchronization processing, and a parameter memory for storing parameters for data
communication And a parameter memory 57b, and (7) data constituting a network interface
circuit which is connected to other surrounding sensor nodes Nn (n = 1, 2,..., N) and transmits /
receives voice data, control packets, etc. And a communication unit 57a.
[0080]
Although each sensor node Nn (n = 0, 1, 2, ..., N) has the same configuration as each other, the
sensor node N0 of the base station further enhances the SNR by collecting the voice data on the
network. Voice data is obtained.
[0081]
As shown in FIG. 23, the data communication unit 57a of FIG. 23 is connected to other
surrounding sensor nodes Nn (n = 1, 2,..., N), and transmits and receives audio data and control
packets. Layer circuit unit 61, (2) MAC processing unit 62 connected to physical layer circuit unit
61 and time synchronization unit 63 and executing media access control processing for audio
03-05-2019
28
data and control packets, etc. (3) MAC processing unit 62 And a time synchronization unit 63
connected to the timer and parameter memory 57b and executing time synchronization
processing with another node, (4) temporarily storing data such as voice data or control packets
extracted by the MAC processing unit 62. A reception buffer 64 for storing and outputting to the
header analyzer 66; (5) voice data or control packets generated by the packet generation unit 68
The transmission buffer 65 temporarily stores the packet and outputs the packet to the MAC
processing unit 62. (6) The packet stored in the reception buffer 64 is received, the header of the
packet is analyzed, and the result is output to the routing processing unit 67 or Header analyzer
66 outputting to VAD processing unit 50, delay / sum circuit unit 52 and MPU 59, (7) Based on
the analysis result from header analyzer 66, it is determined to which node the packet should be
routed and (8) audio data from the delay / sum circuit unit 52 or control data from the MPU 59
is received based on a routing instruction from the routing processing unit 67. To transmit buffer
65 and output it to MAC processor 62. And a packet generation unit 68.
[0082]
Further, as shown in FIG. 24, the table memory in the parameter memory 57b includes: (1) own
node information (node ID and own node XY coordinates) determined and stored in advance; (2)
time period T11 The acquired path information (part 1) (the transmission destination node ID
toward the base station), and (3) the path information acquired during the time period T12 (the
part 2) (the transmission destination node ID of cluster CL1, cluster CL2 Destination node ID, ...,
destination node ID of cluster CLN) (4) Cluster information (cluster head node ID (cluster CL1)
acquired at time periods T13 and T14, XY coordinates of sound source SS1, cluster head node ID
(cluster CL2), XY coordinates of sound source SS2, ..., cluster head node ID (cluster CLN), XY
coordinates of sound source SSN) Remember.
Each node N n (n = 1, 2,..., N) is located on a plane and has predetermined XY coordinate system
coordinates (known), and the position of each sound source is measured by position
measurement processing. Ru.
[0083]
FIG. 25 is a schematic plan view showing the processing operation of the data aggregation
system of FIG. 22. FIG. 25 (a) is a schematic plan view showing the processing and routing (T11)
of the FTSP from the base station. b) is a schematic plan view showing voice activity detection
(VAD) and detection message transmission (T12), and FIG. 25 (c) is a schematic plan view
03-05-2019
29
showing wakeup message and clustering (T13); ) Is a schematic plan view showing a delay-sum
process (T14) by selecting a cluster.
26A and 26B are timing charts showing the processing operation of the data collection system of
FIG.
[0084]
In the operation example of FIG. 25, FIG. 26A and FIG. 26B, a cluster of one hop is constructed
for each of two sound sources SSA and SSB, and a base station in the lower right (a node of a
plurality of nodes, a square In the symbol with a circle inside.
An example of collecting and emphasizing voice data to N0 is shown. First, the base station N0 of
the microphone array sensor node simultaneously controls the control packet CP (white arrow)
using a predetermined FTSP and NNT (Nearest Neighbor Tree) protocol every predetermined
time such as 30 minutes. ) Is used to broadcast for time synchronization between nodes and
collection path construction by the spanning tree to the base station (FIG. 25A, T11 in FIG. 26A).
After that, each node (N1 to N8) other than the base station goes into sleep mode until voice
input is detected to reduce power consumption. In the sleep mode, a circuit including the AD
conversion circuit 51 and the VAD processing unit 52 of FIG. 22, a circuit for receiving a wakeup
message (a physical layer circuit unit 61 and a MAC processing unit 62 of the data
communication unit 57a, and a timer The circuits other than the parameter memory 57b) are not
supplied with power, and power consumption can be significantly reduced.
[0085]
Next, when an audio signal is generated from each of the two sound sources SSA and SSB, a node
to which the VAD processing unit 52 responds by detecting an audio signal (that is, an utterance)
(node N4 shown by ● in FIGS. 25 and 26A) N7) sends a detection message to the base station
N0 using the control packet CP and transmits the detection message to the base station N0 using
the spanning tree path constructed in T11 (T12 in FIG. 25 and FIG. 26A) ) And broadcast a
wakeup message (startup message) instructing start-up using the control packet CP (FIG. 25 (c)
and T13 in FIG. 26A). However, the range to be broadcast at this time is only the same number of
hops as the cluster distance to be constructed (one hop in the operation example of FIG. 25). The
wake-up message activates peripheral sleep nodes (N1 to N3 and N8) and simultaneously forms
03-05-2019
30
a cluster centered on the node to which the VAD processing unit 52 has reacted.
[0086]
Next, the node to which the VAD processing unit 52 has reacted and the node activated by the
wakeup message (in the operation example, nodes N1 to N8 other than the base station N0)
estimate the direction of the sound source using the microphone array network system ,
Transmit the result to the base station N0. The route used at this time is a route by the spanning
tree constructed in FIG. The base station N0 geometrically estimates the absolute position of each
sound source using the method of the position measurement system according to the second
embodiment described above based on the sound source direction estimation result of each node
and the known position of each node. . Furthermore, the base station N0 designates the node
closest to the sound source as the source node of the detection message as the cluster head node,
and broadcasts it to each node (N1 to N8) of the entire network together with the estimated
absolute position of the sound source. Do. If a plurality of sound sources SSA and SSB are
estimated, cluster head nodes of the same number as the number of sound sources are
designated. As a result, clusters corresponding to the physical positions of the sound sources are
formed, and paths from each cluster head node to the base station N0 are constructed (T14 in
FIG. 25D and FIG. 26B). In the operation example of FIG. 25, a node N6 (shown by ◎ in FIG. 26D)
is designated as a cluster head node of the sound source SSA, and a node belonging to that
cluster is N3 in one hop from N6, N6 and N7. In addition, as a cluster head node of the sound
source SSB, a node N4 (shown by ◎ in FIG. 26D) is designated, and the nodes belonging to that
cluster are N1, N3, N4 and N5 within one hop from N4. , N7. That is, each node located within
the hop count from each of the cluster head nodes N6 and N4 is clustered as a node belonging to
each cluster. Then, the emphasizing process is performed based on the voice data measured by
each node belonging to each cluster, and the voice data after the emphasizing process is
transmitted to the base station N0. As a result, the voice data emphasized for each cluster
corresponding to each of the sound sources SSA and SSB is transmitted to the base station N0
using the packets ESA and ESB. Here, the packet ESA is a packet for transmitting voice data
obtained by emphasizing the voice data from the sound source SSA, and the packet ESB is a
packet for transmitting voice data obtained by emphasizing the voice data from the sound source
SSB. .
[0087]
FIG. 27 is a plan view showing the configuration of an embodiment of the data aggregation
system of FIG. The inventors created a prototype using an FPGA (field programmable gate array)
03-05-2019
31
board to evaluate the network of the microphone array according to the present embodiment.
The prototype device has the functions of a VAD processing unit, sound source localization,
sound source separation, and a wired data communication module. The FPGA board of the
prototype device is configured to have 16 channels of microphones 1, and the 16 channels of
microphones 1 are arranged in a grid of 7.5 cm apart. Since the goal of this system is human
speech with a frequency range of 30 Hz to 8 kHz, the sampling frequency is set to 16 kHz.
[0088]
Here, each sub-array is connected using a UTP cable. The 10BASE-T Ethernet (registered
trademark) protocol is used as a physical layer. In the data link layer, a protocol adopting LPL
(listening low power consumption) (see, for example, Non-Patent Document 11). To reduce power
consumption).
[0089]
To verify the performance of the proposed system, we conducted experiments on the three
subarrays of FIG. As shown in FIG. 27, three sub-arrays are arranged, and one sub-array 1 located
at the center is connected to the server PC as a base station. Here, the network topology used a
two-hop linear topology to evaluate the multi-hop environment.
[0090]
From the measured signal waveform after time synchronization processing, the maximum time
lag between the subarrays is 1 μs immediately after the completion of the FTSP synchronization
processing, and the maximum time lag between the subarrays with and without linear
interpolation is respectively 10 microseconds per minute and 900 microseconds per minute.
[0091]
Next, we evaluated the data capture of speech using the distributed delay and sum algorithm.
Here, as shown in FIG. 27, a 500 Hz sine wave signal source and a noise source (300 Hz, 700 Hz,
and 1300 Hz sine waves) were used. The experimental results show that the speech signal is
03-05-2019
32
enhanced, the noise is reduced, and the SNR is improved as the number of microphones is
increased. It has also been found that under the condition of 48 channels, noise of 300 Hz and
1300 Hz is suppressed by 20 dB without degrading the signal source (500 Hz) dramatically. On
the other hand, the noise of 700 Hz is slightly suppressed. This is considered to be due to the
occurrence of interference due to the position of the signal source and the noise source. Also, in
another experiment, it was found that the noise source of 700 Hz is hardly suppressed or
suppressed around the position of the noise source even in the case of 48 channels, this problem
is to increase the number of nodes It is thought that it can be avoided by In addition, we also
confirmed that voice acquisition can operate in real time using three sub-arrays.
[0092]
As described above, in cluster-based routing according to the prior art, clustering is performed
based only on information of the network layer. On the other hand, in an environment where
there are multiple signal sources to be sensed in a large-scale sensor network, in order to
construct a path optimized for each signal source, sensor node clustering technology based on
sensing information is required. Met. Therefore, in the method according to the present
invention, the path construction more specialized for the application is realized by using the
signal information (information of the application layer) sensed in selecting a cluster head and
constructing a cluster. Further, by combining with a wakeup mechanism (hardware) such as the
VAD processing unit 52 in the microphone array network, it is possible to further improve the
low power consumption performance.
[0093]
In the above embodiments, a sensor network system related to a microphone array network
system for high-quality sound acquisition has been described, but the present invention is not
limited to this, and temperature, humidity, human detection, animal detection, stress It is
applicable to the sensor network system which concerns on various sensors, such as detection
and light detection.
[0094]
As described above in detail, according to the sensor network system and the communication
method thereof according to the present invention, a plurality of signals are utilized by using a
signal to be sensed for clustering, cluster head determination, and routing on the sensor network.
By constructing a network route dedicated to data aggregation corresponding to the physical
03-05-2019
33
arrangement of the source, redundant routes can be reduced and at the same time the efficiency
of data aggregation can be improved.
In addition, since the communication overhead for route construction is small, network traffic
can be reduced, and the operation time of the communication circuit with large power
consumption can be reduced. Therefore, in the sensor network system, data aggregation can be
performed more efficiently than in the prior art, network traffic can be significantly reduced, and
power consumption of the sensor node can be reduced.
[0095]
1, m11, m12, ..., m43, m44: microphones, 1a, 1b, 1c, ..., 1n: microphone arrays, 2, 2a, 2b, 2c, ...,
2n: sound pickup processing units, 3: input / output units ( I / O unit 4) processor 10, 11, 12,
network 20, voice processing server 30, 30a, 30b, 30c node 50, MPU 51, AD converter 52, VAD
processor 53 Power supply management unit 54 SRAM 56 SSL processing unit 56 SSS
processing unit 57 Network interface circuit 57 a Data communication unit 57 b Timer and
parameter memory 58 Delay and sum circuit unit 61 Physical layer circuit unit 62 MAC
processing unit 63 Time synchronization unit 64 Reception buffer 65 Transmission buffer 66
Header analyzer 67 ... routing processor, 67m ... table memory, 68 ... packet generation unit,
N0∼NN ... sensor node (node), SV ... server, T1-T4 ... tablet.
03-05-2019
34
Документ
Категория
Без категории
Просмотров
0
Размер файла
59 Кб
Теги
jp2013030946
1/--страниц
Пожаловаться на содержимое документа