close

Вход

Забыли?

вход по аккаунту

?

JP2018117341

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018117341
Abstract: [Problem] It is possible to adjust and control an existing position of an audible area of a
sound image according to a subject. SOLUTION: From a virtual sound source generation unit 200
for generating a virtual sound source, a pair of left and right speakers 201a and 201b for
emitting a sound image generated by the virtual sound source generation unit 200, and the pair
of left and right speakers 201a and 201b And a control unit 150 that adjusts and controls the
position of the audible area of the sound image to be emitted in accordance with the object
person. The control unit 150 adjusts and controls the presence position of the audible area of the
sound image according to the subject, so that the subject can effectively listen to the sound
image. [Selected figure] Figure 1
Mobiles and programs
[0001]
The present invention relates to a movable body etc. capable of adjusting and controlling the
position of the audible area (also referred to as a "sweet spot") of a three-dimensional sound
image, and in particular, a sound image according to the object person by a rotation operation, a
movement operation, etc. The present invention relates to a movable object or the like capable of
adjusting and controlling the position of the audible area (sweet spot).
[0002]
Transaural processing is processing for performing localization of a three-dimensional sound
image based on acoustic signals from two speakers (stereo speakers).
03-05-2019
1
The transaural processing is a “convolution operation processing” that performs an operation
to convolve an input signal with a head related transfer function (“HRTF”), and a “cross”
that removes crosstalk with respect to the processing result by this convolution operation.
Includes talk cancellation processing. Here, "cross talk" refers to a path in which the output
sound from the right speaker (or left speaker) enters the left ear (or right ear) of the listener in
stereo reproduction, and the process for canceling this is "cross talk It is a cancellation process.
[0003]
The transaural processing enables localization of a three-dimensional sound image, but the
listener listens to the three-dimensional sound image when the “sweet spot”, which is an area
where the listener can grasp the sound image, is out. It has been proposed that this problem be
overcome, as it is not possible to do so, and the volume is even louder.
[0004]
The proposed sound reproduction device delays the binaural sound signal and, if the listener's
ears are located in the sweet spot, outputs the crosstalk canceled cancellation-processed binaural
sound signal, while If at least one ear is located outside the sweet spot, a delayed binaural signal
delayed by the time required for the crosstalk cancellation process is output (see, for example,
Patent Document 1).
[0005]
Further, as described in Patent Document 1, according to the transaural reproduction technology,
the sweet spot is generally narrow, and it becomes impossible to grasp a three-dimensional
sound image even if the listener moves a little.
[0006]
JP, 2015-170926, A (page 5-19, FIG. 1)
[0007]
Certainly, according to the device described in Patent Document 1, although there may be an
effect of suppressing an offensive sound when the sweet spot is removed, it is a so-called passive
device.
03-05-2019
2
That is, it is a passive device intended for a listener who has invaded the imaging area of the
fixed camera.
For this reason, the listener is positioned in a direction perpendicular to the straight line passing
through the middle point of the straight line connecting the existing positions of the left and
right pair of speakers for emitting the sound image (hereinafter also referred to as "center
direction") Because it was not always possible to make it happen, it was difficult to make the
sound image listen effectively.
[0008]
In addition, only the listener who is located in the limited area (in the sweet spot) can listen to the
reproduced sound image, and the person not located in the area can not be the listener, so the
reproduced sound image should be used effectively. I could not.
[0009]
The present invention has been made to solve the conventional problems as described above, and
it is an object of the present invention to provide a mobile body and program capable of
adjusting and controlling the presence position of the audible area of the sound image according
to the subject. To aim.
[0010]
In order to achieve the above object, according to the present invention, a virtual sound source
generation unit for generating a virtual sound image, a pair of left and right speakers for emitting
a sound image generated by the virtual sound source generation unit, and sound emission from a
pair of left and right speakers And a control unit that adjusts and controls the position of the
audible area of the sound image to be adjusted according to the subject.
[0011]
According to this configuration, the control unit adjusts and controls the presence position of the
audible area of the sound image emitted from the pair of left and right speakers according to the
target person, so for example, for the person who has already become the listener, The sound
image can be effectively listened to.
03-05-2019
3
Further, the sound image can be effectively used by capturing a target person who has not been
a listener as a listener.
[0012]
More specifically, the control unit includes a pivoting unit that fixes the pair of left and right
speakers to be pivotable, and a pivoting drive control unit that drives and controls pivoting
operation of the pivoting unit.
According to this configuration, the rotation drive control unit fixes the speaker and controls
rotation of the rotatable rotation portion. Therefore, for example, by positioning the listener in
the center direction, the sound image can be effectively obtained. It becomes possible to make it
listen.
[0013]
In addition, a configuration is also proposed in which the control unit includes a moving unit on
which the turning unit and the turning drive control unit are mounted, and a movement drive
control unit that drives and controls the movement operation of the moving unit.
According to this configuration, the movement drive control unit performs movement control of
the movement unit having the rotation unit mounted thereon, so for example, it is possible to
take in a person who is not located in a listenable area as a listener and to effectively use the
sound image. Become.
[0014]
The movement mode includes linear movement in the front and rear and right and left directions,
and the rotation mode includes clockwise rotation and counterclockwise rotation with the central
axis extending in the vertical direction of the movable body as the rotation center.
03-05-2019
4
The moving object side can move actively to position a person located outside the audible area
within the area, so attraction effects can be obtained by performing demonstration in a theme
park or the like.
[0015]
In addition, it is further provided with a determination unit that determines whether the listener
is positioned within the listenable area, and the control unit also proposes a configuration that
adjusts and controls the existing position of the hearable area based on the determination result
of the determination unit. it can.
According to this configuration, the determination unit determines whether or not the listener is
located in the audible area, and the control unit matches the presence position of the audible
area to the target person according to the result of the determination. Adjust and control.
[0016]
If the determination unit determines that the listener is not located in the audible area, for
example, the identification unit identifies the person closest to the moving object, and the
movement drive control unit identifies the person identified by the identification unit. The
moving unit is driven and controlled so as to be located in the audible area. Human identification
may be performed by image processing, distance measurement processing, infrared signal
processing, or the like.
[0017]
In addition, the control unit may be configured to adjust the existing position of the audible area
so as to follow the movement of the listener once identified by the identification unit. According
to this configuration, the audible area can be listened to Is adjusted and controlled to be followed
by the listener, so that the listener can effectively hear the sound image at all times. As a result,
attraction is enhanced.
03-05-2019
5
[0018]
In addition, a mobile unit is also proposed that is further provided with a delay unit on the front
side of the pair of left and right speakers, and the control unit further controls the delay unit to
give a delay amount corresponding to the rotation amount of the rotation unit. . According to this
configuration, even if the sweet spot is separated from the listener, the distance difference is
corrected by providing a delay according to the distance difference generated in the left and right
speakers due to the rotation of the rotation portion. Sound effects can be maintained.
[0019]
The control unit may further include a gain unit that gives a gain to the signal from the delay
unit, and the control unit may further control the gain unit to give a gain according to the
amount of rotation of the rotation unit. Thus, it is possible to correct the amount of attenuation of
the sound wave corresponding to the difference in distance between the left and right speakers
caused by the rotation of the moving part.
[0020]
The moving body of the other aspect includes a reproduction unit that reproduces audio data on
which virtual sound source processing has been performed, a pair of left and right speakers for
emitting a sound image reproduced by the reproduction unit, and a pair of left and right speakers
And a control unit configured to adjust and control the presence position of the audible area of
the reproduced sound to be produced in accordance with the subject.
[0021]
The reproduction unit reproduces audio data on which virtual sound source processing has been
performed, and emits the sound from the left and right pair of speakers, but the control unit
listens to an audible area of reproduced sound emitted from the left and right pair of speakers
The position of the subject is adjusted and controlled in accordance with the target person, so
that, for example, a person who has already become a listener can effectively make the sound
image listen, and a target person who has not become a listener yet It is also possible to make
effective use of the sound image by taking in as a listener.
[0022]
Furthermore, the program can be executed to realize a determination unit, a specification unit, a
03-05-2019
6
virtual sound source generation unit, a control unit, and the like.
The program is recorded, for example, in a recording medium such as a ROM.
A processor such as a CPU or DSP executes a program while using a work area formed in a RAM
or the like.
As a result, each part (each means) is realized, so that the virtual sound source generation part
that generates a sound image can be adjusted and controlled as to the presence of the sound
image hearable area by performing “rotation” and / or “move” control. .
[0023]
According to the present invention, it is possible to adjust and control the presence position of
the audible area of the sound image according to the subject.
[0024]
It is a front view of the mobile body 1, and an external view.
FIG. 2 is a block diagram of an electronic circuit 100. FIG. 3 is a configuration diagram of a
distance measuring unit 130. FIG. 7 is an explanatory diagram of a distance measuring operation
by the distance measuring unit 130. FIG. 2 is a block diagram of a virtual sound source
generation unit 200. FIG. 2 is a block diagram of an image processing unit 120. FIG. 6 is a block
diagram of an infrared device 400. FIG. 16 is an explanatory diagram of an output operation of
the infrared ray device 400. FIG. 2 is a block diagram of a virtual sound source generation unit
200. FIG. 5 is a schematic plan view of the moving unit 3; FIG. 6 is an explanatory view of the
movement operation of the movement unit 3; It is a flowchart for demonstrating operation |
movement. It is explanatory drawing of operation | movement. It is explanatory drawing of
operation | movement. 7 is a configuration example of a virtual sound source generation unit
201. It is explanatory drawing of operation | movement of a virtual sound source production |
generation part. It is typical explanatory drawing of the principle of other embodiment. It is
typical explanatory drawing of the principle of other embodiment. It is typical explanatory
drawing of the principle of other embodiment. It is a figure which shows the relationship
between rotation angle (theta) and an arrival time difference (distance difference). It is a block
03-05-2019
7
diagram of the virtual sound source production | generation part 202 of other embodiment. 5 is
a configuration diagram of a virtual sound source generation unit 204. FIG. It is an example of
concrete composition of a amendment part.
[0025]
Hereinafter, embodiments of the present invention will be described with reference to the
drawings. The embodiment shown below is only one embodiment of the present invention. The
invention includes various structural variations. In addition, the "audible area" where the sound
image can be listened to is referred to as a "sweet spot" as appropriate.
[0026]
(Structure) (Moving body 1) Fig. 1 (a) is a front view of the moving body 1, and Fig. 2 (b) is a
perspective external view of the moving body 1. The moving body 1 includes a moving unit 3, a
body 4, a rotation mechanism 4, and a head 2. The moving unit 3 is a plate-like member having a
square planar view, and omnidirectional wheels 111a, 1111b, and 111d are rotatably provided
on the front, rear, left, and right of the plate-like member. The body portion 4 has a columnar
appearance, is connected to the head portion 2 via a rotation mechanism 4 (rotation portion),
and is mounted on and fixed to the moving portion 3. The pivoting mechanism 4 can be realized
by a known pivoting means. The head 2 has a cylindrical appearance with a diameter smaller
than that of the body 4. The head 2 is rotatably fixed by a rotation mechanism 4 provided on the
upper surface of the body 4 fixed to the moving unit 3 (see reference numeral R in FIG. 1B).
[0027]
A display 500 (display monitor) for displaying and outputting required information is fixed to the
front of the body 4 with its display surface facing forward. A pair of speakers 201a and 201b are
fixed to the left and right portions of the head 2, and can emit an audio signal corresponding to
the generated sound image toward the front space. The display 500 displays a content moving
image, a still character, a message synthesized by the voice synthesis unit 300, and the like
according to the operation control of the control unit 150. By playing back various content
moving images and displaying them on the display 500, it becomes an attraction to attract the
target person located around the mobile unit 1 to the vicinity of the mobile unit 1.
03-05-2019
8
[0028]
On the front side of the head 2, a TR section 132 of the distance measuring section 130
described later and a part of the infrared ray device 400 are embedded via a pair of round
members. A CCD camera 125 is embedded in the lower center of the front of the head 2 and has
an imaging function of capturing an image in front of the moving body 1 by the CCD camera
125.
[0029]
Thus, the moving body 1 is movable by the moving unit 3, and the head 2 is configured to be
rotatable relative to the body 4. In addition, the moving body 1 can acquire image information of
a person / object in front, measure a distance to a person / object in front, and the like. The area
for image acquisition and distance measurement is rotated around the vertical direction central
axis (hereinafter also referred to as “central axis”) of the moving body 1 by the rotation of the
head 2 (the swing in the horizontal plane) by the rotation mechanism 5 Do. The “central axis”
is a vertical line passing through the center of the circle in the head 2 and the body 4 having a
circular shape in plan view.
[0030]
Further, the sound images emitted from the pair of speakers 201 a and 201 b are set such that
the audible area is located in front of the moving body 1. The audible area rotates about the
“central axis” in accordance with the rotational movement of the head 2. Further, the existing
position of the audible area moves in accordance with the movement of the moving unit 3 of the
moving body 1. Thus, since the location of the audible area moves horizontally or rotates around
the central axis, the location can be adjusted and controlled according to the situation.
[0031]
The infrared device 400 also detects the presence of a person. The detection area of a person by
the infrared device 400 also moves and rotates in accordance with the movement of the moving
body 1 and the rotation of the head 2. A detection area of a person by the infrared device 400, an
03-05-2019
9
area capable of measuring a distance by the distance measurement unit 130, an area capable of
acquiring an image by the CCD camera 125, etc. include sweet spots and are larger than this. It
has become.
[0032]
Moving Unit 3 FIG. 10 is a schematic plan view of the moving unit 3. The base 50 is a plate-like
member having a square shape in a plan view, and a motor 112a, a motor 112b, a motor 112c,
and a motor 112d are fixed in four directions in the front, rear, left, and right of the lower part.
An omni wheel 111a, an omni wheel 111b, an omni wheel 111c, and an omni wheel 111d are
rotatably provided on the rotation axes of the motor 112a, the motor 112b, the motor 112c, and
the motor 112d.
[0033]
The rotation shafts of the motors 112a, 112b, 112c and 112d are received by bearings (not
shown), and the bearings are supported on the lower part of the base 50 by a support member
(not shown) and the like. An electronic circuit 100 and a power supply 101 for supplying power
to the electronic circuit 100 are mounted and fixed on the upper surface of the base 50. A
regulator (not shown) is connected to the power supply 101, and a plurality of types of voltages
can be output to the terminals of the regulator, and an appropriate voltage extracted from the
terminals of the regulator for a required portion of the electronic circuit 100. Is supplied.
[0034]
The configuration for controlling movement of the moving unit 3 including the base 50 and the
like by the rotational driving of the four omni wheels 111a, 111b, 111c, and 111d is known, and,
for example, "Japanese Patent Application Laid-Open No. 2008-155652" It may be configured as
a reference. In the omni wheels 111a, 111b, 111c, and 111d, an appropriate number of roller
shafts are disposed around the periphery, and free rollers are rotatably provided on each roller
shaft. As a result, the omni wheels 111a, 111b, 111c, and 111d can rotate in the wheel
circumferential direction and the direction orthogonal thereto. Drive control for moving the
moving unit 3 in all directions by drive control of the motors 112a, 112b, 112c, and 112d will be
described later.
03-05-2019
10
[0035]
Electronic Circuit 100 FIG. 2 is a block diagram of the electronic circuit 100. As shown in FIG. 2,
the electronic circuit 100 includes an image processing unit 120, a distance measurement unit
130, a virtual sound source generation unit 200, an infrared ray device 400, a drive control unit
140, a voice synthesis unit 300, and a control unit. And 150.
[0036]
(Image Processing Unit 120) FIG. 6 is a block diagram of the image processing unit 120. As
shown in FIG. The image processing unit 120 is connected to a CCD camera 125 for capturing an
image of a predetermined area in front of the moving body 1, and executes image processing
using an imaging signal output from the CCD camera 125.
[0037]
The image processing unit 120 includes an A / D conversion unit 122, a frame buffer 124, and
an image processing engine 126. The A / D converter 122 analog-digital converts the imaging
information of a predetermined area in front of the CCD camera 125. If the CCD camera 125 has
a digital output terminal, the A / D converter 122 may not be provided.
[0038]
The frame buffer 124 has a plurality of planes 124a to 124n capable of storing imaging
information of one frame. The conversion results by the A / D conversion unit 122 are
sequentially stored in the frame buffer 124 in frame units. The frame buffer 124 stores the
conversion result by the A / D conversion unit 122 in a first-in first-out FIFO method.
[0039]
The image processing engine 126 performs processing for specifying a person located in a
03-05-2019
11
predetermined area in front while referring to the data of the plurality of planes stored in the
frame buffer 124. The “people” includes “listeners” located in sweet spots, “nonlisteners” not located in sweet spots, and the like. An example of processing performed by the
image processing engine 126 will be described later. The processing result of the image
processing unit 120 and the data used for the processing are sent to the determination unit 160
provided in the control unit 150.
[0040]
In order to identify a listener located in the sweet spot, human detection by the image processing
unit 120 described later, distance measurement by the distance measuring unit 130, human
detection by the infrared device 400, and the like are performed in combination. For example,
"person" is detected by image processing, the distance to the "person" is measured, and if the
measured distance is within a predetermined value, it is specified as the "person" located in the
sweet spot.
[0041]
On the other hand, even if the "person" is detected by the image, if the distance to the "person"
exceeds a predetermined value, it is determined that the "person" is not located in the sweet spot,
and the infrared device 400 Check. Note that this algorithm is an example, and any algorithm
that can detect that “a person specified or specified“ person ”is located in a sweet spot” can
be adopted.
[0042]
(Distance Measurement Unit 130) The distance measurement unit 130 outputs a wave such as an
electromagnetic wave or an ultrasonic wave forward, or receives an echo signal from the TR unit
132, and the echo signal received by the TR unit 132 is analog A / D converter 132 for digital
conversion is provided, and the distance to the listener, non-listener, fixed object, etc. in front is
measured. The distance measuring unit 130 outputs waves such as ultrasonic waves and
millimeter waves forward. At that time, the TR unit 102 (Transmit and Receive) outputs a wave to
the front of the mobile unit 1.
03-05-2019
12
[0043]
When a wave is output forward, a reflected wave (echo signal) from an object / person such as
“listener, non-listener, fixed object” etc. comes back, so the reflected wave is received by the
TR unit 132 . Based on the echo signal, the distance measuring unit 130 determines the distance
to the listener already in the sweet spot, or a non-listener who is not yet a listener, in other
words, not yet in the sweet spot. Measure the distance to
[0044]
FIG. 3 is a block diagram of the distance measuring unit 130. As shown in FIG. The distance
measuring unit 130 includes an oscillating unit 133, a transmission processing unit 134, a timer
136, a reception processing unit 138, an operation unit 139, and an A / D conversion unit 131.
The oscillating unit 133 is a wave source that outputs a wave. The transmission processing unit
134 repeats the wave continuously oscillated by the oscillating unit 133 for a predetermined
time, opens the gate, and outputs the wave in a pulse form.
[0045]
The reception processing unit 138 receives an echo signal which is a reflected wave from the
distance measurement direction in synchronization with the transmission processing unit 134
(reception echo). The arithmetic unit 139 multiplies the count value (T) of the timer 136 by the
wave velocity (C), and further multiplies the wave velocity (C) by 1⁄2 to obtain the distance (R)
(“2R = C · T”). In addition, if the carrier frequency of the echo signal is obtained by FFT
calculation and the Doppler shift of an object / person located in the distance measurement
direction is calculated, it is easy to determine whether or not a person is located in the distance
measurement direction. Become. Since human's Doppler shift is predicted, human beings are
located in the distance measurement direction by comparing the prepared experimental values
with the Doppler shift obtained by FFT calculation. It can be determined whether the
[0046]
Next, the operation of the distance measuring unit 130 will be described with reference to FIG.
The gated control by the transmission processing unit 134 transmits a pulse-like transmission
03-05-2019
13
signal (transmission pulse), sends a synchronization signal to the reception processing unit 138,
and gives a reset signal to the timer 136. The timer 136 receiving the reset signal resets its own
count area. The timer 136 measures time by a count operation that increments the set value of
the count area.
[0047]
The reception processing unit 138 having received the synchronization signal receives the echo
signal (reception echo) and provides the timer 136 with a stop signal. Thus, the timer 136 stops
measuring time. At this time, usually, a large number of echo signals are received from the
distance measurement direction. In this example, three echo signals "ech1", "ech2" and "ech3" are
reflected.
[0048]
Therefore, the reception processing unit 138 stops the counting of the timer 136 when the first
echo signal is received after receiving the synchronization signal. This makes it possible to
measure the distance to the closest person / object in the target direction. That is, only the main
echo signal is adopted, and the other sub echo signals are cut. This makes it possible to measure
the distance from the moving body 1 to the closest person / object. Whether this is a person or
not is determined in combination with image processing, infrared signal processing and the like.
In addition, since the distance distribution of the listenable area (meaning the two-dimensional
and three-dimensional distance distribution, the existence position of the listenable area, etc.) is
known in advance, it is determined whether a person or object is located within the listenable
area. It is possible to determine.
[0049]
In addition, if the area that can be measured by the distance measurement unit 130 is set to
include the sweet spot, the distance to the person / object located in the sweet spot can be
measured, and at a point outside the sweet spot It can also measure the distance to a person or
object located.
[0050]
03-05-2019
14
Further, as described above, it is also possible for the calculation unit 139 to determine the
frequency of the echo signal by FFT calculation or the like to determine whether it is a person or
not.
Since human beings usually appear to be stationary and have slight swaying movement, this
movement is reflected as Doppler shift to the person who is stationary. In this example, the
frequency “f1” of the echo signal ech1 is different from the transmission pulse frequency
“f0”, and “Δf = | f1−f0 | It is also possible to use this as a person or thing judgment
criterion. The stationary object is basically completely stationary, and it is sufficient to use the
fact that the Doppler shift is larger than that of a stationary person or the like generally having
stationary parts.
[0051]
In the case of using “ultrasound” as the wave, the oscillation unit 132 may be configured by a
piezoelectric element such as ceramic, and the TR unit 132 may be configured by a speaker and
a microphone. Further, in the case of using “millimeter wave” as the wave, the oscillation unit
132 may be configured by a GUNN oscillator (gun oscillator), and the TR unit 132 may be
configured by a small parabola antenna, a slot type multi antenna, or the like. When “near
infrared light” is used as a wave, the oscillator 132 can be configured by a semiconductor laser,
and the TR unit 132 can be configured by an optical element such as a lens.
[0052]
Alternatively, when measuring the distance, instead of directly measuring time, a phase
difference between the irradiation light and the reflected light may be detected, and the distance
may be determined by a known method based on the detected phase difference. The ranging
result by the ranging unit 130 is sent to the determination unit 160 included in the control unit
150.
[0053]
(Regarding “Range Image”) The transmission processing unit 134 described above outputs a
pulse wave, but may be configured to obtain distance information for each pixel by, for example,
03-05-2019
15
TOF method (Time Of Flight). When light (irradiated light) from a light emitting element is output
to the front space and reflected light from an object (including a person) is imaged on an imaging
element such as a CCD by an optical system, phase difference for each pixel, In other words,
since the distance information for each pixel is known, it is possible to grasp the object threedimensionally. According to this configuration, the light from the LED can be irradiated in a pulse
shape to detect the amount of Doppler shift of the object, and the movement of the object can be
estimated (see, for example, known documents such as TDK Techno Magazine No. 159).
[0054]
Therefore, by detecting the distance image, the shortest distance to the object can be known, and
if a three-dimensional template of the object is prepared, it can be detected whether the object is
a person or not. For this purpose, in the configuration of the distance measuring unit 130 in FIG.
3, “the transmission processing unit 134 outputs a wave in the form of CW (Continuous Wave:
continuous wave), making the timer 136 unnecessary, the reception processing unit 138 Is
configured to include a solid-state imaging device and a phase detection unit that obtains the
phase for each pixel, and changing the calculation unit 139 to a configuration that obtains the
distance of each pixel based on the phase. It can be realized.
[0055]
(Infrared Device 400) FIG. 7 is a block diagram of the infrared device 400. As shown in FIG. IR
(Infrared Ray) means "infrared ray". The infrared device 400 has an infrared optical system 410,
an IR filter 420, and an IR sensor 430. The infrared optical system 410 is configured by
combining infrared optical devices such as a concave lens capable of transmitting infrared light, a
convex lens capable of transmitting infrared light, and forms infrared light coming from a
predetermined area in front of the movable body 1 on the IR sensor 430 Optical system.
[0056]
The IR filter 420 is a device called a so-called “interference filter”, and in the present
embodiment, it is a sharp cut type filter that transmits 9 to 10 (μm) infrared rays and blocks
transmission of infrared rays of other wavelengths. is there. The IR sensor 430 outputs a voltage
according to the intensity of infrared light imaged on its light receiving surface. The output
voltage is sent to the control unit 150 (determination unit 160).
03-05-2019
16
[0057]
Infrared rays are emitted from objects (including living things) with an absolute temperature
above zero. According to the Wien's displacement law, assuming that the peak wavelength of the
infrared radiation is λp and the temperature is T (K), “λp × T = b (“ b ”is a constant: about
2.9 × 10 <-3>)” Therefore, the peak wavelength of infrared rays emitted by people is “9.0 to
10.0 (μm)”. Using this, only infrared light of the wavelength of “9.0 to 10.0 (μm)” is
transmitted by the IR filter 420, and an image is formed on the IR sensor 430, whereby a person
can be in a predetermined area ahead. It can be determined whether it is located.
[0058]
The infrared optical system 410 forms an infrared ray coming from a predetermined area in
front of the moving body 1 on the light receiving surface of the IR sensor 430. At this time, the
predetermined area is a sweet spot by devising a lens combination or the like. It is set to the
same or almost the same area as. Therefore, in accordance with the output signal from IR sensor
430, it can be determined whether or not a person is located in the sweet spot.
[0059]
FIG. 8 is an explanatory diagram of an output of the IR sensor 430. The vertical axis represents
the output voltage level of the IR sensor 430. Two threshold values (a first threshold value and a
second threshold value) are set for the output voltage. The first threshold is set to a level lower
than the second threshold. The “signal D1” is a “person non-detection signal” which is at a
level lower than the first threshold value and does not indicate that a person is detected.
[0060]
On the other hand, the "signal D2" is a "person detection signal" which is at a level exceeding the
first threshold and indicates that a person is detected. In addition, "signal D3" is at a level
exceeding both the first threshold and the second threshold, and indicates that a person is
detected, and also becomes a "person proximity signal" indicating that a person is extremely
03-05-2019
17
close. . The output voltage level of the IR sensor 430 becomes larger as the person gets closer,
until it saturates. For example, assuming that the longest distance of a sweet spot is Ls, “signal
D3” exceeding both thresholds is a signal meaning that a person is located within a distance of
half or less of Ls (Ls / 2 or less).
[0061]
Therefore, when the output level of IR sensor 430 exceeds the first threshold, determination unit
160 included in control unit 150 determines that a person is located in the sweet spot, and the
output level of IR sensor 430 is both If the threshold value is exceeded, it is determined that a
person is located in the sweet spot and the position is extremely close to the mobile body 1 (for
example, located within 2 to 3 (m)). Therefore, although it is possible to determine whether a
person is located in the sweet spot only by the infrared device 400, it is also possible to combine
information from the image processing unit 120, the distance measuring unit 130, etc. Improves
detection accuracy.
[0062]
In addition, if the head 2 is turned to the left and right while the position of the moving unit 3 is
fixed and the output of the infrared device 400 is checked, the direction in which a person exists
can be detected. For example, the head 2 is turned to the left and right, and the direction in
which the output of the infrared device 400 is “Max” is the direction in which “person” is
present, and in the straight line connecting the pair of speakers 201a and 201b. It passes
through the point and becomes the center direction which is the direction perpendicular to the
straight line. Thus, by rotating the head 2 left and right so that the output of the infrared device
400 is maximized, the "person" can be positioned toward the center.
[0063]
(Virtual Sound Source Generating Unit 200) The virtual sound source generating unit 200
generates and localizes a three-dimensional sound image. The D / A conversion unit 202 and the
pair of left and right speakers 201 a and 201 b are connected to the output stage. As shown in
FIG. 2, the sound image information generated and localized by the virtual sound source
generation unit 200 is subjected to digital-to-analog conversion by the D / A conversion unit 202,
and analog audio signals after conversion are output from the left and right speakers 201a and
03-05-2019
18
201b. .
[0064]
FIG. 5 is a block diagram of the virtual sound source generation unit 200. As shown in FIG. The
virtual sound source generation unit 200 includes a memory 205, a reproduction unit 215, a
sound image localization unit 220, and a crosstalk cancellation processing unit 230. In the
memory 205, one or more pieces of reproduction data (audio data) are recorded in advance. The
reproduction unit 215 reads and reproduces the reproduction data recorded in the memory 205
and sends the reproduction data to the sound image localization unit 220. The sound image
localization unit 220 performs sound image localization processing for localizing the sound
image using the sent reproduction data. The crosstalk cancellation processing unit 230 performs
crosstalk cancellation processing for removing crosstalk based on the sound image localization
information.
[0065]
According to the example shown in FIG. 5, reproduction data a to reproduction data n are
recorded in advance in the memory 205, but this is merely an example. Also, if two microphones
(not shown) are prepared, stereo sound is collected, and sound image localization processing and
crosstalk cancellation processing are performed on the collected stereo sound, real-time sound
image generation is performed. Becomes possible, and attraction is further improved.
[0066]
FIG. 9 is a block diagram of the sound image localization unit 220 and the crosstalk cancellation
processing unit 230. The virtual sound source generation unit 200 localizes the sound source in
a desired direction based on the input sound signal, and generates a sound image localization
unit 220 that generates a two-channel signal for the right speaker (X2) and a signal for the left
ear (X1). It is configured to include a crosstalk cancellation processing unit 230 for canceling
crosstalk, a right speaker 201a, and a left speaker 201b.
[0067]
03-05-2019
19
The sound image localization unit 220 is configured to include the filter 01 (210) and the filter
02 (211), and the outputs of both filters become a right speaker signal and a left speaker signal.
As for the transfer functions of the filter 01 (210) and the filter 02 (211), "head transfer
functions" for performing sound image localization in desired directions and distances are
measured and generated in advance, and they are incorporated. Then, when the filter 01 (210)
and the filter 02 (211) are configured by the FIR filter, the input audio signal is subjected to a
convolution operation with the filter coefficient, and the signal for the right speaker, the left
Generate a speaker signal. Thus, head-related transfer functions can be used to localize sound
images in desired directions and distances.
[0068]
Also, the filter 11 (220) and the filter 12 (225) are configured to receive supply of the right
speaker signal, and the output of the filter 12 (225) is multiplied by its coefficient value (α) by
the multiplier 260. The multiplication result is input to the adder 245. Similarly, filter 13 (226)
and filter 14 (235) are configured to receive the supply of the left speaker signal, and the output
of filter 13 (226) is multiplied by its coefficient value (α) in multiplier 270. And the
multiplication result is input to the adder 240.
[0069]
The adder 240 adds the multiplication result of the multiplier 270 and the output of the filter 11
(220) to generate a right channel output signal and supplies this to the right speaker 201a, while
the adder 245 multiplies the multiplier 260. And the output of the filter 14 (235) to generate a
left channel output signal, which is supplied to the left speaker 201b. Thus, the corresponding
sound is emitted from both speakers. Note that both multipliers 260 and 270 have coefficient
values of α, and if α is “0”, crosstalk cancellation is not performed, and if “1.0”, complete
crosstalk cancellation is performed. As it is performed, the multipliers 260 and 270 have the
function of adjusting the crosstalk cancellation amount.
[0070]
Next, the operation of the crosstalk cancellation processing unit 230 will be described. Transfer
functions of the filter 11 (220), the filter 12 (225), the filter 13 (226), and the filter 14 (235) are
03-05-2019
20
respectively "H11", "H12", "H21", and "H22". Further, when the signals x1 and x2 are input to the
crosstalk cancellation processing unit 230, the signal x1 is supplied to the filter 11 (220) and the
filter 12 (225) to perform filtering processing, and similarly, the signal x2 is , Filter 13 (226) and
filter 14 (235) to perform filtering processing. Further, the outputs of the filter 11 (220) and the
filter 13 (226) are added by the adder 240 to be the signal y1, and the signal y1 is supplied to
the right speaker 201a to emit the corresponding sound. Similarly, the outputs of the filter 12
(225) and the filter 14 (235) are added by the adder 245 to become the signal y2, and this signal
y2 is supplied to the left speaker 201b to emit the corresponding sound.
[0071]
The sound (signal) output from the right speaker 201a reaches the listener's left and right ears.
The transfer function from the right speaker 201a to the right ear of the listener and the transfer
function to the left ear are respectively G11 and G12, and similarly, the transfer function from
the left speaker 201b to the right ear of the listener and transfer to the left ear Let the functions
be G21 and G22 respectively. In this case, the relationship between x1 and x2 and z1 and z2 is
expressed by a matrix as shown by equation (1) in the upper part of FIG. That is, a 2-by-2 matrix
of transfer functions of the four filters 220, the filters 225, the filters 226, and the filter 235, and
a 2-by-2 matrix of transfer functions from the speakers 201a and 201b to the ears of the
listener. It will be expressed by multiplication with.
[0072]
And, the crosstalk cancellation means that “z1 = x1 (Expression 2)” and “z2 = x2 (Expression
3)”. Therefore, the transfer functions of the filter 220, the filter 225, the filter 226, and the
filter 235 of the crosstalk cancellation processing unit 230 are as shown by (Expression 4) in the
lower part of FIG.
[0073]
Here, referring to FIG. 9, it is assumed that only the signal x1 is input. The coefficient value of the
multiplier 260 is α. Substituting in Eq. 1 with x1 = 1 and x2 = 0, and substituting H11, H12,
H21, and H22 in Eq. 4 into Eq. 1 for expansion, reaches both ears of the listener The signals are
as shown in (Equation 5) and (Equation 6). “Z1 = (G11 · G22−αG21 · G22) / (G11 · G22−G12 ·
G21) (Equation 5),“ z2 = (G11 · G22−αG22 · G12) / (G11 · G22−G12 · G21) (Expression 6) ".
03-05-2019
21
[0074]
When the coefficient value α is “1.0”, z1 is approximated to 1 and z2 is approximated to 0,
and the input signal x1 reaches only the right ear, and (Equation 8) and (Equation 9) are
obtained. “Z1 = x1 (Expression 7)”, “z2 ≒ 0 (Expression 9)”. Similarly, in the filter 13 (226)
and the filter 14 (235), when the coefficient value α of the multiplier 270 is “1.0”, z2 is
approximated to “1”, z1 is approximated to “0”, and the left The input signal x2 reaches
only the ear, and "z2 = x2" and "z1 ≒ 0." As the value of the coefficient value α deviates from
“1.0”, the amount of crosstalk cancellation decreases and the crosstalk cancellation effect
becomes ineffective. Thus, it is possible to adjust the amount of crosstalk cancellation by
adjusting the coefficient values of both multipliers 260, 270.
[0075]
In this way, the virtual sound image generation unit 200 can be realized, but the configuration
example is not limited to this, and any acoustic device that localizes a sound image and removes
its crosstalk can be applied to the present invention. In addition, without mounting the sound
image localization unit 220 and the crosstalk cancellation processing unit 230, it is possible to
reproduce audio data that has already been subjected to the sound image localization processing
and the crosstalk cancellation processing.
[0076]
(Voice Synthesizer 300) The voice synthesizer 300 receives the control information from the
controller 150, and outputs the instructed voice. As shown in FIG. 2, the output from the voice
synthesis unit 300 is converted from digital to analog by the D / A conversion unit 310, and
output from the speakers 201a and 201b. It is, for example, a BGM that flows according to a
required message, the rotation of the head 2, the movement of the moving unit 3, etc. to be
speech-synthesized.
[0077]
03-05-2019
22
(Drive Control Unit 140) The drive control unit 140 controls driving of the five stepping motors
112a, 112b, 112c, 112d, and 114. Among them, the motor 114 is for rotating the head 2. With
reference to FIG. 11, forward and backward movement, leftward and rightward movement, and
turning operation of the moving unit 3 will be described. FIG. 11 is a schematic explanatory view
of the moving unit 3, and as shown in FIG. 11, "front, rear, left, and right" are set.
[0078]
As shown in FIG. 11A, when the motors 112b and 112d are driven so that the omni wheel 111b
and the omni wheel 111d rotate in the direction of arrow A, the moving unit 3 formed of the
base 50 and the like moves to the "front side" Do. At this time, the motors 112a and 112c for
driving the omni wheels 111a and 111c to rotate are not driven. The omni wheels 111a and
111c rotate forward because the free rollers are also rotated in the direction orthogonal to the
wheel outer peripheral direction.
[0079]
On the other hand, as shown in FIG. 11A, when the stepping motors 112b and 112d are driven to
rotate the omni wheel 111b and the omni wheel 111d in the direction of the dotted arrow B, the
moving unit 3 formed of the base 50 etc. Move to the back side. That is, the rotational drive
direction of the motors 112b and 112c in the case of the arrow A is reversely driven. Also at this
time, the motors 112a and 112c corresponding to the omni wheels 111a and 111c are not
driven. The omni wheels 111a and 111c rotate rearward as the free rollers are also rotated in the
direction orthogonal to the wheel outer peripheral direction. Thus, by driving the omni wheel
111b and the omni wheel 111d, the moving unit 3 can move in the front-rear direction.
[0080]
Similarly, as shown in FIG. 11B, when the motors 112a and 112c are driven to rotate the omni
wheel 111a and the omni wheel 111c in the direction of arrow C, the moving unit 3 formed of
the base 50 etc. Move to At this time, the motors 112b and 112d corresponding to the omni
wheels 111b and 111d are not driven. The omni wheels 111b and 111d rotate in the right
direction because the free rollers are also rotated in the direction orthogonal to the wheel outer
peripheral direction.
03-05-2019
23
[0081]
On the other hand, as shown in FIG. 11B, when the stepping motors 112a and 112c are driven to
rotate the omni wheel 111a and the omni wheel 111c in the direction of the dotted arrow D, the
moving unit 3 formed of the base 50 etc. Move to the left. That is, the rotational drive direction
of the motors 112a and 112c in the case of the arrow C is reversely driven. Also at this time, the
motors corresponding to the omni wheels 111b and 111d are not driven. The omni wheels 111b
and 111d rotate in the left direction because the free rollers are also rotated in the direction
orthogonal to the wheel outer peripheral direction. Thus, by driving the omni wheel 111a and
the omni wheel 111c, the moving unit 3 can move in the left and right direction.
[0082]
Here, according to the “Introduction to Measurement and Control Society Tohoku Section 268th
meeting (2011.11.26); Development of a small processing robot capable of moving in all
directions, Oyama et al., Nikkei”, the rotation angle of the stepping motor is Assuming that φ
(deg), wheel radius r (mm), moving distance d (mm), XY linear moving distance dx (mm), dy
(mm), “dx = 2πr · φ, dy = 2πr · φ , D = (dx <2> + dy <2>) <1/2> ”, the moving unit 3 can be
moved in an arbitrary direction by combining the movement in the X direction and the
movement in the Y direction.
[0083]
The combination of the front, rear, left, and right movement makes it possible to make a turning
movement, but an example of drive control focusing on turning will be described as follows.
As shown in FIG. 11C, when the stepping motors 112a, 112b, 111c, and 111d are driven to
rotate the omni wheel 111a, the omni wheel 111b, the omni wheel 111c, and the omni wheel
111d in the arrow E direction, the base The moving unit 3 formed of 50 mag "turns to the right".
[0084]
On the other hand, as shown in FIG. 11C, the omni wheel 111a, the omni wheel 111b, the omni
wheel 111c, and the omni wheel 111d are all reversely driven with respect to the case of arrow E
03-05-2019
24
and rotated in the dotted arrow F direction. When the stepping motors 112a, 112b, 111c, and
111d are driven, the moving unit 3 including the base 50 or the like "turns to the left". In any
case, the turning center is the center position in plan view of the omni wheels 111a to 111d.
Thus, by driving the omni wheel 111a, the omni wheel 111b, the omni wheel 111d, and the omni
wheel 111d, the moving unit 3 can turn left and right.
[0085]
As described above, the drive control of the motors 112a to 112d by the drive control unit 140
enables all-direction movement and all-direction rotation of the moving unit 3 ("central axis" is
the rotation center). Also, in the turning operation, if the relationship between the number of
drive pulses and the amount of turning is determined in advance, and the driving pulses for
turning the required amount are set in a table or the like, the control algorithm of the driving
system is simplified. .
[0086]
(Rotation Operation of Head 2) The drive control unit 140 rotates the head 2 about the “central
axis” of the moving body 1 by driving the motor 114 (see FIGS. 2 and 1). When the drive
control unit 140 performs the normal control of the motor 114, the head 2 rotates "clockwise" as
viewed from the top, centering on the central axis. On the other hand, when the drive control unit
140 reversely controls the motor 114, the head 2 is configured to be rotatable
"counterclockwise" when viewed from the top, centering on the central axis. Thus, the control
command sent by the drive control unit 140 to the motor 114 makes it possible to control the
rotational direction and the amount of rotation of the head 2.
[0087]
Further, as described above, the body 4 is pivoted about the "central axis" by the pivoting
operation of the movable body 3, and as a result, the head 2 is pivoted about the "central axis".
The positions of the speakers 201a and 201b can be rotated about the "central axis". Therefore,
instead of rotating the head 2 by the rotation mechanism 5 and adjusting the sweet spot to the
target person, the position of the sound image audible area can be detected by the left and right
turning operation of the movable body 3 It is also possible to adjust to suit.
03-05-2019
25
[0088]
However, when the moving body 1 is not equipped with a wheel for self-propelled movement,
etc., by using the turning mechanism 5 of the head 2, the turning operation of the head 2
(swinging operation in the horizontal plane) By performing the), it is possible to perform
adjustment control of the existing position of the sound image audible area adapted to the object
person.
[0089]
(Control Unit 150) The control unit 150 illustrated in FIG. 2 comprehensively performs the
operations of the image processing unit 120, the distance measurement unit 130, the drive
control unit 140, the virtual sound source generation unit 200, the infrared device 400, and the
voice synthesis unit 300. Control.
In addition, the control unit 150 includes a determination unit 160, and the determination unit
160 determines whether the listener is located in the sweet spot based on data from the distance
measurement unit 130, the image processing unit 120, and the infrared device 400. It has a
function to determine whether or not etc.
[0090]
The operations of the image processing unit 120, the distance measurement unit 130, the drive
control unit 140, the virtual sound source generation unit 200, the voice synthesis unit 300, and
the control unit 150 described above include a processor such as a CPU or DSP and a program. It
can be realized by the recorded ROM (recording medium) and the RAM in which the work area is
formed. The processor can realize each unit by reading a program recorded on a recording
medium and executing it while using a work area or the like.
[0091]
(Operation Example) (Face Detection Processing) The image processing unit 120 executes, for
example, the following “face detection processing”. The image processing engine 126
performs “(1) capture an image by the CCD camera 125”, “(2) clipping out a region likely to
03-05-2019
26
be a face from the captured image”, and “(3) template matching on the clipped region.
"Performed", "(4) judge the result of template matching and perform human detection". If a
template not only for the face but also for the face parts such as the ears and eyes is prepared as
a comparison target, the accuracy of face detection is improved. Thus, people can be identified.
[0092]
(Distance Measurement Process) In addition to the distance measurement by the distance
measurement unit 130 described above, it is also possible to measure the distance for each pixel.
For example, “(1) radiate infrared radiation from transmission processing unit 134”, “(2)
receive infrared radiation reflected by an object (including a person) by reception processing unit
138”, “(3) phase delay of received data "The arrival time is obtained from", "(4) The distance is
obtained from the arrival time, and the distance image is obtained", "(5) The distance of the pixel
corresponding to the face detected by the face detection process in the distance image And “(6)
obtaining the direction of the face corresponding to the camera from the pixels of the face
detected in the face detection process”, and the like.
[0093]
The configuration of an apparatus for detecting a face from an acquired image and an apparatus
for obtaining a distance image are known, and therefore, the detailed apparatus configuration
will not be described. If planar and three-dimensional sizes of the distance measurementable
area, the image processing area, and the infrared detection area are set approximately the same
even without obtaining the distance image, “temporarily specify” a person in the face image
detection process, When "temporary identification" is changed to "this identification" by the
output of the infrared device 400, and the person is positioned in the sweet spot by the first echo
signal or Doppler shift obtained by the distance measuring unit 130, the distance to the person is
known . The direction is also known from the output of the infrared device 400. Thus, the
accuracy of human detection, distance detection, etc. is improved by reconfirming the
measurement in one method by the measurement in another method.
[0094]
As described above, “face detection”, “distance measurement to detected face”, and “head
rotation angle to detected face in horizontal plane” can be obtained. Further, the detection and
03-05-2019
27
identification as described above are configured to be performed by, for example, the
determination unit 160.
[0095]
(Basic Process Flow Chart) (Operation Example 1) FIG. 12 is an explanatory diagram of a basic
operation. First, the “operation example 1” will be described with reference to FIG. 12 (a). In
step S1200, the control unit 150 activates the image processing unit 120. The image processing
unit 120 tries to detect a person (target person) by the face detection process as described
above. When a subject is detected, next, in step S1205, the control unit 150 drives the voice
synthesis unit 300 to output a message prompting the subject to approach the moving object 1.
At this time, the display 500 displays the message.
[0096]
Next, in step S1210, the control unit 150 activates the distance measuring unit 130. The distance
measuring unit 130 measures the distance to the subject. Next, in step S1215, the control unit
150 controls the drive control unit 140. In response to this, the drive control unit 140
rotationally drives the motor 114.
[0097]
The rotational drive of the motor 114 causes the head 2 to rotate relative to the body 4. For
example, the head 2 is directed in the direction in which the output of the infrared device 400 is
maximized. Thus, the target person passes through the middle point of the straight line
connecting the left and right speakers 201a and 201b, and is positioned in the "center direction",
which is a direction orthogonal to the straight line. Then, in step 1220, the control unit 150
activates the virtual sound source generation unit 200. Since the sound signal from the virtual
sound source generation unit 200 is emitted from the pair of left and right speakers 201a and
201b, it is possible to cause the target person to listen to the sound image.
[0098]
03-05-2019
28
Thus, the subject can be positioned within the sweet spot. Moreover, since the target person can
be positioned in the center direction, the sound image can be effectively listened to.
[0099]
FIG. 13A is an explanatory view for explaining this operation in plan view. “M” indicates the
subject, “R” indicates the moving object 1, “C” indicates the center line, and “Slanting”
indicates a sweet spot. Initially, the subject is not located in the sweet spot, but the subject can be
located in the sweet spot and made to be a "listener" by the rotational movement of the head 2
(see symbol "P"). Moreover, the subject is located on the "center line". The “center line” is a
straight line that passes through the middle point of the straight line connecting the left and right
speakers 201 a and 201 b and is in the direction orthogonal to the straight line.
[0100]
In order to position the target person on the center line, for example, the head 2 is rotated
horizontally so that the center of the detected face is captured on the center line in the horizontal
direction in the acquired image by the CCD camera 125 It can also be realized by In this example,
the detection area of the infrared device 400, the distance measurement area of the distance
measurement unit 130, the image detection area of the image processing unit 120, and the like
are set wider than the sweet spot both in the horizontal direction and in the vertical direction. .
[0101]
Operation Example 2 Next, “Operation Example 2” will be described for the operation of FIG.
In step S1250, the control unit 150 activates the image processing unit 120. The image
processing unit 120 tries to detect a person (target person). If a target person is detected, then in
step S1255, control unit 150 activates distance measuring unit 130. The distance measuring unit
130 measures the distance to the subject.
[0102]
Next, in step S1260, control unit 150 controls drive control unit 140 to control motors 112a to
03-05-2019
29
112b. Specifically, the control unit 150 controls the drive control unit 140, and when the
distance to the target person is R (m), the control unit 150 causes the vehicle to go straight to "R1 (m)". That is, the mobile unit 1 is moved to the front of the target person.
[0103]
Next, in step S1265, the control unit 150 controls the drive control unit 140 to rotate the head 2
so that the subject is positioned on the center line. Then, in step 1270, the control unit 150
activates the virtual sound source generation unit 200. Since the sound signal from the virtual
sound source generation unit 200 is emitted from the pair of left and right speakers 201a and
201b, it is possible to cause the target person to listen to the sound image. Thus, the subject can
be positioned within the sweet spot.
[0104]
FIG.13 (b), FIG.13 (c), FIG.13 (d) is explanatory drawing for demonstrating this operation |
movement in planar view. At first, although the subject is not located in the sweet spot (see FIG.
13 (b)), the mobile unit 1 approaches to the near side of the subject by the movement of the
moving unit 3 (refer to code “Q”) (see FIG. 13 (c)).
[0105]
Then, the target person can be positioned in the sweet spot by the rotational movement of the
head 2 (see the symbol “S”), and can be made a “listener” (see FIG. 13D). Moreover, the
subject is located on the "center line". Also in this case, the detection area of the infrared device
400, the distance measurement area of the distance measurement unit 130, the image detection
area of the image processing unit 120, etc. are set wider than the sweet spot in both horizontal
and vertical directions. There is.
[0106]
Operation Example 3 Next, “Operation Example 3” in FIG. 12C will be described. In the
operation example 3, the virtual sound source generation unit 200 is always activated. Then, in
03-05-2019
30
the case where the listener is positioned in the sweet spot of the reproduced sound image during
the sound image reproduction, this operation example assumes that the listener moves slightly.
First, in step 1280, the control unit 150 drives the image processing unit 120. The image
processing unit 120 tries to detect a person (target person). If face detection is successful, in step
1285, follow-up control is performed to follow the subject.
[0107]
As a specific example of the follow-up control, the control unit 150 controls the drive control unit
140 so that the center position of the detection contour of the subject by the image processing
unit 120 is on the center line in the horizontal direction of the acquired image. The part 2 is
pivoted in the left and right direction. Further, the control unit 150 may control the drive control
unit 140 to rotate the head 2 in the left-right direction so that the output from the infrared ray
device 400 is always maximized. By performing such follow-up control, even if the listener
slightly moves during sound image reproduction, the listener can always be positioned in the
sweet spot only by, for example, the pivoting operation of the head 2. In addition, it is preferable
to have a large number of people listen to the sound image by performing another operation
after following a predetermined time.
[0108]
As described above, by combining “operation example 1”, “operation example 2”, and
“operation example 3”, the control unit 150 determines the existence position of the audible
area of the sound image emitted from the speakers 201a and 201b. Since the adjustment control
is performed according to the target person, for example, the person who has already become the
listener can effectively hear the sound image, and the person who is not yet the listener is made
the listener It is also possible to make effective use of the sound image.
[0109]
That is, since the drive control unit 140 (rotational drive control unit) controls driving of the
head 2 (rotational portion) to which the speakers 201a and 201b are fixed, the center of the left
and right pair of speakers 201a and 201b is controlled. By positioning the listener on the line, it
is possible to effectively listen to the sound image.
[0110]
03-05-2019
31
Further, since the drive control unit 140 (movement drive control unit) controls the movement
operation of the moving unit 3 on which the head 2 and the body unit 4 are mounted, a person
who is not located in a listenable area is captured as a listener. It becomes possible to make
effective use of the sound image.
Further, the control unit 150 further includes a determination unit 160 that determines whether
the listener is positioned within the audible area, and the control unit 150 adjusts and controls
the existing position of the audible area based on the determination result of the determination
unit 160. .
That is, the moving object 1 side actively moves to position a person located outside the
listenable area in the area, thereby enabling effective use of the reproduced sound image, for
example, a theme park etc. You can get attraction effects by performing demonstrations in.
[0111]
When combining the operation example 1, the operation example 2 and the operation example 3,
the wait state can be made for a predetermined time when shifting to each operation, or the
existing position of the first moving body 1 when shifting to each operation It is also possible to
reverse the rotation of the head 2 and to return it as well.
[0112]
(Virtual Sound Source Generating Unit 203) FIG. 15 is a configuration example of the virtual
sound source generating unit 203.
The virtual sound source generation unit 203 more specifically configures the virtual sound
source generation unit 200. The crosstalk cancellation processing unit 230 of the virtual sound
source generation unit 203 has the same configuration as that shown in FIG. The virtual sound
source generation unit 203 mainly “(1) the Doppler signal processing unit 280 adds the
Doppler effect to the sound signal reproduced by the reproduction unit, and the attenuation
processing unit 290 performs attenuation processing according to the distance” “(2)
convolution operation unit 221 convolutes head related transfer function according to position
coordinates”, “(3) crosstalk cancellation processing unit 230 is based on the transfer function
between“ speaker position-listener ” Execute the process of “eliminating crosstalk”.
03-05-2019
32
[0113]
FIG. 16 is an explanatory diagram of the operation of the virtual sound source generation device
203 and the like, where “R” indicates the moving object 1 and “H” indicates the listener.
Further, a polar coordinate system (r, θ, φ) and an orthogonal coordinate system (x, y, z) as
shown in FIG. 16 are set with the central portion of the plane of the listener H as the coordinate
origin. Although FIG. 16 does not show “φ” which is an elevation angle or a depression angle,
it is possible to localize the virtual sound image at an arbitrary position in polar coordinates (r,
θ, φ). The symbol P indicates the sound image localization position at polar coordinates (r, θ,
φ) (however, “φ” is not shown).
[0114]
(Doppler Signal Processor 280) The Doppler effect occurs when the distance between the virtual
sound source and the listener changes (in other words, when there is a relative velocity change
between the two). The Doppler frequency “f” can be expressed as “f = f0 · V / (V + Vr)”,
where the relative velocity is “Vr”. Here, “V” is the sound speed, and “f0” is the frequency
of the sound of the audio data. "Vr" has a "negative" value when the two are close to each other,
and has a "positive" value when the two are away from each other. The Doppler signal processing
unit 290 applies a pitch shift corresponding to the Doppler frequency to the frequency of the
reproduction sound by the reproduction unit 215 in accordance with the relative velocity
between the virtual sound source and the subject. As a result, the virtual sound source sounds to
move with speed to the listener, and the sense of realism is improved.
[0115]
Also, instead of applying the pitch shift, it is also possible to implement applying the Doppler
effect to the original voice by giving a “space propagation delay” according to the distance
between the virtual sound source and the subject. That is, since the distance between the virtual
sound source and the listener changes when the relative velocity occurs between the virtual
sound source and the subject, the change of the “space propagation delay” is also used. For
example, by performing processing to assign a delay time corresponding to “delay” between
the virtual sound source and the listener to the audio data, it sounds realistic as if the virtual
sound source moves with speed.
03-05-2019
33
[0116]
(Attenuation Processing Unit 290) The attenuation processing unit 290 calculates a "distance
attenuation coefficient" based on the distance between the virtual sound source and the listener.
For example, in the case of a point sound source, the distance attenuation coefficient is
determined by “A = 20 × log 10 · (r / r0)”, where “A” is an attenuation amount. As shown in
FIG. 16, “r” is a distance from the sound image localization position to the listener H, and “r
0” is a reference distance where an attenuation amount is 0 (db). The attenuation processing
unit 290 multiplies the output signal from the Doppler signal processing unit 280 by the
obtained distance attenuation coefficient.
[0117]
(Convolution Operation Unit 295) The convolution operation unit 295 performs a convolution
operation on the output signal of the attenuation processing unit 290 by reading out a “head
transfer function” set in advance. "Head-related transfer function (HRTF)" is a transfer function
between the virtual sound source and the listener's both ears. The left ear head transfer function
296 is for convolution operation on the left channel signal, and the right ear head transfer
function 297 is for convolution operation on the right channel signal. Both head transfer
functions 296 and 297 change as the sound image localization position changes. As shown in
FIG. 16, if “θ” is set, for example, head-to-head transfer functions 296 and 297 are set in
advance every “one degree” from “0 degree” to “359 degrees”, and necessary ones are
set. It may be configured to perform a read convolution operation.
[0118]
Further, in the present embodiment, Doppler signal processing and attenuation processing are
separately performed in order to also express the distance by the HRTF at a fixed distance from
the head 2. That is, in the present embodiment, in the case where the head transfer functions
296 and 297 are the same “θ”, it is noted that the head transfer functions are the same even
if the value of “r” is infinite. Use
[0119]
03-05-2019
34
Then, as shown by the “dotted line” in FIG. 16, the crosstalk cancellation processing unit 230
passes the path from the speaker 201a of the moving object 1 to the right ear of the listener H
and the path from the speaker 201b to the left ear of the listener And (see dotted line in FIG. 16).
As described above, in FIG. 16, although “φ” which is an elevation angle or a depression angle
is not illustrated, it is possible to localize the virtual sound image position to any position in polar
coordinates (r, θ, φ) is there. That is, the point P can be placed at an arbitrary position.
[0120]
As described above, according to the sound image generation device 203 shown in FIG. 15, the
Doppler signal processing unit 280 expresses the movement of the sound image and the
attenuation processing unit 290 expresses the attenuation of the sound signal according to the
distance. It is possible to make the listener more realistically listen to the sound image as if there
is a sound source under
[0121]
Also, for example, without incorporating the sound image localization unit 220 and the crosstalk
cancellation processing unit 230 in the virtual sound source generation unit 200, it is possible to
reproduce a virtual sound image by reproducing sound data subjected to sound image
localization processing and crosstalk cancellation processing. It is also possible to create a
configuration.
[0122]
(Other Embodiments) When the listener H exists in the front direction of the movable body 1
provided with the speakers 201a and 201b on the left and right of the front view or plan view of
the head 2, the head 2 is rotated by effect or the like Also, since the sweet spot (area where the
sound image can be heard) is separated from the listener H, the sound effect is reduced.
When the head 2 is rotated, the crosstalk cancellation corresponding to the spatial transfer
function between the speaker and the listener H based on the positions of the speakers 201a and
201b and the position of the listener H is basically performed. Although the configuration of the
processing unit 230 has to be adopted, adopting a new configuration is troublesome.
[0123]
03-05-2019
35
Therefore, in the present embodiment, with the configuration of the crosstalk cancellation
processing unit 230 as it is, “delay amount” and “gain” with respect to the output signal of
the crosstalk cancellation processing unit 230 according to the degree of rotation of the head 2
By controlling the amount, it is characterized in that the sound effect is maintained.
This confirms that the rotation angle is particularly effective for the rotation operation up to
about 25 (degrees) left and right.
[0124]
In addition, it is particularly effective when rotating the head 2 of the moving body 1, rotating
the body 4 or moving with the arms provided to simulate “breathing” all the time.
[0125]
(Schematic explanatory drawing) FIG. 17 is a schematic explanatory drawing of the principle of
this embodiment.
At the bottom of FIG. 17, the listener H is present looking at the front of the mobile unit 1.
Moreover, the state which planarly viewed only the head 2 of the mobile body 1 is shown on the
upper side of FIG. 17, and speaker 201a, 201b is provided in the right and left of the head 2. FIG.
The front direction of the mobile unit 1 is facing the front direction of the listener H and faces
the listener H. In FIG. 17, the speakers 201a and 201b are illustrated as black circles for the sake
of convenience.
[0126]
In addition, a two-dimensional coordinate system is set by passing the rotation center O of the
head 2 and setting the left and right direction in the drawing as the x axis and the front and rear
direction in the drawing as the y axis. Furthermore, the direction in which the head 2 looks at the
front of the listener H is referred to as “reference direction”, and the rotation angle clockwise
from the “reference direction” is referred to as “θ”. The “clockwise” direction from the
reference direction is “positive”, and the counterclockwise direction from the reference
03-05-2019
36
direction is “negative”.
[0127]
In the reference direction of the head 2, the sweet spot SS is directed to the front of the listener
H, and the listener H is located in the sweet spot SS, so that the above-described acoustic effect
can be obtained. The sweet spots SS are schematically illustrated, and even if they are too close
to each other, no acoustic effect can be obtained.
[0128]
In FIG. 17, it is assumed that the head 2 rotates clockwise around the rotation center O. From the
symmetry of the planar view of the head 2, the argument considered clockwise can be applied
counterclockwise as well. Now, it is assumed that the head 2 rotates clockwise by “θ (degree)”
around the rotation center O.
[0129]
At this time, “2 · Δx” shown in FIG. 17 is the distance difference between the two speakers
201a and 201b in the “x” direction, and “2 · Δy” is the distance difference between the two
speakers 201a and 201b in the “y” direction. It is.
[0130]
FIG. 18A is a schematic explanatory view in which a portion of the head 2 and a pair of left and
right speakers (speakers 201a and 201b in FIG. 17 shown by black circles) provided on the head
2 are enlarged.
When the head 2 rotates clockwise by θ degrees, the left speaker (201b in FIG. 17) moves from
point al1 to point al2. At this time, it moves from point al1 to point bl (Δx) in the left-right
direction (also referred to as “lateral direction”), and from point bl to point al2 (Δy) in the
front-back direction (also described as “longitudinal direction”) Moving.
03-05-2019
37
[0131]
Similarly, the right speaker (201a in FIG. 17) moves from point ar1 to point ar2, and in this case
moves from point ar1 to point br (Δx) in the left-right direction (horizontal direction) Direction),
it moves from point br to point ar2 (Δy).
[0132]
Here, assuming that half of the distance between the left and right speakers is “r”, “Δx = r ·
(1−cos (θ (deg) · π / 180))”, “Δy = r × sin (θ (deg)・ It will be π / 180).
In the horizontal direction and in the vertical direction, the difference in distance between the
two speakers is doubled (2 · Δx, 2 · Δy).
[0133]
Now, assuming that the distance between both speakers is SPI (Speaker Interval), "2..DELTA.x"
and "2..DELTA.y" are "2..DELTA.x = (SPI / 2) .times. (1-cos (.theta. (Rad)))" × 2 = SPI × (1−cos
(θ (deg) · π / 180)), “2 · Δy = (SPI / 2) × sin (θ (rad)) × 2 = SPI × sin (θ (deg)・) It will be
π / 180).
[0134]
Here, in FIG. 18B, the distance difference between the left and right direction (horizontal
direction) and the front and rear direction (longitudinal direction) of both speakers with respect
to the rotation angle θ of the head 2 when SPI is “20 (cm)” Change (change distance).
The “solid line” indicates a change in distance in the lateral direction, and the “dotted line”
indicates a change in distance in the vertical direction.
[0135]
When the rotation angle θ is changed from -30 (degrees) to +30 (degrees), the change in
distance in the vertical direction changes substantially linearly from -5 (cm) to +5 (cm). On the
03-05-2019
38
other hand, the lateral distance change is at most +1.2 (cm). Thus, the change in distance
between both speakers in the lateral direction is extremely small with respect to the change in
distance between the speakers in the vertical direction.
[0136]
FIG. 19A is a schematic explanatory view of the definition of “speaker angle” viewed from the
listener H. FIG. As shown in FIG. 19A, a straight line connecting the center of the listener H and
the speaker position accompanying the change of θ is a straight line connecting the center of
the listener H and the speaker position at θ = 0 (degree) It is shown that the angle made by is
determined as “the speaker angle as viewed from the listener H”.
[0137]
FIG. 19B is a diagram showing the relationship between the speaker angle and θ when the
distance between the speaker and the listener H is 80 (cm). As can be understood with reference
to FIG. 19 (b), when θ is 25 (degrees) or less, the “speaker angle” is less than 1 (degrees) and
extremely small. The change in transfer function is negligibly small except for the change in
arrival time of the sound image.
[0138]
On the other hand, the change in the arrival time of the sound image to the listener H, which
occurs from the difference in distance between the left and right speakers 201a and 201b in the
vertical direction caused by the rotation of the head 2, can not be ignored. From the above, since
Δx is extremely smaller than Δy, Δx is regarded as an error, and only the influence of Δy is
considered.
[0139]
FIG. 20 shows a change in arrival time difference of sound images from both speakers to the
listener H with respect to the rotation angle θ of the head 2. As the θ increases, the arrival time
difference increases linearly. For example, in the case of θ = 25 (degrees), since the vertical
03-05-2019
39
direction left and right speaker distance difference Δy is “4.22 (cm),” the arrival time
difference is “0.248 (ms)”.
[0140]
The arrival time difference that changes according to the change in θ is corrected by the delay
unit. For example, when the sampling frequency is “48 (kHz)”, “0.00248 (sec) 1/48000) =
11.904 ≒ 12”. Referring to FIG. 17 and FIG. 18 etc., in this case, since the head 2 rotates
clockwise, the left speaker 201b moves away from the listening difference H and the right
speaker 201a approaches from the listener H. The delay should be 12 samples.
[0141]
Therefore, by delaying the sound signal from the left speaker 201b by 12 samples with respect
to the sound signal from the right speaker 201a, even if the listener H is separated from the
sweet spot SS, a sound effect having a three-dimensional effect can be obtained. be able to. Even
if the sweet spot SS deviates from the listener H, since it can be approximated to θ = 0 (degrees),
it is possible to obtain an acoustic effect as if the sweet spot SS is not deviated.
[0142]
Thus, as shown in FIG. 17, even if the sweet spot SS rotates in the direction L from the reference
direction and is separated from the listener H, the listener H obtains an acoustic effect with a
three-dimensional effect. Can.
[0143]
The above has described the case where the head 2 rotates clockwise from the reference
direction. However, even when the head 2 rotates counterclockwise from the reference direction
(when “θ” is negative), Similarly, it is proposed to correct the distance difference (arrival time
difference) between the two speakers.
When “θ” is negative, for example, the sweet spot SS rotates counterclockwise as shown by a
symbol R in FIG. 17 to remove the listener H from the sweet spot SS, but when “θ” is positive.
03-05-2019
40
Similarly, the difference in distance between the two speakers (the arrival time difference) may
be taken into consideration.
[0144]
Summarizing the above, when the head 2 carrying the pair of left and right speakers is rotated,
the distance between the listener H and both speakers changes, so if it is the case, the spatial
transfer function after change ( It is necessary to obtain the configuration of the crosstalk
cancellation processing unit 230 according to both speakers and the listener H). In addition, if
the rotation angle of the head 2 from the reference direction (front direction of the listener H) is
particularly about “± 25 (degrees)”, the difference in distance between the two speakers
(arrival time difference seen from the listener H) ) Will be larger.
[0145]
Therefore, the crosstalk cancellation processing unit 230 adopts the one corresponding to the
reference direction (the direction in which the head 2 faces the front of the listener H), and for
the output signal from the adopted crosstalk cancellation processing unit 230, By performing the
distance correction by delay and the volume correction by gain according to the rotation angle of
the head 2, even if the sweet spot SS is separated from the listener H by the rotation of the head
2, the rotation of the head 2 The sound effect can be maintained for the listener H because it can
be approximated to the state of θ = 0 (degrees) without.
[0146]
Here, what performs gain control is to consider the attenuation of the sound wave according to
the distance difference.
As an example, similarly to the attenuation processing unit 290 described above, a “distance
attenuation coefficient” may be calculated according to the distance difference between the two
speakers 201 a and 201 b and multiplied by this. As an example, in the case of a point sound
source, the distance attenuation coefficient is determined by “A = 20 · log 10 · (r / r0)”, where
“A” is an attenuation amount. “R” is the difference in distance between the two speakers
201a and 201b, and “r0” is the “reference distance” for which the amount of attenuation is
0 (db). The distance attenuation coefficient of the sound wave can be determined by substituting
2 · Δx ”into A. In practice, the gain A is obtained by measurement so as to reduce the difference
03-05-2019
41
between the left and right volume of about 1 to 4 (kHz) at which the crosstalk cancellation effect
can be easily perceived.
[0147]
As an example, when θ is positive, the left speaker moves away, so the gain is “20 · log10 · ((r0
+ Δy) / r0) (dB)”, and the right speaker approaches because “20 · log10 · ((r0−Δy ) / R0) (dB)
”.
[0148]
(Basic Configuration: Virtual Sound Source Generating Unit 204) The virtual sound source
generating unit 204 has a function of generating and localizing a three-dimensional sound image.
The D / A conversion unit 202 and the pair of left and right speakers 201 a and 201 b are
connected to the output stage.
[0149]
FIG. 21 is a block diagram of the virtual sound source generation unit 204 of the present
embodiment. The virtual sound source generation unit 204 includes a memory 205, a
reproduction unit 215, a sound image localization unit 220, a crosstalk cancellation processing
unit 230, and a correction unit 250. In the memory 205, one or more reproduction data are
recorded in advance.
[0150]
The reproduction unit 215 reads and reproduces the reproduction data recorded in the memory
205 and sends the reproduction data to the sound image localization unit 220. The sound image
localization unit 220 performs sound image localization processing for localizing the sound
image using the sent reproduction data. The crosstalk cancellation processing unit 230 performs
crosstalk cancellation processing for removing crosstalk based on the sound image localization
information.
03-05-2019
42
[0151]
The correction unit 250 includes delay units 252 and 254, and gain units 256 and 258. The
delay unit 252 gives a delay according to the rotation of the head 2 to the right speaker signal
output from the crosstalk cancellation processing unit 230. On the other hand, the delay unit
252 gives a delay according to the rotation of the head 2 to the left ear signal output from the
crosstalk cancellation processing unit 230. The gain unit 256 performs gain adjustment on the
output signal of the delay unit 252. On the other hand, the gain unit 258 performs gain
adjustment on the output signal of the delay unit 254.
[0152]
The operations of delay unit 252, delay unit 254, gain unit 256, and gain unit 158 are
determined according to a control command from control unit 150. The control unit 150
calculates the arrival time difference with respect to the rotation angle θ with reference to the
relationship shown in FIG. When θ is positive, the right speaker 201a is closer to the listener H.
Therefore, the delay unit 252 for the right signal is instructed to be delayed by the arrival time
difference. Thereby, the delay unit 252 delays the input signal to itself by the arrival time
difference instructed by the control unit 100. As a result, the arrival times of the left and right
sound images are aligned.
[0153]
On the other hand, when θ is negative, the left speaker 201b approaches the listener H, and
therefore, the delay unit 254 for the left signal is instructed to be delayed by the arrival time
difference. Thereby, the delay unit 254 delays the input signal to itself by the arrival time
difference instructed by the control unit 100. As a result, the arrival times of the left and right
sound images are aligned.
[0154]
In addition, when θ is positive, the control unit 150 causes the gain unit 256 to which the
03-05-2019
43
output signal from the delay unit 252 is input to have a gain coefficient A (A denotes the first)
according to the difference in distance corresponding to the rotation angle. Give instructions to
multiply by the described attenuation). Specifically, when θ is positive, the right speaker
approaches, so the gain unit 256 multiplies “20 · log 10 · ((r0−Δy) / r0) (dB)” as a gain. On
the other hand, when θ is positive, the gain unit 258 moves away from the left speaker, and
therefore multiplies the gain by “20 · log10 · ((r0 + Δy) / r0) (dB)”. Thus, the volume
correction is performed. The gain coefficient corresponding to the distance difference
corresponding to the rotation angle is a coefficient determined by the distance difference
generated by the rotation of the rotation angle θ of the head 2 and may be defined as, for
example, a logarithmic function of the distance difference.
[0155]
Similarly, when θ is negative, an instruction is given to the gain unit 258 to which the output
signal from the delay unit 254 is input to be multiplied by the gain coefficient corresponding to
the distance difference corresponding to the arrival time difference. Thereby, the gain unit 258
multiplies the input signal to itself by the gain coefficient instructed by the control unit 100 to
perform volume correction.
[0156]
As described above, the signal of the sound image is corrected by the correction unit 250, and
the sound image emitted from both the speakers 201a and 201b produces an acoustic effect by
the listener H. This is because even if the head 2 rotates the rotation angle θ (θ ≠ 0), the
rotation angle of the head 2 can be approximated to the state of “θ = 0 (degree)”, so This is
because even if the spot SS deviates from the listener H, it is possible to maintain a state that is
not approximately deviated.
[0157]
Note that a control unit unique to the virtual sound source generation unit may be provided
inside the virtual sound source generation unit 204, and in this case, the reference direction of
the head 2 from the control unit 150 and / or the drive control unit 140 or the like. It is
sufficient to be configured to obtain information etc. on the turning angle from the point.
[0158]
03-05-2019
44
(Specific Configuration) FIG. 22 is a configuration diagram of the virtual sound source generation
unit 204. As shown in FIG.
The virtual sound source generation unit 204 localizes the sound source in a desired direction
based on the input sound signal, and generates a sound signal localization unit 220 that
generates a two-channel right speaker signal (X2) and a left speaker signal (X1). And a crosstalk
cancellation processing unit 230 for canceling crosstalk, a correction unit 250, and left and right
speakers 201a and 201b.
[0159]
The sound image localization unit 220 is configured to include the filter 01 (210) and the filter
02 (211), and the outputs of both filters become a right speaker signal and a left speaker signal.
As for the transfer functions of the filter 01 (210) and the filter 02 (211), "head transfer
functions" for performing sound image localization in desired directions and distances are
measured and generated in advance, and they are incorporated.
[0160]
When the filter 01 (210) and the filter 02 (211) are composed of FIR filters, convolution
operation with the filter coefficient is performed on the input input audio signal, and the signal
for the right speaker, for the left speaker Generate a signal.
[0161]
In the crosstalk cancellation processing unit 230, the filter 11 (220) and the filter 12 (225) are
configured to receive the supply of the right speaker signal.
The output of the filter 12 (225) is multiplied by the coefficient value (α) by the multiplier 260,
and the multiplication result is input to the adder 245. Similarly, the filter 13 (226) and the filter
14 (235) are configured to receive the supply of the left speaker signal. The output of the filter
13 (226) is multiplied by the coefficient value (α) by the multiplier 270, and the multiplication
result is input to the adder 240.
03-05-2019
45
[0162]
The adder 240 adds the multiplication result of the multiplier 270 and the output of the filter 11
(220) to generate a right channel output signal, and supplies this to the correction unit 250. The
delay unit 252 delays the generated right channel output signal, and the gain unit 256 multiplies
the signal delayed by the delay unit 252 by a coefficient. The gain unit 256 supplies the output
to the right speaker 201a.
[0163]
On the other hand, the adder 245 adds the multiplication result of the multiplier 260 and the
output of the filter 14 (235) to generate a left channel output signal, and supplies this to the
correction unit 250. The delay unit 254 delays the generated left channel output signal, and the
gain unit 258 multiplies the signal delayed by the delay unit 254 by a coefficient. The gain unit
258 supplies the output to the left speaker 201b. Thus, desired sound is emitted from both the
speakers 201a and 201b.
[0164]
As described above, in the multiplier 260 and the multiplier 270, when the coefficient values are
both α and α is “0”, crosstalk cancellation is not performed, and when α is “1.0”, perfect.
Since crosstalk cancellation is performed, the multipliers 260 and 270 have a function of
adjusting the crosstalk cancellation amount.
[0165]
In this way, the virtual sound image generation unit 204 can be realized, but the configuration
example is not limited to this, and any acoustic device that localizes a sound image and removes
its crosstalk is applicable to the present invention .
In addition, without mounting the sound image localization unit 220 and the crosstalk
cancellation processing unit 230, it is also possible to reproduce the audio data on which the
sound image localization processing and the crosstalk cancellation processing have already been
03-05-2019
46
completed.
[0166]
(Specific Example of Correction Unit) FIG. 23 is a specific configuration diagram of the correction
unit 250. FIG. 23 is a configuration example of the correction unit 250 for the right speaker
signal or the left speaker signal output from the crosstalk cancellation processing unit 230.
When the head 2 rotates while producing sound, discontinuous noise occurs at the time when
the delay sample changes unless the delay time is also continuously changed according to the
rotation angle θ. When delay processing is performed digitally, the delay time is discrete and
discontinuous, and it is necessary to add interpolation processing to the delay section.
[0167]
Converting the delay times (differences in arrival times) into delay samples according to the
sampling frequency results in a number of decimal places. By performing the interpolation
process according to the number after the decimal point, the discontinuity does not occur in the
delay result. There are various interpolation methods, but a simple linear interpolation is
expressed by the following equation.
[0168]
In FIG. 23, “D: delay sample calculated from delay time”, “Di: delay sample integer part”,
“Dd: delay sample fraction part”, “a (): delay part output”, “b (): Assuming that
“interpolated delay samples”, the following equation holds. 「b=(a(Di)・
(1−Dd)+a(Di+1)・Dd)/2」となる。
[0169]
According to this, the delay unit 2520 and the delay unit 2540 respectively delay the input signal
to itself by “Di” and “Di + 1”. That is, the delay amount of one delay unit is set to be larger
by one sample than the delay amount of the other delay unit. Further, the gain unit 256 and the
gain unit 258 respectively multiply the input to itself by the coefficients “1-Dd” and “Dd”.
03-05-2019
47
The sum of both coefficients “Dd” and “1−Dd” is “1”, which is a weight for the outputs of
the delay unit 2520 and the delay unit 2540. That is, the gain units 2560 and 2580 perform
interpolation processing by adjusting the coefficients. Then, the adding unit 2570 adds the
outputs of the gain units 2560 and 2580, and the multiplying unit 2580 multiplies the output by
“1/2” to perform interpolation processing.
[0170]
For example, when the rotation angle θ is 15 degrees, the speakers 201a and 201b move
“2.59 (cm)” in the vertical direction, and the distance difference between the left and right
speakers 201a and 201b is twice “5.18 (cm) Therefore, the arrival time difference is “0.15
(ms)”, and the delay sample (D) is “7.3 samples”.
[0171]
Since the integer part (Di) of the delay sample is “7” and the decimal part (Dd) is “0.3”, the
delay unit 2520 gives a delay of “7 samples”, and the gain unit 2560 provides the coefficient
“ 1-Dd = 1-0.3 "is multiplied.
In the other delay unit 2540, a delay of 8 samples is given from “Di + 1 = 7 + 1 = 8”, and the
gain unit 2560 multiplies the coefficient “Dd = 0.3”.
[0172]
Then, the output signals of the gain units 2560 and 2580 are added, and the multiplication unit
1580 multiplies the addition result by “1/2”, so “b” subjected to the above-described
interpolation processing is output from the correction unit 250.
[0173]
For example, the control unit 150 determines the arrival time difference corresponding to the
rotation angle θ with reference to FIG. 20 and the like, calculates the delay sample number Di
according to the obtained arrival time difference, and sends this information to the delay units
2520 and 2540. Thus, the delay units 2520 and 2540 can respectively provide delays.
03-05-2019
48
Further, for example, the control unit 150 determines the arrival time difference corresponding
to the rotation angle θ with reference to FIG. 20 and the like, calculates the delay sample
number Di according to the obtained arrival time difference, and calculates Dd which is a decimal
part of Di. By sending the information to gain units 2560 and 2580, it is possible to give
coefficients to each.
[0174]
According to the above, it is possible to maintain the acoustic effect even when the listener H is
removed from the listener H while preventing the generation of the discontinuous noise. Further,
FIG. 23 only shows an example of the correction unit 250 that can execute interpolation
processing.
[0175]
According to the present invention, it is possible to use an outdoor attraction where there are a
large number of listeners who can use reproduced sound images, a mobile object (robot) and a
listener to face each other one by one, and a mobile object to provide performance. is there.
[0176]
DESCRIPTION OF SYMBOLS 1 mobile 2 head 3 mobile 4 head 5 rotation mechanism 100
electronic circuit 111a, 111b, 111c, 111d omni wheel 112a, 112b, 112c, 112d motor 114 motor
120 image processor 126 image processing engine 125 CCD camera 130 distance measurement
unit 132 A / D conversion unit 140 drive control unit 150 control unit 160 determination unit
200 virtual sound source generation units 201a and 201b speakers 202 D / A conversion unit
203 virtual sound source generation unit 204 virtual sound source generation unit 250
correction unit 252 delay Unit 254 Delay unit 256 Gain unit 258 Gain unit 280 Doppler signal
processing unit 290 Attenuation processing unit 300 Speech synthesis unit
03-05-2019
49
Документ
Категория
Без категории
Просмотров
0
Размер файла
75 Кб
Теги
jp2018117341
1/--страниц
Пожаловаться на содержимое документа