close

Вход

Забыли?

вход по аккаунту

?

JP2011035708

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011035708
When an audio signal and an audio signal are recorded in association with each other or when an
audio signal and an audio signal recorded in association with each other are reproduced, an
audio signal of a sound emitted from a speaker is detected in the audio signal. An acoustic signal
processing apparatus capable of appropriately controlling and recording or reproducing the
sound image direction of the speaker sound, or an imaging apparatus appropriately controlling
and recording the sound image direction of the speaker sound are provided. SOLUTION: A
speaker sound control unit 10 separates and extracts sounds from a plurality of sound sources
collected by the microphones 5L and 5R for each sound source, and whether the sound from
each sound source separated and extracted is a speaker sound or not Determine if When it is
determined that the sound is a speaker sound, the control of the sound image direction is
performed so that the sound image from the sound source substantially matches the shooting
direction by the imaging device 1. [Selected figure] Figure 2
Acoustic signal processing apparatus and imaging apparatus
[0001]
The present invention relates to an acoustic signal processing apparatus that processes an
acoustic signal, and more particularly to an acoustic signal processing apparatus or an imaging
apparatus that controls the sound image direction of an acoustic signal emitted from a speaker.
[0002]
In places such as lectures and various events where a large number of people gather, the speaker
10-05-2019
1
often speaks using a microphone.
The speaker's voice input to the microphone is amplified by the amplifier to which the
microphone is connected, and output as a loud volume from the speaker connected to the
amplifier. Thus, most of the speaker's voice will be heard from the speaker.
[0003]
Then, a plurality of speakers are connected to the amplifier, and the speakers are arranged
asymmetrically around the position of the speaker, or there is only one speaker but this speaker
is completely different from the speaker's position If placed in a position, the speaker's voice will
come from a different position than the speaker's present position.
[0004]
The image and sound of such a scene are recorded, for example, with a video camera equipped
with two stereo microphones so that the speaker is positioned near the middle of the shooting
area, and the recorded image signal and sound signal are reproduced and viewed. Think about
the case.
When the playback device performs stereo playback with, for example, two speakers for L
channel and R channel, a scene in which the talker is speaking is shown near the middle of the
monitor of the playback device. The speaker's voice can be heard only from one of the speakers
or can be biased to one of the speakers. Such reproduction of images and sounds is very
problematic for the viewer who is viewing the images and is a problem.
[0005]
In addition, in an athletic meet or the like, BGM may flow from only one speaker, and even when
images and sounds of such a scene are recorded and viewed, BGM can only be heard from one of
the speakers, or It is a problem because it is lacking in feeling of force because it is heard biased
to one side or the other.
[0006]
10-05-2019
2
In addition, there exists following patent document 1 as a prior art which controls the whole
sound image according to an image.
In the following Patent Document 1, when reproducing an image signal and an audio signal
acquired at the time of shooting, the technology of controlling the directivity of an audio signal to
be reproduced simultaneously and correcting the reproduction sound field according to the angle
of view of the image signal Yes, it does not solve the above problems.
[0007]
Japanese Patent Application Laid-Open No. 2006-287544
[0008]
The present invention has been made in view of the above problems, and when an image signal
and an audio signal are recorded in association with each other or when an image signal and an
audio signal recorded in association with each other are reproduced, When an acoustic signal of
a sound emitted from a speaker is detected, an acoustic signal processing apparatus capable of
appropriately controlling and recording or reproducing a sound image direction of the speaker
sound, or an appropriate sound image direction of the speaker sound It aims at providing an
imaging device which controls and records.
[0009]
A first acoustic signal processing apparatus according to the present invention includes a sound
collection unit that acquires an acoustic signal of the sound by collecting a sound that comes at
the time of shooting, and a speaker that detects a speaker sound signal from the acoustic signal.
Sound detection means; sound image direction control means for performing sound signal
processing on the sound signal so that the sound image direction of the speaker sound signal
matches the shooting direction when the speaker sound signal is detected; and the sound signal
processing And sound signal recording means for recording the sound signal.
[0010]
A second acoustic signal processing apparatus according to the present invention includes a
sound collection unit that acquires an acoustic signal of the sound by collecting a sound that
comes at the time of shooting, and a speaker that detects a speaker sound signal from the
acoustic signal. Sound detection means; sound image direction control means for performing
sound signal processing on the sound signal so that the sound image direction of the sound
signal coincides with the shooting direction when the speaker sound signal is detected; Acoustic
10-05-2019
3
signal recording means for recording the applied acoustic signal.
[0011]
An imaging apparatus according to the present invention includes the first or second acoustic
signal processing apparatus, and an imaging unit that acquires an image signal of the imaging
target by imaging the imaging target, and the imaging unit acquires the imaging signal. And
image signal recording means for recording the image signal in association with the acoustic
signal subjected to acoustic signal processing by the acoustic image direction control means
provided in the acoustic signal processing device.
[0012]
The imaging apparatus according to the present invention further includes face detection means
for detecting a face image signal of a person from the image signal, and microphone detection
means for detecting an image signal of a microphone from the image signal, the acoustic signal
processing device In the speaker sound detection means, the face detection means detects a face
image signal of a person, and the microphone detection means detects an image signal of a
microphone, the sound signal processing device includes sound collection A speaker sound signal
is detected from the sound signal acquired by the means.
[0013]
In the imaging apparatus according to the present invention, an imaging unit for acquiring an
image signal of the imaging target by imaging the imaging target, and a first sound of the sound
by collecting a sound that comes at the time of the imaging. Sound collecting means for acquiring
a signal, first recording means for recording the first acoustic signal in association with the image
signal acquired by the imaging means, and generating a second acoustic signal based on the first
acoustic signal A second recording unit that records the second acoustic signal in association
with the image signal acquired by the imaging unit; and a switching unit that switches between
the first recording unit and the second recording unit; 2. The recording means includes a speaker
sound detection means for detecting a speaker sound signal from the sound signal, and the
sound image direction of the speaker sound signal matches the shooting direction when the
speaker sound signal is detected. A sound image direction control means for generating the
second audio signal by performing audio signal processing over mosquito sound signal,
characterized by comprising a.
[0014]
Furthermore, in the imaging apparatus according to the present invention, an imaging unit for
acquiring an image signal of the imaging target by imaging the imaging target, and a first one of
10-05-2019
4
the sounds by collecting a sound that comes at the time of the imaging. A sound collecting unit
for acquiring an acoustic signal, a first recording unit for recording the first acoustic signal in
association with the image signal acquired by the imaging unit, and a second acoustic signal
based on the first acoustic signal And second recording means for recording the second acoustic
signal in association with the image signal acquired by the imaging means, and switching means
for switching between the first recording means and the second recording means. The second
recording means is a speaker sound detection means for detecting a speaker sound signal from
the sound signal, and when the speaker sound signal is detected, the sound image direction of
the sound signal matches the shooting direction. And sound image direction control means for
generating the second sound signal by performing sound signal processing on the sound signal.
[0015]
In the third acoustic signal processing apparatus according to the present invention, the image
signal acquired by the photographing by the photographing means and the acoustic signal of the
sound arriving at the time of the photographing acquired by the sound collecting means are
recorded in association with each other A speaker sound detection means for detecting a speaker
sound signal from the sound signal acquired by the acquisition means; a speaker sound detection
means for detecting a speaker sound signal from the sound signal acquired from the recording
means; Sound image direction control means for performing acoustic signal processing on the
acoustic signal so that the sound image direction of the sound signal coincides with the imaging
direction by the imaging means, and reproduction means for reproducing the acoustic signal
subjected to the acoustic signal processing , And characterized in that.
[0016]
In the fourth acoustic signal processing apparatus according to the present invention, the image
signal acquired by the photographing by the photographing means and the acoustic signal of the
sound arriving at the time of the photographing acquired by the sound collecting means are
recorded in association with each other An acquisition unit for acquiring the acoustic signal from
the recording unit; a speaker sound detection unit for detecting a speaker sound signal from the
acoustic signal acquired by the acquisition unit; and the sound when the speaker sound signal is
detected Sound image direction control means for performing acoustic signal processing on the
acoustic signal such that the sound image direction of the signal coincides with the imaging
direction by the imaging means; and reproduction means for reproducing the acoustic signal
subjected to the acoustic signal processing; It is characterized by having.
[0017]
According to the present invention, when recording an image signal and an audio signal in
association with each other or reproducing an image signal and an audio signal recorded in
10-05-2019
5
association with each other, the audio signal of the sound emitted from the speaker is detected in
the audio signal. In this case, an audio signal processing apparatus that appropriately controls
and records or reproduces the sound image direction of the speaker sound, or an imaging
apparatus that appropriately controls and records or reproduces the sound image direction of
the speaker sound is provided. be able to.
[0018]
The significance and effects of the present invention will become more apparent from the
description of the embodiments given below.
However, the following embodiment is merely an embodiment of the present invention, and the
meanings of the terms of the present invention or the respective constituent requirements are
limited to those described in the following embodiment. Absent.
[0019]
FIG. 1 is an overall configuration diagram of an imaging device according to an embodiment of
the present invention.
It is a figure explaining the outline | summary of the processing content of the speaker sound
control part 100. FIG.
FIG. 2 is a block diagram showing an outline of an internal configuration of a speaker sound
control unit 100.
It is a figure for demonstrating the method to which direction determination part 102 calculates
the direction in which an acoustic signal arrives.
FIG. 6 is a block diagram showing an outline of an internal configuration of a speaker sound
control unit 200 included in the sound processing unit 7;
10-05-2019
6
It is a figure for demonstrating the method to determine whether the acoustic signal of the n-th
frame has periodicity.
It is a figure which shows the relationship between the autocorrelation value S (P) of the acoustic
signal of the n-th flame | frame, and the variable P. FIG.
[0020]
Hereinafter, an embodiment in which an acoustic signal processing apparatus according to the
present invention is implemented in an imaging device will be described with reference to the
drawings.
[0021]
FIG. 1 is a block diagram showing an outline of an internal configuration of an imaging device
according to an embodiment of the present invention.
As shown in FIG. 1, an imaging device 1 is an image sensor 2 composed of a solid-state imaging
device such as a charge coupled device (CCD) or a complementary metal oxide semiconductor
(CMOS) sensor that converts an incident optical image into an electrical signal; And a lens unit 3
that forms an optical image of a subject on the image sensor 2 and adjusts the light amount and
the like.
The lens unit 3 and the image sensor 2 constitute an imaging unit, and the imaging unit
generates an image signal of an analog signal.
The lens unit 3 includes various lenses (not shown) such as a zoom lens and a focus lens, and an
aperture (not shown) for adjusting the amount of light input to the image sensor 2.
[0022]
Furthermore, the imaging device 1 may sometimes describe an image signal that is an analog
10-05-2019
7
signal output from the image sensor 2 as a digital image signal (hereinafter, simply referred to as
a digital image signal as an image signal).
And an AFE (Analog Front End) 4 for adjusting the gain, and a microphone for converting an
input sound into an electric signal and collecting the sound (hereinafter simply referred to as a
microphone).
The analog acoustic signals output from the microphones 5L and 5R and the digital acoustic
signals (hereinafter, the digital acoustic signals may be simply referred to as acoustic signals).
(Analog to Digital Converters) 6L and 6R for conversion into a), the audio processing unit 7 for
performing various acoustic signal processing on the audio signals output from the ADCs 6L and
6R, and the image signal output from the AFE 4 And an image processing unit 8 that performs
various image signal processing on the image signal and outputs the processed signal.
[0023]
Here, the sound processing unit 7 describes the sound emitted from the speaker as the sound
collected by the microphones 5L and 5R (hereinafter, the sound emitted from the speaker is
described as the speaker sound and the sound signal of the speaker sound as the speaker sound
signal . Is detected, and when the speaker sound is detected, the speaker sound is controlled to
control the sound image direction of the speaker sound, or the speaker sound to control the
sound image direction of the entire sound collected by the microphones 5L and 5R A control unit
is provided. Details of the speaker sound control unit will be described later.
[0024]
In addition, the imaging apparatus 1 performs compression coding processing such as MPEG
(Moving Picture Experts Group) compression method on the image signal output from the image
processing unit 8 and the sound signal output from the sound processing unit 7. A processing
unit 9, an external memory 11 for recording a compression-encoded signal compressed and
encoded by the compression processing unit 9, a driver unit 10 for recording and reading out the
compression-encoded signal in the external memory 11, a driver unit 10 And a decompression
processing unit 12 for decompressing and decoding the compression encoded signal read from
the external memory 11 according to
[0025]
10-05-2019
8
The imaging apparatus 1 further includes an image signal output unit 13 that converts an image
signal decoded by the extension processing unit 12 into a signal in a format that can be
displayed by the display unit 21 such as a monitor, and an acoustic signal decoded by the
extension processing unit 12. And an acoustic signal output unit that converts the signal into a
signal that can be output by the speaker unit.
[0026]
The imaging device 1 also includes a central processing unit (CPU) 15 that controls the overall
operation of the imaging device 1 and a memory 16 that stores each program for performing
each process and that also temporarily stores signals during program execution. , An operation
unit 17 to which an instruction from the photographer such as a button to start photographing
and a button to determine various settings is input, and a timing generator (TG that outputs a
timing control signal for making the operation timing of each unit coincide. And a bus 19 for
exchanging signals between the CPU 15 and each part, and a bus 20 for exchanging signals
between the memory 16 and each part.
[0027]
The external memory 11 may be of any type as long as it can record image signals and sound
signals.
For example, a semiconductor memory such as an SD (Secure Digital) card, an optical disk such
as a DVD, a magnetic disk such as a hard disk, or the like can be used as the external memory 11.
Further, the external memory 11 may be detachable from the imaging device 1.
[0028]
Next, the basic operation of the imaging device 1 will be described with reference to FIG.
First, the imaging device 1 photoelectrically converts light incident from the lens unit 3 by the
image sensor 2 to generate an analog image signal which is an electrical signal. The image sensor
10-05-2019
9
2 sequentially outputs the analog image signal generated to the AFE 4 in a predetermined frame
cycle (for example, 1/30 second) in synchronization with the timing control signal input from the
TG unit 18. Then, the image signal converted from the analog signal to the digital signal by the
AFE 4 is input to the image processing unit 8. The image processing unit 8 converts the image
signal into a signal using YUV, and performs various image signal processing such as gradation
correction and edge enhancement. Further, the memory 16 operates as a frame memory, and
temporarily holds an image signal when the image processing unit 8 performs processing.
[0029]
Further, the microphones 5L and 5R collect sound, convert it into an analog acoustic signal
which is an electric signal, and output it. The analog audio signals output from the microphones
5L and 5R are input to the ADCs 6L and 6R and converted into digital audio signals. Further,
acoustic signals from the ADCs 6L and 6R are input to the acoustic processing unit 7, and various
acoustic signal processing such as noise removal and speaker sound control by the speaker
sound control unit are performed.
[0030]
Both the image signal output from the image processing unit 8 and the acoustic signal output
from the acoustic processing unit 7 are input to the compression processing unit 9 and
compressed by the compression processing unit 9 according to a predetermined compression
method. At this time, the image signal and the sound signal are temporally associated (paired) so
that the image and the sound do not shift at the time of reproduction. Then, the compressed
image signal and sound signal are recorded in the external memory 11 through the driver unit
10.
[0031]
The compressed image signal and sound signal recorded in the external memory 11 are read out
by the decompression processing unit 12 based on the reproduction instruction of the
photographer input via the operation unit 17. The decompression processing unit 12
decompresses the compressed image signal and audio signal read for reproduction, and the
reproduction image signal is output to the image signal output unit 13 and the reproduction
audio signal is output to the audio signal output unit 14. Output. Then, the image signal output
10-05-2019
10
unit 13 converts the image signal for reproduction into a signal of a format that can be displayed
by the display unit 21, and the acoustic signal output unit 14 can output the acoustic signal for
reproduction by the speaker unit 22. Convert to format signal and output each. Thereby, the
image for reproduction is displayed on the display unit 21, and the sound for reproduction is
output from the speaker unit 22.
[0032]
In addition, the imaging device 1 of the present embodiment displays the photographed image on
the display unit 21 before starting recording of the photographed image, at the time of recording
a moving image, or the like. At this time, the image processing unit 8 generates an image signal
for display and outputs the image signal to the image signal output unit 13 via the bus 20. Then,
the image signal output unit 13 converts an image signal for display into a signal in a format that
can be displayed by the display unit 21 and outputs the signal. The photographer can recognize
the angle of view of the image to be recorded or is currently recorded by confirming the image
displayed on the display unit 21.
[0033]
The display unit 21 and the speaker unit 22 may be integrated with the imaging apparatus 1 or
may be separate units, and are connected using terminals and a cable or the like provided in the
imaging apparatus 1. It does not matter. Also, the microphones 5L and 5R may be configured not
to include the ADC 6 as being equipped with a digital microphone that outputs a digital sound
signal.
[0034]
<< 1st Example >> Hereinafter, 1st Example of the speaker sound control part with which the
sound processing part 7 of the imaging device 1 is provided is described. In the following
description of the first embodiment, the number 100 is assigned to the speaker sound control
unit.
[0035]
10-05-2019
11
FIG. 2 is a diagram for explaining an outline of processing contents of the speaker sound control
unit 100. As shown in FIG. In FIG. 2, a photographer captures an image of a speaker holding a
microphone using the imaging device 1. When the speaker speaks into the microphone, the voice
will be emitted from a speaker located at a different position from the speaker. That is, in FIG. 2,
the speaker that emits the speaker's voice is the sound source P. On the other hand, it is assumed
that a sound source Q emitting sounds other than the speaker sound is also present.
[0036]
The speaker sound control unit 100 separates and extracts, for each sound source, sounds from a
plurality of sound sources (in FIG. 2, two sound sources of the sound source P and the sound
source Q) collected by the microphones 5L and 5R. And it is determined whether the sound from
each sound source separated and extracted is a speaker sound. When it is determined that the
sound is a speaker sound, the control of the sound image direction is performed so that the
sound image from the sound source substantially matches the shooting direction by the imaging
device 1. Here, the imaging direction refers to the direction in which the lens unit 3 of the
imaging device 1 is facing at the time of imaging by the imaging device 1. In FIG. 2, although a
talker with a microphone is present in the shooting direction, the sound source P (speaker) and
the sound source Q are not present.
[0037]
When the speaker sound control unit 100 determines that the sound from the sound source P
among the sounds from the sound source P and the sound source Q is a speaker sound, it is as if
the speaker sound is coming from the imaging direction of the imaging device 1. And performing
acoustic signal processing on the acoustic signal of the speaker sound. As a result of such
processing, when the image signal and the sound signal acquired by shooting are reproduced and
viewed, the viewer hears the speaker's voice from the shooting direction, that is, the direction in
which the speaker is present. I can not feel
[0038]
FIG. 3 is a block diagram schematically showing the internal configuration of the speaker sound
control unit 100. As shown in FIG. In FIG. 3, the acoustic signals output from the ADCs 6L and 6R
10-05-2019
12
are signals in the time domain, and when the elapsed time from a certain reference time is t (t is
an integer), the acoustic signals are expressed as a function of t it can. Hereinafter, acoustic
signals output from the ADCs 6L and 6R will be referred to as an original signal Li (t) and an
original signal Ri (t), respectively.
[0039]
The FFT (Fast Fourier Transform) units 101L and 101R respectively perform discrete Fourier
transform on the original signals Li (t) and Ri (t) to generate frequency spectra. The frequency
spectrums output from the FFT units 101L and 101R are obtained by converting the acoustic
signals output from the ADCs 6L and 6R as signals on the time domain into signals on the
frequency domain. Therefore, the frequency spectrum can be expressed as a function of
frequency f (f is a positive integer). Hereinafter, the frequency spectrums output from the FFT
units 101L and 101R will be described as frequency spectra L (f) and R (f), respectively.
[0040]
In the imaging device 1 according to the present embodiment, each of the ADCs 6L and 6R
converts an analog sound signal into a digital sound signal at a sampling frequency of 48 kHz
(kilohertz), for example. Then, the imaging apparatus 1 sets the generated acoustic signal 1024
samples, that is, about 21.3 msec (1024 × 1/48 kHz) as one frame, and performs acoustic signal
processing on the acoustic signal in units of this frame.
[0041]
The FFT units 101L and 101R perform discrete Fourier transform on the acoustic signal in units
of one frame. At this time, the frequency band of the acoustic signal is subdivided into M (M is an
integer of 2 or more) at a sampling interval of Δf, and a frequency spectrum is calculated for
each subdivided frequency band. Hereinafter, the subdivided frequency band is described as a
subdivided band. For example, assuming that the entire frequency band of the acoustic signal is
ΔF, the number M of subdivided bands is M = ΔF / Δf. Here, ideally, by narrowing the sample
interval Δf, each of the subdivided bands can include only the component of the acoustic signal
from one sound source. That is, the acoustic signal included in each of the subdivided bands can
be considered to be a component of the acoustic signal of the sound emitted from any one of the
plurality of sound sources.
10-05-2019
13
[0042]
When the plurality of subdivided bands are f0, f1, f2, ..., fm-1 (m is an integer of 1 or more), the
frequency spectra L (f) and R (f) are divided into subdivided bands f0, 0 .., fm-1 (m is an integer
of 1 or more). Hereinafter, the frequency spectra of the subdivided bands constituting the
frequency spectra L (f) and R (f) are represented by L (f0), L (f1), L (f2),... L (fm-1), respectively.
And R (f0), R (f1), R (f2),... R (fm-1).
[0043]
The direction determination unit 102 calculates the phase difference when the acoustic signal
included in each subdivided band reaches the microphones 5L and 5R from the frequency
spectrum of each subdivided band output from each of the FFT units 101L and 101R, Based on
this phase difference, the arrival direction of the acoustic signal included in each subdivided band
is determined.
[0044]
FIG. 4 is a diagram for explaining a method in which the direction determination unit 102
calculates the direction in which the sound signal arrives.
Now, assume a two-dimensional coordinate plane having coordinate axes X and Y axes
orthogonal to each other. The X and Y axes are orthogonal at the origin O. With the origin O as a
reference, the X-axis positive direction is the right side, the negative direction is the left side, the
Y-axis positive direction is the front, and the negative direction is the rear. It is assumed that the
microphones 5L and 5R are disposed at mutually different positions on the X axis so as to be
symmetrical with respect to the Y axis, and the distance between the two microphones is D. The
distance D is, for example, about several mm.
[0045]
Now, for example, an acoustic signal included in the subdivision band of f0 Hz is emitted from
the sound source P, and the incident angle when the acoustic signal arrives at the origin O is θ
10-05-2019
14
with the counterclockwise as the center. It is assumed that (rad) (radian). At this time, the
incident angles to the microphones 5L and 5R can also be approximated to θ (rad). Assuming
that the phase difference when the sound signal reaches the microphones 5L and 5R is Δφ
(rad), Δφ is calculated from the frequency spectra L (f0) and R (f0) output from the FFT units
101L and 101R, respectively. be able to.
[0046]
Specifically, assuming that the real part of frequency spectrum L (f0) calculated by discrete
Fourier transform is L_r (f0) and the imaginary part is L_i (f0), the phase φl of L (f0) is
[0047]
[0048]
It can be calculated as
[0049]
Similarly, assuming that the real part of the frequency spectrum R (f0) is R_r (f0) and the
imaginary part is R_i (f0), the phase φr of R (f0) is
[0050]
[0051]
It can be calculated as
[0052]
Here, since the phase difference Δφ can be calculated as Δφ = φr−φl, it can be calculated by
the following equation (1).
[0053]
[0054]
Further, assuming that the sound velocity is C (mm / sec) and the distance between the
10-05-2019
15
microphones 5L and 5R is D (mm), Δφ can also be calculated from the following equation (2).
[0055]
[0056]
Therefore, incident angle (theta) is computable from following formula (3) from said formula (1)
and (2).
[0057]
[0058]
From the above, it is possible to calculate the incident angle θ which is the direction in which
the acoustic signal included in the subdivided band f0 Hz arrives.
Thus, the direction determination unit 102 calculates the incident angle of the acoustic signal
included in all the subdivided bands.
Hereinafter, the incident angle of the acoustic signal included in the subdivided band is simply
referred to as the incident angle of the subdivided band.
[0059]
In the present embodiment, the direction determination unit 102 outputs all the combinations to
the speaker sound determination unit 103 as one set of the frequency spectrum of the
subdivided band of the frequency spectrum L (f) and the incident angle of the subdivided band.
The frequency spectrum of the subdivided band of the frequency spectrum R (f) and the incident
angle of the subdivided band may be output as one set.
10-05-2019
16
[0060]
For example, the subdivision bands f0, f1, f2, f3, f4, f6, f7, f8, and f9 have incident angles of θ0,
θ1, θ0, θ0, θ1, θ1, θ1, θ1, θ2, θ2, and θ2, respectively. If it is (rad), the direction
determination unit 102 calculates (L (f0), θ0), (L (f1), θ1), (L (f2), θ0), (L (f3), θ0), (L (f4),
θ1), (L (f5), θ1), (L (f6), θ1), (L (f7), θ2), (L (f8), θ2), (L (f9) , Θ 2) to the speaker sound
determination unit 103.
[0061]
The speaker sound determination unit 103 extracts the frequency spectrum of the subdivided
band for each incident angle from the set of the frequency spectrum of the subdivided band
output from the direction determination unit 102 and the incident angle of the subdivided band,
and combines these. And generate a composite frequency spectrum.
That is, a combined frequency spectrum for each direction of arrival of the acoustic signal is
generated.
Hereinafter, a combined frequency spectrum generated by combining the frequency spectra of
the subdivided bands arriving from the incident angle θ will be referred to as L (θ).
[0062]
For example, from the direction determination unit 102, the speaker sound determination unit
103 may calculate (L (f0), θ0), (L (f1), θ1), (L (f2), θ0), (L (f3), θ0) , (L (f4), θ1), (L (f5), θ1),
(L (f6), θ1), (L (f7), θ2), (L (f8), θ2), (L (f9) ), Θ2), the frequency spectrums L (f0), L (f2), L (f3)
of the subdivided band of the incident angle θ0 are extracted, these are synthesized, and a
synthesized frequency spectrum L (θ0) is generated. Do.
[0063]
Similarly, L (f1), L (f4), L (f5) and L (f6) are extracted and synthesized to obtain a synthesized
frequency spectrum L (θ1) from the incident angle θ1.
Further, L (f7), L (f8), and L (f9) are extracted and synthesized to obtain a synthesized frequency
10-05-2019
17
spectrum L (θ2) from the incident angle θ2.
[0064]
The speaker sound determination unit 103 determines whether the digital sound signal from
each arrival direction is a speaker sound signal from the characteristic of the combined
frequency spectrum in each arrival direction calculated in this manner.
[0065]
Generally, a frequency band of an acoustic signal that can be reproduced by a speaker used in an
event such as a seminar or an athletic event is often in the range of about 300 Hz to 6 kHz.
Therefore, if the frequency spectrum of the acoustic signal in each direction of arrival falls within
the range of approximately 300 Hz to 6 kHz, it is determined that the possibility of the speaker
sound is high.
[0066]
In addition, the frequency spectrum of human voice is concentrated in a range of approximately
100 Hz to 4 kHz.
And voiced sound has a harmonic structure which consists of the harmonic component while
pitch frequency exists in a comparatively low frequency band.
Here, the pitch frequency is the fundamental frequency of the human voice emitted by vocal cord
vibration, and is usually in the range of about 100 Hz to 300 Hz.
Therefore, assuming that the pitch frequency is fp, the frequency spectrum of the human voice
exhibits the characteristic of taking maximum values at fp, 2fp, 3fp,... NfpHz (n is a positive
integer).
10-05-2019
18
[0067]
On the other hand, as described above, since the reproducible frequency band of the speaker
used in events such as training sessions and sports events is about 300 Hz to 6 kHz, when the
speaker sound includes human voice, the frequency spectrum These do not include the spectrum
of the pitch frequency, and exhibit frequency characteristics having a harmonic structure.
[0068]
The speaker sound determination unit 103 performs autocorrelation on the synthesized
frequency spectrum, for example, and determines whether or not the spectrum of the pitch
frequency is included and whether or not the synthesized frequency spectrum includes the
harmonic structure. It is determined whether or not it includes the speaker sound signal of the
voice.
[0069]
Specifically, the speaker sound determination unit 103 first performs autocorrelation on the
combined frequency spectrum L (θ) to detect a plurality of maximum values.
It is assumed that the synthesized frequency spectrum L (θ) does not include a spectrum in the
range of about 100 Hz to 300 Hz, and the frequencies taking maximum values are, for example,
fm1 = 300 Hz, fm2 = 450 Hz, fm3 = 600 Hz.
[0070]
Here, assuming that the first frequency f m1 at which the maximum value is obtained is twice the
pitch frequency fp, that is, 300 Hz = 2 f, the pitch frequency fp is fp = 150 Hz.
Furthermore, if the synthesized frequency spectrum L (θ) includes the frequency spectrum of a
speech signal at a pitch frequency fp = 150 Hz, L (θ) should have a maximum value at 2fp = 300
Hz, 3fp = 450 Hz, 4fp = 600 Hz It is.
[0071]
Now, since the relationship of fm1 = 2fp, fm2 = 3fp, and fm3 = 4fp is satisfied, in this case, the
10-05-2019
19
speaker sound determination unit 103 sets L (θ) to the speaker sound signal of the sound signal
having a pitch frequency of 150 Hz. It is determined that the frequency spectrum of
[0072]
In this manner, the speaker sound determination unit 103 determines whether or not all of the
synthesized frequency spectrum L (θ) includes the frequency spectrum of the speaker sound
signal of the audio signal.
[0073]
In this embodiment, the frequency band of the acoustic signal for each direction of arrival is
included in the reproducible band (in the range of about 300 Hz to 6 kHz) of the speaker sound
used in events such as seminars and sports events, and the audio signal Is included, it is
determined to be a speaker sound signal, and in other cases, it is not determined to be a speaker
sound signal even if the frequency band is 300 Hz to 6 kHz.
Thus, it is possible to control the sound image direction by detecting the speaker sound including
the human voice with high accuracy.
[0074]
The speaker sound determination unit 103 notifies the gain adjustment unit 104 of the
frequency of the subdivided band including the speaker sound signal.
[0075]
The gain adjustment unit 104 adjusts the frequency spectrum corresponding to the frequency of
the subdivided band including the speaker sound signal notified from the speaker sound
determination unit 103 so that L (f) and R (f) have the same level.
[0076]
Specifically, for example, when it is notified that the subdivided band f0 Hz includes the speaker
sound signal, it is assumed that L (f0) = VL, R (f0) = VR and VL> VR, gain adjustment The unit
104 adjusts the gain such that L (f0) = VL and R (f0) = VL.
10-05-2019
20
That is, the level of the frequency spectrum of L (f0) and R (f0) is adjusted to coincide with the
higher one of the two.
[0077]
IFFT (Inverse Fast Fourier Transform) sections 105L and 105R respectively perform inverse
Fourier transform on the frequency spectra L (f) and R (f) after gain adjustment to convert them
into signals in the time domain, and respectively obtain Lo (t And Ro (t).
[0078]
As described above, the speaker sound control unit 100 according to the present embodiment
detects whether or not the speaker sound of the human voice is included in the sounds coming
from the plurality of sound sources.
When a speaker sound of a human voice is detected, control is performed such that the sound
image of the speaker sound substantially matches the shooting direction of the imaging device 1.
[0079]
<< 2nd Example >> Hereinafter, 2nd Example of the speaker sound control part with which the
sound processing part 7 of the imaging device 1 is provided is described.
In the following description of the second embodiment, the number 200 is assigned to the
speaker sound control unit.
[0080]
In general, the voice of a person who is input from a microphone and emitted from a speaker
after being amplified by an amplifier is larger than the voice of a person who is directly emitted.
10-05-2019
21
The voice of the person who is directly emitted may be larger than the voice of the person who is
emitted from the speaker, but the voice of the person who is directly emitted and the voice of the
person who is emitted from the speaker may have different echoes even if they are the same
person Do.
Usually, the person's voice emitted from the speaker is more echoed than the person's voice
emitted directly.
When a person speaks through a microphone, the microphone picks up the voice of the person
emitted from the speaker in addition to the voice directly emitted by the person, so the voices of
these two types of identical persons are emitted from the speaker Because it will
[0081]
From such a thing, the voice of the person who has a large volume and a large echo is considered
to be a speaker sound of the human voice.
Here, the fact that the echo is large means that there is a constant periodicity.
[0082]
Also, in general, the sound signal by music is a wide band signal and has a certain periodicity.
As described above, the frequency band of the acoustic signal that can be reproduced is often in
the range of about 300 Hz to 6 kHz for the speakers that are usually used in events such as
seminars and sports events.
Therefore, when the sound signal by music is emitted from such a speaker, although the width of
the frequency band is narrower than the sound signal by direct music, since the amplification
processing is performed, the volume is large and constant. It has the periodicity of
10-05-2019
22
[0083]
From the above, the acoustic signal satisfying all the requirements of (A) that the frequency band
is included in the range of about 300 Hz to 6 kHz, (B) that the volume is large, and (C) that it has
a certain periodicity. , It can be determined that it is a speaker sound signal. In other words, when
any one of the requirements (A) to (C) is not satisfied, it can be determined that the signal is not a
speaker sound signal.
[0084]
FIG. 5 is a block diagram showing an outline of an internal configuration of the speaker sound
control unit 200 provided in the sound processing unit 7. The speaker sound determination unit
201 determines whether Ri (t) output from the ADC 6R includes the speaker sound signal, and
when it is determined that the speaker sound signal is included, the switching unit 202 described
later Output switching signal.
[0085]
When the switching signal is output from the speaker sound determination unit 201, the
switching unit 202 performs monaural processing on Li (t) and Ri (t) output from each of the
ADCs 6L and 6R, and outputs Lo (t) and Lo (t). Output as Ro (t). Here, the monaural processing is
processing for setting Lo (t) = Ro (t). When the switching signal is not output from the speaker
sound determination unit 201, Li (t) and Ri (t) are output as Lo (t) and Ro (t), respectively.
Although the speaker sound determination unit 201 determines whether or not the speaker
sound signal is used for Ri (t), the determination may be performed for Li (t).
[0086]
Hereinafter, specific processing of the speaker sound determination unit 201 will be described.
<Step 1: Whether or not the frequency band is included in the range of about 300 Hz to 6 kHz>
First, the speaker sound determination unit 201 determines each frame (1024 samples) of the
audio signal Ri (t) output from the ADC 6R. Perform FFT to calculate frequency spectrum. It is
determined whether the calculated frequency spectrum is included in the range of about 300 Hz
10-05-2019
23
to 6 kHz of the frequency band. <Step 2: Whether the Volume is Large> Next, when the speaker
sound determination unit 201 determines that the frequency spectrum of the acoustic signal Ri
(t) is included in the range of about the frequency band 300 Hz to 6 kHz. It is determined
whether the level (power value) of the acoustic signal Ri (t) is equal to or greater than a
predetermined threshold.
[0087]
Specifically, the speaker sound determination unit 201 determines whether the average value PRi
(n) of the power of the acoustic signal of each frame calculated for Ri (t) by the following
equation (4) is equal to or more than a predetermined threshold. Here, successive frames in the
time domain are described as first, second, third,..., N-th frames in order from the earliest time. n
is a positive integer indicating a frame number.
[0088]
[0089]
<Step 3: Whether or not the periodicity is constant> Next, the speaker sound determination unit
201 determines that the average value PRi (n) of the power of the sound signal of the nth frame
(1024 samples) is equal to or greater than a predetermined threshold If it is determined, it is
determined whether the acoustic signal of the nth frame has periodicity.
[0090]
FIG. 6 is a diagram for describing a method of determining whether the acoustic signal of the nth
frame has periodicity.
In FIG. 6, among the acoustic signal Ri (t) of the nth frame, for example, t = 1 to t0 Ri (t) is used
as a reference block, and autocorrelation is calculated (t0 is 2 or more integer).
That is, an evaluation block consisting of t0 consecutive Ri (t) s is defined with respect to Ri (t) on
and after t0, and the position of the evaluation block is sequentially shifted in the time direction,
and between the reference block and the evaluation block. Find the correlation of In FIG. 6, P is a
10-05-2019
24
shift width, in other words, a variable representing the position of the evaluation block, and P>
t0. Specifically, the autocorrelation value S (P) is calculated according to the following equation
(5). The autocorrelation value S (P) is a function of the variable P which determines the position
of the evaluation block.
[0091]
[0092]
FIG. 7 shows the relationship between the calculated autocorrelation value S (P) and the variable
P.
In FIG. 7, the horizontal axis and the vertical axis respectively represent the variable P and the
autocorrelation value S (P). According to FIG. 7, the local maximum value at which the
autocorrelation value S (P) periodically becomes equal to or more than the predetermined
threshold value is taken with respect to the change of the variable P. In this case, the speaker
sound determination unit 201 determines that the sound signal Ri (t) of the nth frame has
periodicity and includes the speaker sound signal, and outputs a switching signal to the switching
unit 202. The execution order of steps 1 to 3 may be changed.
[0093]
When the switching signal is output from the speaker sound determination unit 201, the
switching unit 202 switches from the stereo system to the monaural system. That is, the
switching unit 202 outputs one of Li (t) and Ri (t) acoustic signals as Lo (t) and Ro (t). As a result,
when the collected sound includes the speaker sound, it is recorded by the monaural method.
[0094]
As described above, the speaker sound control unit 200 according to the present embodiment
detects whether the sound collected by the microphones 5L and 5R includes the speaker sound.
When the speaker sound is detected, the sound image of the entire sound collected by the
microphones 5L and 5R is controlled so as to substantially coincide with the imaging direction by
10-05-2019
25
the imaging device 1.
[0095]
<< Modification 1 >> It is also possible to combine the speaker sound control unit 100 of the first
embodiment and the speaker sound control unit 200 of the second embodiment.
[0096]
For example, when the sound collected by the microphones 5L and 5R includes the speaker
sound of a human voice, the control of the sound image direction is performed by the speaker
sound control unit 100, and the speaker sound of the human voice is not included. When the
speaker sound of music is included, it is possible to combine so as to perform sound image
control by the speaker sound control unit 200.
[0097]
<< Modification 2 >> It is also possible to provide the imaging device 1 with a switch for
switching between the normal recording mode and the speaker sound control recording mode.
[0098]
When the user sets the normal recording mode when shooting with the imaging device 1, the
imaging device 1 does not control the direction of the sound image by the speaker sound control
unit 100 or 200 when recording the sound.
On the other hand, when the speaker sound control recording mode is set, the imaging device 1
controls the sound image direction by the speaker sound control unit 100 or 200 on the sound
collected by the microphones 5L and 5R.
[0099]
According to the imaging device 1, the user can freely determine whether the control of the
sound image direction by the speaker sound control unit 100 or 200 is necessary.
[0100]
10-05-2019
26
<< Modification 3 >> In the above-described first and second embodiments, the case where the
control of the sound image direction by the speaker sound control unit 100 or 200 is performed
when the image and the sound are recorded by the imaging device 1 has been described. When
the device 1 or the reproduction device reproduces an image signal and an acoustic signal
recorded in the external recording medium (the external memory 11 in the case of the imaging
device 1), control of the sound image direction by the speaker sound control unit 100 or 200 is
performed. You may do it.
According to such an imaging device 1 or playback device, it is possible to obtain so-called raw
image signals and sound signals which have not been subjected to sound signal processing for
control of the sound image direction at the time of photographing.
Therefore, it is possible to avoid the situation where the control of the sound image direction not
intended by the photographer is given and recorded.
[0101]
<< Modification 4 >> Using a known technique (face image detection) for detecting an image
signal of a human face from an image signal acquired at the time of shooting, detecting an image
signal of a microphone from the image signal (microphone Image detection) is possible.
As a well-known technique which detects the image signal of a human face, there exists a
technique of Unexamined-Japanese-Patent No. 2007-257358, for example. It is possible to detect
the image signal of the microphone from the image signal by replacing the weight table related
to the face of the person to be referred to when detecting the image signal of the face in the art
with the weight table related to the microphone.
[0102]
In the imaging device 1 of FIG. 1, the image processing unit 8 can be provided with a face image
detection unit that detects an image signal of a human face from an image signal and a
microphone image detection unit that detects an image signal of a microphone. Then, the CPU 15
causes the face image detection unit to perform face detection processing on the image signal
10-05-2019
27
output from the AFE 4, and when an image signal of a human face is detected, the microphone
image detection processing is performed on the microphone image detection unit. Can be The
microphone image detection unit includes an area in which an image signal of a human face is
detected among image signals output from the AFE 4 in order to detect whether a microphone is
present in the vicinity of a human face. The image signal of the microphone is detected in a large
predetermined area.
[0103]
When the image signal of the human face and the microphone is detected in the image signal
output from the AFE 4 by the face image detection and the microphone image detection, a person
is present in the shooting area, and the person uses the microphone. It can be determined that
this is a scene that is producing a voice.
[0104]
In such a case, the CPU 15 determines that the sound collected by the microphones 5L and 5R
may contain the speaker sound, and causes the speaker sound control unit 100 or 200 to control
the sound image direction. it can.
On the other hand, when the face image signal is not detected in the image signal output from
the AFE 4 or when the face image signal is detected but the microphone image signal is not
detected, the CPU 15 controls the speaker sound control unit 100. Or, the control of the sound
image direction is not performed by 200.
[0105]
Further, the face image detection processing and the microphone image detection processing,
and the control of the sound image direction by the speaker sound control unit 100 or 200 when
the image signal of the face and the image signal of the microphone are detected, are recorded in
the external memory 11 It may be performed in the case of reproducing existing image signals
and sound signals. As a result, it is possible to appropriately determine a scene that requires
control of the sound image direction, and to control the sound image direction by the speaker
sound control unit 100 or 200.
10-05-2019
28
[0106]
<< Modification 5 >> In Embodiments 1 and 2 described above, the imaging device 1 including
two microphones for LR channel for stereo recording, and imaging including two speakers for LR
channel for performing stereo reproduction. Although the apparatus 1 or the reproduction
apparatus has been described, the number of microphones and speakers and the recording and
reproduction system of the acoustic signal are not limited to these. For example, the present
invention can be realized even with a 5.1 channel recording and reproduction method using six
microphones and speakers.
[0107]
5L microphone 5R microphone 6L ADC 6R ADC 100 speaker sound control unit 101L FFT unit
101R FFT unit 102 direction determination unit 103 speaker sound determination unit 104 gain
adjustment unit 105L IFFT unit 105R IFFT unit 200 speaker sound control unit 201 speaker
sound determination unit 202 switching Department
10-05-2019
29
Документ
Категория
Без категории
Просмотров
0
Размер файла
42 Кб
Теги
jp2011035708
1/--страниц
Пожаловаться на содержимое документа