close

Вход

Забыли?

вход по аккаунту

?

JP2009239346

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009239346
An object of the present invention is to provide a technique capable of tracking and
photographing a specific subject with a configuration simpler than that of the conventional
imaging apparatus. A plurality of microphones 15 are arranged in a row in a photographing
apparatus 1. The control unit 11 of the imaging device 1 analyzes voice data representing voice
collected by each of the microphones 15, and estimates a plurality of directions of the sound
source according to the analysis result. Further, the control unit 11 collates the voice data
representing the voice from each of the estimated sound source directions with the collation data
stored in the collation data storage area 121, and according to the degree of coincidence, the
direction of the specific subject. Identify In addition, the control unit 11 analyzes voice data for
each microphone 15, detects a transition of the direction of the subject according to the analysis
result, controls the rotation mechanism 70, and detects in the imaging range of the imaging
device 1 The orientation of the imaging device 1 is changed so that the direction is included.
[Selected figure] Figure 1
Shooting device
[0001]
The present invention relates to a technique for performing photographing.
[0002]
In imaging devices such as digital cameras that capture still images and moving images,
techniques for tracking and capturing a specific subject have been proposed.
04-05-2019
1
For example, Patent Document 1 proposes an apparatus for automatically tracking and
photographing an object attached with a transmitter. Further, according to Patent Document 2, a
tag is attached to each of a plurality of objects, a tuning signal is transmitted at a spread angle
substantially equal to the angle of view, and a response signal transmitted from each tag is
received according to the tuning signal. There has been proposed an apparatus for detecting a
direction and performing imaging by making the imaging direction substantially coincide with
the detected tag direction. According to this device, even when there are a plurality of subjects, it
is possible to reliably capture and capture only the subjects within the angle of view. JP 200369884 JP JP 2006-261999
[0003]
However, in the techniques described in Patent Documents 1 and 2, it is necessary to separately
provide a receiving unit for receiving a signal from a transmitter (tag) in the imaging apparatus,
and the apparatus configuration of the imaging apparatus becomes complicated. There was a
problem. The present invention has been made under the above-described background, and it is
an object of the present invention to provide a technology capable of tracking and photographing
a specific subject with a simpler configuration as compared with the prior art.
[0004]
In order to solve the above problems, in a photographing apparatus according to a preferred
embodiment of the present invention, a photographing range is set, and photographing means
for outputting video data representing a video within the photographing range; Estimator that
analyzes voice data output from each of a plurality of microphones that pick up voice and output
as voice data and each of the plurality of microphones, and estimates one or more directions of a
sound source according to the analysis result And identification means for identifying at least
one of the directions of the sound source estimated by the estimation means, and audio data for
each of the microphones, and the sound source identified by the identification means according
to the analysis result Detection means for detecting the transition of the direction of the image,
and the imaging range for changing the imaging range of the imaging means to a range including
the direction of the sound source detected by the detection means Characterized by comprising a
further means.
[0005]
In the above-mentioned aspect, when the photographing range is changed by the photographing
04-05-2019
2
range changing means, a video data storage control means may be provided for storing the video
data outputted from the photographing means in a predetermined storage means.
[0006]
Further, in the above-described aspect, the estimation unit calculates the distribution of sound
pressure around the imaging unit based on the correlation of the sound data for each of the
microphones, and determines the direction in which the peak of the sound pressure appears in
the calculated distribution. It may be estimated as the direction of the sound source.
[0007]
In the aspect described above, the shooting range changing means may change the direction of
the shooting means such that the direction of the sound source estimated by the direction
estimating means is included in the shooting range.
In the above-mentioned mode, the detection means may detect the transition of the sound
pressure peak in the sound pressure distribution calculated by the estimation means.
[0008]
Further, in the above-mentioned aspect, collation data storage means for storing collation data
representing voice characteristics, and directional voice data corresponding to each direction
estimated by the estimation means from the voice data for each of the microphones. Means for
generating direction-specific audio data, and the identification means includes the directionspecific audio data generated by the direction-specific audio data generation means and the
verification data stored in the verification data storage means. The direction of the sound source
may be specified based on the degree of coincidence between the two.
[0009]
Further, in the above aspect, from the voice data for each of the plurality of microphones, data
generation means for matching that generates voice data corresponding to the direction specified
by the specifying means as data for matching, and voice data for each of the microphones And
direction-specific audio data generation means for generating direction-specific audio data
corresponding to each of the directions estimated by the estimation means, and the detection
04-05-2019
3
means includes the directions generated by the direction-specific audio data generation means.
Each of the separate audio data may be collated with the collation data generated by the collation
data generation means, and transition of the direction of the sound source may be detected based
on the degree of coincidence.
In the aspect described above, the specifying unit may specify the direction of the sound source
according to a signal output from the operating unit.
[0010]
In the above aspect, the matching data may include feature information of a specific individual's
voice.
Further, in the above-described aspect, the direction-specific audio data generation means
performs mixing so as to increase sound pressure from each direction from each direction
estimated by the estimation means to generate direction-specific audio data May be
[0011]
In the above aspect, the direction-specific audio data generation unit may generate directionspecific audio data by estimating audio data corresponding to a sound source from the audio
data using independent component analysis.
In the above aspect, the estimation means may estimate the direction of the sound source using
independent component analysis.
[0012]
According to the present invention, it is possible to track and shoot a specific subject with a
simpler configuration as compared with the conventional case.
[0013]
04-05-2019
4
Hereinafter, embodiments of the present invention will be described with reference to the
drawings.
<A: Configuration> FIG. 1 is a block diagram showing a hardware configuration of an imaging
device 1 according to an embodiment of the present invention, and FIG. 2 is a perspective view
showing an appearance of the imaging device 1. The photographing device 1 is a device having a
function of photographing a still image or a moving image, and is, for example, a digital camera.
In FIG. 1, the control unit 11 includes a central processing unit (CPU), a read only memory
(ROM), and a random access memory (RAM), and reads and executes a computer program stored
in the ROM or the storage unit 12. Thus, each unit of the photographing apparatus 1 is
controlled via the bus BUS. The storage unit 12 is a storage unit for storing a computer program
executed by the control unit 11 and data used at the time of the execution, and is, for example, a
hard disk device. The display unit 13 includes a liquid crystal panel or the like, and displays
various images under the control of the control unit 11. The operation unit 14 outputs, to the
control unit 11, a signal corresponding to the operation of the user of the imaging device 1. The
operation unit 14 includes various buttons such as a cross key (not shown), a recording button
B1 for starting and ending recording, and a shooting button B2 for starting and ending shooting
of a still image and shooting of a moving image. The user of the photographing apparatus 1 can
perform various operations such as photographing of a still image and photographing of a
moving image by pressing these buttons. The switching between shooting of a still image and
shooting of a moving image can be switched by a switch (not shown) provided in the shooting
device 1.
[0014]
The photographing unit 18 includes a photographing lens 18 a and the like, photographs, and
outputs video data representing a photographed image. The imaging unit 18 can change the
imaging range by moving the imaging lens 18 a back and forth. The user of the photographing
apparatus 1 can set the photographing range of the photographing unit 18 using the cross key of
the operation unit 14 or the like, and the photographing unit 18 for photographing according to
the signal from the operation unit 14 The lens 18a is moved to set a shooting range. The video
data according to the present embodiment includes data representing a still image and data
representing a moving image.
[0015]
04-05-2019
5
The microphone array MA is configured by arranging a plurality of microphones 151, 152, ...,
15n (n is a natural number of 2 or more) in a row. As shown in FIG. 2, a plurality of microphones
151, 152,..., 15n are arranged in a row on the front surface of the photographing device 1 (the
same surface as the surface on which the photographing lens 18a is provided) The microphones
151 152 152,... Preferably, the plurality of microphones 151, 152,..., 15n are directional
microphones. In the following description, when it is not necessary to distinguish the
microphones 151, 152,..., 15n, these will be referred to as "microphones 15". The microphone 15
is a sound collecting means that picks up and outputs an analog signal representing the collected
sound. The audio processing unit 16 A / D converts an analog signal output from the microphone
15 to generate digital data. Further, the audio processing unit 16 D / A converts audio data in
digital format under control of the control unit 11 to generate an analog signal, and outputs the
generated analog signal to the speaker 17. The speaker 17 is a sound emitting means that emits
sound at an intensity corresponding to the analog signal supplied from the audio processing unit
16.
[0016]
In this embodiment, although the case where the microphone 15 and the speaker 17 are
included in the photographing apparatus 1 will be described, the audio processing unit 16 is
provided with an input terminal and an output terminal, and the input terminal and the output
terminal are provided via the audio cable. An external microphone may be connected, and
similarly, an external speaker may be connected to the output terminal via an audio cable.
Further, in this embodiment, the case where the audio signal input from the microphone 15 to
the audio processing unit 16 and the audio signal output from the audio processing unit 16 to
the speaker 17 are analog audio signals will be described. You may input and output. In such a
case, there is no need to perform A / D conversion or D / A conversion in the audio processing
unit 16. The same applies to the display unit 13, the operation unit 14, and the photographing
unit 18. The display unit 13, the operation unit 14, and the photographing unit 18 may be
incorporated in the photographing apparatus 1 or externally attached.
[0017]
As illustrated, the storage unit 12 has a collation data storage area 121 and a moving image data
storage area 122. The collation data storage area 121 stores collation data representing
characteristics (frequency characteristics and the like) of the voice of a specific person recorded
in advance. The collation data is data used when the control unit 11 performs collation
04-05-2019
6
processing described later. In the following description, for convenience of explanation, the voice
represented by the matching data stored in the matching data storage area 121 will be referred
to as “specific voice”. The moving image data storage area 122 stores moving image data
including the video data output from the imaging unit 18 and audio data representing the sound
collected by the microphone array MA. When photographing is performed by the photographing
button B2 of the operation unit 14 being operated by the photographer, the control unit 11
controls the video data output from the photographing unit 18 and the audio data representing
the audio collected by the microphone array MA. Is stored in the moving image data storage area
122.
[0018]
The imaging device 1 is attached to a rotation mechanism 70 as shown in FIG. The rotating
mechanism 70 is installed on a table 61 such as a desk, and the photographing device 1 can be
rotated in the arrow P direction by the rotating device 70. FIG. 3 is a view showing an example of
the configuration of the turning mechanism 70. As shown in FIG. The pivoting mechanism 70 has
a pivoting portion 71 and a fixing portion 72 as illustrated. The pivoting unit 71 is fixed to the
imaging device 1. The fixed portion 72 is provided with a shaft 721, and the shaft 721 is inserted
into the bearing 711 of the rotating portion 71, and the rotating portion 71 is rotatably
supported by the shaft 721. The fixed portion 72 is provided with a drive gear 722 and a motor
723 for rotating the drive gear 722. The motor 723 rotates the drive gear 722 under the control
of the control unit 11. As the drive gear 722 rotates, the passive gear 712 engaged with the drive
gear 722 is rotated, whereby the rotation portion 71 rotates about the shaft 721 by the drive of
the motor 723.
[0019]
<B: Operation> <B-1: Data Registration Operation for Matching> Next, the operation of this
embodiment will be described. First, the user of the imaging device 1 operates the operation unit
14 to perform an operation for registering the verification data. When the photographer presses
the record button B1 to start recording, the operation unit 14 outputs an operation signal
according to the operated content, and the control unit 11 responds to the signal supplied from
the operation unit 14 The voice processing unit 16 is controlled to start recording. The voice of
the user is collected by the microphone 15, converted into a voice signal, and output to the voice
processing unit 16. The audio processing unit 16 converts an audio signal output from the
microphone 15 into digital data (hereinafter referred to as “audio data”). The control unit 11
performs predetermined filtering processing or the like on the audio data output from the audio
04-05-2019
7
processing unit 16 to generate feature data representing an audio feature from the audio data,
and uses the generated feature data as verification data for verification It is stored in the data
storage area 121. When the user presses the recording button B1 to perform an operation to end
the recording, the control unit 11 ends the recording according to the signal supplied from the
operation unit 14.
[0020]
<B-2: Shooting Operation> Next, a shooting operation performed by the shooting apparatus 1 will
be described. The photographing apparatus 1 can switch three modes of a moving image
photographing mode, a still image photographing mode, and an automatic photographing mode.
The user can switch the imaging mode by operating the operation unit 14 of the imaging device
1. The control unit 11 performs imaging processing of the selected mode according to the signal
output from the operation unit 14. Hereinafter, an operation when the automatic shooting mode
is selected will be described. The operation of moving image shooting and still image shooting is
the same as the operation of the conventional imaging device, and the detailed description
thereof is omitted here.
[0021]
FIG. 4 is a flowchart showing the flow of the imaging process performed by the imaging device 1.
When the photographer turns on the power of the photographing apparatus 1 and selects the
automatic photographing mode, the process shown in FIG. 4 is disclosed. When the automatic
shooting mode is selected, the control unit 11 starts shooting of a moving image. The
microphone 15 converts the collected voice into a voice signal and outputs the voice signal to the
voice processing unit 16. The audio processing unit 16 converts an audio signal output from the
microphone 15 into audio data. The control unit 11 mixes audio data corresponding to each of
the plurality of microphones 15 to generate overall audio data representing an overall audio, and
generates the generated overall audio data and the video data output from the imaging unit 18.
Are stored in the moving image data storage area 122 as moving image data.
[0022]
Further, the control unit 11 analyzes voice data for each of the microphones 15, and estimates
the direction of the sound source (hereinafter, "sound source direction") according to the analysis
04-05-2019
8
result (step S1). Here, the control unit 11 detects the sound pressure of the sound signal output
from each of the plurality of microphones 15, and calculates and calculates the distribution of
the sound pressure based on the detected sound pressure correlation for each of the
microphones 15 The direction in which the peak of the sound pressure appears in the
distribution is estimated as the direction of the sound source. An example of the specific content
of this estimation process will be described with reference to FIG.
[0023]
FIG. 5 is a diagram showing an example of the distribution of the sound pressure calculated by
the control unit 11. In the figure, the horizontal axis indicates the angle with respect to the center
direction of the microphone array MA, and the vertical axis indicates the sound pressure. The
time until the sound wave generated by a certain sound source reaches each of the plurality of
microphones 15 differs depending on the direction (angle) of the sound source viewed from the
imaging device 1. Using this principle, in this operation example, the delay time according to the
angle is set in advance for each microphone 15 for each angle of a predetermined unit amount,
and the control unit 11 The sound data corresponding to each angle is calculated by delaying the
delay time according to the microphone 15 and mixing the delayed audio data of each
microphone 15. Next, the control unit 11 detects one or more angles at which peaks appear in
the calculated sound pressure for each angle (that is, the distribution of the sound pressure), and
sets the detected angles as the direction of the sound source. In the example illustrated in FIG. 5,
the control unit 11 estimates the angles θ1, θ2, and θ3 at which the peak of the sound
pressure appears as the sound source direction.
[0024]
Next, the control unit 11 specifies at least one of the estimated sound source directions as a
direction in which a specific subject is present (hereinafter, referred to as a “specific
direction”). In this operation example, the control unit 11 first mixes audio data for each of the
plurality of microphones 15 at an angle of a predetermined unit amount so that the sound
pressure of the audio from each angle is high, and for each angle Voice data (hereinafter referred
to as "direction-specific voice data") is generated (step S2). Next, the control unit 11 performs
predetermined filter processing or the like on the generated directional voice data to generate
feature data representing a voice feature, and the generated feature data is compared in the
matching data storage area 121. It collates with the data, and specifies the direction in which the
degree of coincidence is the highest as the specific direction (step S3).
04-05-2019
9
[0025]
When the specific direction is specified, the control unit 11 controls the motor 723 of the
rotation mechanism 70 to change the direction of the photographing device 1 so that the specific
direction is included in the photographing range of the photographing device 1 (step S4). . At this
time, the control unit 11 may change the direction of the imaging device 1 so that the central
direction of the imaging range of the imaging device 1 matches the specific direction.
[0026]
The control unit 11 outputs moving image data including the video data output from the imaging
unit 18 and the audio data representing the sound collected by the microphone 15 to the moving
image data storage area 122 (step S5). Next, the control unit 11 determines whether to end the
shooting (step S6), and when the determination result is affirmative (step S6; YES), the shooting
is ended (step S7). On the other hand, when the determination result is negative (step S6; NO),
the control unit 11 continues the imaging.
[0027]
The control unit 11 analyzes voice data for each of the microphones 15 during shooting, and
detects a transition in a specific direction according to the analysis result (step S8). In this
operation example, the control unit 11 calculates the distribution of the sound pressure with
respect to the direction based on the correlation of the sound pressure of the sound data of each
microphone 15, and the direction in which the peak of the sound pressure appears in the
calculated distribution is a predetermined unit It detects every time (for example, 10 ms etc.).
And control part 11 detects transition of a peak of sound pressure, and detects transition of a
specific direction according to a detection result. As an aspect of the detection of the transition in
the specific direction, for example, the control unit 11 detects the peak of the sound pressure,
and the difference between the detected peak angle and the previously detected peak angle is
equal to or less than a predetermined threshold. If there is, it is determined that the sound source
at the previous peak position has moved. Specifically, for example, in the example shown in FIG.
5, in the case where the angle θ2 is specified as the specific direction, the sound pressure
distribution after the predetermined unit time in the state shown in FIG. Suppose that a transition
is made. At this time, when the difference between the angle θ21 shown in FIG. 6 and the angle
θ2 shown in FIG. 5 is equal to or less than a predetermined threshold, the control unit 11 is in
04-05-2019
10
the direction of the angle θ2 at the time shown in FIG. It is determined that the sound source
has moved in the direction of the angle θ21, and the angle θ21 is detected as a specific
direction.
[0028]
As described above, the control unit 11 detects the peak of the sound pressure every
predetermined unit time, and detects the movement in the specific direction according to the
difference between the detected peak direction and the specific direction. The control unit 11
sequentially detects this movement over the imaging period, and controls the rotation
mechanism 70 according to the detection result to track and change the direction of the imaging
device 1 in a specific direction. As a result, even when the specific subject moves, it is possible to
track the direction of the specific subject and perform shooting.
[0029]
As described above, the imaging apparatus 1 changes the imaging range of the imaging unit 18
by tracking the transition of the specific direction, generates the video data of the video within
the imaging range, and generates the entire audio within the imaging range. The audio data to be
represented is generated, and the moving image data including these data is stored in the moving
image data storage area 122.
[0030]
<C: Effects of the Embodiment> As described above, according to the present embodiment, the
control unit 11 estimates a plurality of directions of the subject (sound source), and a sound
source desired by the user from the estimated plurality of sound source directions. The
orientation of the imaging device 1 is changed so that the identified direction is included in the
imaging range.
This allows the photographer to shoot while tracking a specific subject (for example, his / her
family, his / her favorite bird, etc.).
[0031]
Further, in the present embodiment, since the control unit 11 detects the transition of the
04-05-2019
11
direction of the sound source, the photographer thus desires while tracking the movement of the
subject to be photographed, even if the subject has moved. The subject can be included in the
shooting range for shooting.
[0032]
Further, in this embodiment, since the sound source direction is specified using the microphone
array MA, it is not necessary to attach a transmitter to the subject or separately provide a
receiver for receiving a signal from the transmitter to the photographing apparatus 1. It is
possible to track and shoot a specific subject with a configuration simpler than in the prior art.
[0033]
Further, according to the present embodiment, the distribution of the sound pressure with
respect to the direction is calculated based on the correlation of the sound pressure for each of
the microphones 15, and the direction in which the peak of the sound pressure appears in the
calculated distribution is estimated as the direction of the sound source. .
As described above, since the direction of the sound source is estimated by the distribution of
sound pressure, the direction of the sound source can be estimated without performing
complicated processing.
Further, the processing time required for the estimation process of the sound source direction
can be shortened.
[0034]
Further, according to the present embodiment, direction-specific audio data is generated for each
angle of a predetermined unit amount, and the generated direction-specific audio data is
compared with the verification data stored in the verification data storage area 121. Identify the
direction based on the degree of match. That is, since the photographing apparatus 1 tracks and
records the registered subject only by registering the voice of the desired subject in the
photographing apparatus 1, the photographer moves the desired subject. There is no need to
perform complicated operations such as moving the photographing device 1 or changing the
direction of the photographing device 1 each time.
04-05-2019
12
[0035]
By the way, there are cases where the photographer can not easily identify the position of the
subject he / she wants to shoot. For example, it is difficult to identify where the subject (semi,
bird, etc.) is located, for example, when you want to shoot a semi that is standing on a tree and is
singing, or when you want to shoot a specific wild bird in a forest In many cases, even if the
subject is found, the subject moves immediately and often loses sight of the subject again. Also,
for example, in children's sports and arts events, it is often difficult for a photographer to find out
of a large number of children that they want to shoot, and it is an important shutter chance while
searching for their own children. Could have missed it. In the prior art, there is a method of
recognizing a subject by image analysis at the start of shooting, but it is difficult to recognize the
subject by image analysis (semi, birds, children in children wearing the same gym clothes, etc.)
Could not be applied. On the other hand, in this embodiment, the voice of the subject is collated
to specify the direction of the subject. Thus, even when the subject is difficult to visually
recognize, the subject can be tracked and photographed. Note that the frequency characteristic of
the microphone 15 may be in a range exceeding the human audible range, and is also applicable
to, for example, ultrasonic waves.
[0036]
<D: Modified Example> Although the embodiment of the present invention has been described
above, the present invention is not limited to the above-described embodiment, and can be
implemented in other various forms. An example is shown below. In addition, you may combine
the following each aspect suitably. (1) The estimation of the sound source direction may use
Independent Component Analysis. In the independent component analysis, each signal from a
plurality of signal sources is mixed in space and arrives at a plurality of sensors, and from the
arrival signals observed by these sensors, the estimation of the direction of arrival of each source
signal and each source signal The separation is performed without knowing the information of
the mixed system of the source signals, and is described, for example, in the background art of
Japanese Patent No. 3881367 (Patent Document 3). Further, a technique for obtaining the arrival
direction of the signal source described in Patent Document 3 may be used.
[0037]
04-05-2019
13
(2) The generation method of the directional voice data or the specific directional voice data is
not limited to the method described in the above embodiment, and voice data corresponding to
the sound source from the voice data of the microphone 15 using the aforementioned
independent component analysis You may obtain | require by estimating. Further, the technology
described in Patent Document 3 may be used.
[0038]
(3) The feature of the voice stored in the verification data may be a feature of the voice of the
individual (voiceprint, etc.). The control unit 11 may identify the direction of the sound source by
analyzing the directional voice data and determining whether or not the voice feature of a
specific individual is detected.
[0039]
(4) In the above-described embodiment, the control unit 11 changes the direction of the
photographing apparatus 1 so that the specific direction is included in the photographing range,
but in this case, the photographer is notified of the specific direction. You may do so. Specifically,
for example, the photographing apparatus 1 may output a voice message for guiding the specific
direction to the photographing company, or, for example, a message for notifying the specific
direction on the display unit 13 may be displayed. It is also good. Also, for example, a vibrator
that rotates in the horizontal direction is provided inside the imaging device 1, and the vibrator is
rotated in a direction that rotates in the specific direction from the center direction of the
imaging range of the imaging device 1 to vibrate in a specific direction. It is also possible to
notify the photographer by
[0040]
(5) In the above-described embodiment, the control unit 11 drives the motor 723 to change the
direction of the photographing apparatus 1. However, the mode of changing the photographing
range is not limited to this. For example, FIG. As described above, the imaging apparatus 1A is
provided with a rotation mechanism that rotates the imaging unit 18A including the imaging lens
18b, the CCD 18c, and the like in the P direction in the figure, and the imaging range is changed
by rotating the imaging unit 18A. You may do so. In the example shown in FIG. 7, the rotation
unit 75 is rotatably supported by the photographing apparatus 1A by the rollers 76a, 76b, and
04-05-2019
14
76c, and the photographing unit 18A is fixed to the rotation unit 75. Along with the rotation of
the rotation unit 75, the imaging unit 18A rotates. The motor 77 rotates the roller 78 under the
control of the control unit 11, and the rotation unit 75 rotates as the roller 78 rotates. The
configuration for rotating the imaging unit 18A is not limited to this, and the imaging unit may
be rotated using another rotation mechanism. As described above, as a mode of changing the
photographing range of the photographing apparatus, the photographing apparatus main body
may be rotated, or the photographing mechanism may be rotated, and the control unit 11 is
important. The imaging device 1 may be controlled to change the imaging range so that the
direction determined to have sound generation is included in the imaging range.
[0041]
(6) In the above embodiment, the control unit 11 detects the movement of the subject by
detecting the transition of the direction in which the peak of the sound pressure appears in the
distribution of the sound pressure, but instead of this, the control unit 11 However, for each
direction of a predetermined unit amount, mixing is performed so that the sound pressure of
voice from each direction is high, and direction-specific sound data is generated, and each
generated direction-specific sound data is stored as collating data The matching may be
performed, and the transition of the direction of the subject (sound source) may be detected
based on the degree of coincidence. At this time, as the collation data, the same one as the
collation data stored in the collation data storage area 121 of the above-described embodiment
may be used. That is, feature data obtained by subjecting voice data to filtering processing to
extract voice features is used as matching data, and control unit 11 performs voice processing on
voice data for each direction to extract and extract voice features. The feature data representing
the feature may be compared with the data for matching, and the transition of the direction of
the specific subject may be detected based on the matching degree.
[0042]
(7) In the above embodiment, the control unit 11 collates the voice data for each sound source
direction with the collation data stored in the collation data storage area 121, and identifies the
specific direction based on the degree of coincidence. I did it. Instead of this, the photographer
may visually recognize the position of the subject displayed on the display unit 13 and input the
direction of the subject to be shot by operating the operation unit 14. Specifically, for example,
after the control unit 11 completes the sound source direction estimation process shown in step
S1 of FIG. 4, the photographer is displayed on the display unit 13 by displaying the sound source
direction estimated by the sound source direction estimation process. , And the photographer
04-05-2019
15
may select one of the sound source directions displayed on the display unit 13 using the
operation unit 14. In this case, the control unit 11 specifies the sound source direction selected
by the photographer as the specific direction among the estimated sound source directions
according to the operation signal from the operation unit 14.
[0043]
As described above, the control unit 11 may specify the specific direction from the plurality of
sound source directions by analyzing the audio data for each sound source direction, and the
specific direction according to the operation signal from the operation unit 14 In other words,
the control unit 11 may specify at least one of the estimated sound source directions. Moreover,
in the above-mentioned embodiment, although the control part 11 specified one specific
direction, you may make it specify multiple specific directions.
[0044]
Further, as a method of specifying a specific direction, for example, a photographer may be able
to select an arbitrary direction. In this case, the photographer uses the operation unit 14 to
specify the direction in which the user wishes to collect sound, and the control unit 11 takes the
designated direction as the specific direction in accordance with the signal from the operation
unit 14 Specific voice data representing voice from a direction may be generated. Specifically, for
example, in the case where a peak can not be detected in the direction of the subject because the
ambient noise is large, the photographing apparatus 1 more preferably selects the direction in
which the photographer wishes to collect sound. Specific direction audio data can be generated.
[0045]
Further, the photographer uses the operation unit 14 to select a mode for specifying the
direction in which the photographer wants to pick up sound and a mode in which the
photographing apparatus 1 as described in the above embodiment automatically detects the
specific direction. It may be possible. In this case, the control unit 11 performs a specific
direction identification process, a specific direction audio data generation process, and the like
according to the selected mode in accordance with the operation signal from the operation unit
14.
04-05-2019
16
[0046]
(8) In the embodiment described above, the imaging apparatus 1 including the microphone array
in which the plurality of microphones 15 are arranged in a row as illustrated in FIG. 2 has been
described. For example, the microphone array may be configured to have the microphones 15
arranged in a planar shape (two-dimensional shape). Also, for example, as shown in FIG. 8, a
microphone array configured to be three-dimensionally arranged may be provided on the front
surface and the side surface of the imaging device 1. In this case, the imaging device 1A can
estimate not only the angle in the x-axis direction (see FIG. 8) but also the angles in the y-axis
direction and the z-axis direction (see FIG. 8) as the angle of the sound source. The direction of
the sound source can be estimated in three dimensions, and more detailed directions can be
estimated. Further, in this case, the sound source can be detected in a wider range.
[0047]
In addition, in the case of using an imaging apparatus provided with a microphone array in which
microphones are arranged in two or three dimensions, in addition to rotating the imaging
apparatus in the horizontal direction (the arrow P direction in FIG. 8) It is good also as
composition provided with rotation mechanism 70A made to rotate in the (arrow Q direction in
Drawing 8).
[0048]
Further, as the microphone 15 in the above-described embodiment, a small silicon microphone
may be used.
In addition, a small silicon microphone capable of picking up ultrasonic waves in addition to
normal sound waves is used, and an ultrasonic tag for transmitting ultrasonic waves is attached
to a subject, and the ultrasonic waves transmitted from the ultrasonic tags are silicon The
microphone may receive, and the control unit 11 may specify the direction of the subject based
on the ultrasonic wave received by the silicon microphone. In this case, although it is necessary
to attach an ultrasonic tag to the subject, the microphone array for recording can be used as a
receiver of the ultrasonic wave, so that the direction of the subject can be specified with a simple
device configuration. it can.
04-05-2019
17
[0049]
(9) In the above embodiment, the storage unit 12 such as a hard disk drive is used as a storage
unit for storing moving image data, but the storage unit for storing moving image data is not
limited to a hard disk drive. It may be a recording medium such as R, CD-R / W, etc. In short, the
control unit 11 may record moving image data on a recording medium readable by a computer.
In addition, the control unit 11 may output moving image data to a predetermined server
apparatus via a communication network.
[0050]
(10) In the above embodiment, the imaging device 1 is configured to execute all the processes
according to the above embodiment. On the other hand, the processing according to the above
embodiment is shared and executed by two or more devices connected by a communication
network, communication I / F or the like, and a system including the plurality of devices is the
imaging device of the same embodiment One may be realized. Specifically, for example, a digital
camera and a computer device may be configured as a system connected via a communication I /
F such as USB.
[0051]
(11) In the above embodiment, the control unit 11 of the imaging device 1 calculates the sound
pressure distribution and estimates the angle at which the peak value appears as the sound
source direction. The method of estimating the sound source direction is not limited to this, and
for example, the sound pressure may be detected at every angle of a predetermined unit amount,
and an angle at which the detected sound pressure is equal to or more than a predetermined
threshold may be detected as the sound source direction The point is that the control unit 11
may detect the sound pressure of the sound data output from the microphone 15 at each angle
of a predetermined unit amount, and estimate the sound source direction from the sound
pressure at each detected angle. .
[0052]
In the above embodiment, the sound source direction is estimated based on the sound pressure
04-05-2019
18
of the sound data. However, the present invention is not limited to this. The frequency
characteristic of the sound data for each direction is detected, and the sound source direction is
detected based on the detected frequency characteristic. It may be estimated. As described above,
the sound source direction may be detected based on the sound pressure of the audio data, or
may be detected based on the frequency, and the point is that the control unit 11 is outputted
from the microphone 15 What is necessary is to perform speech analysis of speech data and
estimate the sound source direction according to the analysis result.
[0053]
Further, the control unit 11 performs image extraction of the image data output from the
imaging unit 18 and performs person extraction (or face extraction) processing, and estimates a
direction corresponding to the position of the extracted person (or face) as a sound source
direction. You may do so. Further, the sound source direction may be estimated by using the
above-described voice analysis and this image analysis in combination. As described above, by
using the image analysis result in addition to the speech analysis result, the accuracy of the
sound source estimation process can be increased.
[0054]
(12) In the above embodiment, the control unit 11 collates the feature data representing the
feature of the voice data representing the voice from the sound source direction with the
matching data stored in the matching data storage area 121, and Although the specific direction
is specified based on the degree of coincidence, the method of specifying the specific direction is
not limited thereto. For example, data representing an image of a specific subject is stored in
advance in the collation data storage area 121 as collation data. The control unit 11 analyzes the
image data output from the photographing unit 18 and performs person extraction (or face
extraction) processing according to the analysis result, and the image data of the extracted
person (or face) The collation direction stored in the collation data storage area 121 may be
collated, and the specific direction may be specified based on the degree of coincidence.
[0055]
Further, in the above-described embodiment, the control unit 11 performs, for example, filtering
processing on voice data representing voice collected by the microphone 15 to generate feature
data representing voice features, and generates the generated feature data as verification data.
However, the present invention is not limited to this, and voice data representing voice collected
by the microphone 15 may be used as it is as data for comparison.
04-05-2019
19
[0056]
(13) Further, in the above-described embodiment, the control unit 11 may control the imaging
unit 18 so as to focus on the sound source (subject).
In this case, for example, the control unit 11 analyzes voice data representing a voice collected
by a plurality of different microphones 15 (for example, the microphone 151 and the
microphone 15n shown in FIG. 2). The time difference to arrive at may be calculated, the distance
between the imaging device 1 and the sound source may be calculated using the calculated time
difference, and focus control may be performed according to the calculation result.
[0057]
In addition, the photographing apparatus 1 may change the direction of the camera to the
direction of the sound source, and may automatically perform zoom-in or zoom-out according to
the approach or distance of the subject.
In this case, the control unit 11 calculates the distance between the imaging device 1 and the
sound source by the above-described processing, and automatically zooms up (for example, the
face of a person is up) according to the calculated distance. It may be controlled to
[0058]
(14) In the above embodiment, an example in which the photographing apparatus according to
the present invention is applied to a digital camera has been described, but the apparatus to
which the photographing apparatus according to the present invention is applied is not limited to
digital cameras. It may be a portable communication terminal, a computer game machine or the
like, and the imaging device according to the present invention is applicable to various devices.
[0059]
(15) The program executed by the control unit 11 of the photographing apparatus 1 in the above
embodiment can be read by a computer such as a magnetic tape, a magnetic disk, a flexible disk,
04-05-2019
20
an optical recording medium, a magneto-optical recording medium, a RAM, a ROM It can provide
in the state recorded on the recording medium.
It is also possible to make the photographing device 1 download via a network such as the
Internet.
[0060]
It is a block diagram showing an example of the hardware constitutions of an imaging device. It is
a perspective view which shows an example of the external appearance of an imaging device. It is
a figure which shows an example of a structure of a rotation mechanism. It is a flowchart which
shows the flow of the imaging | photography process which an imaging device performs. It is a
figure which shows an example of sound pressure distribution which a control part calculates. It
is a figure which shows an example of sound pressure distribution which a control part
calculates. It is a figure which shows an example of a structure of a rotation mechanism. It is a
perspective view which shows an example of the external appearance of an imaging device.
Explanation of sign
[0061]
DESCRIPTION OF SYMBOLS 1 ... Imaging device, 11 ... Control part, 12 ... Storage part, 13 ...
Display part, 14 ... Operation part, 15 ... Microphone, 16 ... Sound processing part, 17 ... Speaker,
18 ... Photography part, 121 ... Data storage for collation Area, 122: moving image data storage
area.
04-05-2019
21
Документ
Категория
Без категории
Просмотров
0
Размер файла
36 Кб
Теги
jp2009239346
1/--страниц
Пожаловаться на содержимое документа