close

Вход

Забыли?

вход по аккаунту

?

JP2018536365

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018536365
Abstract: Methods and systems are provided for refocusing an image captured by a plenoptic
camera. In one embodiment, the plenoptic camera works with an audio capture device. The
method comprises the steps of determining the direction of the dominant audio source
associated with the image; causing the audio zoom by filtering out all audio signals other than
those associated with the dominant audio source. And performing automatic refocusing of the
image based on the induced audio zoom. [Selected figure] Figure 1
Method of refocusing an image captured by a plenoptic camera and audio based refocusing
imaging system
[0001]
The present disclosure relates generally to digital recording and photography techniques, and
more particularly to digital recording and photography techniques with plenoptic cameras that
make use of audio-based selection of focal plane and depth.
[0002]
Photography technology creates durable images by recording light or other electromagnetic
radiation.
Images are captured electronically using an image sensor or chemically using a photosensitive
material. Typically, a lens is used to focus light reflected or emitted from the object in the camera
03-05-2019
1
during the exposure time to a real image on the photosensitive surface. With an electronic image
sensor, charge is generated at each pixel and the charge is processed and stored in a digital
image file for further use. In classical photographic techniques, the focal surface is approximately
a plane or focal plane. The focal surface is perpendicular to the optical axis of the camera and the
depth of field is constant along its plane. The captured image is a basic configuration as the
captured image is limited within the range due to these rules for the depth of field focal surface.
In contrast, light field or plenoptic cameras exhibit more complex configurations.
[0003]
The plenoptic camera utilizes a micro lens array located in the image plane of the main lens and
in front of the photo sensor array, one micro image (one for the array of photosensors) Are also
projected). Thus, each microimage describes an area of the captured scene, and each pixel
associated with that microimage shows that area in terms of a sub-aperture position in the exit
pupil of the main lens. . The raw images of the scene are then obtained as a result of the
summation of all the micro images obtained from the individual parts of the photo sensor array.
This raw image contains the light field angle information. Theoretically and computationally, the
plenoptic camera presents the possibility of proposing superior image capture, taking advantage
of complex configurations not available when using classical cameras. However,
disadvantageously, there are many practical drawbacks that limit the prior art, given the
feasibility that can be realized using a plenoptic camera. These limitations are exacerbated when
attempting to capture video content.
[0004]
Methods and systems are provided for refocusing an image captured by a plenoptic camera. In
one embodiment, the plenoptic camera works with an audio capture device. The method
comprises the steps of determining the direction of the dominant audio source associated with
the image; by filtering out all audio signals other than those associated with the dominant audio
source Generating an audio zoom; and performing an automatic refocusing of the image based on
the induced audio zoom.
[0005]
In another embodiment, an audio based refocusing imaging system is provided, the system
comprising: a plenoptic video camera for capturing video images; and an audio capture for
capturing audio associated with the captured images Based on the means, the means for
03-05-2019
2
determining the dominant audio source, the means for performing audio signal analysis to
determine the direction of the dominant audio source, and the direction of the dominant audio
source By means of beamforming on the audio scene of interest in order to filter out certain
audio scenes and to selectively filter out all audio signals other than those associated with the
dominant audio source , Means to trigger audio zoom, And a means for effecting automatic
refocusing of the image based on the awakened audio zoom.
[0006]
Additional features and advantages are realized through the techniques of the present invention.
Other embodiments and aspects of the invention are described in detail herein and are
considered as part of the claimed invention. For a further understanding of the invention,
together with advantages and features, reference is made to the detailed description and
drawings.
[0007]
The invention will be further understood and described using, but not limited to, the following
embodiments and examples, with reference to the accompanying drawings.
[0008]
FIG. 6 is a flowchart illustrating steps for performing an automatic refocusing function according
to one embodiment.
[0009]
FIG. 2 is a block diagram of a system used in performing processing steps in accordance with the
embodiment of FIG. 1;
FIG. 2 is a block diagram of a system used in performing processing steps in accordance with the
embodiment of FIG. 1;
03-05-2019
3
[0010]
FIG. 7 is a block diagram illustrating the dominant audio direction and estimated width of the
area of interest according to one embodiment.
[0011]
FIG. 6 schematically illustrates a video cone of interest according to one embodiment.
[0012]
FIG. 7 illustrates another system embodiment having a distributed microphone array
configuration.
FIG. 7 illustrates another system embodiment having a distributed microphone array
configuration.
[0013]
FIG. 3 is a block diagram illustration of the beamforming step of the embodiment of FIG. 1;
[0014]
FIG. 5 depicts a beamforming function such as that implemented in accordance with the system
as described in connection with FIGS.
[0015]
FIG. 7 depicts a beamforming function such as that performed in accordance with the systems
described in connection with FIGS. 5 and 6;
[0016]
FIG. 7 is a block diagram of a system having a distributed microphone array according to another
embodiment.
[0017]
03-05-2019
4
In FIGS. 1-10, the blocks represented are purely functional entities and do not necessarily
correspond to physically separate entities.
That is, the blocks can be implemented in software or hardware, or can be implemented in one or
several integrated circuits with one or more processors.
[0018]
Wherever possible, the same reference numbers will be used throughout the drawings to refer to
the same or like parts.
[0019]
While the drawings and the description of the present invention have been simplified to show the
elements related to a clear understanding of the present invention, they can be found in general
digital multimedia content delivery methods and systems for the sake of simplicity. It should be
understood that many other factors have been excluded.
However, as such elements are well known in the art, a detailed discussion of such elements is
not provided herein.
The disclosure herein is directed to all such variations and modifications.
[0020]
FIG. 1 is a flowchart representation of one embodiment illustrating a method for automatic
refocusing of an image utilizing one or more plenoptic cameras.
In one embodiment, audio components can be utilized to calculate the appropriate focal plane
and depth of field of the image.
03-05-2019
5
One such technique is discussed using the method of FIG.
The steps of the embodiment of FIG. 1 are described in connection with the system configuration
of the embodiments provided in FIGS. 2-10 to facilitate understanding.
[0021]
In classical photography, the focal surface is the plane perpendicular to the optical axis of the
camera.
When taking a still image when using a plenoptic camera, the user's interaction remains at the
basic level, so similar refocus properties can be used. This is not the case for video capture and
live image streaming using a plenoptic camera, since more complex operations are then required.
Scenes and images captured by an array of lenses in a plenoptic camera are captured from
different angles, and there are different options to select the degree of sharpness of different
images in the scene So, focusing various scene and image properties can be difficult. While it
may be desirable to utilize automatic refocusing techniques, it is difficult to do so while the focal
plane is perpendicular to the optical axis. The reason is that in many cases the focal plane can
not remain perpendicular to the optical axis, especially during constantly changing video or live
stream broadcasts. Other examples can easily be imagined. For example, consider the case where
an "all-in-focus" mode is used. In this case, the captured scene produces an image that must
remain intentionally sharp regardless of distance. This can mean infinite depth of field and any
focal plane that is not perpendicular to the optical axis. In another example, an "interactive focus"
field is used, which allows the user to indicate and select objects of interest. In this case, the
plane of focus needs to be arithmetically positioned at the correct distance to each image. In this
case, the focus is perpendicular to the optical axis only to objects that must be kept in sharp
focus. In similar cases, only close objects may be selected to produce a sharp image. In such
cases, the depth of field is maintained as a small constant number, and all scene elements
separated by a certain distance are calculated differently than close ones. Thus, when objects are
out of focus, they appear to be intentionally blurred. In yet another example, the camera is
positioned such that the focal plane is tilted so that the focal plane is not perpendicular to the
optical axis.
[0022]
03-05-2019
6
Returning to the embodiment of FIG. 1, the technique used herein is such that, in one
embodiment, an image (hereinafter defined as a still image or video capture) can be optimally
projected: It is possible to apply. Images / video can be provided as a broadcast stream, still
images, recorded images, or selected by the user via a user selection input device, as will be
appreciated by those skilled in the art It is possible. In any case, light field segmentation is
performed accordingly to identify the depth or depth of interest of the area. Then, a focal plane
perpendicular to the optical axis is defined, whereby the area or object of interest remains as
sharp as intended. This technique can be extended to video capture (with or without object
tracking) to ensure temporal consistency between frames. Before discussing the individual steps
as shown in the embodiment of FIG. 1, it may be beneficial to consider a system that can be used
to implement such steps. Figures 2A and 2B show a block diagram of a system according to one
embodiment of the present principles.
[0023]
The systems of FIGS. 2A and 2B can be utilized to apply the techniques provided by the
embodiment of FIG. In FIGS. 2A and 2B, an automatic refocusing system 200 is shown utilizing
one or more plenoptic cameras and associated techniques. In one embodiment, a display 210 is
shown. Display 210 may be used in conjunction with a computer, a television set, a projection
screen, a mobile device (eg, a smart phone, etc.) and others recognized by one of ordinary skill in
the art. It can be of size or shape. In the illustrated example, a large display such as a projection
screen or television display is used to facilitate understanding, but this is merely an example.
[0024]
FIG. 2A can incorporate all of the components of FIG. 2B in one embodiment. Alternatively, FIG.
2B can be configured with various independent components, as described below. To facilitate
understanding, the embodiment of FIG. 2B is described because the individual components are
visually distinguishable and thus easy to refer to.
[0025]
FIG. 2B shows a system 200 having a display 210. In one embodiment, system 200 includes a
light field video capture device 230 (which may be referred to simply as plenoptic camera 230)
based on plenoptic or camera array technology. Audio capture device 220 communicates with
03-05-2019
7
camera 230 during processing. Audio capture device 220 is configured with one or more
distributed microphone arrays. In one embodiment, audio system 220 (ie, a microphone) is
generally calibrated in conjunction with a video capture device or plenoptic camera 230. In the
example shown in FIGS. 2A and 2B, one or more processors indicated by reference numeral 290
are provided. The processor 290 communicates during processing with the display 210, the
plenoptic camera 230 and the audio capture device 220, as shown in dashed lines. 1If more than
one processor is present, multiple processors also communicate with one another during
processing. The processor may be incorporated into various areas within display 210, camera
230, audio capture device 220, or alternatively may be independent as shown in the example of
FIGS. 2A and 2B. Thus, in one embodiment where one or more processors are incorporated into
the camera 230 and / or the audio capture device 220, the camera 230 and the audio capture
device 220 transmit and receive digital data to each other and are processing each other. It is
possible to communicate with Further, the processor can communicate with other computers or
computing environments and networks (not shown).
[0026]
Returning now to FIG. 1, in the step indicated by reference numeral 100, the dominant audio
source (or sound source) associated with the image being displayed is determined. In one
embodiment, the dominant audio source is determined on a continuous basis if the image
appears quickly in the form of an image or video following one immediately after another in a
live or recorded program and / or broadcast. It is possible to In one embodiment, as appreciated
by those skilled in the art, audio processing techniques can be used to derive the dominant audio
direction, for example by an audio source localization algorithm. As indicated by reference
numeral 110, the direction of the dominant audio source is selected such that the direction of
interest for speech is determined. In another embodiment, steps 105 and 110 may be combined
in one step, whereby the audio source localization algorithm is the dominant audio source to be
considered later as the target source to be refocused. Output the direction of
[0027]
Applying steps 105 through 110 of FIG. 1 to the example discussed with the embodiments of
FIGS. 2A and 2B results in a system as shown in FIG.
[0028]
FIG. 3 illustrates an embodiment in which audio processing techniques are used to derive the
03-05-2019
8
audio direction associated with the dominant audio source (ie, step 110 of FIG. 1).
In the embodiment of FIG. 3, a region of interest 340 associated with at least one dominant audio
direction is set. As indicated by reference numeral 302, a dominant audio direction is
determined, and as indicated at 305, the width 340 of the region of interest is also calculated. In
one example, the narrowness of the region of interest with respect to a dominant point source
means that the dominant voice comes from a single direction (eg, a single person is singing ) Is
like. In another example, the width is set larger for more spread or moving sources. This is the
case when the dominant voice comes from several coexisting sources, for example on the stage
as shown on the screen provided in the upper right corner of FIG. 3 as indicated by reference
numeral 320 If you are coming from a music band with several instruments and lead voices.
[0029]
In one example where an audio source localization algorithm is used to further set the audio
direction, audio signal analysis may be performed to provide an approximate estimate of the
angular width of the area of interest. An audio source localization algorithm is used to determine
the angular range of the area of interest, resulting in a "cone of interest" as shown at 410 in FIG.
[0030]
In one embodiment, as shown in FIG. 5, the immersive user experience can be further improved
by utilizing the state of the art distributed microphone array 520. In the embodiment of FIG. 6, a
similar distributed microphone array system 620 can be used to improve automatic audio
localization. In FIG. 6, the dominant audio source as described in step 110 of FIG. 1 is determined
by exploiting both audio and video signals to determine the direction of interest. Similar to FIG. 4,
a video cone 610 of interest is determined, where the video cone is improved due to the
microphone array 620. A localization algorithm is used to determine the audio signal, and the
direction of the dominant audio source is considered as the direction of interest. In one
embodiment, an object tracking algorithm can be used to determine the video signal. In this case,
the direction of the moving object in the scene can potentially be thought of as the direction of
interest associated with the dominant audio source. In one embodiment, audio and video can be
combined to provide a late fusion of candidates, as would be detected alone in the case of audio
and video, accordingly They can be applied in finding an optimized direction of interest.
03-05-2019
9
[0031]
Returning to FIG. 1, the next step referenced by reference numeral 115 is performed. At step
115, an audio scene of interest is determined, and at step 120 an audio focus analysis is
performed on the audio scene. After determining the audio direction of interest, audio beams are
generated in the audio scene in order to perform an audio focus or an audio zoom. In one
embodiment, the audio beam is generated utilizing beamforming, as will be appreciated by those
skilled in the art. Beamforming or spatial filtering is a signal processing technique that utilizes
transmission or reception of directional signals to achieve spatial selectivity. This is achieved by
combining the elements in a phased array fashion in such a way that certain signals experience
constructive interference at certain angles while others experience destructive interference. To
change the directivity of the array, the phase and relative amplitudes of the signals are controlled
to form constructive and / or destructive radio interference patterns. Adaptive beamforming is
used to detect and estimate signals of interest at the output of the sensor array by utilizing
optimal spatial filtering and interference rejection. In this way, audio signals from the
microphone array for only a predetermined signal pattern can be selected for the target source.
Audio focus can form multiple beams simultaneously and can track active speakers. The concept
of beamforming as provided by step 120 is further illustrated by the block diagram of FIG.
[0032]
In one embodiment, as shown in FIG. 7, the width of the audio beam depends on the size and
settings of the microphone array. The target source for producing the sound is illustratively given
as shown at 710 in FIG. Target source 750 produces an acoustic beam as shown at 760. Sound is
received by the microphone array 720 and processed using audio beamforming techniques to
provide an improved target source from both a video and audio standpoint. Noise and other
interference (780) are properly filtered out. The resulting audio beam 760 is shown in the
graphical representation of FIG. 8 in an embodiment similar to that shown in FIGS. 2-4 and in an
embodiment of a distributed microphone array configuration (similar to FIGS. 5 and 6) Shown in
the graphical representation used at reference numeral 960 in FIG.
[0033]
In FIG. 10, another embodiment is used where the audio system has a distributed microphone
array system 1020. A dominant audio direction 940 is determined. Thereafter, as indicated by
reference numeral 1050, it is possible to obtain additional information regarding the depth of
03-05-2019
10
interest by calculating the intersection between the regions of interest, so that FIG. Other steps,
such as beamforming similar to step 120 of FIG.
[0034]
In an alternative embodiment (not shown), it is possible to provide a user interaction system,
wherein the user is (i) from among the possible orientations identified based on the audio, and
Choose a width. Based on this selection, in one embodiment, audio beamforming techniques such
as those described above can be used to aim at the sound coming from a particular selected
direction. The final focal surface and depth of field are selected and rendered according to
direction and width information as before.
[0035]
In the embodiment of FIGS. 8-10, the beamforming output signal is x (t). The output signal
includes sounds from both positions A and B, but the targeted zooming plane may only include
sounds from position B. For this example, the audio signal from the microphone near position B
is exploited, so that the final audio output is given in one embodiment as: xx (t) = alpha * x (t) +
(1-alpha) * y (t) Here, "alpha" is a weighting factor. In this example, as shown in the equation, a
high value of "alpha" means that the audio signal recorded from the local microphone position B
greatly contributes to the final audio focus.
[0036]
Referring back to FIG. 1, the final step 130 is performed. Step 130 is a video refocusing step
driven by the audio source of interest. Scene elements in the cone of interest are rendered in
focus, and the remaining focal surfaces (and depth of field) are automatically estimated in a
meaningful way. This makes it possible to access a new audio based attractive way of
automatically and dynamically selecting the focal plane and the depth of field. In this way, the
audio zoom feature can also be enhanced with a strongly related video focus.
03-05-2019
11
Документ
Категория
Без категории
Просмотров
0
Размер файла
22 Кб
Теги
jp2018536365
1/--страниц
Пожаловаться на содержимое документа