close

Вход

Забыли?

вход по аккаунту

?

JP2018137677

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018137677
Abstract: [Problem] To provide a viewer with desired sound that has been clarified, and to
improve the sense of presence of images and sounds provided to the viewer. An image
information acquisition unit for acquiring image information from an imaging device, a first area
sound emitted from a first region included in a range imaged by the imaging device, and the first
region A control unit that controls mixing of a wide range sound emitted from a wide range, and
the control unit controls the amplification factor of the first region sound or the amplification of
the wide region sound according to the expansion of the first region An information processing
apparatus, which selects a control pattern from a plurality of control patterns different in how
the rate changes, and controls the mixing according to the selected control pattern. [Selected
figure] Figure 2
INFORMATION PROCESSING APPARATUS, METHOD, PROGRAM, AND INFORMATION
PROCESSING SYSTEM
[0001]
The present invention relates to an information processing apparatus, method, program, and
information processing system.
[0002]
BACKGROUND ART In recent years, with the development of information communication
technology, technology for transmitting the situation of a remote place has been researched and
developed.
03-05-2019
1
Specifically, there is a system for distributing video and sound of a remote place. In such a
system, it is desirable to give the viewer a sense of realism as if the viewer were at a remote
location.
[0003]
As a method of giving a sense of reality to the viewer, it is conceivable to suppress noise other
than the sound desired to be viewed. For example, Patent Document 1 discloses an invention
relating to a sound collection device that attenuates the amount of background noise using a
microphone array. It is assumed that the said sound collection apparatus is utilized for a
teleconference system etc.
[0004]
Moreover, in order to give the viewer a sense of reality, it is desirable that the video and the
sound to be delivered match in position. For example, Patent Document 2 discloses an invention
relating to an image monitoring apparatus that acquires an image obtained by shooting a
surveillance area and an audio of the surveillance area and transmits the audio acquired during
transmission of the acquired image. There is.
[0005]
JP 2007-235358 JP JP 2010-233145 JP
[0006]
However, in the prior art represented by the invention described above, there is a problem that
the sense of reality may be insufficient.
For example, in the teleconferencing system using the sound collection device disclosed in Patent
Document 1, only local sound is delivered at all times, and the viewer may feel discomfort.
Further, in the image monitoring device disclosed in Patent Document 2, since the sound of the
03-05-2019
2
entire monitoring area is transmitted, it is difficult for the viewer to locally listen to the sound of
a part of the monitoring area.
[0007]
Accordingly, the present invention has been made in view of the above problems, and an object
of the present invention is to provide a viewer with a desired sound that has been clarified, and
to provide video and sound to be provided to the viewer. It is an object of the present invention
to provide a new and improved information processing apparatus, method, program and
information processing system capable of achieving both improvement in the sense of presence
with the information processing system.
[0008]
To solve the above problems, according to one aspect of the present invention, a video
information acquisition unit that acquires video information from an imaging device, and a first
area included in a range imaged by the imaging device A control unit configured to control
mixing of a first area sound and a wide range sound emitted from a range wider than the first
area; the control section is configured to adjust the first area according to the expansion of the
first area; An information processing apparatus, which selects a control pattern from a plurality
of control patterns different in how the amplification factor of one area sound or the
amplification factor of the wide-range sound changes, and controls the mixing according to the
selected control pattern Provided.
[0009]
Each of the plurality of control patterns may indicate a relationship in which the amplification
factor of the first area sound is increased according to the expansion of the first area in the video
information.
[0010]
Each of the plurality of control patterns may indicate a relationship in which the amplification
factor of the wide-range sound decreases as the first area of the video information is expanded.
[0011]
In the plurality of control patterns, the rise of the amplification factor of the first area sound may
be different from the expansion of the first area.
03-05-2019
3
[0012]
The control unit may measure the degree of user's attention to the first area in the video
information, and select the control pattern from the plurality of control patterns based on the
degree of user's attention.
[0013]
The control unit selects a control pattern with a steeper rise of the amplification factor of the first
area sound as the user's attention degree is higher, and the amplification factor of the first area
sound as the user's attention degree is lower. The control pattern may be selected to have a
gradual rise.
[0014]
The control unit may measure the number of people staying in the first area, and select the
control pattern from the plurality of control patterns based on the number of people staying in
the first area.
[0015]
The control unit selects a control pattern in which the rising of the amplification factor of the
first area sound is steeper as the number of people staying in the first area is larger, and the
number of people staying in the first area is smaller as the number of people staying in the first
area is smaller. A control pattern may be selected in which the rise of the amplification factor of
the area sound is slow.
[0016]
The control unit may measure the sound pressure of sound information in the first area, and
select the control pattern from the plurality of control patterns based on the sound pressure of
sound information in the first area. .
[0017]
The control unit selects a control pattern in which the rising of the amplification factor of the
first area sound is steeper as the sound pressure of the sound information in the first area is
larger, and the sound pressure of the sound information in the first area The control pattern may
be selected such that the rising of the amplification factor of the first region sound is gradual as
the value of .beta.
03-05-2019
4
[0018]
The control unit may determine the presence or absence of conversation in the first area, and
select the control pattern from the plurality of control patterns based on the presence or absence
of conversation in the first area.
[0019]
The control unit performs control such that the rise of the amplification factor of the first area
sound is steeper than the control pattern selected when there is no conversation in the first area
when there is a conversation in the first area. You may select a pattern.
[0020]
The control unit may select the control pattern from the plurality of control patterns based on
the image quality of the first area in the video information.
[0021]
The control unit may select a control pattern in which a rise of the amplification factor of the first
area sound is steeper as the image quality of the first area in the image information is higher.
[0022]
The plurality of control patterns are the first area sound or the wide area according to the
predetermined amount of expansion of the first area before and after the size of the first area in
the image information reaches a threshold. The amount of change in amplification factor of the
sound may include control patterns that differ.
[0023]
The control unit may set the first area based on a selection operation by a user.
[0024]
In order to solve the above problems, according to another aspect of the present invention,
acquiring image information from an imaging device, selecting a control pattern from a plurality
of control patterns, and capturing the image according to the selected control pattern Controlling
the mixing of a first area sound emitted from a first area included in an area imaged by the
device and a broad area sound emitted from a range wider than the first area. Provides a method
implemented by
03-05-2019
5
[0025]
Further, in order to solve the above problems, according to another aspect of the present
invention, a computer is provided with a video information acquisition unit for acquiring video
information from an imaging device, and a first image included in a range imaged by the imaging
device. A control unit that controls mixing of a first area sound emitted from an area and a wide
area sound emitted from a range wider than the first area, the control section performing control
based on a plurality of control patterns A program is provided for functioning as an information
processing apparatus, which selects a pattern and controls the mixing in accordance with the
selected control pattern.
[0026]
According to another aspect of the present invention, there is provided an information
processing system including an imaging device, a plurality of sound collection devices, a region
sound extraction device, and an information processing device. The area sound extraction device
extracts a first area sound emitted from a first area included in a range captured by the imaging
device from the sounds collected by the plurality of sound collection devices. The information
processing apparatus selects a control pattern from a plurality of control patterns, and controls
mixing of the first area sound and a wide range sound emitted from a range wider than the first
area according to the selected control pattern. An information processing system is provided.
[0027]
As described above, according to the present invention, it is possible to simultaneously provide
the viewer with a desired sound that has been clarified and to improve the sense of reality for the
video and the sound provided to the viewer. It is.
[0028]
It is a figure for explaining an outline of an information processing system concerning one
embodiment of the present invention.
It is a block diagram showing an example of rough functional composition of an information
processing system concerning one embodiment of the present invention.
It is a figure for demonstrating the example of the imaging | video reproduced | regenerated in
03-05-2019
6
the information processing system which concerns on one Embodiment of this invention.
It is a figure for demonstrating the example of the imaging | video reproduced | regenerated in
the information processing system which concerns on one Embodiment of this invention.
It is a figure for demonstrating the example of the imaging | video reproduced | regenerated in
the information processing system which concerns on one Embodiment of this invention.
It is a figure which shows the example of a pattern of sound control in the information
processing system which concerns on one Embodiment of this invention.
It is a figure which shows notionally the example of a process in case the target in the
information processing system which concerns on one Embodiment of this invention is not
selected.
It is a figure which shows notionally the example of a process when the target in the information
processing system which concerns on one Embodiment of this invention is selected.
It is a figure which shows notionally the example of a process when the imaging | video in the
information processing system which concerns on one Embodiment of this invention is switched.
It is a figure which shows notionally the example of a process when the imaging | video in the
information processing system which concerns on one Embodiment of this invention is further
switched.
It is a figure which shows the example of a pattern of sound control in the information
processing system which concerns on the 1st modification of one Embodiment of this invention.
It is a figure which shows another pattern example of the sound control in the information
processing system which concerns on the 2nd modification of one Embodiment of this invention.
03-05-2019
7
It is a figure which shows notionally the example of a process of the information processing
system which concerns on the 3rd modification of one Embodiment of this invention.
It is explanatory drawing which shows the sound control pattern by an application example.
It is explanatory drawing which shows the relationship between the amplification factor of the
broadband sound information by the certain sound control pattern and the target area sound
information, and the area of the target area.
It is explanatory drawing which shows the relationship between the amplification factor of the
broadband sound information by the other sound control pattern and the target area sound
information, and the area of the target area.
It is explanatory drawing which shows the specific value of the variation | change_quantity of
amplification factor by each sound control pattern The sound control pattern which shows the
relationship between the zoom magnification of an image, the amplification factor of target area
sound information, and the variation of amplification of wide area sound information It is
explanatory drawing which shows a specific example.
It is explanatory drawing which shows the specific example of the sound control pattern which
shows the relationship between the virtual distance and the amplification factor of the target
area sound information, and the variation of the amplification factor of wide area sound
information.
It is an explanatory view showing the hardware constitutions of the information processor
concerning one embodiment of the present invention.
[0029]
Hereinafter, embodiments of the present invention will be described in detail with reference to
the accompanying drawings.
03-05-2019
8
In the present specification and the drawings, components having substantially the same
functional configuration will be assigned the same reference numerals and redundant description
will be omitted.
[0030]
Further, in the present specification and the drawings, a plurality of components having
substantially the same functional configuration may be distinguished by attaching different
alphabets to the same reference numerals.
For example, a plurality of configurations having substantially the same functional configuration
or logical meaning are distinguished as the imaging device 400A and the imaging device 400B as
necessary.
However, when it is not necessary to distinguish each of a plurality of components having
substantially the same functional configuration, each of the plurality of components is only given
the same reference numeral. For example, when it is not necessary to distinguish between the
photographing device 400A and the photographing device 400B, each photographing device is
simply referred to as a photographing device 400.
[0031]
<1. One Embodiment of the Present Invention> An information processing system 1
according to an embodiment of the present invention and an information processing apparatus
100 which is one of the components of the information processing system will be described.
[0032]
<1.1. Overview of System> First, an overview of the information processing system 1 will
be described with reference to FIG. FIG. 1 is a view for explaining an outline of an information
processing system 1 according to an embodiment of the present invention.
03-05-2019
9
[0033]
The information processing system 1 includes an information processing device 100, a sound
collection device 200, an area sound generation device 300, a photographing device 400, and a
media reproduction device 500. In the information processing system 1, the sound collection
device 200 and the photographing device 400 are disposed in the same space, and the sound
collection device 200 and the photographing device 400 collect and photograph in the space.
[0034]
Here, in the information processing system 1, a space to be a target of sound collection and
photographing (hereinafter also referred to as a target space). ) Are also referred to as a plurality
of areas (hereinafter referred to as areas). Divided into For example, the target space is divided
into area A to area I as shown in FIG. The sound collection device 200 collects sound in a range
wider than the area.
[0035]
The area sound generation device 300 is a sound obtained by the sound collection of the sound
collection device 200 (hereinafter also referred to as a wide range sound). Based on the
information, a sound assumed to be perceived in each area, or a sound presumed to be emitted
from each area (hereinafter also referred to as an area sound). ) Generate information for each
area. For example, the area sound generation device 300 generates area sound information for
the areas A to I, respectively, based on the wide-range sound information obtained from the
sound collection devices 200A to 200D. In addition, the existing technology may be used for
generation of the area sound.
[0036]
Then, the information processing apparatus 100 controls video and sound to be reproduced by
the media reproduction apparatus 500. Specifically, the information processing apparatus 100
provides the media reproduction apparatus 500 with video information by controlling the
photographing apparatus 400. Further, the information processing apparatus 100 is also
referred to as sound information obtained by mixing processing of the wide area sound
03-05-2019
10
information obtained from the sound collection apparatus 200 and the area sound information
obtained from the area sound generation apparatus 300 (hereinafter also referred to as mixed
sound information). ) And provides the generated mixed sound information to the media
playback device 500. Media reproduction device 500 reproduces video and sound based on the
provided video information and mixed sound information. For example, the information
processing apparatus 100 instructs the imaging apparatus 400A to perform zoom-in control
centering on the area E as shown in FIG. The imaging device 400A provides the media
reproduction device 500 with video extracted from a video obtained by shooting based on the
instruction or an enlarged video by controlling the optical system based on the instruction as
video information. In addition, the information processing apparatus 100 provides the media
reproduction apparatus 500 with mixed sound information obtained by mixing processing of
area sound information and wide area sound information for the area E that is widely displayed
in a video.
[0037]
Thus, the information processing system 1 reproduces the mixed sound of the area sound and
the wide area sound in accordance with the video. Therefore, a sound assumed to be a sound that
the viewer wants to hear in the reproduced video (hereinafter, also referred to as a desired
sound). The sound other than the desired sound (hereinafter also referred to as background
sound) while clarifying ) Can play. For example, as shown in FIG. 1, while the area sound (that is,
the desired sound) for the area E being focused in the video reproduced by the media
reproduction device 500 is reproduced, the wide area sound around the area E (that is, the
background) Sound is played back. Therefore, not only the voice, such as "What about this" in the
area E, also voice saying "Hello" in the area F which is adjacent to the area E will be coming to the
viewer's ear, viewers are if they were in the vicinity of the area E It can give the viewer a sense of
realism.
[0038]
Although FIG. 1 illustrates an example in which a plurality of sound collection devices 200 and
photographing devices 400 are arranged, the sound collection device 200 and the photographing
devices 400 may be singular.
[0039]
<1.2.
03-05-2019
11
System Configuration> Next, the configuration of an information processing system 1 according
to an embodiment of the present invention will be described with reference to FIG. FIG. 2 is a
block diagram showing an example of a schematic functional configuration of the information
processing system 1 according to an embodiment of the present invention.
[0040]
As shown in FIG. 2, the information processing system 1 includes an information processing
device 100, a sound collection device 200, an area sound generation device 300, a
photographing device 400, and a media reproduction device 500. Each device is connected via
communication. The functions of each device will be described in detail below.
[0041]
[Information Processing Device] The information processing device 100 includes a media control
unit 101 and a sound mixing unit 102. The media control unit 101 and the sound mixing unit
102 are communication units (not shown). Receive information from an external device via the)
or transmit information to an external device.
[0042]
The media control unit 101 controls video and sound to be played back by the media playback
apparatus 500. Specifically, the media control unit 101 controls an imaging device 400 to
control an image to be reproduced by the media reproduction device 500. For example, the
media control unit 101 generates video control information indicating presence or absence of
shooting, control parameters, presence or absence of distribution, and the like, and provides the
video control information generated to the shooting apparatus 400. The control parameters
include digital zoom that cuts out a part of the image, zoom information related to optical zoom
that enlarges or reduces the image, and shooting direction information related to pan or tilt.
Hereinafter, the case where digital zoom is performed as image control will be described.
[0043]
03-05-2019
12
The media control unit 101 also controls sound to be reproduced by the media reproduction
device 500 using the video information acquired from the photographing device 400.
Specifically, the media control unit 101 acquires video information from the imaging device 400
as a video information acquisition unit, and as a sound output control unit, information related to
a display mode of an area in a video related to the video information (hereinafter referred to as
an area It is also called a close-up index. Control information relating to sound output based on
area sound information and wide-range sound information (hereinafter also referred to as sound
control information). Generate). As the area close-up index, there is information indicating the
size of the area. In addition, the area involved in the generation of sound control information is
also referred to as a designated area (hereinafter referred to as a target area) identified based on
the designation operation information. And an area specified according to the target area
(hereinafter also referred to as a first near area). And an area close-up index (hereinafter also
referred to as a second near area). )がある。 For example, the media control unit 101 generates
sound control information related to mixing of sounds based on area sound information and
wide-range sound information for the target area, the first near area, or the second near area.
The generated sound control information is provided to the sound mixing unit 102. In the
following, when the first near area and the second near area are not distinguished from each
other, the first near area and the second near area are also simply referred to as near areas.
[0044]
Furthermore, with reference to FIGS. 3-5 and FIG. 6, the example of a sound control process is
demonstrated. FIGS. 3-5 is a figure for demonstrating each of the example of the image | video
reproduced | regenerated in the information processing system 1 which concerns on one
Embodiment of this invention, respectively. FIG. 6 is a view showing an example of a sound
control pattern in the information processing system 1 according to an embodiment of the
present invention.
[0045]
(Sound Control Pattern 1) The media control unit 101 first attempts to set a target area.
Specifically, the media control unit 101 sets a target area based on an area selection operation or
an object selection operation in a video related to video information. For example, the media
control unit 101 determines whether or not an area or an object on an image to be a focus target
is selected. In the example of FIG. 3, the target area is not set because the target is not selected.
03-05-2019
13
Note that area or object selection operation information may be provided from a device such as
the media playback device 500.
[0046]
When the target area is not set, the media control unit 101 sets a second nearby area.
Specifically, the media control unit 101 sets an area having an area equal to or larger than the
threshold in the video as a second near area. For example, the area H and the area I as shown in
FIG. 3 are set as the second nearby area. Note that an area having a relatively large area among a
plurality of areas in the video may be set as the second neighboring area. As described above,
when the target is not selected, the viewer is given a sense of reality to the video by mixing the
area sounds of areas relatively close to the virtual viewing position (for example, the shooting
position) of the video. be able to.
[0047]
Then, the media control unit 101 generates sound control information relating to the mixing of
the area sound and the wide area sound according to the setting presence or absence of the
target area. As sound control information concerning mixing of sound, there is volume control
information. For example, since the target area (i.e., the target) is not selected, the media control
unit 101 uses the sound volume amplification factor illustrated as pattern 1 in FIG. Generate
control information. Here, since the target is not selected, the volume amplification factor of the
wide area sound is set higher than that of the near area sound.
[0048]
In the example of FIG. 6, the volume of any sound is amplified (volume amplification factor is 1
or more), and the volume of other sounds is attenuated (volume amplification factor is less than
1). However, the volume of all the sounds may be amplified or attenuated as long as the relative
relationship of the volumes is maintained. For example, for the sound control pattern 1, the
volume amplification factor of the wide area sound may be set to 3.6, and the volume
amplification factor of the near area sound may be set to 1.0. Further, with regard to the sound
control pattern 1, the volume amplification factor of the wide area sound may be set to 0.9, and
the volume amplification factor of the near area sound may be set to 0.25.
03-05-2019
14
[0049]
(Sound Control Pattern 2) When the target area is set, the media control unit 101 sets a first
nearby area. For example, when a person located in the area E or the area E is selected as a
target, the area E is set as the target area. Then, as shown in FIG. 4, the media control unit 101
sets an area F which is an area other than the area E which is the target area and whose area in
the video is equal to or larger than the threshold as the first neighboring area.
[0050]
Then, the media control unit 101 generates sound control information related to the mixture of
the area sound and the wide area sound according to the area close-up index for the target area.
Specifically, the media control unit 101 controls the volume ratio of the area sound and the wide
area sound according to the area close-up index. Specifically, the media control unit 101
increases the ratio of the wide-range sound as the area close-up index for the target area is
smaller (ie, as the degree of close-up is smaller). For example, since the area E is selected as the
target area (ie, the target), the media control unit 101 determines the relationship between the
area of the area E in the video and the first threshold. When it is determined that the area of area
E is less than the first threshold, media control section 101 mixes the wide area sound, the near
area sound and the target area sound with sound volume amplification factor shown in pattern 2
of FIG. To generate sound control information for mixing. Here, the respective sound volume
amplification factors are set such that the sound volume amplification factors become lower in
the order of the wide area sound, the near area sound, and the target area sound. For this reason,
the broad-range sound can be heard louder than the target area sound, so that the viewer can
feel the target area far away. Furthermore, by mixing the first nearby area sound, the mixed
sound becomes layered, and the sense of reality for the mixed sound can be improved. Further,
since the first near area sound sounds smaller than the wide area sound and larger than the
target area sound, it is possible to give the viewer a sense of reality for the first near area sound.
[0051]
(Sound Control Pattern 3) The media control unit 101 increases the ratio of the target area sound
as the area close-up index for the target area is larger (that is, as the degree of close-up is larger).
For example, as shown in FIG. 5, when it is determined that the area of the target area E in the
video is equal to or greater than the first threshold, the media control unit 101 performs wide-
03-05-2019
15
range sound with the sound volume amplification factor shown in pattern 3 of FIG. Sound control
information for causing the sound mixing unit 102 to mix the near area sound and the target
area sound is generated. Here, the respective sound volume amplification factors are set such
that the sound volume amplification factors become higher in the order of the wide area sound,
the near area sound, and the target area sound. For this reason, since the target area sound
sounds louder than the wide area sound and the first near area sound, the viewer can feel the
target area closer. In addition, since the first near area sound is smaller than the target area
sound and sounds larger than the wide area sound, it is possible to give the viewer a sense of
reality for the first near area sound.
[0052]
Although the example in which the volume amplification factor is determined according to the
comparison result of the area of the area with the first threshold has been described above, the
volume amplification factor may be determined according to the numerical value related to the
area of the area . For example, the media control unit 101 may control to increase the volume
amplification factor in proportion to the increase of the area of the area.
[0053]
(Switching of Sound Control Pattern) The media control unit 101 switches the sound control
pattern according to the situation. Specifically, the media control unit 101 generates sound
control information in accordance with the specification of the target area. For example, the
media control unit 101 generates sound control information corresponding to any of the
patterns as shown in FIG. 6 in accordance with the change in the presence or absence of the
target area, and generates the sound control information generated by the sound mixing unit
102. To provide. Thereby, the sound control pattern can be switched.
[0054]
Also, the media control unit 101 may generate sound control information according to the
change of the area close-up indicator. For example, the media control unit 101 generates sound
control information corresponding to any of the patterns as shown in FIG. 6 according to the
change in the area of the target area or the nearby area in the video, and generates the sound
control information generated. To the sound mixing unit 102.
03-05-2019
16
[0055]
The functions of the media control unit 101 have been described above. Subsequently, the
function of the sound mixing unit 102 will be described. The sound mixing unit 102 performs
mixing processing of the wide-range sound information and the area sound information based on
an instruction of the media control unit 101. Specifically, the sound mixing unit 102 performs
mixing processing at a volume amplification factor specified for the sound information of the
mixing processing target instructed by the sound control information from the media control unit
101, and generates mixed sound information. . For example, the sound mixing unit 102 acquires
area sound information and wide-range sound information indicated by the sound control
information, and the sound volume amplification factor indicated by the sound control
information indicates the sound according to the acquired area sound information and the sound
according to the wide-area sound information. The mixed sound information related to the mixed
sound is generated according to. Then, the sound mixing unit 102 provides the generated mixed
sound information to the media reproduction device 500. Note that the sound mixing unit 102
may provide the area reproduction sound or the wide area sound to the media reproduction
device 500 as it is.
[0056]
[Sound Collection Device] The sound collection device 200 picks up sound in the vicinity of the
sound collection device 200. Specifically, the sound collection device 200 generates sound
information by sound collection. For example, the sound collection device 200 is a microphone
or a microphone array.
[0057]
[Area Sound Generation Device] The area sound generation device 300 generates area sound
information from sound information. Specifically, the area sound generation device 300
generates area sound information based on the wide-range sound obtained from the sound
collection device 200. For example, the area sound generation device 300 generates area sound
information by extracting sounds in a specific area from a wide area sound. The generated area
sound information is provided to the sound mixing unit 102 of the information processing
apparatus 100.
03-05-2019
17
[0058]
[Photographing Device] The photographing device 400 photographs the periphery of the
photographing device 400. Specifically, based on an instruction of the media control unit 101,
the imaging device 400 provides imaging and image information obtained by imaging. For
example, the photographing apparatus 400 photographs the space where the photographing
apparatus 400 is installed based on the video control information from the media control unit
101, and reproduces the photographing information such as an image (still image or moving
image) obtained by photographing. The device 500 is provided.
[0059]
[Media Reproduction Device] The media reproduction device 500 reproduces video and sound.
Specifically, the media reproduction device 500 reproduces the video and the sound based on the
video information provided from the imaging device 400 and the mixed sound information
provided from the sound mixing unit 102 of the information processing device 100. For example,
the media playback device 500 is a display device with a sound output function. Media playback
device 500 may be an assembly of a plurality of devices such as a sound output device and a
display device, and each of the plurality of devices may operate independently.
[0060]
Also, the media playback device 500 may operate as an input device. Specifically, the media
playback device 500 receives a user's operation input and generates input information. For
example, the media reproduction device 500 receives an operation of designating a target by the
user, and generates the above-described designation operation information.
[0061]
<1.3. Process of System> Next, a process of the information processing system 1 according
to an embodiment of the present invention will be described. Here, the flow of processing in each
situation of target selection and video will be described with reference to FIGS. 7 to 10,
03-05-2019
18
respectively.
[0062]
(Target Unselected) First, processing in the case where a target is not selected will be described
with reference to FIG. FIG. 7 is a diagram conceptually showing an example of processing when
the target is not selected in the information processing system 1 according to an embodiment of
the present invention.
[0063]
The sound collection device 200 continuously transmits the wide range sound information to the
sound mixing unit 102 and the area sound generation device 300 (step S601). Specifically, the
sound collection device 200 generates wide-range sound information by sound collection. In
addition, sound ID (Identifier) for identifying sound information is provided to sound information.
For example, a sound ID of audioW is assigned to the wide-range sound information.
[0064]
The area sound generation device 300 generates area sound information based on the wide area
sound information (step S602), and transmits the generated area sound information to the sound
mixing unit 102 (step S603). Specifically, the area sound generation device 300 generates area
sound information (sound IDs: audio A to auido I) for the areas A to I from the wide area sound
information. Then, the area sound generation device 300 transmits the generated area sound
information to the sound mixing unit 102. The area sound generation device 300 may transmit
area sound information in response to a request from the sound mixing unit 102.
[0065]
The media control unit 101 transmits a video control message as video control information to the
imaging device 400 (step S604). Specifically, the media control unit 101 generates a video
control message instructing the start of shooting, the start of video distribution, and the setting
to 100% of the zoom ratio (that is, no zoom). Then, the media control unit 101 transmits the
03-05-2019
19
generated video control message to the imaging device 400.
[0066]
The photographing apparatus 400 operates based on the video control message, and transmits
video information to the media control unit 101 (step S605). Specifically, the imaging device 400
starts imaging based on the received video control message, and transmits video information
obtained by imaging to the media control unit 101. Note that the video information may not be
transmitted to the media playback device 500 at this time.
[0067]
Next, the media control unit 101 sets a target area (step S606). Specifically, the media control
unit 101 sets a target area based on the target selected by the viewer. Here, no target area is set
because a target is not selected.
[0068]
Also, the media control unit 101 sets a near area (step S 607). Specifically, the media control unit
101 sets the proximity area based on the video information. Here, since the target area is not set,
the area H and the area I in which the area in the video is equal to or larger than the threshold
are set as the second neighboring area.
[0069]
Then, the media control unit 101 notifies the sound mixing unit 102 of a sound control message
as sound control information (step S608). Specifically, the media control unit 101 notifies the
sound mixing unit 102 of a sound control message for generating mixed sound information of
the wide-range sound information and the second neighboring area sound information. Here,
since the target is not selected, that is, the target area is not set, the sound control pattern 1 is
applied, and sound control showing audio volume amplification factor of audioW = 1.8, audioH
and auidoI volume amplification factor = 0.5 A message is notified.
03-05-2019
20
[0070]
The sound mixing unit 102 generates mixed sound information based on the sound control
message (step S609). Specifically, based on the received sound control message, the sound
mixing unit 102 sets the sound information of audioW at a sound volume amplification factor of
1.8 and the sound information of audioH and audioI at a sound volume amplification factor of
0.5, respectively. Mix processing to generate mixed sound information.
[0071]
The photographing device 400 transmits the video information to the media reproduction device
500 (step S610), and the sound mixing unit 102 transmits the generated mixed sound
information to the media reproduction device 500 (step S611). Then, the media reproduction
device 500 reproduces the video and the sound based on the received video information and the
mixed sound information (step S612).
[0072]
(Target Selection) Subsequently, processing when a target is selected will be described with
reference to FIG. FIG. 8 is a diagram conceptually showing an example of processing when a
target is selected in the information processing system 1 according to an embodiment of the
present invention. The description of the processing substantially the same as the processing
described above will be omitted.
[0073]
The media control unit 101 receives designated operation information from the media
reproduction device 500 or the like (step S621). Specifically, the media playback device 500
receives a target specification operation by the user, and generates specification operation
information. Then, the media playback device 500 transmits the generated designated operation
information to the media control unit 101.
03-05-2019
21
[0074]
Next, the media control unit 101 sets a target area (step S622). Specifically, the media control
unit 101 identifies a target from the received designated operation information. Next, when the
identified target is an area, the media control unit 101 sets the area as a target area. In addition,
when the identified target is an object, the media control unit 101 sets an area in which the
object is located as a target area. Here, the area E is set as the target area.
[0075]
Also, the media control unit 101 sets a nearby area (step S623). Specifically, since the target area
is set, the media control unit 101 sets an area other than the target area, in which the area in the
video is equal to or larger than the threshold, as the first neighboring area. Here, the area H and
the area I are set as the first nearby area.
[0076]
Then, the media control unit 101 notifies the sound mixing unit 102 of the sound control
message (step S624). Specifically, the media control unit 101 notifies the sound mixing unit 102
of a sound control message for generating mixed sound information of the wide area sound
information, the first neighboring area sound information, and the target area sound information.
Here, since the area of area E in the image is less than the first threshold, pattern 2 shown in FIG.
6 is applied, and the audio amplification factor of audioW = 1.8, and the audio amplification
factor of audioH and audioI = 0.5. , And an audio control message indicating a volume
amplification factor of 0.2 of audio E are notified.
[0077]
The sound mixing unit 102 generates mixed sound information based on the sound control
message (step S625). Specifically, based on the received sound control message, the sound
mixing unit 102 sets the sound information of audioW at a sound volume amplification factor of
1.8, and the sound information of audioH and audioI at a sound volume amplification factor of
0.5. Are mixed at a sound volume amplification factor of 0.2 to generate mixed sound
information.
03-05-2019
22
[0078]
Then, the sound mixing unit 102 transmits the generated mixed sound information to the media
reproduction device 500 (step S626), and the media reproduction device 500 reproduces the
video and the sound based on the received video information and the mixed sound information.
(Step S627).
[0079]
(Enlargement of Video) Subsequently, processing when the video is switched will be described
with reference to FIG.
FIG. 9 is a diagram conceptually showing an example of processing in the case where a video is
switched in the information processing system 1 according to an embodiment of the present
invention. The description of the processing substantially the same as the processing described
above will be omitted.
[0080]
The media control unit 101 receives video operation information from the media reproduction
device 500 or the like (step S641). Specifically, the media reproduction device 500 receives a
zoom operation of a video by the user, and generates video operation information related to the
received zoom operation. Then, the media playback device 500 transmits the generated video
operation information to the media control unit 101.
[0081]
Next, the media control unit 101 transmits a video control message to the photographing
apparatus 400 based on the received video operation information (step S642). Specifically, the
media control unit 101 generates a video control message instructing setting of the zoom
position and the zoom ratio of 150% in accordance with the received video operation information
in addition to the continuation of the video distribution. Then, the media control unit 101
transmits the generated video control message to the imaging device 400.
03-05-2019
23
[0082]
The photographing device 400 transmits the video information to the media control unit 101
based on the received video control message (step S643). Specifically, based on the received
video control message, the imaging device 400 cuts out the video obtained by shooting around
the instructed zoom position, and stretches the cut out video to a size of 150%. Then, the imaging
device 400 transmits the video information obtained by such processing to the media control
unit 101.
[0083]
Next, the media control unit 101 sets a target area (step S644). Specifically, the media control
unit 101 updates the target area based on the received video information. For example, the
media control unit 101 specifies a target in a video related to video information obtained after
changing the zoom ratio. Next, when the identified target is an area, the media control unit 101
sets the area as a target area. In addition, when the identified target is an object, the media
control unit 101 sets an area in which the object is located as a target area. Here, area E
continues to be set as the target area.
[0084]
Also, the media control unit 101 sets a near area (step S645). Specifically, since the target area is
set, the media control unit 101 sets the first nearby area. Here, the area H and the area I are
separated from the first neighboring area by zooming in the image, and the area F is newly set as
the first neighboring area.
[0085]
Then, the media control unit 101 notifies the sound mixing unit 102 of the sound control
message (step S646). Specifically, the media control unit 101 notifies the sound mixing unit 102
of a sound control message for generating mixed sound information of the wide area sound
information, the first neighboring area sound information, and the target area sound information.
03-05-2019
24
Here, since the area of the area E in the video exceeds the first threshold due to the zooming in of
the video, the pattern 3 shown in FIG. 6 is applied, and the audio amplification factor of audioW =
0.2, and the audio amplification factor of audioF = 0. 5. A sound control message indicating a
sound volume amplification factor of 1.8 of audio E is notified.
[0086]
The sound mixing unit 102 generates mixed sound information based on the sound control
message (step S647). Specifically, based on the received sound control message, the sound
mixing unit 102 sets the sound information of audioW at a sound volume amplification factor of
0.2, the sound information of audioF at a sound volume amplification factor of 0.5, and the sound
of audioE. The information is mixed at a volume amplification factor of 1.8 to generate mixed
sound information.
[0087]
Then, the photographing device 400 transmits the video information generated after the control
by the video operation information to the media reproduction device 500 (step S648), and the
sound mixing unit 102 transmits the generated mixed sound information to the media
reproduction device 500 (Step S649). Then, the media reproduction device 500 reproduces the
video and the sound based on the received video information and the mixed sound information
(step S650).
[0088]
(Further Image Enlargement) Next, with reference to FIG. 10, a process in the case where the
image is further switched will be described. FIG. 10 is a diagram conceptually showing an
example of processing in the case where the video is further switched in the information
processing system 1 according to the embodiment of the present invention. The description of
the processing substantially the same as the processing described above will be omitted.
[0089]
03-05-2019
25
The media control unit 101 receives video operation information from the media reproduction
device 500 or the like (step S661), and transmits a video control message to the imaging device
400 based on the received video operation information (step S662). Specifically, the media
control unit 101 generates a video control message instructing setting of the zoom position and
the zoom ratio of 200% according to the received video operation information. Then, the media
control unit 101 transmits the generated video control message to the imaging device 400.
[0090]
The photographing apparatus 400 transmits the video information to the media control unit 101
based on the received video control message (step S663). Specifically, based on the received
video control message, the imaging device 400 cuts out the video obtained by the shooting
around the instructed zoom position, and stretches the cut out video to a size of 200%. Then, the
imaging device 400 transmits the video information obtained by such processing to the media
control unit 101.
[0091]
Next, the media control unit 101 sets a target area (step S664). Specifically, the media control
unit 101 sets a target area based on a target in a video with a zoom ratio of 200%. Here, area E
continues to be set as the target area.
[0092]
Also, the media control unit 101 sets a near area (step S665). Specifically, since the target area is
set, the media control unit 101 tries to set the first nearby area. However, here, since the area of
the video is equal to or larger than the threshold due to the zoom-in of the video and there is no
area other than the area E, the first neighboring area is not set.
[0093]
Then, the media control unit 101 notifies the sound mixing unit 102 of the sound control
message (step S666). Specifically, the media control unit 101 notifies the sound mixing unit 102
03-05-2019
26
of a sound control message for generating mixed sound information of the wide area sound
information and the target area sound information. Here, since the area of the area E in the video
exceeds the first threshold due to the zooming in of the video, the pattern 3 shown in FIG. 6 is
applied, and the audio amplification factor of audioW = 0.2 and the audio amplification factor of
audioE1.8. A sound control message indicating that is notified.
[0094]
The sound mixing unit 102 generates mixed sound information based on the sound control
message (step S 667). Specifically, based on the received sound control message, the sound
mixing unit 102 mixes the sound information of audioW at a sound volume amplification factor
of 0.2 and the sound information of audioE at a sound volume amplification factor of 1.8.
Generate mixed sound information.
[0095]
Then, the photographing device 400 transmits the video information generated after the control
by the video operation information to the media reproduction device 500 (step S668), and the
sound mixing unit 102 transmits the generated mixed sound information to the media
reproduction device 500 (Step S669). Then, the media reproduction device 500 reproduces the
video and the sound based on the received video information and the mixed sound information
(step S670).
[0096]
<1.4. Summary of one embodiment of the present invention> Thus, according to one
embodiment of the present invention, the information processing apparatus 100 acquires video
information relating to a specific space, and the specific information in the video relating to the
video information. According to the information related to the display mode of the first area in
the space, to the first sound information related to the first area and the second sound
information related to a wider range than the first area in the specific space Control information
related to the sound output based on
[0097]
03-05-2019
27
In the past, a technique was used to make the sound in a specific area clearer by attenuating
background noise other than the sound in the specific area. However, with this technology, it is
difficult to give the viewer a sense of reality because only the sound of the specific area is heard.
[0098]
On the other hand, according to an embodiment of the present invention, the first sound
information (area sound) and the second sound information (wide range sound) are selected
according to the display mode of the first area in the video information. The sound output based
on and is controlled. Therefore, not only local area sound but also wide-range sound in a wider
range than the area can be provided to the viewer according to the image. Thus, the viewer can
feel the atmosphere of a space wider than the area while listening to the area sound. Therefore, it
is possible to simultaneously provide the viewer with the desired sound that has been clarified,
and to improve the sense of presence of the video and the sound provided to the viewer.
[0099]
Further, the information according to the display mode includes information indicating the size
of the first area. Therefore, by controlling the area sound in accordance with the size of the area
in the video, it is possible to present the relationship between the video and the sound to the
viewer. Therefore, it is possible to improve the sense of reality with respect to images and
sounds.
[0100]
The control information relating to the sound output includes control information relating to
mixing of sounds based on the first sound information and the second sound information.
Therefore, the area sound and the wide area sound can be presented to the viewer as an integral
sound. Therefore, it is possible to improve the sense of presence as compared with the case
where the area sound and the wide area sound are independently presented. As in the above
embodiment, a plurality of first sound information (the target area sound and the near area
sound) and the second sound information may be mixed.
03-05-2019
28
[0101]
Further, control information relating to the mixing of the sounds includes volume control
information of the first sound information and the second sound information. Therefore, the
degree of mixing of the area sound and the wide area sound can be controlled in accordance with
the display mode of the specific area in the video. Therefore, for example, as the area of the
target area in the image is larger, the viewer can be given a sense of realism as if the target area
is approached by lowering the volume amplification factor of the wide area sound while raising
the volume amplification factor of the area sound. It becomes possible.
[0102]
The first area includes an area specified according to the information related to the display mode.
Therefore, it is possible to set, for example, an area where the area in the video is larger than the
threshold as the second near area and to mix the second near area sound. Therefore, even if the
target area is not set, area sounds in areas virtually close to the viewer are mixed and
reproduced, whereby it is possible to improve the sense of reality with respect to video and
sound. Become.
[0103]
Further, the first area includes a designated area specified based on the designated operation
information. Therefore, the target area can be determined based on the user's operation.
Therefore, it is possible to more reliably reproduce the sound of the area that the user desires to
view.
[0104]
Further, the operation according to the designation operation information includes the selection
operation of the first area or the selection operation of the object in the video according to the
video information. For this reason, when the user recognizes the area, it is possible to suppress
the occurrence of wrinkles in the setting of the target area by directly selecting the area. Also,
when the user selects an object on the video, the user can clearly hear the area sound of the
03-05-2019
29
target area without recognizing the area in advance.
[0105]
The first area includes an area specified according to the specified area. Therefore, for example,
by setting the peripheral area of the target area as the near area, it is possible to improve the
realism of the mixed sound.
[0106]
Further, control information relating to the sound output is generated according to the
specification of the designated area. Therefore, by controlling the mixed sound to be reproduced
according to the setting of the target area, it is possible to improve the responsiveness of the
mixed sound control to the user's operation. Therefore, it is possible to improve the user's
operation feeling.
[0107]
Further, control information relating to the sound output is generated according to a change in
information relating to the display mode. Therefore, by controlling the mixed sound according to
the change of the area close-up index, it is possible to suppress the decrease in the sense of
realism due to the switching of the image.
[0108]
The first sound information is generated by extraction from the second sound information. For
this reason, it is possible to suppress the viewer's discomfort with respect to the mixed sound.
[0109]
<2. Modifications> The embodiments of the present invention have been described above. In
03-05-2019
30
the following, some variants of the embodiments of the invention will be described. In addition,
each modification described below may be independently applied to the embodiment of the
present invention, or may be applied to the embodiment of the present invention in combination.
In addition, each modification may be applied instead of the configuration described in the
embodiment of the present invention, or may be additionally applied to the configuration
described in the embodiment of the present invention.
[0110]
(2−1. First Modified Example As a first modified example of the embodiment of the present
invention, the information related to the display mode of the area in the video may be
information other than the information indicating the size of the area. Specifically, the area closeup index may be zoom information related to video information. For example, the media control
unit 101 controls the mixing of sound according to the zoom factor of the video. Furthermore,
with reference to FIG. 11, the process of this modification is demonstrated. FIG. 11 is a view
showing an example of a sound control pattern in the information processing system 1 according
to the first modified example of the embodiment of the present invention.
[0111]
The media control unit 101 controls the volume ratio of the area sound and the wide area sound
according to the zoom magnification of the image. Specifically, the media control unit 101
increases the ratio of the wide-range sound (or decreases the ratio of the area sound) as the zoom
magnification of the video decreases. For example, the media control unit 101 determines
whether the zoom magnification of the video being reproduced is less than the first threshold.
When it is determined that the zoom magnification is less than the threshold, the media control
unit 101 mixes the wide area sound, the near area sound, and the target area sound into the
sound mixing unit 102 at the sound volume amplification factor shown in pattern 4 of FIG.
Generate sound control information for This is because it is considered that the viewer feels the
target area farther as the zoom magnification is lower.
[0112]
In addition, the media control unit 101 increases the ratio of the target area sound (or decreases
the ratio of the wide area sound) as the zoom magnification of the video is higher. For example,
03-05-2019
31
when it is determined that the zoom magnification of the video being reproduced is equal to or
greater than the threshold, the media control unit 101 determines the wide area sound, the near
area sound, and the target area sound at the volume amplification factor shown in pattern 5 of
FIG. Sound control information to be mixed in the sound mixing unit 102 is generated. This is
because it is considered that the viewer feels the target area closer as the zoom magnification is
higher.
[0113]
Thus, according to the first modification, the information related to the display mode of the area
in the video includes zoom information related to the video information. Therefore, mixed sound
can be controlled without analyzing the video. Therefore, it is possible to improve the processing
load and processing speed required for the sound control processing.
[0114]
(2−2. Second Modified Example The area close-up index may be information indicating a
positional relationship between a virtual viewing position of a video related to video information
and an area in the video. For example, the media control unit 101 controls the mixing of sounds
according to the virtual distance between the virtual viewing position and the target area. FIG. 12
is a view showing another pattern example of the sound control in the information processing
system 1 according to the second modified example of the embodiment of the present invention.
[0115]
The media control unit 101 controls the volume ratio of the area sound and the wide area sound
according to the virtual distance between the virtual viewing position and the target area.
Specifically, the media control unit 101 increases the ratio of the wide-range sound as the virtual
distance is longer. For example, the media control unit 101 first calculates a virtual distance
between the shooting position of the video and the target area in the video. Next, the media
control unit 101 determines whether the calculated virtual distance exceeds the first threshold.
When it is determined that the virtual distance exceeds the threshold value, the media control
unit 101 mixes the wide area sound, the near area sound, and the target area sound into the
sound mixing unit 102 at the sound volume amplification factor shown in pattern 6 of FIG.
Generate sound control information for This is because it is considered that the viewer feels the
03-05-2019
32
target area farther as the virtual distance is longer.
[0116]
Also, the media control unit 101 increases the ratio of the target area sound as the virtual
distance is shorter. For example, when it is determined that the calculated virtual distance is
equal to or less than the first threshold, the media control unit 101 mixes the wide area sound,
the near area sound, and the target area sound with the volume amplification factor shown in
pattern 7 of FIG. Sound control information to be mixed in the part 102 is generated. This is
because it is considered that the viewer feels the target area closer as the virtual distance is
shorter.
[0117]
Thus, according to the second modification, the information relating to the display mode of the
area in the video indicates the positional relationship between the virtual viewing position of the
video according to the video information and the area in the video according to the video
information. Contains information. For this reason, it is possible to control mixed sound more
accurately than the size of the area in the video. Therefore, it is possible to further enhance the
sense of reality given to the viewer.
[0118]
(2−3. Third Modified Example As a third modified example of the embodiment of the
present invention, the mixed sound of the area sound and the wide area sound may be controlled
in accordance with the change of the video information. Specifically, the media control unit 101
controls the sound output based on the area sound information and the wide-range sound
information according to the change of the provider of the video information. For example, when
the imaging device 400 for distributing the video information is switched, the media control unit
101 sets a target area or a nearby area based on the video information obtained from the
imaging device 400 of the switching destination. Then, the media control unit 101 performs
mixing processing of the area sound information and the wide area sound information according
to the set target area or the nearby area. Further, processing of the present modification will be
described with reference to FIG. FIG. 13: is a figure which shows notionally the example of a
process of the information processing system 1 which concerns on the 3rd modification of one
03-05-2019
33
Embodiment of this invention. The description of the processing substantially the same as the
processing described above will be omitted.
[0119]
The media control unit 101 receives video operation information from the media playback device
500 or the like (step S 701), and transmits a video control message to the imaging device 400
based on the received video operation information (steps S 702 and S 703). Specifically, when
the received video operation information is information related to the switching operation of the
imaging device 400 that distributes the video information, the media control unit 101 instructs
the imaging device 400A of the switching source to stop the distribution. Send a message Also,
the media control unit 101 transmits a video control message instructing the start of distribution
to the switching destination imaging device 400B.
[0120]
The photographing apparatus 400 transmits the video information to the media control unit 101
based on the received video control message (step S704). Specifically, the photographing
apparatus 400B starts photographing based on the received video control message, and
transmits the video information obtained by the photographing to the media control unit 101. If
the shooting has already been started, only the distribution of the video is started. The imaging
device 400A stops the delivery of the video based on the received video control message.
Furthermore, imaging may be stopped.
[0121]
Next, the media control unit 101 sets a target area (step S705). Specifically, the media control
unit 101 resets the target area based on the video information and the designation operation
information received from the imaging device 400B. Here, the area E is set as the target area.
[0122]
Also, the media control unit 101 sets a near area (step S706). Specifically, since the target area is
03-05-2019
34
set, the media control unit 101 attempts to set the first near area based on the video information
received from the imaging device 400B. Here, since the area whose area is equal to or larger than
the threshold is the area D, the area D is set as the first neighboring area.
[0123]
Then, the media control unit 101 notifies the sound mixing unit 102 of the sound control
message (step S 707), and the sound mixing unit 102 generates mixed sound information based
on the sound control message (step S 708).
[0124]
Thereafter, the video information is transmitted from the photographing device 400B instead of
the photographing device 400A to the media reproduction device 500 (step S709), and the
mixed sound information is transmitted from the sound mixing unit 102 to the media
reproduction device 500 (step S710).
Media reproduction device 500 reproduces video and sound based on the received video
information and mixed sound information (step S711).
[0125]
Thus, according to the third modification, the sound control information is generated in response
to the change of the video information. Therefore, even when the video to be reproduced is
switched, it is possible to present the mixed sound according to the switched video to the viewer.
Therefore, it is possible to continue the sense of reality for the image and the sound. When the
video is switched, the selection state of the target may be maintained or reset.
[0126]
(2−4. Fourth Modified Example As a fourth modified example of the embodiment of the
present invention, the information processing apparatus 100 may perform the mixing process
using the wide-range sound information selected from a plurality of wide-range sound
information. Specifically, the media control unit 101 selects the wide-range sound information
03-05-2019
35
used to generate the sound control information from the plurality of wide-range sound
information according to the area sound information used to generate the sound control
information. For example, when the target area is set, the media control unit 101 selects the
wide-range sound information generated by the sound collection device 200 disposed near the
target area, and uses the selected wide-range sound information for the mixing process. . When
the target area is not set, the media control unit 101 selects the wide-range sound information
generated by the sound collection device 200 disposed near the second near area.
[0127]
As described above, according to the fourth modification, the information processing apparatus
100 generates the wide-range sound used to generate the sound control information from the
plurality of wide-area sound information according to the area sound information used to
generate the sound control information. Select information Therefore, by selecting a wide range
sound that matches the area sound, it is possible to suppress the sense of discomfort of the
viewer for the mixed sound and to improve the sense of reality. In particular, when the space to
which the information processing system 1 is applied is wide and a plurality of sound collection
devices 200 are installed, the configuration is more effective. Note that the information
processing apparatus 100 may use all acquired wide-range sound information for the mixing
process.
[0128]
<3. Application Example> The embodiment of the present invention has been described
above. In the embodiment of the present invention described above, when the area of the target
area in the video information exceeds the first threshold value, the mixing ratio of the target area
sound information and the wide area sound information changes. However, when the user
approaches the target area in real space, the sound in the target area may gradually become
louder as the user approaches the target area. Therefore, in the application example, a device is
proposed to make the user naturally feel the change in the mixing ratio of the target area sound
information and the wide area sound information.
[0129]
(3−1. Sound Control Pattern According to Application Example) The media control unit 101
03-05-2019
36
according to the application example uses a plurality of sound control patterns which differ in
how the amplification factor of the target area sound or the wide area sound amplification
changes according to the expansion of the target area. One of the sound control patterns is
selected, and the sound control information is generated according to the selected sound control
pattern. For example, the media control unit 101 may measure the degree of attention to the
target area by the user of the media playback device 500, and select the sound control pattern
based on the degree of attention to the target area. Hereinafter, with reference to FIG. 14, a
plurality of sound control patterns according to the application example and a specific example
of pattern selection will be described.
[0130]
FIG. 14 is an explanatory view showing a sound control pattern according to the application
example. In FIG. 14, pattern 1, pattern 8 and pattern 9 are shown as a plurality of sound control
patterns. The media control unit 101 selects the pattern 1 when the target is not selected.
[0131]
In addition, when the target has been selected, the media control unit 101 selects a different
pattern according to the degree of attention to the target area. That is, when the degree of
attention to the target area is small (the degree of attention is less than the threshold), the media
control unit 101 selects pattern 8 and the degree of attention to the target area is large (the
degree of attention is equal to or more than the threshold). In some cases, pattern 9 is selected.
[0132]
In patterns 8 and 9, the amplification factor of wideband sound information is "1.8-(value
according to the area increase of the target area)", the amplification factor of target area sound
information is "0.2 + (area of target area It is common in that it becomes "the value according to
increase. That is, pattern 8 and pattern 9 are common in that the amplification factor of the
broadband sound information decreases with the increase of the area of the target area, and the
amplification factor of the target area sound information increases with the increase of the area
of the target area. However, in the patterns 8 and 9, the rise of the amplification factor of the
target area sound information with respect to the expansion of the target area is different. The
differences between the pattern 8 and the pattern 9 will be specifically described below with
03-05-2019
37
reference to FIGS. 15 and 16.
[0133]
FIG. 15 is an explanatory view showing the relationship between the amplification factor of the
broadband sound information and the target area sound information according to the pattern 8
and the area of the target area. As shown in FIG. 15, in the pattern 8, the change characteristic of
the amplification factor of the target area sound information and the wide area sound
information changes depending on whether the area of the target area is less than the second
threshold. Specifically, the target area when the area of the target area is equal to or greater than
the second threshold when the amplification factor of the target area sound information and the
wide-range sound information changes when the area of the target area is less than the second
threshold. It is smaller than the change of the amplification factor of the sound information and
the wide range sound information. That is, in the pattern 8, the rise of the target area sound
information with respect to the expansion of the area of the target area is gradual, and the fall of
the amplification factor of the wide area sound information with respect to the expansion of the
area of the target area is also gradual.
[0134]
FIG. 16 is an explanatory view showing the relationship between the amplification factor of the
broadband sound information and the target area sound information according to the pattern 9
and the area of the target area. As shown in FIG. 16, in the pattern 9 as well, the change
characteristic of the amplification factor of the target area sound information and the wide area
sound information changes depending on whether the area of the target area is less than the
second threshold. The second threshold may be different between the pattern 8 and the pattern
9.
[0135]
In pattern 9, when the area of the target area is less than the second threshold, the change in
amplification factor of the target area sound information and the wide-range sound information
indicates that the area of the target area is the second threshold or more. It is larger than the
change in amplification factor of the target area sound information and the wide range sound
information. That is, in the pattern 9, the rise of the target area sound information with respect to
03-05-2019
38
the expansion of the area of the target area is sharp, and the fall of the amplification factor of the
wide area sound information with respect to the expansion of the area of the target area is also
steep.
[0136]
The specific values of the variation of the amplification factor according to such pattern 8 and
pattern 9 are arranged as shown in FIG. 17, for example.
[0137]
According to the pattern 8 described above, the amplification factor of the target area sound
information and the wide area sound information does not greatly change in the process of the
area of the target area reaching the second threshold, so that the wide area sound information
can be heard relatively well.
On the other hand, after the area of the target area exceeds the second threshold, the
amplification factor of the target area sound information rapidly increases and the amplification
factor of the wide area sound information rapidly decreases, so that the target area sound
information can be heard well become. That is, according to the pattern 8, it is possible for the
user to hear a wide range of sound for a relatively long time until the target area is expanded to
some extent.
[0138]
Further, according to the pattern 9 described above, the amplification factor of the target area
sound information increases rapidly with the increase of the area of the target area, and the
amplification factor of the wide area sound information decreases rapidly. It is possible to hear
target area sound information.
[0139]
Here, the user who is focusing on the target area is likely to want to hear the target area sound
information.
03-05-2019
39
Therefore, the media control unit 101 according to the application example hears the target area
sound information early to the user who is focusing on the target area by selecting the pattern 9
when the user's attention to the target area is “large”. It is possible to On the other hand, the
user's attention not focused on the target area may be widely directed. Therefore, the media
control unit 101 according to the application example may select the pattern 8 when the user's
attention to the target area is “small”, in which case the user can easily hear the wide-range
sound information. Is possible.
[0140]
Although the example in which the change amount of the sound volume amplification factor is
determined according to the comparison result of the area of the target area and the second
threshold has been described above, the change amount of the sound volume amplification factor
according to the area of the target area is It may be determined. For example, the media control
unit 101 may control so that the change amount of the volume amplification factor increases or
decreases in proportion to the increase of the area of the target area.
[0141]
Moreover, the method in which the media control unit 101 measures the attention level of the
user is not particularly limited. For example, the media control unit 101 uses the eye gaze
detection result of the user from the eye gaze detecting device, and measures the attention
degree based on the time or the number of times the eye gaze facing the target area is detected
within the last specific time. You may
[0142]
(3−2. Processing of System Next, a specific example of processing of a system according to
an application example will be described. Although the flow of the outline frame is common
between the application example and the processing described with reference to FIGS. 7 to 10,
the contents of the sound control message generated by the media control unit 101 are different.
So, below, the example of the sound control message produced | generated by an application
example in the flow of FIGS. 7-10 is mainly demonstrated.
03-05-2019
40
[0143]
(Target Not Selected) In the processing in the case where the target described with reference to
FIG. 7 is not selected, the area H and the area I are set as the second neighboring area (S 607),
and the media control unit 101 in FIG. According to the pattern 1 shown, the sound mixing unit
102 is notified of a sound control message indicating that the sound volume amplification factor
of audioW = 1.8 and the sound volume amplification factor of audioH and auidoI = 0.5 (S 608).
[0144]
(Target Selection) Thereafter, as shown in FIG. 8, when the target area is set to the area E (S622),
the area H and the area I are set as the first neighboring area (S623).
Then, the media control unit 101 sets the audio amplification factor of audioW = 1.8, the audio
amplification factor of audioH and audioI = 0.5, and the audio amplification factor of audioE0.2
of 0.2 as initial values when the target area is not enlarged. The sound mixing unit 102 is notified
of a sound control message to be displayed (S624).
[0145]
(Enlargement of Video) Thereafter, when the area of the area E in the video information is
expanded to 120% as shown in FIG. 9 in a state where the user has a low degree of attention to
the area E (the target area) (S643) The area H and the area I are out of the first near area, and
the area F is newly set as the first near area (S645). Then, the media control unit 101 selects the
pattern 8 shown in FIGS. 14 and 17, and generates a sound control message according to the
pattern 8. Here, when the second threshold is 120%, the media control unit 101 sets audio
volume amplification factor of audioW = 1.78 (1.8−0.02), audio volume amplification factor of
audioF = 0.5, and audioE A sound control message indicating a sound volume amplification factor
of 0.22 (0.2 + 0.02) is generated, and the sound control message is notified to the sound mixing
unit 102 (S646).
[0146]
Furthermore, if the area of the area E in the video information is expanded to 170% while the
user's attention to the area E is low, the media control unit 101 continues to sound according to
03-05-2019
41
the pattern 8 shown in FIG. 14 and FIG. Generate control messages. Specifically, the media
control unit 101 sets audio volume amplification factor of audioW = 0.78 (1.78 to 1.00), audio
volume amplification factor of audioF = 0.5, and audio volume amplification factor of audioE1.22
(0 22. The sound control message indicating the sound control message is notified to the sound
mixing unit 102.
[0147]
(Further Image Enlargement) Thereafter, as shown in FIG. 10, when the area of area E is
expanded to 200% (S663), the area of the image information is equal to or larger than the
threshold, and is other than area E Since there is no area, the first neighboring area is not set.
Therefore, the media control unit 101 follows the pattern 8 and indicates that the sound volume
amplification factor of audio W = 0.18 (0.78-0.6) and the sound volume amplification factor of
audio E of 1.82 (1.22 + 0.6). The control message is generated, and the sound control message is
notified to the sound mixing unit 102.
[0148]
(3−3. Supplement) In the above application example, amplification of the target area sound
information and the wide area sound information with respect to the expansion of the target area
is an example of the relationship between the expansion of the target area and the amplification
factor of the target area sound information or the wide area sound information. Explained the
rate of change. As another example of the relationship between the expansion of the target area
and the amplification factor of the target area sound information or the amplification factor of
the wide area sound information, “2-1. First Modified Example "and" 2-2. As described in the
second modified example, a sound control pattern indicating the relationship between the zoom
factor or virtual distance of the image, the amplification factor of the target area sound
information, and the variation of the amplification factor of the wide area sound information may
be used. .
[0149]
FIG. 18 is an explanatory view showing a specific example of the sound control pattern indicating
the relationship between the zoom factor of the image, the amplification factor of the target area
sound information, and the variation of the amplification factor of the wide area sound
03-05-2019
42
information. In the example shown in FIG. 18, when the degree of attention to the target area is
"small", the pattern 10 in which the rising of the target area sound information is gentle is
selected, and the degree of attention to the target area is "large" In this case, the pattern 11 in
which the rising of the target area sound information is steep is selected.
[0150]
FIG. 19 shows a specific example of the sound control pattern showing the relationship between
the virtual distance (the distance between the virtual viewing position and the target area) and
the amplification factor of the target area sound information and the amplification factor of the
wide area sound information. FIG. In the example shown in FIG. 19, when the degree of attention
to the target area is "small", the pattern 12 in which the rising of the target area sound
information is gentle is selected, and the degree of attention to the target area is "large" In this
case, the pattern 13 in which the rising of the target area sound information is steep is selected.
[0151]
Also, although the example in which the media control unit 101 selects the sound control pattern
according to the user's attention to the target area has been described above, the media control
unit 101 may also select the sound control pattern by another method. It is possible.
[0152]
For example, the media control unit 101 may determine the presence or absence of conversation
in the target area, and select the sound control pattern based on the presence or absence of
conversation in the target area.
Specifically, when there is a conversation in the target area, the media control unit 101 selects a
control pattern in which the start of the target area sound information is steeper than the sound
control pattern selected when there is no conversation in the target area. It is also good. If there
is a conversation in the target area, the user of the media playback device 500 may be interested
in the conversation and may wish to hear the conversation quickly, so the above configuration
makes it possible to realize the operation desired by the user.
[0153]
03-05-2019
43
In addition, the media control unit 101 has a function of setting the image quality of the entire
video information or for each area according to the operation of the user of the media playback
apparatus 500, and the sound control pattern is selected based on the image quality of the target
area in the video information. You may choose. Since the higher the user's attention to the target
area, the higher the target area may be set, the sound control based on the image quality of the
target area in the video information instead of the user's attention to the target area Methods of
selecting patterns are also useful.
[0154]
In addition, the media control unit 101 may measure the number of people staying in the target
area, and select the sound control pattern based on the number of people staying in the target
area. Specifically, the media control unit 101 selects a control pattern in which the rising edge of
the target area sound information is steeper as the number of people staying in the target area is
larger, and the rising of the target area sound information is more gradual as the number of
people staying in the target area is smaller. Control pattern may be selected. Since there is a high
possibility that the user is interested in the target area as the number of people staying in the
target area is high, the above configuration can realize the operation desired by the user.
[0155]
Also, the media control unit 101 may measure the sound pressure of the sound information in
the target area, and select the sound control pattern based on the sound pressure of the sound
information in the target area. Specifically, the media control unit 101 selects a control pattern in
which the rising of the target area sound information is steeper as the sound pressure of the
sound information in the target area is larger, and as the sound pressure of the sound
information in the target area is larger, the smaller A control pattern may be selected in which
the start of the target area sound information is gradual. The sound information in the target
area may be target area sound information emitted from the target area or may be sound
information reaching the target area.
[0156]
03-05-2019
44
<4. Hardware Configuration> The embodiments and application examples of the present
invention have been described above. The processing of the information processing apparatus
100 described above can be realized by cooperation of software and hardware of the information
processing apparatus 100 described below.
[0157]
FIG. 20 is an explanatory diagram showing the hardware configuration of the information
processing apparatus 100 according to an embodiment of the present invention. As shown in
FIG. 20, the information processing apparatus 100 includes a central processing unit (CPU) 132,
a read only memory (ROM) 134, a random access memory (RAM) 136, an internal bus 138, and
an input / output interface 140. , An input device 142, an output device 144, an HDD (Hard Disk
Drive) 146, a network interface 148, and an external interface 150.
[0158]
The CPU 132 functions as an arithmetic processing unit and a control unit, and realizes the
operations of the media control unit 101 and the sound mixing unit 102 in the information
processing apparatus 100 in cooperation with various programs. また、CPU132は、マイク
ロプロセッサであってもよい。 The ROM 134 stores programs or operation parameters used by
the CPU 132. The RAM 136 temporarily stores a program used by the execution of the CPU 132
or parameters and the like appropriately changed in the execution. The ROM 134 and the RAM
136 implement part of the storage unit in the information processing apparatus 100. The CPU
132, the ROM 134, and the RAM 136 are mutually connected by an internal bus 138 including a
CPU bus and the like. The cooperation of the CPU 132, the ROM 134 and the RAM 136 with the
software can realize the functions of the media control unit 101 and the sound mixing unit 102.
[0159]
The input device 142 includes an input unit for inputting information such as a button, a
microphone, a switch and a lever, and an input control circuit which generates an input signal
based on the input and outputs the signal to the CPU 132. By operating the input device 142,
various data can be input to the information processing apparatus 100 and processing
operations can be instructed.
03-05-2019
45
[0160]
The output device 144 performs output to a display device such as, for example, a liquid crystal
display (LCD) device, an OLED (Organic Light Emitting Diode) device, and a lamp. Furthermore,
the output device 144 may perform audio output such as speakers and headphones.
[0161]
The HDD 146 is a device for storing data. The HDD 146 may include a storage medium, a
recording device that records data in the storage medium, a reading device that reads data from
the storage medium, and a deletion device that deletes data recorded in the storage medium. The
HDD 146 stores programs executed by the CPU 132 and various data.
[0162]
Network interface 148 may be configured with a communication device for connecting to a
network. The network interface 148 is a wire communication compatible device that performs
wired communication, but may be a wireless local area network (LAN) compatible
communication device or a 3G or LTE compatible communication device.
[0163]
The external interface 150 is, for example, a bus for connecting to an external device or
peripheral device of the information processing apparatus 100. Also, the external interface 150
may be a USB (Universal Serial Bus).
[0164]
<5. Conclusion> As described above, according to the embodiment of the present invention,
not only local area sound but also wide-range sound in a wider range than the area can be
provided to the viewer according to the image. Thus, the viewer can feel the atmosphere of a
03-05-2019
46
space wider than the area while listening to the area sound. Therefore, it is possible to
simultaneously provide the viewer with the desired sound that has been clarified, and to improve
the sense of presence of the video and the sound provided to the viewer.
[0165]
Further, according to the application example of the embodiment of the present invention, it is
possible to make the user naturally feel the change of the mixing ratio of the target area sound
information and the wide area sound information accompanying the expansion of the target area.
Furthermore, a pattern in which the rising of the target area sound information for the expansion
of the area of the target area is gradual and the pattern for the rising of the target area sound
information for the expansion of the area of the target area are selectively used It is possible for
the user to hear the sound information to be played.
[0166]
Although the preferred embodiments of the present invention have been described in detail with
reference to the accompanying drawings, the present invention is not limited to such examples. It
is obvious that those skilled in the art to which the present invention belongs can conceive of
various changes or modifications within the scope of the technical idea described in the claims.
Of course, it is understood that these also fall within the technical scope of the present invention.
[0167]
For example, in the above embodiment, the information indicating the size of the area is the area,
but the present invention is not limited to this example. For example, the information indicating
the size of the area may be information such as the total length of the area, the length from the
center to the end, or the radius (in the case of a circle).
[0168]
Further, in the above embodiment, although the example in which the imaging device 400 is
controlled by the media control unit 101 has been described, the imaging device 400 may be
03-05-2019
47
controlled by a unit other than the media control unit 101. In that case, the control parameters of
the imaging device 400 may be notified to the media control unit 101.
[0169]
In the above embodiment, an example in which area sound information and wide area sound
information are mixed is described, but area sound information and wide area sound information
are distributed independently, and area sound and wide area sound are output as separate
sounds. It may be output from the device.
[0170]
In the above embodiment, the zoom position is determined by the user's operation. However, the
zoom position may be fixed.
For example, the zoom position may be fixed at the center of the image or some other specific
position.
[0171]
Further, in the above-described embodiment, an example is described in which the designation
operation information and the video operation information are generated based on the user's
operation, but the designation operation information and the video operation information may be
generated by a computer. For example, the designation operation information or the video
operation information may be generated at a preprogrammed timing in the video to be
reproduced and transmitted to the media control unit 101.
[0172]
Further, although the example has been described above in which the amplification factors of
both the target area sound information and the broadband sound information change according
to the expansion of the target area, the target area sound information or the broadband sound
information Either amplification factor may change. For example, even if the amplification factor
of the target area sound information does not change, if the amplification factor of the
03-05-2019
48
broadband sound information decreases, the volume ratio of the target area sound information
relatively increases, so the user can easily hear the target area sound information. Become.
Similarly, even if the amplification factor of the broadband sound information does not change, if
the amplification factor of the target area sound information increases, the volume ratio of the
target area sound information relatively increases, so the user listens to the target area sound
information It will be easier.
[0173]
In addition, the steps shown in the processing diagram of the above embodiment are, of course,
processing performed chronologically in the order described, but of course not necessarily in
chronological order, either in parallel or individually It also includes the processing to be
performed. It is needless to say that even in the steps processed chronologically, the order can be
changed appropriately in some cases.
[0174]
In addition, it is possible to create a computer program for causing hardware incorporated in the
information processing apparatus 100 to exhibit the same function as each functional
configuration of the information processing apparatus 100 described above. A storage medium
storing the computer program is also provided.
[0175]
DESCRIPTION OF SYMBOLS 100 information processing apparatus 101 media control part 102
sound mixing part 200 sound collection apparatus 300 area sound generation apparatus 400
imaging | photography apparatus 500 media reproduction apparatus
03-05-2019
49
Документ
Категория
Без категории
Просмотров
0
Размер файла
68 Кб
Теги
jp2018137677
1/--страниц
Пожаловаться на содержимое документа