close

Вход

Забыли?

вход по аккаунту

?

JP2008113164

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008113164
The present invention provides a technology for efficiently transmitting necessary video
information within a limited communication bandwidth range in data transfer between
conference terminals. Kind Code: A1 A plurality of webcams 107 set in different areas for
imaging ranges and generating video signals, a microphone array 106 collecting voice generated
by a sound source and generating audio signals, and microphone arrays 106 The sound source
direction information generation unit generates sound source direction information indicating
the direction of the sound source based on the sound signal and the web camera 107 set so that
the imaging range is oriented in the direction represented by the sound source direction
information. There is provided a communication apparatus characterized by capturing video of a
person and outputting voice and video of a speaker to a network. [Selected figure] Fig. 6 (A)
Communication device
[0001]
The present invention relates to a communication device that transmits an image together with
voice.
[0002]
In recent years, a conference system for conducting a conference using a plurality of conference
terminals connected via a communication network has become widespread.
04-05-2019
1
Patent Document 1 discloses a technology for supporting the operation of a remote video
conference in which persons at remote places participate in each other. The system disclosed in
this document includes a plurality of video conference terminals, and a multipoint video
conference relay device that mediates exchange of audio information and video information in
each of the terminals. Then, the relay device appropriately superimposes meeting operation
information such as a reference page of materials used in the meeting, a remaining time until the
end of the meeting, and the like on audio information and video information passing through
itself. Unexamined-Japanese-Patent No. 05-145918
[0003]
By the way, the video information which a participant needs regarding the other party is various
according to the situation of a meeting. For example, immediately after the start of the meeting, it
is necessary to know the state of the entire conference room on the other side and the positional
relationship of the participants, etc. During the discussion, it is necessary to know who is
speaking with what expression. There will be. However, when the necessary video information is
to be transmitted to the other side in detail using the technique of Patent Document 1, the
amount of data to be transmitted becomes enormous, and can not be covered by the limited
communication bandwidth.
[0004]
The present invention has been made in response to the above-mentioned problems, and an
object thereof is to provide a technique for efficiently transmitting necessary video information,
for example, in data transfer between conference terminals.
[0005]
In the first embodiment of the communication apparatus according to the present invention, a
plurality of imaging units, each of which has an imaging range set and outputs a video signal
indicating an image within each imaging range, and an audio signal of collected sound Sound
source direction specifying means for specifying the direction of the sound source based on the
sound signal generated by the sound signal generation means, sound source direction
information indicating the specified direction, and the sound source direction information
Selection means for selecting the corresponding photographing means, Output means for
outputting the video signal outputted by the photographing means selected by the selection
means and the audio signal generated by the audio signal generation means to another terminal
apparatus via the network And comprising.
04-05-2019
2
[0006]
Further, in the second embodiment of the communication apparatus according to the present
invention, a plurality of imaging means, each of which sets an imaging range and outputs a video
signal indicating an image within each imaging range, and an audio of collected sound Sound
signal generation means for generating a signal, sound source direction specifying means for
specifying a direction of a sound source based on the sound signal generated by the sound signal
generation means, and sound source direction information indicating the specified direction;
Information selection means for selecting the imaging means corresponding to the information;
audio extraction means for extracting an audio signal corresponding to the sound source
direction information from the audio signal generated by the audio signal generation means;
imaging selected by the selection means It is characterized by comprising an output means for
outputting a video signal output by the means and an audio signal extracted by the audio
extraction means to another terminal apparatus through a network.
[0007]
Further, in the third embodiment of the communication apparatus according to the present
invention, in the second embodiment, the audio signal generation unit has a microphone array in
which a plurality of microphones are arrayed, and the audio extraction unit is the microphone
The sound source direction control device controls the sound collection direction of the array,
and the sound source direction identification means determines the sound source direction
controlled by the sound collection direction controller and the direction from the volume level of
the audio signal output from the microphone array. And a direction specifying device for
specifying
[0008]
Further, in the fourth embodiment of the communication device according to the present
invention, in the first or second embodiment, the audio signal generation unit has a plurality of
microphones capable of individually setting the position, and the sound source direction The
identification means selects one of the microphones recognized as the sound source direction
from the volume level of the audio signal output from each of the microphones, and outputs
information indicating the selected microphone as the sound source direction information, and
the audio signal extraction means The audio signal of the microphone selected by the sound
source direction specifying means is extracted.
[0009]
Further, in the fifth embodiment of the communication apparatus according to the present
invention, in the embodiment described in any one of the first to fourth, the output unit is
04-05-2019
3
configured to capture a specific area among all areas constituting one screen. It is characterized
in that it is set as a display area of the video signal outputted by the photographing means
selected by the means selecting means, and a video signal synthesized so as to display another
video signal is outputted to the other area.
[0010]
Further, in the sixth embodiment of the communication device according to the present
invention, in the fifth embodiment described above, how to divide all areas constituting one
screen of the video signal output by the output means is divided. And a plurality of classification
data for setting the specific area in the classified area, and classification data selection means for
selecting the classification data, wherein the output means is the classification selected by the
classification data selection means One screen is divided according to the data, and the area
indicated by the divided data is recognized as the specific area.
[0011]
In the seventh embodiment of the communication device according to the present invention, in
the embodiment described in any one of the first to sixth embodiments, a plurality of pattern
images showing a plurality of aspects according to the sound source position of the area to be
photographed A table storing the correspondence between the sound source position in each of
the pattern images and the photographing range of each of the photographing means, and a
pattern image selecting means for selecting the pattern image; Each imaging range
corresponding to the selected template is determined with reference to the table, and imaging
range control means for controlling each imaging means to coincide with each determined
imaging range.
[0012]
ADVANTAGE OF THE INVENTION According to the conference terminal which concerns on this
invention, it is effective in the ability to provide the technique which transmits efficiently the
video information required in the data transfer between conference terminals.
[0013]
Hereinafter, a conference terminal which is an embodiment of the present invention will be
described with reference to the drawings.
04-05-2019
4
(A: Configuration) FIG. 1 is a block diagram showing a configuration of a conference system 1
including a conference terminal according to an embodiment of the present invention.
The conference system 1 includes a conference terminal 10A, a conference terminal 10B, and a
communication network 20. The conference terminal 10A and the conference terminal 10B are
connected to the communication network 20 by wire.
The conference terminal 10A and the conference terminal 10B have the same configuration, and
hereinafter, when there is no need to distinguish between the conference terminal 10A and the
conference terminal 10B, both are generically referred to as the conference terminal 10. Note
that two conference terminals communicate here Although illustrated as being connected to the
network 20, three or more conference terminals may be connected.
[0014]
In the present embodiment, each communication protocol described below is used as the
communication protocol.
That is, as a communication protocol of the application layer, Real-time Transport Protocol
(hereinafter, "RTP") is used for transfer of audio data and video data.
RTP is a communication protocol for providing a communication service for transmitting and
receiving voice data and video data in real time end-to-end, and the details thereof are defined in
RFC 1889.
In RTP, data are exchanged between communication terminals by generating and transmitting
and receiving RTP packets.
On the other hand, HTTP (Hypertext Transfer Protocol) is used to transfer image data.
Also, UDP is used as a communication protocol of the transport layer, and IP is used as a
communication protocol of the network layer.
04-05-2019
5
An IP address is assigned to each of the conference terminal 10A and the conference terminal
10B described above, and is centrally identified on the network. Note that HTTP, UDP, and IP are
communication protocols that are generally used widely, and therefore the description thereof is
omitted.
[0015]
Next, the hardware configuration of the conference terminal 10 will be described with reference
to FIG. In the following description, when it is necessary to distinguish which of the conference
terminals 10 the configuration of the conference terminal 10 mentioned above belongs to, the
control unit 101 of the conference terminal 10A may be, for example, the control unit 101A. As
shown in the alphabet.
[0016]
A control unit 101 shown in the figure is, for example, a CPU (Central Processing Unit), and
controls the operation of each unit of the conference terminal 10 by executing various programs
stored in a storage unit 103 described later.
[0017]
The storage unit 103 includes a read only memory (ROM) 103 a and a random access memory
(RAM) 103 b.
The ROM 103 a stores data and a control program for causing the control unit 101 to implement
the functions characteristic of the present invention. Examples of the data include a web camera
selection table, test data, and a transmission rate management table. These data will be described
in detail later. The RAM 103 b is used as a work area by the control unit 101 operating
according to various programs, and stores audio data / video data and the like received from the
microphone array 106 and the web camera 107.
[0018]
The operation unit 104 is, for example, a keyboard, a mouse, etc. When the operator of the
04-05-2019
6
conference terminal 10 operates the operation unit 104 and performs some input operation, data
representing the content of the operation is transmitted to the control unit 101.
[0019]
The microphone array 106 includes a plurality (eight in the present embodiment) of microphone
units (not shown) and an analog / digital (hereinafter abbreviated as “A / D”) converter.
The microphone array 106 has a function as a directional microphone, and has a function of
picking up voice while scanning an azimuth for picking up voice. The control unit 101 analyzes
voice data generated from voices from various directions as such, and identifies a direction with a
large volume level as the direction of a sound source (that is, if the received voice is a human
voice, its speaker) Do. When a plurality of participants speak at the same time and voices are
input simultaneously from a plurality of directions, the control unit 101 compares the volume
levels of the voices from the plurality of directions, and the volume level is the highest. The high
direction is the direction of the sound source. FIG. 5 is a diagram showing an example of the
relative arrangement of the microphone array 106 and the participant 2 in the conference room.
Specifically, information on the sound source direction specified by the microphone array 106 is
generated as the direction of the sound source viewed from the center of the microphone array
106 (the deflection angle 極 in polar coordinates), and is output to the RAM 103b together with
the audio data. In FIG. 5, the speaker is the participant 2a, and the sound source direction is π /
6.
[0020]
The web camera 107 outputs the input from the C-MOS image sensor as a Motion-JPEG moving
image. Motion-JPEG is a moving image data generation method in which an image of each
photographed frame is JPEG-compressed and continuously recorded. The web camera 107
captures an image with a predetermined image size and the number of frames per unit time (fps;
frames per second), performs image compression by JPEG, and outputs the image to the RAM
103 b. The resolution is controlled by the control unit 101 as appropriate using the value preset
for the web camera and the number of frames per unit time. Further, the compression ratio of the
image can be set under the control of the control unit 101 within the range of image
compression by JPEG (compression ratio is 1/5 to 1/60). In the present embodiment, five web
cameras 107 are attached to the conference terminal 10, and the participants 2 can manually
change their orientations.
04-05-2019
7
[0021]
The data output from the web camera 107 is once written to the RAM 103 b, and the control unit
101 generates an RTP packet from the data written to the RAM 103 b. As shown in FIG. 2, the
RTP packet has a header portion attached to the payload portion, as in the packet which is a data
transfer unit in IP and the segment which is a data transfer unit in TCP.
[0022]
For example, in the data transmission message, audio / video data or the like for a predetermined
time (20 milliseconds in the present embodiment) is written in the payload portion. In the
acknowledgment message, the sequence number of the received packet is written. In the header
portion, four types of data of a time stamp, a payload type, a sequence number, and section
information are set. Here, the time stamp is data indicating a time (a time elapsed since the start
of voice communication is instructed). The payload type is data for causing the type of
communication message to be identified to the destination of the communication message. There
are five message types used in this embodiment: a speaker video data transmission message, a
conference room video data transmission message, an audio data transmission message, a
material data transmission message, and a reception notification message. In those messages, five
types of numerals of “1”, “2”, “3”, “4” and “5” are written in the payload type,
respectively. The sequence number is an identifier for uniquely identifying each packet, and, for
example, when one voice data is divided into a plurality of RTP packets and transmitted, each of
1, 2, 3. Sequence numbers are assigned as follows. The section information is information that
defines in which area of the video display unit 105 described later an image included in the RTP
packet is displayed. For example, position information of an image is written in coordinates, such
as (0, 0)-(256, 192).
[0023]
The communication IF unit 102 is, for example, a NIC (Network Interface Card), and is connected
to the communication network 20. The communication IF unit 102 sends, to the communication
network 20, an IP packet obtained by sequentially encapsulating the RTP packet delivered from
the control unit 101 according to the communication protocol of the lower layer. Note that
encapsulation is to generate a UDP segment in which the RTP packet is written in the payload
04-05-2019
8
section, and further to generate an IP packet in which the UDP segment is written to the payload
section. Also, the communication IF unit 102 receives an IP packet through the communication
network 20, and performs the processing reverse to the above-described encapsulation, thereby
reading out the RTP packet encapsulated in the IP packet, and the control unit 101. Hand over.
[0024]
The video display unit 105 is a monitor of 1024 pixels × 768 pixels. Images and data are
displayed based on the data received via the communication IF unit 102.
[0025]
The speaker 108 reproduces the sound represented by the sound data delivered from the control
unit 101, and includes a speaker unit and a D / A converter. The D / A converter converts the
audio data delivered from the control unit 101 into an analog audio signal by performing D / A
conversion and outputs the analog audio signal to the speaker unit. Then, the speaker unit
reproduces the audio according to the audio signal received from the D / A converter. The above
is the hardware configuration of the conference terminal 10.
[0026]
Here, the installation aspect of the meeting terminal 10 in a meeting room is demonstrated. FIG.
6A is a diagram (corresponding to FIG. 5) showing a state of the conference room in which the
conference terminal 10 is installed. A desk 3 is installed in the conference room, and participants
2a, 2b, 2c, and 2d participating in the conference sit on chairs installed around the desk. The
conference terminal 10 is installed beside the desk, and the video display unit 105, the
microphone array 106, the five installed web cameras 107, and the speakers 108 face the
participants. In the following, when it is necessary to distinguish and show each of the five web
cameras 107, they are represented as web cameras 107a, 107b, 107c, 107d and 107e as shown
in FIG. 6A.
[0027]
04-05-2019
9
The image display unit 105 is disposed at a place where all participants can visually recognize.
The microphone array 106 is disposed below the video display unit 105 such that a plurality of
microphone units are horizontally arranged in a line. The speakers 108 are disposed at two
positions on the left and right of the conference terminal 10 so as to sandwich the microphone
array 106.
[0028]
The web cameras 107 are arranged horizontally in a line below the video display unit 105, and
are connected to the conference terminal 10 via lines numbered respectively. Specifically, the
web camera 107 a is connected to line 1, the web camera 107 b to line 2, the web camera 107 c
to line 3, the web camera 107 d to line 4, and the web camera 107 e to line 5.
[0029]
Here, FIG. 6 (B) shows a state where the conference room shown in FIG. 6 (A) is overlooked from
the side of the conference terminal 10. As shown in FIG. In FIG. 6B, areas a, b, c, d and e
respectively indicate areas photographed by the web cameras 107a, 107b, 107c, 107d and 107e
shown in FIG. 6A. Region e includes the entire conference room, and regions a, b, c, and d include
the shooting ranges of the respective web cameras 107 so as to include the participants 2a, 2b,
2c, and 2d, respectively.
[0030]
Next, characteristic functions of the conference terminal 10 will be described. They are roughly
classified into (1) video selection function, (2) available bandwidth measurement function, and
(3) data transmission rate control function. The above functions are realized by the control unit
101 executing a control program stored in the ROM 103a.
[0031]
First, the video selection function will be described. The video selection function is a function of
selecting the web camera 107 directed to the speaker speaking at that time, and displaying the
04-05-2019
10
video taken by the selected web camera 107 on the opposite side conference terminal 10. . Here,
the web camera selection table stored in the ROM 103a will be described. FIG. 7 is a diagram
showing an example of the web camera selection table. In the web camera selection table, the
range of declination, which is sound source direction information, is associated with the line
number to which each web camera 107 is connected.
[0032]
The control unit 101 reads out the sound source direction information written in the RAM 103b,
and identifies the line number by comparing the value of the declination, which is the sound
source direction information, with the web camera selection table. For example, as shown in FIG.
5, when the participant 2a whose argument angle == π / 6 speaks, it corresponds to the
argument angles == 0 to π / 4 of the sound source direction information in the web camera
selection table, The web camera 107 connected to the line 1, that is, the web camera 107a is
selected. Subsequently, the control unit 101 outputs a signal indicating that video data is
generated and output to the web camera 107 connected to the selected line number. The web
camera 107 that receives the signal from the control unit 101 generates video data and outputs
the video data to the RAM 103 b.
[0033]
The control unit 101 transmits the video data generated by the Web camera 107 e to the
opposite conference terminal 10 simultaneously with the video data captured by the speaker as
described above. At that time, the control unit 101 writes “1” to the payload type of the RTP
packet including the video data of the speaker, and “(0, 0) − (512, 384)” to the section
information, On the other hand, "2" is written in the payload type of the video data generated by
the Web camera 107e, and "(0, 384)-(512, 768)" is written in the section information. In
addition, material data is also transmitted in parallel with the video data. "4" is written to the
payload type of the material data, and "(512, 0)-(1024, 768)" is written to the partition
information.
[0034]
The opposite conference terminal 10 receives both video data. The video display unit 105 reads
the section information of the received data, and displays the contents of the data in the
04-05-2019
11
corresponding area of the video display unit 105. Now, as shown in FIG. 4, the display surface of
the video display unit 105 has coordinates set with the upper left in the drawing as the origin.
For example, the partition information of "(0, 0)-(512, 384)" is surrounded by four points of
origin (0, 0), (512, 0), (512, 384) and (0, 384) Represents a region. Accordingly, the video data of
the web camera 107a is displayed on the speaker display unit 105a, the video data of the web
camera 107e on the conference room display unit 105b, and the material data on the material
display unit 105c.
[0035]
Next, the available bandwidth measurement function will be described with reference to the
flowchart shown in FIG. The available bandwidth measurement function is a function of
measuring the maximum communication bandwidth that can be used by the communication
network 20 when performing data communication with the opposite conference terminal via the
communication network 20. First, the control unit 101 determines a transmission interval when
transmitting a packet (step SC100). When the available bandwidth measurement process is
performed for the first time, a predetermined transmission interval is selected. Next, the control
unit 101 generates a plurality of packets from the test data stored in the ROM 103a, and
transmits the plurality of packets to the opposite conference terminal at the selected
transmission interval (step SC110). At this time, the control unit 101 writes the sequence number
of each transmitted packet into the RAM 103 b. The test data is video data based on Motion-JPEG
similarly to the video data generated in advance by the Web camera 107, and the video content
may be anything.
[0036]
The control unit 101 of the opposite conference terminal 10 receives the test data, writes the
sequence number of each received packet in the reception notification message, and returns the
reception notification message to the transmission conference terminal. The control unit 101 of
the transmitting conference terminal 10 receives the reception notification message returned
from the opposite conference terminal (step SC120), and the packet loss incidence rate in the
transmission of the test data (number of packets not received / The number of transmitted
packets is calculated (step SC130).
[0037]
04-05-2019
12
When the test data is transmitted at the predetermined transmission interval and the control unit
101 does not generate a packet loss (step SC130; "No"), the control unit 101 performs the
process after step SC100 again. At that time, in step SC100, a transmission interval shorter by a
predetermined ratio than the transmission interval used in the processing of the previous step
SC100 to step SC130 is selected. As the transmission interval becomes shorter, the amount of
transmission data per unit time, that is, the transmission rate becomes higher. Generally, when
the transmission rate is small compared to the available bandwidth, the incidence rate of packet
loss is small, and the incidence rate of packet loss increases as the transmission rate increases
relative to the available bandwidth. It has been known. Therefore, when a packet loss occurs in
the transmission of test data, it means that the transmission rate used at that time exceeds the
available communication bandwidth. Therefore, the control unit 101 sequentially shortens the
transmission interval of each packet and repeatedly performs step SC100 to step SC130, and if
packet loss occurs (step SC130; "Yes"), the test data is output one time ago. The transmission rate
at the time of transmission (data amount of test data / time taken for transmission) is calculated
as the available bandwidth at that time (unit: BPS; Byte / second) (step SC140). The above is the
available bandwidth measurement process.
[0038]
Next, the data transmission rate control function will be described. The data transmission rate
control function is a function that executes a process of controlling the data transmission rate of
audio data and video data to be output according to the available communication bandwidth
measured by the available bandwidth measurement function.
[0039]
FIG. 9 is a diagram showing an example of the transmission rate management table. In the
transmission rate management table, the number of frames per unit time (frames per second) set
in the web camera 107 and the JPEG image compression rate in the Motion-JPEG system are
defined in correspondence with the available communication bandwidth. It is done. The control
unit 101 compares the available communication bandwidth measured by the above-mentioned
available bandwidth measurement function with the transmission rate management table stored
in the ROM 103 a to obtain a unit time corresponding to the available bandwidth. Choose the
number of frames per frame and the compression rate of the JPEG image.
04-05-2019
13
[0040]
The control unit 101 sets the number of frames per unit time and the compression rate of the
JPEG image to the selected value in all the web cameras 107. When data communication is
started, the web camera 107 generates video data with the set number of frames per unit time,
and the control unit 101 compresses the generated video data at the selected JPEG image
compression rate. The above is the data transmission rate control function.
[0041]
(B: Operation) Next, an operation performed by the conference terminal 10 when a participant on
the conference terminal 10A side and a participant of the conference terminal 10B conduct a
teleconference will be described. Note that, at the start of the remote conference, each of the web
cameras 107 is assumed to be installed so as to be able to capture an area shown in FIG. 6B.
Further, in the web camera selection table, as shown in FIG. 7, the direction of the sound source
and the line number to which each web camera 107 is connected are associated and written.
[0042]
First, the control unit 101A performs parameter adjustment processing immediately after the
start of data communication and at regular intervals. FIG. 10 is a flowchart showing a flow of
parameter adjustment processing performed by the side (conference terminal 10A) that
transmits data.
[0043]
The control unit 101A first performs available bandwidth measurement processing (step SA100).
Next, the control unit 101 checks the measurement value of the available bandwidth
measurement process against the transmission rate management table (see FIG. 9) stored in the
ROM 103a, and uses the largest value among the smaller ones in the table. Choose the number of
frames associated with the possible bandwidth and the compression rate of the JPEG image. The
control unit 101A sets the number of shooting frames per unit time of all the web cameras 107
to the selected value (step SA110), and sets the compression rate of the JPEG image to the
selected value (step SA120). In the following step SA130, it is determined whether or not a
04-05-2019
14
predetermined time has elapsed since the parameter adjustment process was started. When the
determination result of step SA130 is "No", the process of step SA130 is repeated until a fixed
time passes. When the predetermined time has elapsed, the determination result of step SA130
becomes "Yes", and step SA140 is performed. At step SA140, control unit 101A determines
whether the data communication has ended. When the determination result of step SA140 is
"No", the process after step SA100 is performed again. If the determination result in step SA140
is "Yes", the control unit 101 ends the parameter adjustment process. From the above process,
the control unit 101 performs the available bandwidth measurement process at regular intervals
after the start of data communication, and various parameters related to data transmission are
set each time according to the measured available bandwidth. It will be. As a result, it is possible
to transmit data in accordance with the available communication bandwidth that changes from
moment to moment, and it is possible to transmit data efficiently and safely.
[0044]
The following describes the operation performed by the conference terminal 10A when the
participant 2a on the conference terminal 10A side speaks during the teleconference and the
participant on the conference terminal 10B listens to the speech. FIG. 11 is a flowchart showing a
flow of processing performed by the conference terminal 10 when data communication is
performed. When the participant 2a makes a speech, the microphone array 106A picks up the
voice and generates voice data (step SB100). The control unit 101A generates azimuth
information of the sound source based on the generated voice data (step SB110). In this case,
since the participant 2a is positioned as shown in FIG. 5, the sound source direction information
is π / 6, which is the direction (declination) of the participant 2a. Audio data and sound source
direction information generated by the microphone array 106A are written to the RAM 103bA.
[0045]
The control unit 101A identifies one of the line numbers (line 1 in this case) to which the web
camera 107 is connected by comparing the sound source direction information written in the
RAM 103bA with the web camera management table (step SB120) . Next, the control unit 101A
causes the Web camera 107aA specified in step SB120 to generate video data representing a
detailed video of the participant 2a (step SB130).
[0046]
The control unit 101A transmits the speaker video data generated by the web camera 107aA, the
04-05-2019
15
conference room video data generated by the web camera 107eA, the audio data, and the
material data to the conference terminal 10B (step SB140).
[0047]
The conference terminal 10B receives video data, audio data, and material data representing a
speaker video and a conference room video.
Audio data is reproduced from the speaker 108B. For the video data and the document data, the
control unit 101B reads out the section information of each video data. In the section
information, the area of the video display unit 105B where each data is to be displayed is written
in coordinates. The video display unit 105B displays each data content in a predetermined area
based on the section information.
[0048]
According to the above process, the participant who uses the conference terminal 10B on the
other party listens to the voice of the speaker who uses the conference terminal 10A from the
speaker 108B, while the document data presented by the speaker and the expression of the
speaker are displayed. It can be viewed visually. (C: Modifications) Although the embodiments of
the present invention have been described above, the present invention can be implemented in
various forms described below.
[0049]
(1) In the above embodiment, the video selection function, the available bandwidth measurement
function, and the data transmission rate control function are provided for the conference
terminal, but the installation target is of course not limited to the conference terminal. For
example, the present invention may be applied to a server device that generates video data and
audio data and provides them to a client device.
[0050]
(2) In the above embodiment, the case has been described where the characteristic function of
04-05-2019
16
the conference terminal according to the present invention is realized by the software module,
but the conference terminal according to the present invention is realized by combining
hardware modules carrying the above respective functions. May be configured.
[0051]
(3) In the embodiment described above, the case of using RTP as the communication protocol of
the application layer for communication of video data and audio data has been described, but it is
needless to say that other communication protocols may be used.
The point is that any communication protocol may be used as long as it is a communication
protocol in which video data or audio data is written for each predetermined time and
transmitted in the payload of a data block having a predetermined header and a payload. . In the
above-described embodiment, although the case of using UDP as the communication protocol of
the transport layer has been described, TCP may be used. Similarly, the communication protocol
of the network layer is not limited to IP.
[0052]
(4) In the embodiment described above, the case of transmitting and receiving video data, audio
data, and material data in parallel was described, but the type of data is not limited to those data
types. It is not necessary to transmit and receive the material data.
[0053]
(5) In the above embodiment, the case where the conference terminal 10A and the conference
terminal 10B are connected by wire to the communication network 20 has been described, but
the communication network 20 is a wireless packet communication network such as a wireless
LAN (Local Area Network). Of course, the conference terminal 10A and the conference terminal
10B may be connected to this wireless packet communication network. Further, although the
case where the communication network 20 is the Internet has been described in the above
embodiment, it goes without saying that the communication network 20 may be a LAN (Local
Area Network). The point is that any communication network may be used as long as it has a
function of mediating communication performed according to a predetermined communication
04-05-2019
17
protocol.
[0054]
(6) In the above embodiment, the control program for causing the control unit 101 to realize the
functions characteristic of the communication apparatus according to the present invention has
been written in advance in the ROM 103a. The control program may be recorded and distributed
on a computer-readable recording medium, or the control program may be distributed by
downloading via a telecommunication line such as the Internet.
[0055]
(7) In the above embodiment, the case of performing control by the same resolution, the number
of frames per unit time, and the JPEG image compression rate as the video data of the video of
the speaker is also described for the video data of the entire conference room. The video of the
entire conference room may be sent out by lowering the number of frames per unit time or
increasing the compression rate of the JPEG image as necessary, or still images may be sent at
regular intervals. .
[0056]
(8) In the above embodiment, the case where the video data of the speaker is a moving image has
been described. However, a still image may be transmitted as needed or according to the line
condition of the communication network.
[0057]
(9) In the above embodiment, the case of specifying the speaker by using the microphone array
has been described.
However, the speaker identification method is not limited to the microphone array.
For example, in the conference room, a plurality of microphones (preferably equal in number to
the number of participants) are installed in front of the participants, and among the sounds
collected by the plurality of microphones, audio data with the highest volume level is selected It
may be determined that the participant corresponding to the microphone that has generated the
selected voice data is a speaker, and a camera directed to the specified speaker may be selected.
04-05-2019
18
Further, buttons may be provided as many as the number of participants as the operation unit
104, and each participant may make a speech after pressing the button. In that case, the control
unit 101 may receive a signal output from the pressed button, specify a speaker, and select the
web camera 107 directed to the speaker.
[0058]
(10) In the above embodiment, the web camera generates video data by the Motion-JPEG
method. However, the video recording method is not limited to Motion-JPEG, and other methods
such as MPEG may be used.
[0059]
(11) Although the case where five web cameras 107 are installed has been described in the
above embodiment, the number of installed web cameras 107 is not limited to five. It may be
determined appropriately according to the number of participants and the like. The same applies
to the number of lines (terminals) connected to the web camera 107 provided in the conference
terminal 10, as long as the number is larger than the number of web cameras 107 installed.
[0060]
(12) In the above embodiment, the case where the participant manually adjusts the imaging area
of the web camera 107 in advance has been described. However, the adjustment of the imaging
area of the web camera 107 may be performed under the control of the control unit 101. In that
case, an embodiment as described below is possible. The conference terminal 10 is provided with
drive means for adjusting the imaging area by rotating the web camera horizontally and
vertically, and the control means 101 can control the drive means. Also, the "arrangement
template" is stored in advance in the ROM 103a. FIG. 12 is a diagram showing an example of the
arrangement template. A plurality of patterns A, B, and C are set in the arrangement template.
For each pattern, the position of the participant and the web camera shooting direction are
defined in association with each other. Here, the position of the participant is a diagram
schematically showing the relative position of each participant with reference to the microphone
array 106, while the photographing direction of the web camera is that of the web camera 107
04-05-2019
19
connected to each line. It is the shooting direction. Note that the shooting direction is
represented as a declination angle in polar coordinates similar to that used when the microphone
array generates the sound source direction information. To start the meeting, the video display
unit 105 displays the template image written in the item "Position of Participant" in the
arrangement table. A participant who uses the conference terminal selects one of the plurality of
patterns displayed on the video display unit 105 that matches the number of participants in the
conference and the seat position. When a specific pattern is selected, the control unit 101 drives
the drive unit to direct each of the web cameras 107 connected to each line in the web camera
photographing direction defined in the selected pattern. According to the embodiment described
above, the participant manually sets the position of the web camera every time a subsequent
meeting is held, by creating a template according to the installation position after installing the
conference terminal 10 in the conference room. It is possible to save time and effort.
[0061]
(13) In the above embodiment, the case where the display surface of the video display unit 105 is
divided as shown in FIG. 4 has been described. However, the section of the video display unit 105
is not limited to the above-described aspect. For example, the section template shown in FIG. 13
is stored in the ROM 103a. In the section template, a plurality of templates schematically
showing the section method of the display surface of the video display unit 105 are defined for
the number of people who participate in the conference. In each template, the section in which
the number is written displays the video of the Web camera 107 (video of each participant)
connected to the line number of the number. For example, in a section displayed as 1/2/3, an
image of any one of the web cameras 107 connected to the line numbers 1 to 3 is displayed. In
the section where the "conference room" is written, the video of the web camera 107e connected
to the line 5 is displayed as in the conference room display unit 105b in the above embodiment.
The section in which "data" is written is the same as the data display unit 105c in the above
embodiment. Now, when the conference starts, the participant selects a specific template
according to the number of participants in the conference. When a specific template is selected,
the video display unit 105 is partitioned as defined by the selected template, and the video and
data of each web camera 107 are displayed in each partition as defined by the template. Ru. By
doing as described above, it is possible to display the information required by the participant on
the video display unit 105 in a layout adapted to the situation at that time. Also, after the default
template is selected as described above, the template may be switched for a short time when the
speaker speaks. That is, the speaker switching template is separately selected, and when the
speaker is switched, the control unit 101 switches the partitioning method of the video display
unit 105 from the default template to the speaker switching template for a predetermined time,
and a predetermined time elapses. Then, it returns to the default template. For example, if three
participants select template A as the default template and template C as the speaker switching
04-05-2019
20
template, for example, as shown in template C when participant A starts speaking The video of
the participant A who is the speaker is displayed large at the center of the screen at the center of
the video display unit 105, and then the video of all the participants, the video of the entire
conference room, and the material are displayed as shown in the template A. .
Similarly, when Participant B makes a speech, the image of Participant B is displayed large at the
center of the screen for a predetermined time after the start of the speech. As described above,
by switching the template also during the meeting, it is possible to notify that the speaker has
changed, or to notify the participant which participant has started speaking.
[0062]
(14) In the above embodiment, the case where the video display unit 105 displays the speaker /
conference room video data and the material data in a divided manner as shown in FIG. 4 has
been described. However, the division of the video display unit 105 is not limited to the abovedescribed aspect, and may be performed in other various aspects such as the size and positional
relationship of the display area of each data.
[0063]
(15) In the above embodiment, the case has been described in which the area in which each
video data is displayed on the video display unit 105 is specified by including the section
information in the header portion of the RTP packet. However, the conference terminal 10 on the
receiving side of the video data may control which area of the video display unit 105 is to display
each video data. In that case, the other party conference terminal 10 divides the video display
unit 105 into a plurality of areas in advance, and associates the types of video (speaker video,
conference room video) with the respective sections. When the video data is received, the
payload type of the packet may be read out, the type of the video may be determined from the
payload type, and may be allocated to the corresponding video display unit 105 and output.
[0064]
(16) In the embodiment described above, the display surface of the display unit 105 is divided to
display a plurality of data. However, for example, when displaying the video of the speaker on the
04-05-2019
21
entire display surface, the division of the display surface of the video display unit 105 may not be
performed.
[0065]
It is a block diagram showing composition of a conference system containing a conference
terminal concerning the present invention. It is a figure which shows the structure of a RTP
packet. FIG. 2 is a block diagram showing the configuration of a conference terminal 10. It is a
figure showing an example of a picture display part. It is a figure explaining detection of a sound
source direction by a microphone array. It is a figure which shows the positional relationship of
the meeting terminal and participant in a meeting room. It is a figure which shows the positional
relationship of the participant seen from the side of a meeting terminal. It is a figure which shows
an example of a web camera selection table. It is a flowchart which shows the flow of available
bandwidth measurement processing. It is a figure which shows an example of a transmission rate
management table. It is a flowchart which shows the flow of a parameter adjustment process. It is
a flowchart which shows the flow of data transfer. It is a figure which shows an example of an
arrangement | positioning template. It is a figure which shows an example of a division |
segmentation template.
Explanation of sign
[0066]
DESCRIPTION OF SYMBOLS 1 ... conference system, 2 ... participant, 3 ... desk, 10, 10A, 10B ...
conference terminal, 20 ... communication network, 101 ... control part, 102 ... communication IF
part, 103 ... storage part (103a; ROM, 103b; RAM) 104 operation unit 105 image display unit
105a speaker display unit 105b conference room display unit 105c material display unit 106
microphone array 107 Web camera 108 speaker 109 …bus
04-05-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
38 Кб
Теги
jp2008113164
1/--страниц
Пожаловаться на содержимое документа