close

Вход

Забыли?

вход по аккаунту

?

JP2004130427

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2004130427
The present invention provides a robot apparatus capable of performing more natural operations
on an object and improving entertainment characteristics, and an action control method of the
robot apparatus. A robot device 1 estimates a sound source direction from voice data, a CCD
camera 22, a microphone 24, a moving body detection module 32 for detecting a moving body
from image data, a face detection module 33 for detecting a human face, Control means for
controlling to move to any one of a sound source direction estimation module 34, a moving body
direction based on the moving body detection result, a face direction based on the face detection
result, and the estimated sound source direction; Controls to move in the face direction when the
face is detected while walking in the moving object direction or the sound source estimation
direction, and stops walking when the target object that is the face detection target is approached
within a predetermined range Control to [Selected figure] Figure 9
Robot device and motion control method of robot device
The present invention relates to a robot apparatus which is movable with two legs or four legs or
the like, and which operates autonomously, and an operation control method for the robot
apparatus, and more particularly to a human face. The present invention relates to a robot
apparatus that autonomously moves in response to a call, a movement, or the like, and an
operation control method thereof. [0002] A mechanical device that performs movement similar
to the movement of a human (organism) using electrical or magnetic action is called a "robot".
Robots began to spread in Japan from the end of the 1960s, but most of them were industrial
robots (Industrial Robot), such as manipulators and transport robots for the purpose of
automation and unmanned production in factories. Met. [0003] Recently, practical robots have
been developed to support life as a human partner, that is, to support human activities in various
situations in daily life such as the living environment. Unlike industrial robots, such practical
04-05-2019
1
robots have the ability to learn by themselves different ways of adapting to different human
beings or different environments in various aspects of human living environment. For example, a
?pet-type? robot that simulates the body mechanism of a four-legged animal such as a dog or a
cat and its movement, or a body mechanism or movement of an animal that performs two-leg
upright walking is designed as a model ? Legged mobile robots such as "human-type" or
"humanoid" robots are already in practical use. These legged mobile robots are sometimes
referred to as entertainment robots because they can perform various operations with an
emphasis on entertainment as compared to industrial robots. The legged mobile robot has an
appearance shape as close as possible to the appearance of an animal or human being, and is
designed to perform an operation as close as possible to the motion of an animal or human
being. For example, in the case of the four-legged "pet-type" robot mentioned above, it has an
appearance similar to a dog or cat bred in a general family, and works from a user (owner) such
as "tap" or "boiling" And act autonomously according to the surrounding environment etc. For
example, as an autonomous action, it acts like "growing", "sleeping", etc. like an actual animal. By
the way, in such an entertainment type robot apparatus, a robot apparatus in which the
entertainment property is improved by detecting a moving body in an image is disclosed in
Patent Document 1 below.
The robot apparatus described in the patent document 1 includes an imaging unit for imaging
the outside, a first detection unit for detecting the movement of the entire image based on image
information from the imaging unit, and a first detection unit. A second detection means for
detecting a motion in the image is provided by a predetermined motion detection process in
consideration of the detection result, and the detection result of the second detection means is
reflected on the above-mentioned action. The first detection means calculates a matching score
between the divided image of the current frame based on the image information and the divided
image of the position corresponding to the previous frame, and detects the movement amount of
the entire image based on the calculation result. The second detection means raises the
sensitivity of motion detection in the image so that even small motions can be detected when the
detected amount of movement is small, and erroneously by decreasing the sensitivity when the
amount of movement is large. The detected moving object detection result is reflected on its own
movement while reducing the detection. That is, for example, when a large movement is suddenly
detected, the parameter of "surprise" in the action generation module of the robot apparatus
rises, and the behavior of the emotional expression of "surprise" is determined, etc. Can be
improved. Patent Document 1: Japanese Patent Application Laid-Open No. 14-251615 However,
in the entertainment type robot apparatus, it responds particularly to the movement of only a
specific object such as a human being. In such a case, even if only the motion of the detected
moving body is used as in the robot apparatus described in the above-mentioned Patent
Document 1, for example, a plurality of moving bodies in a captured image may be used. It is
difficult to get an action when the object is not moving etc. For example, even if the robot
apparatus can detect an object even if it is an object without movement, and if the action is
04-05-2019
2
performed in response to the target person's call, movement, etc., petness and animalness are
Exposed to further enhance entertainment. The present invention has been proposed in view of
such conventional circumstances, and is capable of performing more natural operations on a
target object to improve the entertainment property and a robot apparatus behavior control
method. Intended to provide. SUMMARY OF THE INVENTION In order to achieve the abovementioned object, a robot apparatus according to the present invention performs an operation
according to an external operation and / or an autonomous operation based on an internal state.
Robot apparatus, which moves in a sound source direction estimated by the sound source
direction estimation means, sound source direction estimation means for estimating a sound
source direction from sound data detected by the sound detection means; The sound source
direction estimation means estimates the current sound source direction based on the history of
information on the sound source direction estimated in the past.
In the present invention, when estimating the sound source direction based on the voice data, it
may not be possible to estimate the sound source direction only with the current voice data, but
at that time, the history of the sound source direction estimated in the past is By referring to it, it
is possible to estimate the current sound source direction. Further, the voice detection means is
provided on a head rotatably connected to the body portion, and the control means is configured
to have no history of information on the sound source direction estimated in the past. The sound
source direction estimation means can estimate the sound source direction from the sound data
detected before and after the rotation, and the head can be controlled even if there is no history
of the sound source direction in the past. The sound source direction can be estimated from the
sound data before and after the rotation detected by the rotation. A robot apparatus according to
the present invention is a robot apparatus that executes an operation according to an external
operation and / or an autonomous operation based on an internal state, comprising: an imaging
unit; a voice detection unit that detects a voice; Moving object detection means for detecting a
moving object from image data captured by the imaging means, face detection means for
detecting a human face from the image data, and estimation of sound source direction from voice
data detected by the voice detection means A sound source direction estimation means, and a
control means for controlling to perform the face detection with priority among at least the
moving object detection, the face detection, and the sound source direction estimation, are
characterized. In the present invention, in the case where the robot apparatus is performing a
plurality of processes such as face detection, moving object detection, voice detection and the
like in parallel, it preferentially performs face detection and reflects it in action. , Improve the
target identification rate more. BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter,
specific embodiments to which the present invention is applied will be described in detail with
reference to the drawings. This embodiment is an application of the present invention to an
autonomous robot apparatus that acts autonomously according to the surrounding environment
(or external stimulus) or the internal state. In the present embodiment, first, the configuration of
the robot apparatus will be described, and then the application portion of the present invention
04-05-2019
3
in the robot apparatus will be described in detail. (1) Configuration of Robot Device According to
the Present Embodiment As shown in FIG. 1, the robot device 1 according to the present
embodiment is a so-called pet robot that simulates an animal such as a ?dog?, The leg units 3A,
3B, 3C, 3D are connected to the front, rear, left and right of the body unit 2, and the head unit 4
is connected to the front end of the body unit 2.
As shown in FIG. 2, the trunk unit 2 includes a central processing unit (CPU) 10, a dynamic
random access memory (DRAM) 11, a read only memory (ROM) 12, and a personal computer
(PC) card interface. A control unit 16 formed by mutually connecting the circuit 13 and the
signal processing circuit 14 via the internal bus 15 and a battery 17 as a power source of the
robot apparatus 1 are accommodated. Further, an angular velocity sensor 18 and an acceleration
sensor 19 for detecting the acceleration of the direction and movement of the robot device 1 are
accommodated in the body unit 2. In the body unit 2, a speaker 20 for outputting a voice or
melody such as a cry is disposed at a predetermined position as shown in FIG. The tail 5 of the
body unit 2 is provided with an operation switch 21 as a detection mechanism for detecting an
operation input from the user. The operation switch 21 is a switch that can detect the type of
operation performed by the user, and the robot device 1 may, for example, be praised or scolded
according to the type of operation detected by the operation switch 21. Recognize. The head unit
4 corresponds to the ?eye? of the robot apparatus 1 and includes a CCD (Charge Coupled
Device) camera 22 for capturing an external condition, an object color, a shape, a motion, and the
like, A distance sensor 23 for measuring the distance to an object located in the microphone, a
microphone 24 corresponding to the left and right ?ears? of the robot apparatus 1 for
collecting external sound, for example, an LED (Light Emitting Diode) As shown in FIG. 1, the
light emission part 25 grade | etc., Provided with each is each arrange | positioned in a
predetermined position. However, the light emitting unit 25 is referred to as an LED 25 as
needed in the description of the configuration and the like. Further, although not shown in FIG. 1,
a head switch 26 is provided inside the head unit 4 as a detection mechanism for indirectly
detecting a touch of the user on the head unit 4. The head switch 26 is, for example, a switch that
can detect the tilt direction when the head is moved by the contact of the user, and the robot
device 1 detects the tilt direction of the head detected by the head switch 26. In accordance with,
they recognize whether they are praised or beaten. The joint portion of each leg unit 3A to 3D,
the connection portion between each leg unit 3A to 3D and the body unit 2, and the connection
portion between the head unit 4 and the body unit 2 have a number of degrees of freedom The
actuators 281 to 28 n and the potentiometers 291 to 29 n are respectively disposed.
The actuators 281 to 28 n include, for example, servomotors. By driving the servomotors, the leg
units 3A to 3D are controlled to shift to the target attitude or operation. At the position
corresponding to the "meat ball" at the tip of each of the leg units 3A to 3D, meat ball switches
27A to 27D as a detection mechanism for mainly detecting a touch from the user are provided,
04-05-2019
4
and the touch by the user can be detected It is supposed to be. In addition to this, although not
shown here, the robot apparatus 1 also emits a light emitting unit for representing an operation
state (operation mode) different from the internal state of the robot apparatus 1, charging,
starting up, A status lamp or the like indicating the status of the internal power supply, such as
start and stop, may be appropriately provided at an appropriate place. In the robot device 1,
various switches such as the operation switch 21, the head switch 26 and the flesh and ball
switch 27, various sensors such as the angular velocity sensor 18, the acceleration sensor 19,
and the distance sensor 23, the speaker 20, and the microphone The light emitting unit 25, the
actuators 281 to 28n, and the potentiometers 291 to 29n are connected to the signal processing
circuit 14 of the control unit 16 via the corresponding hubs 301 to 30n. On the other hand, the
CCD camera 22 and the battery 17 are directly connected to the signal processing circuit 14
respectively. The signal processing circuit 14 sequentially takes in the switch data supplied from
the various switches described above, the sensor data supplied from the various sensors, the
image data, and the audio data, and sequentially stores them in the DRAM 11 via the internal bus
15. Store sequentially at a predetermined position. Further, the signal processing circuit 14
sequentially fetches battery residual amount data representing the battery residual amount
supplied from the battery 17 together with this, and stores it at a predetermined position in the
DRAM 11. The switch data, the sensor data, the image data, the voice data, and the battery
remaining amount data stored in the DRAM 11 in this manner are used when the CPU 10
controls the operation of the robot apparatus 1. The CPU 10 reads out the control program
stored in the flash ROM 12 and stores it in the DRAM 11 at the initial stage when the power of
the robot apparatus 1 is turned on. Alternatively, the CPU 10 reads the control program stored in
the semiconductor memory device mounted in the PC card slot of the body unit 2 (not shown in
FIG. 1), for example, the memory card 31 via the PC card interface circuit 13 to the DRAM 11
Store.
As described above, the CPU 10 is based on each sensor data, image data, voice data, and battery
residual amount data sequentially stored in the DRAM 11 from the signal processing circuit 14
as described above. It judges the presence or absence of instructions and action. Furthermore,
the CPU 10 determines the subsequent action based on the determination result and the control
program stored in the DRAM 11, and drives the necessary actuators 281 to 28 n based on the
determination result, thereby the head unit 4 And swing the leg units 3A to 3D to cause them to
walk. Further, at this time, the CPU 10 generates audio data as necessary, and supplies the audio
data to the speaker 20 as an audio signal through the signal processing circuit 14 to output
audio based on the audio signal to the outside. The signal which instructs lighting and
extinguishing of the LED in the light emitting unit 25 described above is generated, and the light
emitting unit 25 is turned on and off. Thus, in this robot apparatus 1, it is possible to act
autonomously according to the situation of oneself and the surroundings, and instructions and
actions from the user. (2) Software Configuration of Control Program Here, the software
configuration of the above-described control program in the robot apparatus 1 is as shown in
04-05-2019
5
FIG. In FIG. 3, the device driver layer 40 is located at the lowest layer of the control program, and
comprises a device driver set 41 consisting of a plurality of device drivers. In this case, each
device driver is an object permitted to directly access hardware used in a normal computer such
as a CCD camera 22 (FIG. 2) or a timer, and receives an interrupt from the corresponding
hardware. Do the processing. Also, the robotic server object 42 is located at the lowest layer of
the device driver layer 40, and provides an interface for accessing hardware such as the various
sensors and actuators 281 to 28 n described above. A virtual robot 43 consisting of a group of
software, a power manager 44 consisting of a software group managing power switching and the
like, a device driver manager 45 consisting of a software group managing various other device
drivers, a robot A designed robot 46 is composed of a software group that manages the
mechanism of the device 1.
The manager object 47 is composed of an object manager 48 and a service manager 49. The
object manager 48 is a software group that manages activation and termination of each software
group included in the robotic server object 42, the middleware layer 50, and the application
layer 51, and the service manager 49 It is a software group that manages the connection of each
object based on the connection information between the objects described in the connection file
stored in the memory card 31 (FIG. 2). The middle wear layer 50 is located on the upper layer of
the robotic server object 42, and is composed of a group of software that provides basic
functions of the robot apparatus 1 such as image processing and audio processing. There is.
Further, the application layer 51 is located in the upper layer of the middleware layer 50, and
determines the action of the robot apparatus 1 based on the processing result processed by each
software group configuring the middleware layer 50. It consists of a group of software to The
specific software configurations of the middleware layer 50 and the application layer 51 are
shown in FIG. As shown in FIG. 4, the middle wear layer 50 is for noise detection, temperature
detection, brightness detection, scale recognition, distance detection, attitude detection, contact
detection, operation input detection. Recognition system 71 having signal processing modules 60
to 69 for motion detection and color recognition, an input semantics converter module 70 and
the like, an output semantics converter module 79 and attitude management, tracking, motion
reproduction, and walking And an output system 80 having signal processing modules 72 to 78
for fall recovery, LED lighting, and sound reproduction. The respective signal processing modules
60 to 69 of the recognition system 71 correspond to respective sensor data, image data and
voice data read out from the DRAM 11 (FIG. 2) by the virtual robot 43 of the robotic server
object 42. Data to be processed, and performs predetermined processing based on the data, and
provides the processing result to the input semantics converter module 70. Here, for example,
the virtual robot 43 is configured as a part that exchanges or converts signals according to a
predetermined communication protocol.
The input semantics converter module 70 determines ?noisy?, ?hot?, ?bright?, ?heard the
04-05-2019
6
scale of the dormant?, ?disturbance? based on the processing result provided from each of
the signal processing modules 60 to 69. Self and surrounding conditions such as ?detected
object?, ?detected a fall?, ?hited?, ?comemed?, ?detected a moving object? or
?detected a ball?, and the user And recognize the command and the action from and output the
recognition result to the application layer 51 (FIG. 2). As shown in FIG. 5, the application layer 51
is composed of five modules of a behavior model library 90, a behavior switching module 91, a
learning module 92, an emotion model 93 and an instinct model 94. In the behavior model
library 90, as shown in FIG. 6, "when the remaining amount of battery is low", "fall over," "when
avoiding obstacles", "when expressing emotions" Independent behavior models 901 to 90 n are
provided respectively corresponding to several preselected condition items such as ?when a ball
is detected?. Then, these behavioral models 901 to 90 n can be used as needed, for example,
when a recognition result is given from the input semantics converter module 71 or when a
predetermined time has elapsed since the last recognition result was given. The action to follow
is determined by referring to the parameter value of the corresponding emotion held in the
emotion model 93 and the parameter value of the corresponding desire held in the instinct
model 94 as described later, and the determination result is an action It outputs to the switching
module 91. In the case of this embodiment, each action model 901 to 90 n is a method for
determining the next action, from one node (state) NODE 0 to NODE n as shown in FIG. An
algorithm called a finite probability automaton is used which probabilistically determines
transition probabilities to NODE n based on transition probabilities P 1 to P n respectively set for
the arcs ARC 1 to ARC n connecting the nodes NODE 0 to NODE n. Specifically, each behavior
model 901 to 90 n corresponds to each of the nodes NODE 0 to NODE n which form its own
behavior model 901 to 90 n, and the state as shown in FIG. 8 for each of these nodes NODE 0 to
NODE n A transition table 100 is included.
In this state transition table 100, input events (recognition results) to be transition conditions in
the nodes NODE0 to NODEn are listed in order of priority in the ?input event name? column,
and further conditions for the transition conditions are ?data It is described in the
corresponding row of the "Name" and "Data range" columns. Therefore, in node NODE 100
represented by state transition table 100 in FIG. 8, when a recognition result of ?detect ball
(BALL)? is given, ?size of the ball given together with the recognition result? (SIZE) is in the
range of ?0 to 1000?, and when a recognition result of ?detect an obstacle (OBSTACLE)? is
given, ?the distance (distance) to the obstacle given along with the recognition result It is a
condition for transitioning to another node that ?DISTANCE)? is in the range of ?0 to 100?.
Further, in this node NODE 100, even when there is no input of recognition results, parameters
of each emotion and each desire held in the emotion model 93 and the instinct model 94 to
which the behavior models 901 to 90n periodically refer. Among the values, when the parameter
value of either ?JOY?, ?Surprise? or ?SadNESS? held in the emotion model 93 is in the
range of ?50 to 100? It is possible to transition to a node. Further, in the state transition table
100, in the ?transition destination node? row in the ?transition probability to another node?
04-05-2019
7
column, node names that can be transited from the nodes NODE0 to NODEn are listed, and
?input event The transition probability to each of the other nodes NODE 0 to NODE n that can
be transited when all the conditions described in the ?Name?, ?Data value? and ?Data
range? columns are aligned is ?Transition probability to other nodes? The action to be output
when transitioning to that node NODE 0 to NODE n is described in the ?output action? row in
the ?transition probability to another node? column, which is described in the corresponding
part in the column. In addition, the sum of the probability of each row in the "transition
probability to another node" column is 100 [%]. Therefore, in the node NODE 100 represented by
the state transition table 100 of FIG. 8, for example, ?ball is detected (BALL)?, and ?SIZE
(size)? of the ball is in the range of ?0 to 1000?. If a recognition result that there is a given is
given, it is possible to make a transition to the "node NODE 120 (node 120)" with a probability of
"30 [%]", and at this time the action of "ACTION 1" will be output.
Each behavior model 901 to 90 n is configured such that nodes NODE 0 to NODE n described as
such a state transition table 100 are connected to each other, and the recognition result is given
from input semantics converter module 71. When, for example, the next action is determined
probabilistically using the state transition table of the corresponding nodes NODE0 to NODEn,
the determination result is output to the action switching module 91. The action switching
module 91 shown in FIG. 5 is outputted from the action models 901 to 90 n having a
predetermined priority among the actions respectively output from the action models 901 to 90
n of the action model library 90. A command to select an action and execute the action
(hereinafter referred to as an action command). ) To the output semantics converter module 79
of the middleware layer 50. In this embodiment, the priority is set higher for the behavior models
901 to 90 n depicted on the lower side in FIG. The action switching module 91 also notifies the
learning module 92, the emotion model 93 and the instinct model 94 that the action is
completed based on the action completion information given from the output semantics
converter module 79 after the action completion. . On the other hand, the learning module 92
inputs the recognition result of the teaching received as an action from the user, such as ?sent?
or ?praised? among the recognition results given from the input semantics converter module
71. Do. Then, based on the recognition result and the notification from the action switching
module 91, the learning module 92 lowers the occurrence probability of the action when it is
"fallen" and increases the occurrence probability of the action when "comed". The corresponding
transition probabilities of the corresponding behavior models 901 to 90 n in the behavior model
library 90 are changed. On the other hand, the emotion model 93 includes ?joy?, ?sadness?,
?anger?, ?surprise?, ?disgust? and ?fear?. For a total of six emotions "," each emotion
has a parameter representing the strength of the emotion. Then, the emotion model 93
determines the parameter value of each of these emotions from the specific recognition results
such as ?scored? and ?come? given from the input semantics converter module 71, and the
elapsed time and behavior switching module 91. Update periodically based on the notification
from.
04-05-2019
8
Specifically, the emotion model 93 performs a predetermined operation based on the recognition
result given from the input semantics converter module 71, the action of the robot apparatus 1
at that time, the elapsed time since the previous update, and the like. Assuming that the variation
amount of the emotion at that time calculated by the equation is ?E [t], the parameter value of
the current emotion is E [t], and the coefficient representing the sensitivity of the emotion is ke,
the following equation (1) The parameter value E [t + 1] of the emotion in the cycle of is
calculated, and the parameter value of the emotion is updated by replacing it with the parameter
value E [t] of the current emotion. Also, the emotion model 73 updates parameter values of all
emotions in the same manner. <Img class = "EMIRef" id = "197936346-00003" /> Note that each
recognition result and the notification from the output semantics converter module 79 indicate
the variation amount of the parameter value of each emotion. The degree to which E [t] is
influenced is determined in advance, and the recognition result such as "hit" has a great influence
on the variation value ?E [t] of the parameter value of the emotion of "anger". The recognition
result such as "boiled" has a great influence on the variation amount ?E [t] of the parameter
value of the emotion of "joy". Here, the notification from the output semantics converter module
79 is so-called action feedback information (action completion information), is information on the
appearance result of the action, and the emotion model 93 is also based on such information.
Change your emotions. This is, for example, an action such as "barking" to lower the emotional
level of anger. The notification from the output semantics converter module 79 is also input to
the above-described learning module 92, and the learning module 92 changes the corresponding
transition probability of the behavior models 901 to 90n based on the notification. The feedback
of the action result may be performed by the output of the action switching module 91 (the
action to which the emotion is added). On the other hand, the instinct model 94 has four
independent desires of ?exercise?, ?affection?, ?appetite? and ?curiosity?. Each desire
has a parameter that represents the strength of that desire.
Then, the instinct model 94 periodically updates the parameter values of these desires based on
the recognition result given from the input semantics converter module 71, the elapsed time, the
notification from the behavior switching module 91, and the like. More specifically, the instinct
model 94 has predetermined values for ?motion desire?, ?loving desire? and ?curiosity?
based on recognition results, elapsed time, notification from the output semantics converter
module 68, etc. The amount of fluctuation of the desire calculated by the arithmetic expression is
?I [k], the parameter value of the present desire is I [k], the coefficient ki representing the
sensitivity of the desire, and the equation (2) Is used to calculate the parameter value I [k + 1] of
the desire in the next cycle, and the parameter value of the desire is updated by replacing the
calculation result with the current parameter value I [k] of the desire. Also, the instinct model 94
updates the parameter values of the respective desires excluding ?appetite? in the same
manner. <Img class = "EMIRef" id = "197936346-00004" /> Note that the recognition result and
04-05-2019
9
the notification from the output semantics converter module 79, etc. The extent to which I [k] is
affected is determined in advance. For example, the notification from the output semantics
converter module 79 has a great influence on the variation value ?I [k] of the parameter value
of ?fatigue?. It has become. In the present embodiment, parameter values of each emotion and
each desire (instinct) are regulated so as to vary in the range of 0 to 100, respectively, and the
values of the coefficients ke and ki are also set. It is set individually for each emotion and each
desire. On the other hand, as shown in FIG. 4, the output semantics converter module 79 of the
middle wear layer 50 receives ?advance? and ?joy? provided from the action switching
module 91 of the application layer 51 as described above. Abstract action commands such as
"speak" or "tracking (following the ball)" to the corresponding signal processing modules 72-78
of the output system 80. Then, when given an action command, these signal processing modules
72 to 78 are servo command values to be given to the corresponding actuators 281 to 28 n (FIG.
2) to perform the action based on the action command, The voice data of the sound output from
the speaker 20 (FIG. 2) and / or the drive data given to the LED of the light emitting unit 25 (FIG.
2) are generated, and these data are generated by the virtual robot 43 of the robotic server
object 42 and The signal processing circuit 14 (FIG. 2) sequentially sends out to the
corresponding actuators 281 to 28 n, the speaker 20 or the light emitting unit 25 sequentially.
In this manner, in the robot apparatus 1, based on the control program, it is possible to perform
an autonomous action according to the situation of one's (internal) and the surroundings
(external) and instructions and actions from the user. It is made to be able. (3) Behavior Control
Method in Robot Device Here, in the robot device 1 having the above-described structure, the
voice, face, movement, etc. of the subject are detected, and the behavior is executed based on the
detection results. The action control method of the robot apparatus will be described. The robot
apparatus in the present embodiment detects the voice of the object person by means of the
microphone 24 which is the voice detection means shown in FIG. 1, and estimates the sound
source direction based on the voice data. Further, the face of the object is detected based on the
image data acquired by the CCD camera 22 by the imaging means shown in FIG. Furthermore,
moving object detection is performed based on the image data. Then, the robot apparatus itself
starts to move in one of the directions of the estimated sound source direction, the face direction
based on the face detection result, and the moving body direction based on the moving body
detection result. The condition is to stop the movement. Here, in the present embodiment, when a
voice, a face, and a moving body are detected, the face detection result is preferentially used to
be reflected in the action. This is because face detection can be detected with the highest degree
of accuracy, but by further using the voice detection and moving object detection results, the
detection efficiency is improved, and the detection results are reflected on the operation of the
robot apparatus. It is intended to improve entertainment. FIG. 9 is a block diagram showing
components necessary for controlling the action of the robot apparatus shown in FIGS. 2 to 6 by
voice, face, and moving object detection. As shown in FIG. 9, the image data captured by the CCD
camera 22 and the audio data detected by the microphone 24 are stored in a predetermined
04-05-2019
10
location of the DRAM 11 and are stored in the virtual robot 43 in the robotic server object 42.
Supplied. The virtual robot 43 reads the image data from the DRAM 11, supplies the image data
to the moving object detection module 32 and the face detection module 33 in the middleware
layer 50, reads the sound data, and supplies the sound data to the sound source direction
estimation module 34. In each module, a moving object detection process, a face detection
process, and a sound source direction estimation process described later are performed, and the
detection process result is supplied to the behavior model library 90 in the application layer 51.
The behavior model library 90 determines the subsequent behavior by referring to the parameter
value of emotion and the parameter value of desire as necessary, and gives the determination
result to the behavior switching module 91. Then, the action switching module 91 sends an
action command based on the determination result to the tracking signal processing module 73
and the walking module 75 in the output system 80 of the middle wear layer 50. The tracking
signal processing module 73 and the walking module 75 generate servo command values to be
given to the corresponding actuators 281 to 28 n to perform the action based on the action
command, when the action command is given, This data is sequentially sent out to the
corresponding actuators 281 to 28 n sequentially through the virtual robot 43 of the robotic
server object 42 and the signal processing circuit 14 (FIG. 2). As a result, the action of the robot
apparatus 1 is controlled, and for example, an action such as approaching an object appears.
First, the face detection process in the face detection module 33 will be specifically described.
The face detection module 33 can perform face detection by using, for example, an average front
face template image and determining the correlation between the input image and the template
image. The face detection module 33 takes a frame image obtained as a result of imaging by an
imaging unit such as a CCD camera as an input image, and obtains a template for determining the
correlation between the input image and a template image of a predetermined size indicating an
average face image. A matching unit (not shown), a determination unit (not shown) that
determines whether or not a face image is included in the input image based on correlation, and
a case where it is determined that a face image is included; It is comprised from the face
extraction part (not shown) which extracts a face image. In order to match the size of the face in
the prepared template image, the input image supplied to the template matching unit is, for
example, an image cut out into a predetermined size after converting the frame image into a
plurality of scales. The template matching unit performs matching on the input image for each
scale. As a template image, for example, an average face image composed of an average of about
100 persons can be used. The determination unit determines that a face image is included in the
input image when a correlation value equal to or greater than a predetermined threshold value is
indicated in template matching in the template matching unit, and the face extraction unit
determines the corresponding face area Extract.
Here, if any of the matching results is less than the predetermined threshold in the determination
04-05-2019
11
unit, it is determined that the input image does not include the face indicated by the template
image, and the determination result is sent to the template matching unit. return. When it is
determined that the input image does not include a face image, the matching unit performs
matching with the next scale image. Then, based on the matching result between the next scale
image and the template image, the determination unit determines whether a face image is
included in the scale image. Then, as described above, when the correlation value is equal to or
more than the predetermined threshold value, it is determined that the face image is included.
Matching with all scale images is performed, and if no face is detected, the next frame image is
processed. In addition, since the average face used in the template matching is generally
performed using a general one taken from the front, it is called a non-front face (hereinafter
referred to as a non-front face). ) Difficult to detect. For example, in a robot apparatus, if a CCD
camera for acquiring an image is mounted on, for example, the face of the robot apparatus, the
robot apparatus is photographed when a user etc. looks into the robot apparatus which has
turned over and turned over. The face image becomes a non-front face in a direction opposite to
that of the normal front face, that is, the front face is rotated approximately 180 degrees.
Therefore, in order to enable face detection even when such a non-frontal face is photographed, a
template image of the frontal face is used, and a face is not detected even if a template image of
the frontal face is used Is used by rotating the template image by a predetermined angle, and
when a face is detected, the template image of the rotation angle at the time of detection is used
to perform matching with the next input image. Even in this case, the face detection process may
be accelerated by storing the previous rotation angle as well as being detectable. Thus, in the
face detection module, the face is detected from the image data, and the robot apparatus can
perform actions such as approaching the direction of the detected face, pointing in the face
direction, or tracking based on the detection result. . Next, moving object detection in the moving
object detection module 32 will be specifically described. In the moving object detection process,
the moving object detection module in the recognition system 71 of the middleware layer 50
shown in FIG. 4 detects the moving object in the image data captured by the CCD camera 22 (FIG.
2) and detects the detected moving object direction. Take action such as facing or tracking.
For example, a difference image between frames can be generated, and a moving object can be
detected from this difference image. For example, in a still image, the difference value between
frames becomes 0 when the motion of the moving body stops. For example, as shown in FIG. 10,
when difference image data D1 to D3 are generated for image data P1 to P4 obtained by imaging
a human at times t1 to t4, respectively, if the face is stationary between times t3 and t4, The
difference data of the face disappears from the difference image data D3. That is, the
disappearance of the moving body from the difference image data means that the moving body
exists in the place where the moving body has disappeared, not from the spot. Therefore, the
robot apparatus 1 can detect the position of the moving body by detecting the time when this
difference becomes zero. With such moving object detection, for example, by orienting the CCD
camera 22 in the direction of the center-of-gravity position in the immediately preceding
04-05-2019
12
differential image, the center-of-gravity position can be oriented or approached. That is, as shown
in the flowchart of FIG. 11, first, at step S1, the moving body is detected by calculating the center
of gravity of the difference image data, and at step S2 whether the detected moving body has
disappeared from the difference image data It is determined. If the moving body has not
disappeared in step S2 (No), the process returns to step S1. On the other hand, if the moving
body disappears in step S2 (Yes), the process proceeds to step S3, and the direction of the lost
position, that is, the direction of the position of the center of gravity in the previous difference
image is directed or approaches the direction of the center of gravity. Although the moving
object disappears from the difference image even when the detected moving object deviates from
the visual range of the robot device 1, also in this case, it faces the direction of the center of
gravity position detected last in step S3 described above. In this case, the direction of the moving
body can be almost turned. As described above, the robot device 1 detects that the moving object
has disappeared from the difference image data due to the moving object being stopped within
the visual range, and by pointing the direction of the center of gravity position, for example, It is
possible to realize an autonomous interaction of feeling the sign of the moving body and turning
in that direction. In addition, it is possible to substantially direct the direction of the moving
object by detecting that the moving object has deviated from the visual range and disappearing
from the difference image data and pointing it in the direction of the center of gravity position
detected last. Further, the robot apparatus 1 not only detects the moving object disappearing
from the difference image data, but also detects the gravity center direction detected every
predetermined time interval or every time when the gravity center position of the moving body is
out of the visual range. It is also possible to track the moving body.
That is, as shown in the flowchart of FIG. 12, first, at step S10, the moving object is detected by
calculating the center of gravity of the difference image data, and at step S11, the moving object
is likely to go out of the visual range at predetermined time intervals. The direction of the
position of the center of gravity detected every time Here, in addition to the case where the
moving object disappears from the difference image data as described above, when the
movement of the robot device 1 in step S11 is large, the robot device 1 performs its motion and
movement by motion compensation. It becomes impossible to distinguish from movement and
loses motion. Therefore, in step S12, it is determined whether or not the moving body has been
lost. If the moving body is not lost in step S12 (No), the process returns to step S10. On the other
hand, if the moving object is lost in step S12 (Yes), the process proceeds to step S13 and faces
the direction of the center of gravity detected last. As described above, the robot device 1 faces
the detected barycentric direction every predetermined time interval or every time the moving
body is about to go out of the visual range, and the gravity center detected last when the moving
body loses sight By directing the direction of the position, it becomes possible to detect and track
the moving body in the image captured by the CCD camera 22 provided in the head unit 4 by a
simple method. In such moving object detection processing, first, the virtual robot 43 of the
robotic server object 42 shown in FIG. 9 reads out image data of a frame unit captured by the
04-05-2019
13
CCD camera 22 from the DRAM 11, and this image data Are sent to the moving object detection
module 32 included in the recognition system 71 of the middle wear layer 50. Then, each time
the moving object detection module 32 inputs image data, the moving object detection module
32 obtains difference from the image data of the adjacent previous frame to generate difference
image data. For example, when the difference image data D2 between the image data P2 and the
image data P3 described above is generated, the luminance value D2 (i, j) of the difference image
data D2 at the position (i, j) is the position (i, j) It is obtained by subtracting the luminance value
P2 (i, j) of the image data P2 at the same position from the luminance value P3 (i, j) of the image
data P3 in. Then, the same calculation is performed for all the pixels to generate difference image
data D2, and this difference image data D2 is generated. Then, the barycentric position G (x, y) is
calculated for the portion of the difference image data in which the luminance value is larger
than the threshold value Th.
Here, x and y are calculated using the following equations (3) and (4), respectively. Thus, as
shown in FIG. 13, for example, a difference image between the image data P2 and the image data
P3 described above is obtained. The center-of-gravity position G2 is obtained from the data D2.
The data of the determined center-of-gravity position is sent to the behavior model library 90 of
the application layer 51. As described above, the behavior model library 90 determines the
subsequent behavior with reference to the parameter value of emotion and the parameter value
of desire as necessary, and gives the determination result to the behavior switching module 91.
For example, when the moving object disappears from the difference image data, an action that
points or approaches the center of gravity position detected immediately before is determined,
and the determination result is given to the action switching module 91. Further, in the case of
tracking a moving object at predetermined time intervals, an action that points or approaches the
detected center of gravity position at each time interval is determined, and the determination
result is given to the action switching module 91. Then, the action switching module 91 sends an
action command based on the determination result to the tracking signal processing module 73
in the output system 80 of the middle wear layer 50. When given the action command, the
tracking signal processing module 73 generates a servo command value to be given to the
corresponding actuators 281 to 28 n to perform the action based on the action command, and
this data is The virtual robot 43 of the tick server object 42 and the signal processing circuit 14
(FIG. 2) sequentially transmit to the corresponding actuators 281 to 28 n. As a result, for
example, when the moving body disappears from the difference image data, the behavior model
library 90 determines an action that points or approaches the center of gravity position detected
immediately before, and the behavior switching module 91 determines An action command for
performing an action is generated. When tracking a moving object at predetermined time
intervals, the behavior model library 90 determines an action that points or approaches the
center of gravity detected at each time interval, and the action switching module 91 determines
the action. Action commands are generated to cause them to
04-05-2019
14
Then, when this action command is given to the tracking signal processing module 73, the
tracking signal processing module 73 sends out a servo command value based on the action
command to the corresponding actuators 281 to 28n. The robot apparatus 1 is interested in the
moving body, and the action of pointing the head in that direction or approaching the direction
of the moving body is expressed. Next, the sound source estimation process in the sound source
direction estimation module 34 will be specifically described. As described above, the head unit 4
of the robot apparatus 1 is provided with the microphones 24 corresponding to the left and right
"ears", and the robot apparatus 1 can estimate the sound source direction using the microphones
24. it can. Specifically, for example, as described in ?Oga, Yamazaki, Kanada? Acoustic system
and digital processing ?(The Institute of Electronics, Information and Communication Engineers)
p 197?, the sound source direction and the time difference between the signals received by
multiple microphones The sound source direction can be estimated using the one-to-one
relationship. That is, as shown in FIG. 14, when the plane waves coming from the ?S direction
are received by the two microphones M1 and M2 set apart by the distance d, the sound reception
signals of the microphones M1 and M2 ( Between the audio data) x1 (t) and x2 (t), the
relationships shown in the following formulas (5) and (6) are established. Here, in the following
formulas (5) and (6), c is the speed of sound, and ?S is a time difference between signals
received by the two microphones M1 and M2. Therefore, if the time difference ? S between the
sound receiving signals x 1 (t) and x 2 (t) is known, then: img class = ?EMIRef? id =
?197936346-000006? The arrival direction of the sound wave, that is, the sound source
direction can be determined by the following equation (7). Here, the time difference ? S is the
sound reception signal x 1 (t) as shown in the following equation (8): And x2 (t) can be obtained
from the cross correlation function ?12 (?). Here, in the following formula (8), E [и] is an
expected value. From the above equation (5) and equation (8), the cross-correlation function ?
12 (?) is expressed by the following equation (6) <img class = "EMIRef" id = "19793634600008" It is expressed as (9).
Here, in the following equation (9), ? 11 (?) is an autocorrelation function of the sound
reception signal x 1 (t). It is known that this autocorrelation function ? 11 (?) takes a maximum
value at ? = 0. <Img class = ?EMIRef? id = ?197936346-00009? Therefore, according to
equation (5), the cross correlation function ?12 (?) takes a maximum value at ? = ?S.
Therefore, calculating the cross-correlation function ?12 (?) and finding ? giving the
maximum value gives ?S, which is substituted into the above equation (3) to obtain the arrival
direction of the sound wave, ie the sound source direction It can be asked. Then, the difference
between the direction in which the robot apparatus 1 is currently facing and the direction of the
sound source is calculated, and the relative angle of the sound source direction with respect to
the direction of the trunk is obtained. Here, as shown in FIG. 15 (a), the robot apparatus 1 differs
in the distance between the microphone 24R / L provided at different positions provided on the
head unit and the sound source A, That is, the position of the sound source is estimated based on
04-05-2019
15
the time difference between the reception signals, but for a certain point A, when considering a
point B that is line symmetrical with a straight line 101 connecting the left and right
microphones 24L and 24R, That is, considering the points A and B where the distances LA and
LB between the straight line 101 are equal, the distance between the point A and the left and
right microphones 24R / L and the distance between the point B and the microphones 24R / L
Since the time difference between the sound reception signals is equal, the time difference
between the sound reception signals is also equal. Therefore, depending on the time difference
between the sound reception signals, the direction of the sound source can not be specified.
Therefore, in the present embodiment, the sound source direction is stored as a history of
information related to the sound source direction, whether it is the sound source direction
specified in one direction last time or the estimated two sound source directions. If it can not be
identified, it shall refer to the previous history. If the previous sound source direction is specified
as one, the object does not move so much in a short time, and the sound source of the current
sound data has a high possibility of the same direction as the sound source direction of the
previous sound data Can be used. Also, if the previous sound source direction is not specified as
one, or if there is no history at all, the head unit equipped with a microphone is rotated as
described below to obtain audio data before and after rotation. Thus, the sound source direction
can be identified. That is, in FIG. 15A, when the point A on the upper left side of the screen of the
robot apparatus is actually a sound source, when the head unit 4 is turned to the left, the
microphone 24 R on the right is the sound source It will be closer to the point.
That is, the sound source direction can be specified by the time difference between the left and
right received sound signals before and after the rotation. Similarly, when point B at the lower
left side of the screen is a sound source, the left microphone 24L is closer to point B when the
head unit 4 is turned to the left, so that it can be identified that the sound source direction is
point B. Thus, whether the actual sound source is the point A or the point B can be specified from
the sound data before and after the rotation. Thus, the sound source direction is estimated by the
sound source direction estimation module 34, and based on the result of the sound source
direction estimation, the robot device can express an operation such as pointing toward the
estimated sound source direction or approaching the sound source. Next, a control method for
controlling the action of the robot apparatus based on the face detection result, the moving
object detection result, and the sound source direction estimation result will be described. In the
present embodiment, when the face detection module 33 of the robot apparatus 1 detects a face,
the robot apparatus 1 is controlled to start walking in the face direction and perform an
operation approaching the face detection target. Here, the face direction indicates, for example, a
direction in which the center of gravity of the face area substantially overlaps the vertical line
passing through the center of the screen. In addition, when a moving object is detected by the
moving object detection module 32, an operation approaching the moving object is performed by
starting walking in the gravity center position direction (moving object direction) in the
difference image, and a sound source estimation direction is detected. It is controlled to start
04-05-2019
16
walking in the direction and perform an operation to approach the sound source. Here, when
face detection, moving object detection, and sound source direction estimation are
simultaneously performed, control is performed such that the face detection result is
preferentially used. That is, for example, when the estimated sound source direction and the
detected face direction are different, it is controlled to move in the face direction. FIG. 16 is a
flowchart showing an action control method of the robot apparatus in the present embodiment.
As shown in FIG. 16, first, the robot apparatus stands by while swinging at constant intervals
(step S21). Next, voice detection, moving object detection, and face detection are sequentially
determined (steps S22 to S24). If none of them is detected, the process returns to step S21 again
to enter a standby state. On the other hand, when the face is detected in step S24, for example,
the face position is detected by rotating the head unit so that the center of gravity of the detected
face area is on a vertical line passing through the center of the screen. It identifies (step S31), and
starts walking in the face direction (step S32).
During walking, walking is continued until a predetermined ending condition described later is
satisfied. Also, in step S22, when the voice is detected, the direction of the sound source is
specified. As described above, when it is difficult to specify the sound source direction, that is,
when two sound source directions are calculated as described above, it is detected whether or
not there is a history of sound source directions ( Step S25). Here, if there is a sound source
direction history, the current sound source direction is specified with reference to the sound
source direction history, and walking is performed in the sound source direction (step S29). On
the other hand, if no history is detected in step S25, the robot apparatus 1 rotates (swings) the
head unit having the voice detection means, stores the calculated two sound source directions,
and enters the standby state again. Then, when the audio data is detected again, the sound
source direction is stored in the above-mentioned step S26, so that the sound source direction
can be specified. Here, in step S26, the history of the two calculated sound source directions is
stored, but it is also possible to store one sound source direction specified last time. That is, when
it is determined in step S25 that there is no history, the head unit is rotated, one sound source
direction is calculated from the sound data before and after rotation, and the sound source
direction specified from the sound data before and after rotation is used as the history. It may be
stored. If a moving body is detected in step S23, the head unit is rotated and tracked in
accordance with the movement of the moving body (step S27), and the position at which the
moving body stands still is detected (step S28). The tracking of the moving object is continued
until the motion of the moving object is detected. Then, when the stillness of the moving body is
detected, for example, walking is started in the direction of the center of gravity in the difference
data of the image data before and after (step S29). In step S29, when walking is started based on
the result of voice detection or moving object detection, it is determined periodically whether a
face is detected (step S30). Then, in step S29, face detection processing is performed when a face
is detected even while walking. That is, for example, the face position detected by rotating the
head or the like so that the face area comes to the center position of the screen is identified (step
04-05-2019
17
S31), and walking in the face direction is started (step S32). Next, the end determination in step
S32 will be described.
In the present embodiment, walking is started by face detection or the like, and walking is
stopped when a predetermined ending condition is satisfied. The conditions for the end
determination include the following. That is, When the face direction in which the face is
detected is the front of the robot apparatus and the distance to the face detection target is equal
to or less than a predetermined distance When the distance to the object is equal to or less than a
predetermined distance When a predetermined speech word is detected The walking is stopped
when any one of the termination conditions is met when the contact sensor detects a touch.
Whether or not the face direction is the front of the robot device, as shown in FIG. 17, when the
vertical viewing angle of the robot device 1 is ?1, the face 301 of the object 300 is from the
front of the robot device 1 For example, the moving direction is controlled so as to be within the
range of ▒ 22.5 ░ in the horizontal direction of the screen. Further, the distance H between the
face detection target or the target object 300 is detected by, for example, a PSD (Position Sensor
Devise) or a distance sensor, or the size of the face area in the image, the size of the target object,
etc. It is estimated that the stop control can be performed when the distance becomes, for
example, 40 cm or less. Here, for example, when the vertical viewing angle of the robot
apparatus 1 is 55 ░, the face of the object may not be detected depending on the designated
distance. In such a case, only the distance data to the object can be used as the stop condition. In
addition, for example, walking is also stopped when a predetermined speech word such as
?seating? or ?torting? is detected. In addition, when the user picks up his head or touches an
obstacle, the touch sensor detects the touch and stops walking. Also, for example, when a touch
is detected by a touch sensor other than the head, it can be determined that the possibility of
having touched an obstacle is high. Alternatively, an action may be generated to perform
operation control to bypass an obstacle on the spot. FIG. 18 is a view schematically showing a
walking path when the robot apparatus approaches an object. When the robot device 1 detects a
face while swinging the face, as shown in FIG. 18, when the face of the object 300 is detected at a
position at an angle ?2 from the posture direction C of itself, Rather than starting walking on
the object 300, the movement may look more natural if it moves by ?2 rotation while drawing
the arc D.
By controlling to move in this manner, visual effects can be improved. Next, the operation of the
robot apparatus approaching the object will be described in more detail. As described above, the
robot device 1 starts walking in a predetermined direction by face detection, voice detection or
moving object detection, but at this time, the angle of the vertical direction of the head unit is
made different depending on the object. , Can further improve the entertainment. FIGS. 19 (a)
and 19 (b) are views showing the robot apparatus in a walking state, and are side views showing
the robot apparatus according to the present embodiment and the prior art, respectively. As
04-05-2019
18
shown in FIG. 19 (a), for example, when detecting a human face and performing an operation to
closely follow a human, the field of view is smaller than that of the conventional walking posture
shown in FIG. 19 (b). By moving upward and upward, the ability to detect and follow human faces
improves. In addition, even when moving in the direction of voice or moving body, by moving
upward while giving the impression that the robot device 1 is pointing at the target person who
made a voice to the robot device, for example Visual effects occur. Also, depending on whether
the walking target is a human or an object such as a ball, the angle in the vertical direction of the
head unit 4 is changed to make the visual field different, that is, change the pattern of the face
position. However, the behavior is closer to that of animals, and entertainment is improved. In the
present embodiment, using the voice and moving object detection results together with the face
detection results and controlling the operation approaching the target, the malfunction can be
reduced. Moreover, when detecting audio | voice data and estimating a sound source direction,
the estimation ratio of a sound source direction can be improved by using the log | history of the
information regarding the sound source direction in the past. In addition, when approaching by a
predetermined distance or more, the approaching motion is stopped by a predetermined call or
the like, and not only the face direction, the sound source direction, or the moving object
detection direction is approached but an arc is drawn to walk. By controlling to walk with the
field of view upward to look at the face of the user, it becomes a motion close to an actual animal,
and it is possible to improve entertainment as a pet type robot device. The present invention is
not limited to the above-described embodiment, and it goes without saying that various
modifications can be made without departing from the scope of the present invention.
For example, in the above embodiment, face detection, voice detection, and moving object
detection are performed to perform an operation approaching an object, but only voice detection
results are used to perform operation control approaching an object It is also possible to In voice
detection, the history of the sound source direction estimated in the past is used, and if there is
no history, the head unit is rotated to specify the sound source direction, so the sound source
direction can be accurately estimated and reflected in the action. it can. Further, although the
above embodiment has been described as the software configuration, the present invention is not
limited to this, and at least a part can be realized by hardware. As described above in detail, the
robot apparatus according to the present invention is a robot apparatus that executes an
operation according to an external operation and / or an autonomous operation based on an
internal state. Sound detection means for detecting the sound source, sound source direction
estimation means for estimating the sound source direction from the sound data detected by the
sound detection means, control means for controlling movement to the sound source direction
estimated by the sound source direction estimation means Since the sound source direction
estimation means estimates the current sound source direction based on the history of
information on the sound source direction estimated in the past, only the current sound data is
estimated when the sound source direction is estimated based on the sound data. In this case, it
may not be possible to estimate the sound source direction, but in that case, the current sound
04-05-2019
19
source direction may be estimated by referring to the history such as the past sound source
direction. Can, for example, can be reflected in the operation of such approaches to the sound
source direction, it is possible to improve the entertainment property. A robot apparatus
according to the present invention is a robot apparatus that executes an operation according to
an external operation and / or an autonomous operation based on an internal state, comprising:
an imaging unit; a voice detection unit that detects a voice; Moving object detection means for
detecting a moving object from image data captured by the imaging means, face detection means
for detecting a human face from the image data, and estimation of sound source direction from
voice data detected by the voice detection means Since the sound source direction estimation
means and the control means for controlling at least the moving object detection, the face
detection, and the sound source direction estimation to preferentially perform the face detection,
face detection, moving object detection, voice detection, etc. When multiple processes are
performed in parallel, face detection is preferentially performed, thereby improving the
identification rate of the target object and counteracting the action such as movement
approaching the target object. It can be projected and the entertainment can be improved.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a perspective view showing an appearance
configuration of a robot apparatus according to an embodiment of the present invention. FIG. 2 is
a block diagram showing a circuit configuration of the robot apparatus. FIG. 3 is a block diagram
showing a software configuration of the robot apparatus. FIG. 4 is a block diagram showing the
configuration of a middle wear layer in the software configuration of the robot apparatus. FIG. 5
is a block diagram showing the configuration of an application layer in the software
configuration of the robot device. FIG. 6 is a block diagram showing a configuration of a behavior
model library of the application layer. FIG. 7 is a diagram used to explain a finite probability
automaton serving as information for determining the action of the robot apparatus. FIG. 8 is a
diagram showing a state transition table prepared for each node of the finite probability
automaton. FIG. 9 is a block diagram showing components necessary for face detection, voice
detection, moving object detection and control of an action thereof in the robot apparatus shown
in FIGS. 2 to 6; FIG. 10 is a schematic view showing image data P1 to P4 and difference image
data D1 to D3 generated therefrom. FIG. 11 shows a method of performing an operation of
pointing toward the direction of the position of the center of gravity or an operation of
approaching the direction of the position of the center of gravity in the differential image when
the moving object disappears in the moving object detection module of the robot apparatus
according to the embodiment of the present invention It is a flowchart. 12 is a moving body
detection module of a robot apparatus according to an embodiment of the present invention, the
motion or moving body is directed at the center of gravity detected at predetermined time
intervals or every time the position of the center of gravity of the moving body deviates from the
visual range. Is a flowchart showing a method of performing an operation of tracking FIG. 13 is a
schematic diagram for explaining a gravity center position G2 obtained from difference image
data D2 between the image data P2 and the image data P3. FIG. 14 is a schematic diagram for
04-05-2019
20
explaining the principle of estimating the sound source direction in the sound source direction
estimation module of the robot device according to the embodiment of the present invention; FIG.
15 is a schematic view illustrating a method of specifying a sound source direction. FIG. 16 is a
flowchart showing an action control method of the robot device according to the embodiment of
the present invention. FIG. 17 is a schematic view for explaining an example of the walking stop
condition of the robot apparatus according to the embodiment of the present invention. FIG. 18 is
a view schematically showing a walking path when the robot apparatus according to the
embodiment of the present invention approaches an object.
FIGS. 19 (a) and 19 (b) are diagrams showing the robot apparatus in a walking state, and are side
views showing the robot apparatus according to the embodiment of the present invention and
the prior art, respectively. [Explanation of the code] 1 robot device, 10 CPU, 11 DRAM, 14 signal
processing circuit, 22 CCD camera, 281 to 28 n actuator, 33 face detection module, 42 robotic
server object, 43 virtual robot, 50 middle иии Layer, 51 application layers, 68 motion detection
signal processing modules, 70 input semantics converter modules, 71 recognition systems, 73
tracking signal processing modules, 75 walking modules, 79 output semantics converter
modules, 80 output systems, 90 actions Model library, 91 behavior switching module
04-05-2019
21
direction estimation means, sound source direction estimation means for estimating a sound
source direction from sound data detected by the sound detection means; The sound source
direction estimation means estimates the current sound source direction based on the history of
information on the sound source direction estimated in the past.
In the present invention, when estimating the sound source direction based on the voice data, it
may not be possible to estimate the sound source direction only with the current voice data, but
at that time, the history of the sound source direction estimated in the past is By referring to it, it
is possible to estimate the current sound source direction. Further, the voice detection means is
provided on a head rotatably connected to the body portion, and the control means is configured
to have no history of information on the sound source direction estimated in the past. The sound
source direction estimation means can estimate the sound source direction from the sound data
detected before and after the rotation, and the head can be controlled even if there is no history
of the sound source direction in the past. The sound source direction can be estimated from the
sound data before and after the rotation detected by the rotation. A robot apparatus according to
the present invention is a robot apparatus that executes an operation according to an external
operation and / or an autonomous operation based on an internal state, comprising: an imaging
unit; a voice detection unit that detects a voice; Moving object detection means for detecting a
moving object from image data captured by the imaging means, face detection means for
detecting a human face from the image data, and estimation of sound source direction from voice
data detected by the voice detection means A sound source direction estimation means, and a
control means for controlling to perform the face detection with priority among at least the
moving object detection, the face detection, and the sound source direction estimation, are
characterized. In the present invention, in the case where the robot apparatus is performing a
plurality of processes such as face detection, moving object detection, voice detection and the
like in parallel, it preferentially performs face detection and reflects it in action. , Improve the
target identification rate more. BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter,
specific embodiments to which the present invention is applied will be described in detail with
reference to the drawings. This embodiment is an application of the present invention to an
autonomous robot apparatus that acts autonomously according to the surrounding environment
(or external stimulus) or the internal state. In the present embodiment, first, the configuration of
the robot apparatus will be described, and then the application portion of the present invention
04-05-2019
3
in the robot apparatus will be described in detail. (1) Configuration of Robot Device According to
the Present Embodiment As shown in FIG. 1, the robot device 1 according to the present
embodiment is a so-called pet robot that simulates an animal such as a ?dog?, The leg units 3A,
3B, 3C, 3D are connected to the front, rear, left and right of the body unit 2, and the head unit 4
is connected to the front end of the body unit 2.
As shown in FIG. 2, the trunk unit 2 includes a central processing unit (CPU) 10, a dynamic
random access memory (DRAM) 11, a read only memory (ROM) 12, and a personal computer
(PC) card interface. A control unit 16 formed by mutually connecting the circuit 13 and the
signal processing circuit 14 via the internal bus 15 and a battery 17 as a power source of the
robot apparatus 1 are accommodated. Further, an angular velocity sensor 18 and an acceleration
sensor 19 for detecting the acceleration of the direction and movement of the robot device 1 are
accommodated in the body unit 2. In the body unit 2, a speaker 20 for outputting a voice or
melody such as a cry is disposed at a predetermined position as shown in FIG. The tail 5 of the
body unit 2 is provided with an operation switch 21 as a detection mechanism for detecting an
operation input from the user. The operation switch 21 is a switch that can detect the type of
operation performed by the user, and the robot device 1 may, for example, be praised or scolded
according to the type of operation detected by the operation switch 21. Recognize. The head unit
4 corresponds to the ?eye? of the robot apparatus 1 and includes a CCD (Charge Coupled
Device) camera 22 for capturing an external condition, an object color, a shape, a motion, and the
like, A distance sensor 23 for measuring the distance to an object located in the microphone, a
microphone 24 corresponding to the left and right ?ears? of the robot apparatus 1 for
collecting external sound, for example, an LED (Light Emitting Diode) As shown in FIG. 1, the
light emission part 25 grade | etc., Provided with each is each arrange | positioned in a
predetermined position. However, the light emitting unit 25 is referred to as an LED 25 as
needed in the description of the configuration and the like. Further, although not shown in FIG. 1,
a head switch 26 is provided inside the head unit 4 as a detection mechanism for indirectly
detecting a touch of the user on the head unit 4. The head switch 26 is, for example, a switch that
can detect the tilt direction when the head is moved by the contact of the user, and the robot
device 1 detects the tilt direction of the head detected by the head switch 26. In accordance with,
they recognize whether they are praised or beaten. The joint portion of each leg unit 3A to 3D,
the connection portion between each leg unit 3A to 3D and the body unit 2, and the connection
portion between the head unit 4 and the body unit 2 have a number of degrees of freedom The
actuators 281 to 28 n and the potentiometers 291 to 29 n are respectively disposed.
The actuators 281 to 28 n include, for example, servomotors. By driving the servomotors, the leg
units 3A to 3D are controlled to shift to the target attitude or operation. At the position
corresponding to the "meat ball" at the tip of each of the leg units 3A to 3D, meat ball switches
27A to 27D as a detection mechanism for mainly detecting a touch from the user are provided,
04-05-2019
4
and the touch by the user can be detected It is supposed to be. In addition to this, although not
shown here, the robot apparatus 1 also emits a light emitting unit for representing an operation
state (operation mode) different from the internal state of the robot apparatus 1, charging,
starting up, A status lamp or the like indicating the status of the internal power supply, such as
start and stop, may be appropriately provided at an appropriate place. In the robot device 1,
various switches such as the operation switch 21, the head switch 26 and the flesh and ball
switch 27, various sensors such as the angular velocity sensor 18, the acceleration sensor 19,
and the distance sensor 23, the speaker 20, and the microphone The light emitting unit 25, the
actuators 281 to 28n, and the potentiometers 291 to 29n are connected to the signal processing
circuit 14 of the control unit 16 via the corresponding hubs 301 to 30n. On the other hand, the
CCD camera 22 and the battery 17 are directly connected to the signal processing circuit 14
respectively. The signal processing circuit 14 sequentially takes in the switch data supplied from
the various switches described above, the sensor data supplied from the various sensors, the
image data, and the audio data, and sequentially stores them in the DRAM 11 via the internal bus
15. Store sequentially at a predetermined position. Further, the signal processing circuit 14
sequentially fetches battery residual amount data representing the battery residual amount
supplied from the battery 17 together with this, and stores it at a predetermined position in the
DRAM 11. The switch data, the sensor data, the image data, the voice data, and the battery
remaining amount data stored in the DRAM 11 in this manner are used when the CPU 10
controls the operation of the robot apparatus 1. The CPU 10 reads out the control program
stored in the flash ROM 12 and stores it in the DRAM 11 at the initial stage when the power of
the robot apparatus 1 is turned on. Alternatively, the CPU 10 reads the control program stored in
the semiconductor memory device mounted in the PC card slot of the body unit 2 (not shown in
FIG. 1), for example, the memory card 31 via the PC card interface circuit 13 to the DRAM 11
Store.
As described above, the CPU 10 is based on each sensor data, image data, voice data, and battery
residual amount data sequentially stored in the DRAM 11 from the signal processing circuit 14
as described above. It judges the presence or absence of instructions and action. Furthermore,
the CPU 10 determines the subsequent action based on the determination result and the control
program stored in the DRAM 11, and drives the necessary actuators 281 to 28 n based on the
determination result, thereby the head unit 4 And swing the leg units 3A to 3D to cause them to
walk. Further, at this time, the CPU 10 generates audio data as necessary, and supplies the audio
data to the speaker 20 as an audio signal through the signal processing circuit 14 to output
audio based on the audio signal to the outside. The signal which instructs lighting and
extinguishing of the LED in the light emitting unit 25 described above is generated, and the light
emitting unit 25 is turned on and off. Thus, in this robot apparatus 1, it is possible to act
autonomously according to the situation of oneself and the surroundings, and instructions and
actions from the user. (2) Software Configuration of Control Program Here, the software
configuration of the above-described control program in the robot apparatus 1 is as shown in
04-05-2019
5
FIG. In FIG. 3, the device driver layer 40 is located at the lowest layer of the control program, and
comprises a device driver set 41 consisting of a plurality of device drivers. In this case, each
device driver is an object permitted to directly access hardware used in a normal computer such
as a CCD camera 22 (FIG. 2) or a timer, and receives an interrupt from the corresponding
hardware. Do the processing. Also, the robotic server object 42 is located at the lowest layer of
the device driver layer 40, and provides an interface for accessing hardware such as the various
sensors and actuators 281 to 28 n described above. A virtual robot 43 consisting of a group of
software, a power manager 44 consisting of a software group managing power switching and the
like, a device driver manager 45 consisting of a software group managing various other device
drivers, a robot A designed robot 46 is composed of a software group that manages the
mechanism of the device 1.
The manager object 47 is composed of an object manager 48 and a service manager 49. The
object manager 48 is a software group that manages activation and termination of each software
group included in the robotic server object 42, the middleware layer 50, and the application
layer 51, and the service manager 49 It is a software group that manages the connection of each
object based on the connection information between the objects described in the connection file
stored in the memory card 31 (FIG. 2). The middle wear layer 50 is located on the upper layer of
the robotic server object 42, and is composed of a group of software that provides basic
functions of the robot apparatus 1 such as image processing and audio processing. There is.
Further, the application layer 51 is located in the upper layer of the middleware layer 50, and
determines the action of the robot apparatus 1 based on the processing result processed by each
software group configuring the middleware layer 50. It consists of a group of software to The
specific software configurations of the middleware layer 50 and the application layer 51 are
shown in FIG. As shown in FIG. 4, the middle wear layer 50 is for noise detection, temperature
detection, brightness detection, scale recognition, distance detection, attitude detection, contact
detection, operation input detection. Recognition system 71 having signal processing modules 60
to 69 for motion detection and color recognition, an input semantics converter module 70 and
the like, an output semantics converter module 79 and attitude management, tracking, motion
reproduction, and walking And an output system 80 having signal processing modules 72 to 78
for fall recovery, LED lighting, and sound reproduction. The respective signal processing modules
60 to 69 of the recognition system 71 correspond to respective sensor data, image data and
voice data read out from the DRAM 11 (FIG. 2) by the virtual robot 43 of the robotic server
object 42. Data to be processed, and performs predetermined processing based on the data, and
provides the processing result to the input semantics converter module 70. Here, for example,
the virtual robot 43 is configured as a part that exchanges or converts signals according to a
predetermined communication protocol.
The input semantics converter module 70 determines ?noisy?, ?hot?, ?bright?, ?heard the
04-05-2019
6
scale of the dormant?, ?disturbance? based on the processing result provided from each of
the signal processing modules 60 to 69. Self and surrounding conditions such as ?detected
object?, ?detected a fall?, ?hited?, ?comemed?, ?detected a moving object? or
?detected a ball?, and the user And recognize the command and the action from and output the
recognition result to the application layer 51 (FIG. 2). As shown in FIG. 5, the application layer 51
is composed of five modules of a behavior model library 90, a behavior switching module 91, a
learning module 92, an emotion model 93 and an instinct model 94. In the behavior model
library 90, as shown in FIG. 6, "when the remaining amount of battery is low", "fall over," "when
avoiding obstacles", "when expressing emotions" Independent behavior models 901 to 90 n are
provided respectively corresponding to several preselected condition items such as ?when a ball
is detected?. Then, these behavioral models 901 to 90 n can be used as needed, for example,
when a recognition result is given from the input semantics converter module 71 or when a
predetermined time has elapsed since the last recognition result was given. The action to follow
is determined by referring to the parameter value of the corresponding emotion held in the
emotion model 93 and the parameter value of the corresponding desire held in the instinct
model 94 as described later, and the determination result is an action It outputs to the switching
module 91. In the case of this embodiment, each action model 901 to 90 n is a method for
determining the next action, from one node (state) NODE 0 to NODE n as shown in FIG. An
algorithm called a finite probability automaton is used which probabilistically determines
transition probabilities to NODE n based on transition probabilities P 1 to P n respectively set for
the arcs ARC 1 to ARC n connecting the nodes NODE 0 to NODE n. Specifically, each behavior
model 901 to 90 n corresponds to each of the nodes NODE 0 to NODE n which form its own
behavior model 901 to 90 n, and the state as shown in FIG. 8 for each of these nodes NODE 0 to
NODE n A transition table 100 is included.
In this state transition table 100, input events (recognition results) to be transition conditions in
the nodes NODE0 to NODEn are listed in order of priority in the ?input event name? column,
and further conditions for the transition conditions are ?data It is described in the
corresponding row of the "Name" and "Data range" columns. Therefore, in node NODE 100
represented by state transition table 100 in FIG. 8, when a recognition result of ?detect ball
(BALL)? is given, ?size of the ball given together with the recognition result? (SIZE) is in the
range of ?0 to 1000?, and when a recognition result of ?detect an obstacle (OBSTACLE)? is
given, ?the distance (distance) to the obstacle given along with the recognition result It is a
condition for transitioning to another node that ?DISTANCE)? is in the range of ?0 to 100?.
Further, in this node NODE 100, even when there is no input of recognition results, parameters
of each emotion and each desire held in the emotion model 93 and the instinct model 94 to
which the behavior models 901 to 90n periodically refer. Among the values, when the parameter
value of either ?JOY?, ?Surprise? or ?SadNESS? held in the emotion model 93 is in the
range of ?50 to 100? It is possible to transition to a node. Further, in the state transition table
100, in the ?transition destination node? row in the ?transition probability to another node?
04-05-2019
7
column, node names that can be transited from the nodes NODE0 to NODEn are listed, and
?input event The transition probability to each of the other nodes NODE 0 to NODE n that can
be transited when all the conditions described in the ?Name?, ?Data value? and ?Data
range? columns are aligned is ?Transition probability to other nodes? The action to be output
when transitioning to that node NODE 0 to NODE n is described in the ?output action? row in
the ?transition probability to another node? column, which is described in the corresponding
part in the column. In addition, the sum of the probability of each row in the "transition
probability to another node" column is 100 [%]. Therefore, in the node NODE 100 represented by
the state transition table 100 of FIG. 8, for example, ?ball is detected (BALL)?, and ?SIZE
(size)? of the ball is in the range of ?0 to 1000?. If a recognition result that there is a given is
given, it is possible to make a transition to the "node NODE 120 (node 120)" with a probability of
"30 [%]", and at this time the action of "ACTION 1" will be output.
Each behavior model 901 to 90 n is configured such that nodes NODE 0 to NODE n described as
such a state transition table 100 are connected to each other, and the recognition result is given
from input semantics converter module 71. When, for example, the next action is determined
probabilistically using the state transition table of the corresponding nodes NODE0 to NODEn,
the determination result is output to the action switching module 91. The action switching
module 91 shown in FIG. 5 is outputted from the action models 901 to 90 n having a
predetermined priority among the actions respectively output from the action models 901 to 90
n of the action model library 90. A command to select an action and execute the action
(hereinafter referred to as an action command). ) To the output semantics converter module 79
of the middleware layer 50. In this embodiment, the priority is set higher for the behavior models
901 to 90 n depicted on the lower side in FIG. The action switching module 91 also notifies the
learning module 92, the emotion model 93 and the instinct model 94 that the action is
completed based on the action completion information given from the output semantics
converter module 79 after the action completion. . On the other hand, the learning module 92
inputs the recognition result of the teaching received as an action from the user, such as ?sent?
or ?praised? among the recognition results given from the input semantics converter module
71. Do. Then, based on the recognition result and the notification from the action switching
module 91, the learning module 92 lowers the occurrence probability of the action when it is
"fallen" and increases the occurrence probability of the action when "comed". The corresponding
transition probabilities of the corresponding behavior models 901 to 90 n in the behavior model
library 90 are changed. On the other hand, the emotion model 93 includes ?joy?, ?sadness?,
?anger?, ?surprise?, ?disgust? and ?fear?. For a total of six emotions "," each emotion
has a parameter representing the strength of the emotion. Then, the emotion model 93
determines the parameter value of each of these emotions from the specific recognition results
such as ?scored? and ?come? given from the input semantics converter module 71, and the
elapsed time and behavior switching module 91. Update periodically based on the notification
from.
04-05-2019
8
Specifically, the emotion model 93 performs a predetermined operation based on the recognition
result given from the input semantics converter module 71, the action of the robot apparatus 1
at that time, the elapsed time since the previous update, and the like. Assuming that the variation
amount of the emotion at that time calculated by the equation is ?E [t], the parameter value of
the current emotion is E [t], and the coefficient representing the sensitivity of the emotion is ke,
the following equation (1) The parameter value E [t + 1] of the emotion in the cycle of is
calculated, and the parameter value of the emotion is updated by replacing it with the parameter
value E [t] of the current emotion. Also, the emotion model 73 updates parameter values of all
emotions in the same manner. <Img class = "EMIRef" id = "197936346-00003" /> Note that each
recognition result and the notification from the output semantics converter module 79 indicate
the variation amount of the parameter value of each emotion. The degree to which E [t] is
influenced is determined in advance, and the recognition result such as "hit" has a great influence
on the variation value ?E [t] of the parameter value of the emotion of "anger". The recognition
result such as "boiled" has a great influence on the variation amount ?E [t] of the parameter
value of the emotion of "joy". Here, the notification from the output semantics converter module
79 is so-called action feedback information (action completion information), is information on the
appearance result of the action, and the emotion model 93 is also based on such information.
Change your emotions. This is, for example, an action such as "barking" to lower the emotional
level of anger. The notification from the output semantics converter module 79 is also input to
the above-described learning module 92, and the learning module 92 changes the corresponding
transition probability of the behavior models 901 to 90n based on the notification. The feedback
of the action result may be performed by the output of the action switching module 91 (the
action to which the emotion is added). On the other hand, the instinct model 94 has four
independent desires of ?exercise?, ?affection?, ?appetite? and ?curiosity?. Each desire
has a parameter that represents the strength of that desire.
Then, the instinct model 94 periodically updates the parameter values of these desires based on
the recognition result given from the input semantics converter module 71, the elapsed time, the
notification from the behavior switching module 91, and the like. More specifically, the instinct
model 94 has predetermined values for ?motion desire?, ?loving desire? and ?curiosity?
based on recognition results, elapsed time, notification from the output semantics converter
module 68, etc. The amount of fluctuation of the desire calculated by the arithmetic expression is
?I [k], the parameter value of the present desire is I [k], the coefficient ki representing the
sensitivity of the desire, and the equation (2) Is used to calculate the parameter value I [k + 1] of
the desire in the next cycle, and the parameter value of the desire is updated by replacing the
calculation result with the current parameter value I [k] of the desire. Also, the instinct model 94
updates the parameter values of the respective desires excluding ?appetite? in the same
manner. <Img class = "EMIRef" id = "197936346-00004" /> Note that the recognition result and
04-05-2019
9
the notification from the output semantics converter module 79, etc. The extent to which I [k] is
affected is determined in advance. For example, the notification from the output semantics
converter module 79 has a great influence on the variation value ?I [k] of the parameter value
of ?fatigue?. It has become. In the present embodiment, parameter values of each emotion and
each desire (instinct) are regulated so as to vary in the range of 0 to 100, respectively, and the
values of the coefficients ke and ki are also set. It is set individually for each emotion and each
desire. On the other hand, as shown in FIG. 4, the output semantics converter module 79 of the
middle wear layer 50 receives ?advance? and ?joy? provided from the action switching
module 91 of the application layer 51 as described above. Abstract action commands such as
"speak" or "tracking (following the ball)" to the corresponding signal processing modules 72-78
of the output system 80. Then, when given an action command, these signal processing modules
72 to 78 are servo command values to be given to the corresponding actuators 281 to 28 n (FIG.
2) to perform the action based on the action command, The voice data of the sound output from
the speaker 20 (FIG. 2) and / or the drive data given to the LED of the light emitting unit 25 (FIG.
2) are generated, and these data are generated by the virtual robot 43 of the robotic server
object 42 and The signal processing circuit 14 (FIG. 2) sequentially sends out to the
corresponding actuators 281 to 28 n, the speaker 20 or the light emitting unit 25 sequentially.
In this manner, in the robot apparatus 1, based on the control program, it is possible to perform
an autonomous action according to the situation of one's (internal) and the surroundings
(external) and instructions and actions from the user. It is made to be able. (3) Behavior Control
Method in Robot Device Here, in the robot device 1 having the above-described structure, the
voice, face, movement, etc. of the subject are detected, and the behavior is executed based on the
detection results. The action control method of the robot apparatus will be described. The robot
apparatus in the present embodiment detects the voice of the object person by means of the
microphone 24 which is the voice detection means shown in FIG. 1, and estimates the sound
source direction based on the voice data. Further, the face of the object is detected based on the
image data acquired by the CCD camera 22 by the imaging means shown in FIG. Furthermore,
moving object detection is performed based on the image data. Then, the robot apparatus itself
starts to move in one of the directions of the estimated sound source direction, the face direction
based on the face detection result, and the moving body direction based on the moving body
detection result. The condition is to stop the movement. Here, in the present embodiment, when a
voice, a face, and a moving body are detected, the face detection result is preferentially used to
be reflected in the action. This is because face detection can be detected with the highest degree
of accuracy, but by further using the voice detection and moving object detection results, the
detection efficiency is improved, and the detection results are reflected on the operation of the
robot apparatus. It is intended to improve entertainment. FIG. 9 is a block diagram showing
components necessary for controlling the action of the robot apparatus shown in FIGS. 2 to 6 by
voice, face, and moving object detection. As shown in FIG. 9, the image data captured by the CCD
camera 22 and the audio data detected by the microphone 24 are stored in a predetermined
04-05-2019
10
location of the DRAM 11 and are stored in the virtual robot 43 in the robotic server object 42.
Supplied. The virtual robot 43 reads the image data from the DRAM 11, supplies the image data
to the moving object detection module 32 and the face detection module 33 in the middleware
layer 50, reads the sound data, and supplies the sound data to the sound source direction
estimation module 34. In each module, a moving object detection process, a face detection
process, and a sound source direction estimation process described later are performed, and the
detection process result is supplied to the behavior model library 90 in the application layer 51.
The behavior model library 90 determines the subsequent behavior by referring to the parameter
value of emotion and the parameter value of desire as necessary, and gives the determination
result to the behavior switching module 91. Then, the action switching module 91 sends an
action command based on the determination result to the tracking signal processing module 73
and the walking module 75 in the output system 80 of the middle wear layer 50. The tracking
signal processing module 73 and the walking module 75 generate servo command values to be
given to the corresponding actuators 281 to 28 n to perform the action based on the action
command, when the action command is given, This data is sequentially sent out to the
corresponding actuators 281 to 28 n sequentially through the virtual robot 43 of the robotic
server object 42 and the signal processing circuit 14 (FIG. 2). As a result, the action of the robot
apparatus 1 is controlled, and for example, an action such as approaching an object appears.
First, the face detection process in the face detection module 33 will be specifically described.
The face detection module 33 can perform face detection by using, for example, an average front
face template image and determining the correlation between the input image and the template
image. The face detection module 33 takes a frame image obtained as a result of imaging by an
imaging unit such as a CCD camera as an input image, and obtains a template for determining the
correlation between the input image and a template image of a predetermined size indicating an
average face image. A matching unit (not shown), a determination unit (not shown) that
determines whether or not a face image is included in the input image based on correlation, and
a case where it is determined that a face image is included; It is comprised from the face
extraction part (not shown) which extracts a face image. In order to match the size of the face in
the prepared template image, the input image supplied to the template matching unit is, for
example, an image cut out into a predetermined size after converting the frame image into a
plurality of scales. The template matching unit performs matching on the input image for each
scale. As a template image, for example, an average face image composed of an average of about
100 persons can be used. The determination unit determines that a face image is included in the
input image when a correlation value equal to or greater than a predetermined threshold value is
indicated in template matching in the template matching unit, and the face extraction unit
determines the corresponding face area Extract.
Here, if any of the matching results is less than the predetermined threshold in the determination
04-05-2019
11
unit, it is determined that the input image does not include the face indicated by the template
image, and the determination result is sent to the template matching unit. return. When it is
determined that the input image does not include a face image, the matching unit performs
matching with the next scale image. Then, based on the matching result between the next scale
image and the template image, the determination unit determines whether a face image is
included in the scale image. Then, as described above, when the correlation value is equal to or
more than the predetermined threshold value, it is determined that the face image is included.
Matching with all scale images is performed, and if no face is detected, the next frame image is
processed. In addition, since the average face used in the template matching is generally
performed using a general one taken from the front, it is called a non-front face (hereinafter
referred to as a non-front face). ) Difficult to detect. For example, in a robot apparatus, if a CCD
camera for acquiring an image is mounted on, for example, the face of the robot apparatus, the
robot apparatus is photographed when a user etc. looks into the robot apparatus which has
turned over and turned over. The face image becomes a non-front face in a direction opposite to
that of the normal front face, that is, the front face is rotated approximately 180 degrees.
Therefore, in order to enable face detection even when such a non-frontal face is photographed, a
template image of the frontal face is used, and a face is not detected even if a template image of
the frontal face is used Is used by rotating the template image by a predetermined angle, and
when a face is detected, the template image of the rotation angle at the time of detection is used
to perform matching with the next input image. Even in this case, the face detection process may
be accelerated by storing the previous rotation angle as well as being detectable. Thus, in the
face detection module, the face is detected from the image data, and the robot apparatus can
perform actions such as approaching the direction of the detected face, pointing in the face
direction, or tracking based on the detection result. . Next, moving object detection in the moving
object detection module 32 will be specifically described. In the moving object detection process,
the moving object detection module in the recognition system 71 of the middleware layer 50
shown in FIG. 4 detects the moving object in the image data captured by the CCD camera 22 (FIG.
2) and detects the detected moving object direction. Take action such as facing or tracking.
For example, a difference image between frames can be generated, and a moving object can be
detected from this difference image. For example, in a still image, the difference value between
frames becomes 0 when the motion of the moving body stops. For example, as shown in FIG. 10,
when difference image data D1 to D3 are generated for image data P1 to P4 obtained by imaging
a human at times t1 to t4, respectively, if the face is stationary between times t3 and t4, The
difference data of the face disappears from the difference image data D3. That is, the
disappearance of the moving body from the difference image data means that the moving body
exists in the place where the moving body has disappeared, not from the spot. Therefore, the
robot apparatus 1 can detect the position of the moving body by detecting the time when this
difference becomes zero. With such moving object detection, for example, by orienting the CCD
camera 22 in the direction of the center-of-gravity position in the immediately preceding
04-05-2019
12
differential image, the center-of-gravity position can be oriented or approached. That is, as shown
in the flowchart of FIG. 11, first, at step S1, the moving body is detected by calculating the center
of gravity of the difference image data, and at step S2 whether the detected moving body has
disappeared from the difference image data It is determined. If the moving body has not
disappeared in step S2 (No), the process returns to step S1. On the other hand, if the moving
body disappears in step S2 (Yes), the process proceeds to step S3, and the direction of the lost
position, that is, the direction of the position of the center of gravity in the previous difference
image is directed or approaches the direction of the center of gravity. Although the moving
object disappears from the difference image even when the detected moving object deviates from
the visual range of the robot device 1, also in this case, it faces the direction of the center of
gravity position detected last in step S3 described above. In this case, the direction of the moving
body can be almost turned. As described above, the robot device 1 detects that the moving object
has disappeared from the difference image data due to the moving object being stopped within
the visual range, and by pointing the direction of the center of gravity position, for example, It is
possible to realize an autonomous interaction of feeling the sign of the moving body and turning
in that direction. In addition, it is possible to substantially direct the direction of the moving
object by detecting that the moving object has deviated from the visual range and disappearing
from the difference image data and pointing it in the direction of the center of gravity position
detected last. Further, the robot apparatus 1 not only detects the moving object disappearing
from the difference image data, but also detects the gravity center direction detected every
predetermined time interval or every time when the gravity center position of the moving body is
out of the visual range. It is also possible to track the moving body.
That is, as shown in the flowchart of FIG. 12, first, at step S10, the moving object is detected by
calculating the center of gravity of the difference image data, and at step S11, the moving object
is likely to go out of the visual range at predetermined time intervals. The direction of the
position of the center of gravity detected every time Here, in addition to the case where the
moving object disappears from the difference image data as described above, when the
movement of the robot device 1 in step S11 is large, the robot device 1 performs its motion and
movement by motion compensation. It becomes impossible to distinguish from movement and
loses motion. Therefore, in step S12, it is determined whether or not the moving body has been
lost. If the moving body is not lost in step S12 (No), the process returns to step S10. On the other
hand, if the moving object is lost in step S12 (Yes), the process proceeds to step S13 and faces
the direction of the center of gravity detected last. As described above, the robot device 1 faces
the detected barycentric direction every predetermined time interval or every time the moving
body is about to go out of the visual range, and the gravity center detected last when the moving
body loses sight By directing the direction of the position, it becomes possible to detect and track
the moving body in the image captured by the CCD camera 22 provided in the head unit 4 by a
simple method. In such moving object detection processing, first, the virtual robot 43 of the
robotic server object 42 shown in FIG. 9 reads out image data of a frame unit captured by the
04-05-2019
13
CCD camera 22 from the DRAM 11, and this image data Are sent to the moving object detection
module 32 included in the recognition system 71 of the middle wear layer 50. Then, each time
the moving object detection module 32 inputs image data, the moving object detection module
32 obtains difference from the image data of the adjacent previous frame to generate difference
image data. For example, when the difference image data D2 between the image data P2 and the
image data P3 described above is generated, the luminance value D2 (i, j) of the difference image
data D2 at the position (i, j) is the position (i, j) It is obtained by subtracting the luminance value
P2 (i, j) of the image data P2 at the same position from the luminance value P3 (i, j) of the image
data P3 in. Then, the same calculation is performed for all the pixels to generate difference image
data D2, and this difference image data D2 is generated. Then, the barycentric position G (x, y) is
calculated for the portion of the difference image data in which the luminance value is larger
than the threshold value Th.
Here, x and y are calculated using the following equations (3) and (4), respectively. Thus, as
shown in FIG. 13, for example, a difference image between the image data P2 and the image data
P3 described above is obtained. The center-of-gravity position G2 is obtained from the data D2.
The data of the determined center-of-gravity position is sent to the behavior model library 90 of
the application layer 51. As described above, the behavior model library 90 determines the
subsequent behavior with reference to the parameter value of emotion and the parameter value
of desire as necessary, and gives the determination result to the behavior switching module 91.
For example, when the moving object disappears from the difference image data, an action that
points or approaches the center of gravity position detected immediately before is determined,
and the determination result is given to the action switching module 91. Further, in the case of
tracking a moving object at predetermined time intervals, an action that points or approaches the
detected center of gravity position at each time interval is determined, and the determination
result is given to the action switching module 91. Then, the action switching module 91 sends an
action command based on the determination result to the tracking signal processing module 73
in the output system 80 of the middle wear layer 50. When given the action command, the
tracking signal processing module 73 generates a servo command value to be given to the
corresponding actuators 281 to 28 n to perform the action based on the action command, and
this data is The virtual robot 43 of the tick server object 42 and the signal processing circuit 14
(FIG. 2) sequentially transmit to the corresponding actuators 281 to 28 n. As a result, for
example, when the moving body disappears from the difference image data, the behavior model
library 90 determines an action that points or approaches the center of gravity position detected
immediately before, and the behavior switching module 91 determines An action command for
performing an action is generated. When tracking a moving object at predetermined time
intervals, the behavior model library 90 determines an action that points or approaches the
center of gravity detected at each time interval, and the action switching module 91 determines
the action. Action commands are generated to cause them to
04-05-2019
14
Then, when this action command is given to the tracking signal processing module 73, the
tracking signal processing module 73 sends out a servo command value based on the action
command to the corresponding actuators 281 to 28n. The robot apparatus 1 is interested in the
moving body, and the action of pointing the head in that direction or approaching the direction
of the moving body is expressed. Next, the sound source estimation process in the sound source
direction estimation module 34 will be specifically described. As described above, the head unit 4
of the robot apparatus 1 is provided with the microphones 24 corresponding to the left and right
"ears", and the robot apparatus 1 can estimate the sound source direction using the microphones
24. it can. Specifically, for example, as described in ?Oga, Yamazaki, Kanada? Acoustic system
and digital processing ?(The Institute of Electronics, Information and Communication Engineers)
p 197?, the sound source direction and the time difference between the signals received by
multiple microphones The sound source direction can be estimated using the one-to-one
relationship. That is, as shown in FIG. 14, when the plane waves coming from the ?S direction
are received by the two microphones M1 and M2 set apart by the distance d, the sound reception
signals of the microphones M1 and M2 ( Between the audio data) x1 (t) and x2 (t), the
relationships shown in the following formulas (5) and (6) are established. Here, in the following
formulas (5) and (6), c is the speed of sound, and ?S is a time difference between signals
received by the two microphones M1 and M2. Therefore, if the time difference ? S between the
sound receiving signals x 1 (t) and x 2 (t) is known, then: img class = ?EMIRef? id =
?197936346-000006? The arrival direction of the sound wave, that is, the sound source
direction can be determined by the following equation (7). Here, the time difference ? S is the
sound reception signal x 1 (t) as shown in the following equation (8): And x2 (t) can be obtained
from the cross correlation function ?12 (?). Here, in the following formula (8), E [и] is an
expected value. From the above equation (5) and equation (8), the cross-correlation function ?
12 (?) is expressed by the following equation (6) <img class = "EMIRef" id = "19793634600008" It is expressed as (9).
Here, in the following equation (9), ? 11 (?) is an autocorrelation function of the sound
reception signal x 1 (t). It is known that this autocorrelation function ? 11 (?) takes a maximum
value at ? = 0. <Img class = ?EMIRef? id = ?197936346-00009? Therefore, according to
equation (5), the cross correlation function ?12 (?) takes a maximum value at ? = ?S.
Therefore, calculating the cross-correlation function ?12 (?) and finding ? giving the
maximum value gives ?S, which is substituted into the above equation (3) to obtain the arrival
direction of the sound wave, ie the sound source direction It can be asked. Then, the difference
between the direction in which the robot apparatus 1 is currently facing and the direction of the
sound source is calculated, and the relative angle of the sound source direction with respect to
the direction of the trunk is obtained. Here, as shown in FIG. 15 (a), the robot apparatus 1 differs
in the distance between the microphone 24R / L provided at different positions provided on the
head unit and the sound source A, That is, the position of the sound source is estimated based on
04-05-2019
15
the time difference between the reception signals, but for a certain point A, when considering a
point B that is line symmetrical with a straight line 101 connecting the left and right
microphones 24L and 24R, That is, considering the points A and B where the distances LA and
LB between the straight line 101 are equal, the distance between the point A and the left and
right microphones 24R / L and the distance between the point B and the microphones 24R / L
Since the time difference between the sound reception signals is equal, the time difference
between the sound reception signals is also equal. Therefore, depending on the time difference
between the sound reception signals, the direction of the sound source can not be specified.
Therefore, in the present embodiment, the sound source direction is stored as a history of
information related to the sound source direction, whether it is the sound source direction
specified in one direction last time or the estimated two sound source directions. If it can not be
identified, it shall refer to the previous history. If the previous sound source direction is specified
as one, the object does not move so much in a short time, and the sound source of the current
sound data has a high possibility of the same direction as the sound source direction of the
previous sound data Can be used. Also, if the previous sound source direction is not specified as
one, or if there is no history at all, the head unit equipped with a microphone is rotated as
described below to obtain audio data before and after rotation. Thus, the sound source direction
can be identified. That is, in FIG. 15A, when the point A on the upper left side of the screen of the
robot apparatus is actually a sound source, when the head unit 4 is turned to the left, the
microphone 24 R on the right is the sound source It will be closer to the point.
That is, the sound source direction can be specified by the time difference between the left and
right received sound signals before and after the rotation. Similarly, when point B at the lower
left side of the screen is a sound source, the left microphone 24L is closer to point B when the
head unit 4 is turned to the left, so that it can be identified that the sound source direction is
point B. Thus, whether the actual sound source is the point A or the point B can be specified from
the sound data before and after the rotation. Thus, the sound source direction is estimated by the
sound source direction estimation module 34, and based on the result of the sound source
direction estimation, the robot device can express an operation such as pointing toward the
estimated sound source direction or approaching the sound source. Next, a control method for
controlling the action of the robot apparatus based on the face detection result, the moving
object detection result, and the sound source direction estimation result will be described. In the
present embodiment, when the face detection module 33 of the robot apparatus 1 detects a face,
the robot apparatus 1 is controlled to start walking in the face direction and perform an
operation approaching the face detection target. Here, the face direction indicates, for example, a
direction in which the center of gravity of the face area substantially overlaps the vertical line
passing through the center of the screen. In addition, when a moving object is detected by the
moving object detection module 32, an operation approaching the moving object is performed by
starting walking in the gravity center position direction (moving object direction) in the
difference image, and a sound source estimation direction is detected. It is controlled to start
04-05-2019
16
walking in the direction and perform an operation to approach the sound source. Here, when
face detection, moving object detection, and sound source direction estimation are
simultaneously performed, control is performed such that the face detection result is
preferentially used. That is, for example, when the estimated sound source direction and the
detected face direction are different, it is controlled to move in the face direction. FIG. 16 is a
flowchart showing an action control method of the robot apparatus in the present embodiment.
As shown in FIG. 16, first, the robot apparatus stands by while swinging at constant intervals
(step S21). Next, voice detection, moving object detection, and face detection are sequentially
determined (steps S22 to S24). If none of them is detected, the process returns to step S21 again
to enter a standby state. On the other hand, when the face is detected in step S24, for example,
the face position is detected by rotating the head unit so that the center of gravity of the detected
face area is on a vertical line passing through the center of the screen. It identifies (step S31), and
starts walking in the face direction (step S32).
During walking, walking is continued until a predetermined ending condition described later is
satisfied. Also, in step S22, when the voice is detected, the direction of the sound source is
specified. As described above, when it is difficult to specify the sound source direction, that is,
when two sound source directions are calculated as described above, it is detected whether or
not there is a history of sound source directions ( Step S25). Here, if there is a sound source
direction history, the current sound source direction is specified with reference to the sound
source direction history, and walking is performed in the sound source direction (step S29). On
the other hand, if no history is detected in step S25, the robot apparatus 1 rotates (swings) the
head unit having the voice detection means, stores the calculated two sound source directions,
and enters the standby state again. Then, when the audio data is detected again, the sound
source direction is stored in the above-mentioned step S26, so that the sound source direction
can be specified. Here, in step S26, the history of the two calculated sound source directions is
stored, but it is also possible to store one sound source direction specified last time. That is, when
it is determined in step S25 that there is no history, the head unit is rotated, one sound source
direction is calculated from the sound data before and after rotation, and the sound source
direction specified from the sound data before and after rotation is used as the history. It may be
stored. If a moving body is detected in step S23, the head unit is rotated and tracked in
accordance with the movement of the moving body (step S27), and the position at which the
moving body stands still is detected (step S28). The tracking of the moving object is continued
until the motion of the moving object is detected. Then, when the stillness of the moving body is
detected, for example, walking is started in the direction of the center of gravity in the difference
data of the image data before and after (step S29). In step S29, when walking is started based on
the result of voice detection or moving object detection, it is determined periodically whether a
face is detected (step S30). Then, in step S29, face detection processing is performed when a face
is detected even while walking. That is, for example, the face position detected by rotating the
head or the like so that the face area comes to the center position of the screen is identified (step
04-05-2019
17
S31), and walking in the face direction is started (step S32). Next, the end determination in step
S32 will be described.
In the present embodiment, walking is started by face detection or the like, and walking is
stopped when a predetermined ending condition is satisfied. The conditions for the end
determination include the following. That is, When the face direction in which the face is
detected is the front of the robot apparatus and the distance to the face detection target is equal
to or less than a predetermined distance When the distance to the object is equal to or less than a
predetermined distance When a predetermined speech word is detected The walking is stopped
when any one of the termination conditions is met when the contact sensor detects a touch.
Whether or not the face direction is the front of the robot device, as shown in FIG. 17, when the
vertical viewing angle of the robot device 1 is ?1, the face 301 of the object 300 is from the
front of the robot device 1 For example, the moving direction is controlled so as to be within the
range of ▒ 22.5 ░ in the horizontal direction of the screen. Further, the distance H between the
face detection target or the target object 300 is detected by, for example, a PSD (Position Sensor
Devise) or a distance sensor, or the size of the face area in the image, the size of the target object,
etc. It is estimated that the stop control can be performed when the distance becomes, for
example, 40 cm or less. Here, for example, when the vertical viewing angle of the robot
apparatus 1 is 55 ░, the face of the object may not be detected depending on the designated
distance. In such a case, only the distance data to the object can be used as the stop condition. In
addition, for example, walking is also stopped when a predetermined speech word such as
?seating? or ?torting? is detected. In addition, when the user picks up his head or touches an
obstacle, the touch sensor detects the touch and stops walking. Also, for example, when a touch
is detected by a touch sensor other than the head, it can be determined that the possibility of
having touched an obstacle is high. Alternatively, an action may be generated to perform
operation control to bypass an obstacle on the spot. FIG. 18 is a view schematically showing a
walking path when the robot apparatus approaches an object. When the robot device 1 detects a
face while swinging the face, as shown in FIG. 18, when the face of the object 300 is detected at a
position at an angle ?2 from the posture direction C of itself, Rather than starting walking on
the object 300, the movement may look more natural if it moves by ?2 rotation while drawing
the arc D.
By controlling to move in this manner, visual effects can be improved. Next, the operation of the
robot apparatus approaching the object will be described in more detail. As described above, the
robot device 1 starts walking in a predetermined direction by face detection, voice detection or
moving object detection, but at this time, the angle of the vertical direction of the head unit is
made different depending on the object. , Can further improve the entertainment. FIGS. 19 (a)
and 19 (b) are views showing the robot apparatus in a walking state, and are side views showing
the robot apparatus according to the present embodiment and the prior art, respectively. As
04-05-2019
18
shown in FIG. 19 (a), for example, when detecting a human face and performing an operation to
closely follow a human, the field of view is smaller than that of the conventional walking posture
shown in FIG. 19 (b). By moving upward and upward, the ability to detect and follow human faces
improves. In addition, even when moving in the direction of voice or moving body, by moving
upward while giving the impression that the robot device 1 is pointing at the target person who
made a voice to the robot device, for example Visual effects occur. Also, depending on whether
the walking target is a human or an object such as a ball, the angle in the vertical direction of the
head unit 4 is changed to make the visual field different, that is, change the pattern of the face
position. However, the behavior is closer to that of animals, and entertainment is improved. In the
present embodiment, using the voice and moving object detection results together with the face
detection results and controlling the operation approaching the target, the malfunction can be
reduced. Moreover, when detecting audio | voice data and estimating a sound source direction,
the estimation ratio of a sound source direction can be improved by using the log | history of the
information regarding the sound source direction in the past. In addition, when approaching by a
predetermined distance or more, the approaching motion is stopped by a predetermined call or
the like, and not only the face direction, the sound source direction, or the moving object
detection direction is approached but an arc is drawn to walk. By controlling to walk with the
field of view upward to look at the face of the user, it becomes a motion close to an actual animal,
and it is possible to improve entertainment as a pet type robot device. The present invention is
not limited to the above-described embodiment, and it goes without saying that various
modifications can be made without departing from the scope of the present invention.
For example, in the above embodiment, face detection, voice detection, and moving object
detection ar
Документ
Категория
Без категории
Просмотров
0
Размер файла
46 Кб
Теги
jp2004130427
1/--страниц
Пожаловаться на содержимое документа