вход по аккаунту



код для вставкиСкачать
3rd International Conference on Advanced Technologies
for Signal and Image Processing - ATSIP'2017,
May 22-24, 2017, Fez, Morroco.
Smart multi-agent traffic coordinator for autonomous
vehicles at intersections
Imad LAMOUIK, Ali YAHYAOUY, My Abdelouahed SABRI
University of Sidi Mohamed Ben Abdellah
Faculty of Sciences Dhar El Mahraz Fez
Department of Computer Science , ,
Abstract—Intersections are not only a scene to daily car
accidents (rear-end collisions and side impacts...) but also a
big cause for anger and frustration to many drivers, making
the driving task difficult and dangerous. In this research, we
proposed a smart multi-agent system to coordinate traffic in
intersections for autonomous vehicles using reinforced learning
and deep neural networks, this system will offer the possibility
for a safe and fast passage through intersections without the need
for human control.
Keywords—traffic control , intersection , multi-agent , deep
reinforcement learning , deep neural network .
Autonomous vehicles spent years being just a part of
science fiction books and movies. They promised us a future
where we can travel with a single push of button and where we
can enjoy travel without worrying about accidents.., however
in the recent years this technology have seen a major transition
to reality in a form of small but important steps like parallel
parking assist system (PPAS)[1], autonomous cruise control
system and lane departure warning systems[2][3].. , it’s no
longer the question of if but when a fully autonomous vehicle
will hit the road.
Intersections have become, due to the fast growth and over
population of cities, a prime place for automobile accidents.
Every year, the U.S federal highway administration reports
approximately 2.5 million intersection accident, most of them
involve left turns. It’s also reported that 40 % of all crashes
and 20 % of all fatal accidents involve intersections, it’s
also estimated that 165,000 accident occurring annually at
intersections are caused by red light runners, this alone leads
from 700 to 800 death a year[4].
Intersections are also the main cause of traffic congestion,
making the drivers feel stressed and frustrated while idly
waiting at red lights or stop signs. They also lead to waste
in fuel consumption and pollution.
In this research, we’ll propose a decentralized multi-agent
system for autonomous vehicle coordination in intersections
using deep reinforcement learning, this system will provide a
way for coordinating traffic in intersections to help eliminate
collisions, reduce congestion and provide a safe and fast
transportation experience.
978-1-5386-0551-6/17/$31.00 2017
Due to the difficulty and complexity of handling traffic
in dense urban areas, many attempts have been made to
find a better solution for managing vehicles at intersections.
Some focused on optimizing the traditional traffic-light signals
using swarm intelligent (discrete harmony search) [5], or by
transmitting the intersection state back to the drivers and
giving them the option to change route to avoid congested
traffic [6]. On the other hand, some solutions focused on
scheduling [7] vehicle passage through intersections using
multiple techniques: first-in-first-out, shortest remaining time,
round robin [8], fixed priorities [9], and multi-level queue
scheduling[10]. Or by using a reservation-based approach
[11]. However all these solutions are static and must
be re-implemented for each intersection based on lanes
More smart solutions have been studied like traffic
coordination based on vehicle trajectory and not by reserving
the whole intersection [12].
Other attempts focused mainly on changing road
architecture by creating bridges instead of intersections, these
techniques work well but cost a lot to build and maintain the
infrastructure, and may also not be possible in some areas.
Multi-agent systems (MAS) are defined in [13] as ”the
subfield of artificial intelligence that aims to provide both
principles for construction of complex systems involving
multiple agents and mechanisms for coordination of
independent agents’ behaviors” a multi-agent system consists
of multiple autonomous agents, that can solve bigger problems
than they would be able to solve individually. The key
advantage of MAS is that each agent can be very simple to
build and can be implemented independently, as long as its
specification is respected to assure the correct interaction with
other agents. Another important aspect of MAS is they are
scalable because including a new agent in MAS is easier than
changing a large system.
Each agent at a MAS can interact with its environment and
take actions depending on the state of the environment and as
a result receives feedback(reward) as show in Fig.1.
In this research, we created a system composed of
two classes of multi agent systems, a vehicle agent VA
Fig. 1: An abstract view of an agent interaction with the
installed in each vehicle, these agents communicate with the
intersection agent IA installed at every intersection Fig2. This
communication is what will lead to the coordination of traffic
between the (VAs) present at a specific intersection. Each class
of agents have its own architecture and responsibilities and are
defined as follow:
Fig. 2: Overview of the components of the proposed system
The exchange of messages between the intersection agent
and the vehicle agent passes through multiple steps modeled
as shown in Fig.3.
A. Vehicle agent (VA)
The vehicle agent VA is a system installed in each vehicle
passing by the intersection, it is composed of a communication
system to receive and transmit messages from and to the
IA. This communication could be done either by using radio
signals, GSM [14], WI-FI or 4G[15] ..., the only criteria of
choosing the method of communication is that it must be
capable of working with moving agents at high speeds with
precision. The VA can use different techniques to detect when
it approaches an intersection in order to initiate the connection,
such as sensors or GPS..., in this research we choose to use
a GPS based method to detect intersections however using
different method will lead to similar results.
The vehicle agent also has a mechanism to control the
speed and brakes of the vehicle based on the response and
instructions of the intersection agent.
B. The intersection Agent (IA)
The intersection component is a smart agent, pre-trained to
coordinate traffic for autonomous vehicles inside intersections.
It‘s responsible for deciding the best action (accelerate,
decelerate, keep same speed) that will allow the vehicles to
pass the intersection at the fastest possible speed and without
collision with other vehicles.
The base knowledge contained in the IA is shared among
different IAs with similar architecture (number and geometry
of lanes), this will eliminate the need to re-train every new IA,
and they will be able to share new experiences.
The IA also takes on consideration the nature of vehicles
to prioritize special vehicles such as ambulances and police
Fig. 3: proposed system algorithm
1) When a vehicle approaches an intersection, the vehicle
agent and at a predetermined constant distant d initiates
a connection with the IA and listens for incoming
instructions, the connection will stay open until the
vehicle exits the intersection. In this system, we assume
that the VA will follow the exact commands from the IA.
2) The intersection agent receives a request then establishes
a connection with VA, the IA start generating commands
that will allow the vehicle to pass the intersection without
collision with other vehicles. If a passage is not possible
at that moment, the IA instruct the VA to decelerate until
it stops and wait further instructions.
3) When a connection is established with intersection agent
the vehicle grant full control of the steering wheel,
throttle, and breaks to the IA.
4) After the vehicle passes the intersection the IA release
control of the vehicle and the vehicle driver takes full
control of the vehicle.
The request message contains the id, position , destination
(turn left, turn right, going straight ahead), original speed, and
the dimensions (width and length) of the vehicle sending the
request, in addition to its priority in passing the intersection
(normal, police car , ambulance...).
If a vehicle sends a request and did not receive a response
in a time t the vehicle cannot advance in the intersection and
must resend a requests until it receive a response.
In this research, we focused on making the intersection
agent smart rather than just controlling its behavior based
on static policies, this way the agent will be able to adapt
to new unexpected scenarios that were not present during
the development of the IA; and to accomplish that, deep
reinforced learning is used with deep neural networks as
function approximators. The IA uses a trained deep Q-network
to estimate the best action a for each vehicle in the intersection
depending on the state s of other vehicles already in the
intersection at every time increment t.
A. Deep reinforcement learning
Fig. 4: Communication process between intersection agent and
vehicle agent
The messages exchanged between the VA and the IA
contain a minimum number of arguments to minimize the
message size and in consequence the transfer delay and data
1 <?xml version="1.0" encoding="UTF-8"?>
2 <xml>
<type value="Request" />
<destination value="left" />
16 </xml>
Request message
1 <?xml version="1.0" encoding="UTF-8"?>
2 <xml>
<type value="Response" />
6 </xml>
Response messsage
Deep neural networks are general purpose function
approximators that have been used in supervised learning.
In recent years, and by taking advantage of the advances in
computational power they have been applied to reinforcement
learning problems [16][17], giving rise to the field of deep
reinforcement learning [18]. This field seeks to combine the
advances in deep neural networks with reinforcement learning
algorithms to create agents capable of acting intelligently in
complex environments without human intervention, in this
research, we used deep Q-learning as reinforcement learning
In deep Q-learning [19], a value function Q(s, a) is
estimated over the learning process. A value represents the
expected sum of discounted future rewards rt the agent expects
to receive by executing an action a from a state s [20].
Q(st , at ) = E[rt+1 + γrt+2 + γ 2 rt+3 + ...|s, a]
Value functions decompose into a Bellman equation :
Q(s, a) = Ea ,b [r + Q(s , a )|s, a]
The deep Q-learning algorithm uses a deep neural network
as a function approximators for the Q-value.
In training, we
observe a transition from a state s to state s by choosing the
best action a and receiving a reward r to estimate the error .
Error = [r + γmaxa Q(s , a ) − Q(s, a)]2
Given a transition <s, a, r, s >, the Q action-value
calculation process is detailed as follow :
1) Feed-forward the current state s to get predicted Q-values
for all actions.
2) Choose an action a using -greedy to get into the next
state s
3) Feed-forward state s and calculate maximum overall
network outputs maxa Q(s , a ).
4) Set Q-value target with the new value for the chosen
action .
Q(s, a) = r + γmaxa Q(s , a )
5) Update network weights using back-propagation.
B. System inputs
The IA takes 5 continuously valued features (position x,
position y, speed, width and length), and 2 discreet features
(destination and priority) of the vehicles (current vehicle vc
and the vehicles present in the intersection) as input to make
a decision about the output to return to the current vehicle vc .
These features are arranged in a m ∗ 5 matrix where m
is the maximum number of vehicles that can be managed at
a time. and because a neural network has a fixed number of
input nodes we fill the rest with zeros (if the number of current
vehicles is less than m).
The matrix is then flattened to a 1D array so that each cell
corresponds to a node’s input in the input layer, this process
is described in Fig. 5
Fig. 6: Vehicles arrangement to input to the deep neural
⎨speed ∗ priority
r = 0.1
speed > 0 collision = f alse
speed = 0 collision = f alse
collision = true
To test how well the proposed system worked, we created
a simulation environment using Python. The simulator tries
to coordinate traffic in a demo intersection using the method
described above.
Fig. 5: Vehicles arrangement to input to the deep neural
In the simulator we only control the actions of one vehicle
(green) to move through the intersection from left to right
Fig.9, meanwhile, the movement of the other vehicles (red)
are generated randomly to simulate real word behavior .
C. Deep neural network architecture
To create a deep neural network we used a neural network
with twenty fully connected hidden layers and a Rectified
Linear Unit (ReLU [21]) Eq.5 as activation function. Each
layer has 300 node .
f (x) = max(0, x)
D. System output
The deep neural network action space is three actions
(keep same speed, accelerate, decelerate). At each time
incrementation t the intersection must instruct the vehicle agent
to take one of these actions, the output with the highest Q-value
is chosen Fig.6.
E. Reward function
During the training of the agent the system need a
mechanism to evaluate each action and maximize its expected
discounted future rewards for state-action pairs. In designing
the reward function, we emphasized the importance of passing
with the highest speed and passing the vehicle with higher
priority first.
Fig. 7: Estimated Max Q-value
In running the simulation program, we observed the
progress of the value of the Q-value and the reward that the
agent received. Fig.7 , Fig.8.
This system used a deep neural network and reinforcement
learning to estimate the best action for each vehicle. This
approach will give the intersection the ability to adapt to
any new intersection architecture quickly without the need to
re-implement it. Moreover, the proposed solution will give the
possibility for a safe and fast passage through intersection,
without the need for human control.
However, this system still needs some improvements
to take on consideration pedestrian’s crosswalk and
non-autonomous vehicles. In addition, some other
improvements maybe added like predictive vehicle routing
based on destination, and communication between intersection
systems to route to the intersection with fewer vehicles.
Fig. 8: Recieved reward
S. H. Jeong, C. G. Choi, J. N. Oh, P. J. Yoon, B. S. Kim, M. Kim, and
K. H. Lee, “Low cost design of parallel parking assist system based on
an ultrasonic sensor,” International Journal of Automotive Technology,
vol. 11, no. 3, pp. 409–416, 2010.
W.-w. Zhang, X.-l. Song, and G.-x. Zhang, “Real-time lane departure
warning system based on principal component analysis of grayscale
distribution and risk evaluation model,” Journal of Central South
University, vol. 21, no. 4, pp. 1633–1642, 2014.
N. Balaji, K. Babulu, and M. Hema, Development of a Real-Time Lane
Departure Warning System for Driver Assistance, pp. 463–471. Cham:
Springer International Publishing, 2015.
“Federal highway administration (fhwa). intersection safety.,” 2012.
K. Gao, Y. Zhang, A. Sadollah, and R. Su, “Optimizing urban traffic
light scheduling problem using harmony search with ensemble of local
search,” Applied Soft Computing, vol. 48, pp. 359–372, nov 2016.
R. Sumner, “Cell messaging process for an in-vehicle traffic congestion
information system,” Jan. 26 1993. US Patent 5,182,555.
Q. Jin, G. Wu, K. Boriboonsomsin, and M. Barth, “Multi-agent
intersection management for connected vehicles using an optimal
scheduling approach,” IEEE, vol. 10.1109/ICCVE.2012.41, pp. 185 –
190,, 2012.
A. Silberschatz and G. Galvin, Peter B.and Gagne, Operating System
Concepts. eighth ed. Round Robin Scheduling.
K. Tindell and H. Hansson, “Real time systems by fixed priority
scheduling,” Master’s thesis, Department of Computer Systems Uppsala
University, 1997.
H. S. Behera, R. K. Naik, and S. Parida, “Improved multilevel feedback
queue scheduling using dynamic time quantum and its performance
analysis,” International Journal of Computer Science and Information
Technologies,, vol. 3, p. 3801 – 3807, 2012.
K. Dresner and P. Stone, “A multiagent approach to autonomous
intersection management,” Journal of Artificial Intelligence Research,
vol. 31, pp. 591–656, March 2008.
M. A. S. Kamal, J. i. Imura, T. Hayakawa, A. Ohata, and K. Aihara,
“A vehicle-intersection coordination scheme for smooth flows of
traffic without using traffic lights,” IEEE Transactions on Intelligent
Transportation Systems, vol. 16, pp. 1136–1147, June 2015.
P. Stone and M. Veloso, “Multiagent systems: A survey from a machine
learning perspective,” Autonomous Robots, vol. 8, no. 3, pp. 345–383,
S. Y. Willassen, “Forensics and the GSM mobile telephone system,”
IJDE, vol. 2, no. 1, 2003.
J. Govil and J. Govil, “4g mobile communication systems: Turns, trends
and transition,” in Proceedings of the 2007 International Conference
on Convergence Information Technology, ICCIT ’07, (Washington, DC,
USA), pp. 13–18, IEEE Computer Society, 2007.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,
D. Wierstra, and M. A. Riedmiller, “Playing atari with deep
reinforcement learning,” CoRR, vol. abs/1312.5602, 2013.
Fig. 9: Simulation screen-shot
In this research, we described some of the current solutions
to the problem of traffic congestion at intersections. And we
concluded that many of them are either static or they must be
redesigned for each intersection architecture, depending on the
number and width of lanes
To solve this problem, we created multi agent system to
coordinate traffic for autonomous vehicles at intersections.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran,
D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through
deep reinforcement learning,” Nature, vol. 518, p. 529–533, feb 2015.
[18] Y. Li, “Deep reinforcement learning: An overview,” CoRR,
vol. abs/1701.07274, 2017.
[19] C. J. Watkins and P. Dayan, “Technical note: Q-learning,” Machine
Learning, vol. 8, no. 3, pp. 279–292, 1992.
[20] C. J. C. H. Watkins and P. Dayan, “Q-learning,” in Machine Learning,
pp. 279–292, 1992.
[21] R. Arora, A. Basu, P. Mianjy, and A. Mukherjee, “Understanding deep
neural networks with rectified linear units,” CoRR, vol. abs/1611.01491,
Без категории
Размер файла
396 Кб
2017, 8075564, atsip
Пожаловаться на содержимое документа