Affiliations: Neural Networks Research Centre, Helsinki University
of Technology, P.O. Box 5400, FI-02015 HUT, Finland. Tel.: +358 9 451 5024;
Fax: +358 9 451 3277; E-mail: [email protected]
Abstract: A gradient-based method for both symmetric and asymmetric multiagent
reinforcement learning is introduced in this paper. Symmetric multiagent
reinforcement learning addresses the problem with agents involved in the
learning task having equal information states. Respectively, in asymmetric
multiagent reinforcement learning, the information states are not equal, i.e.
some agents (leaders) try to encourage agents with less information (followers)
to select actions that lead to improved overall utility values for the leaders.
In both cases, there are a huge number of parameters to learn and we thus need
to use some parametric function approximation methods to represent the value
functions of the agents. The method proposed in this paper is based on the VAPS
framework that is extended to utilize the theory of Markov games, which is a
natural basis of multiagent reinforcement learning.
Keywords: multiagent reinforcement learning, Markov games, Nash equilibrium, Stackelberg equilibrium, value function approximation