On the use of the policy gradient and Hessian in inverse reinforcement learning

Metelli, Alberto Maria; Pirotta, Matteo; Restelli, Marcello

doi:10.3233/IA-180011

On the use of the policy gradient and Hessian in inverse reinforcement learning

Article type: Research Article

Authors: Metelli, Alberto Maria^{a; *} | Pirotta, Matteo^b | Restelli, Marcello^a

Affiliations: [a] DEIB, Politecnico di Milano, Milan, Italy | [b] SequeL, Inria Lille – North Europe, Villeneuve d’Ascq, France

Correspondence: [*] Corresponding author: Alberto Maria Metelli, DEIB, Politecnico di Milano, 34/5, Via Ponzio, 20133 Milan, Italy. E-mail: [email protected].

Abstract: Reinforcement Learning (RL) is an effective approach to solve sequential decision making problems when the environment is equipped with a reward function to evaluate the agent’s actions. However, there are several domains in which a reward function is not available and difficult to estimate. When samples of expert agents are available, Inverse Reinforcement Learning (IRL) allows recovering a reward function that explains the demonstrated behavior. Most of the classic IRL methods, in addition to expert’s demonstrations, require sampling the environment to evaluate each reward function, that, in turn, is built starting from a set of engineered features. This paper is about a novel model-free IRL approach that does not require to specify a function space where to search for the expert’s reward function. Leveraging on the fact that the policy gradient needs to be zero for an optimal policy, the algorithm generates an approximation space for the reward function, in which a reward is singled out employing a second-order criterion. After introducing our approach for finite domains, we extend it to continuous ones. The empirical results, on both finite and continuous domains, show that the reward function recovered by our algorithm allows learning policies that outperform those obtained with the true reward function, in terms of learning speed.

Keywords: Reinforcement learning, inverse reinforcement learning, policy gradient, feature extraction

DOI: 10.3233/IA-180011

Journal: Intelligenza Artificiale, vol. 14, no. 1, pp. 117-150, 2020

Published: 17 September 2020

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia