A New Architecture for Learning Classifier Systems to Solve POMDP Problems

Hamzeh, Ali; Rahmani, Adel

A New Architecture for Learning Classifier Systems to Solve POMDP Problems

Article type: Research Article

Affiliations: Computer Engineering Department Iran University of Science and Technology, Teheran, Iran. E-mail: [email protected]; [email protected]

Note: [] Address for correspondence: Computer Engineering Department, Iran University of Science and Technology, Narmak, Teheran, Iran

Abstract: Reinforcement Learning is a learning paradigm that helps the agent to learn to act optimally in an unknown environment through trial and error. An RL-based agent senses its environmental state, proposes an action, and applies it to the environment. Then a reinforcement signal, called the reward, is sent back from the environment to the agent. The agent is expected to learn how to maximize overall environmental reward through its internal mechanisms. One of the most challenging issues in the RL area arises as a result of the sensory ability of the agent, when it is not able to sense its current environmental state completely. These environments are called partially observable environments. In these environments, the agent may fail to distinguish the actual environmental state and so may fail to propose the optimal action in particular environmental states. So an extended mechanism must be added to the architecture of the agent to enable it to perform optimally in these environments. On the other hand, one of the most-used approaches to reinforcement learning is the evolutionary learning approach and one of the most-used techniques in this family is learning classifier systems. Learning classifier systems try to evolve state-action-reward mappings to model their current environment through trial and error. In this paper we propose a new architecture for learning classifier systems that is able to perform optimally in partially observable environments. This new architecture uses a novel method to detect aliased states in the environment and disambiguates them through multiple instances of classifier systems that interact with the environment in parallel. This model is applied to some well-known benchmark problems and is compared with some of the best classifier systems proposed for these environments. Our results and detailed discussion show that our approach is one of the best techniques among other learning classifier systems in partially observable environments.

Keywords: Learning Classifier Systems, POMDP Environments, XCS

Journal: Fundamenta Informaticae, vol. 84, no. 3-4, pp. 329-351, 2008

Received 5 September 2008

Accepted 5 September 2008

Published: 2008

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia