不完全知覚判定法を導入した Profit Sharing

Transactions of the Japanese Society for Artificial Intelligence 19:379-388 (2004)
  Copy   BIBTEX

Abstract

To apply reinforcement learning to difficult classes such as real-environment learning, we need to use a method robust to perceptual aliasing problem. The exploitation-oriented methods such as Profit Sharing can deal with the perceptual aliasing problem to a certain extent. However, when the agent needs to select different actions at the same sensory input, the learning efficiency worsens. To overcome the problem, several state partition methods using history information of state-action pairs are proposed. These methods try to convert a POMDP environment into an MDP environment, and thus they are sometimes very useful. However, their computation cost is very high especially in large state spaces. In contrast, memory-less approaches try to escape from the aliased states by outputting actions stochastically. However, these methods output actions stochastically even in unaliased states, and thus the learning efficiency is bad. If we desire to guarantee the rationality in POMDPs, it is efficient to output actions stochastically only in the aliased states and to output one action deterministically in the other unaliased states. Hence, to discriminate between aliased states and unaliased states, the utilization of χ² -goodness-of-fit test is proposed by Miyazaki et al. They point out that, in aliased states, the distributions of the state transitions by random search and a particular policy are different. This difference doesn't occur owing to non-deterministic actions. Hence, if the agent can collect enough samples to implement the test, the agent can distinguish between aliased states and unaliased states well. However, such a test needs a large amount of data, and it's a problem how the agent collects samples without worsening learning efficiency. If the agent uses random search in the course of learning, the learning efficiency worsens especially in unaliased states. Therefore, in this research, we propose a new method called Extended On-line Profit Sharing with Judgement (EOPSwJ) to detect important incomplete perception, which doesn't need large computation cost and numerous samples. We use two criterions for detecting important incomplete perceptions to attain a task. One is the rate of transitions to each state, and the other is the deterministic rate of actions. We confirm the availability of EOPSwJ using two simulations.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,611

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

How to profit from profit sharing.J. Bell & D. Wray - 1989 - Business and Society Review 68:57-60.
Profit-sharing and industrial peace.Arthur O. Lovejoy - 1921 - International Journal of Ethics 31 (3):241-263.
Profit Sharing 法における強化関数に関する一考察.Tatsumi Shoji Uemura Wataru - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:197-203.
経験に固執しない Profit Sharing 法.Ueno Atsushi Uemura Wataru - 2006 - Transactions of the Japanese Society for Artificial Intelligence 21:81-93.
Profit Sharing の不完全知覚環境下への拡張: PS-r^* の提案と評価.Kobayashi Shigenobu Miyazaki Kazuteru - 2003 - Transactions of the Japanese Society for Artificial Intelligence 18:286-296.
Profit: Some moral reflections.Paul F. Camenisch - 1987 - Journal of Business Ethics 6 (3):225 - 231.
The self-dual serial cost-sharing rule.M. J. Albizuri - 2010 - Theory and Decision 69 (4):555-567.
Insights from ifaluk: Food sharing among cooperative fishers.Richard Sosis - 2004 - Behavioral and Brain Sciences 27 (4):568-569.
What Should Be the Data Sharing Policy of Cognitive Science?Mark A. Pitt & Yun Tang - 2013 - Topics in Cognitive Science 5 (1):214-221.

Analytics

Added to PP
2014-03-21

Downloads
24 (#662,338)

6 months
10 (#280,381)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references