経験に固執しない Profit Sharing 法

Transactions of the Japanese Society for Artificial Intelligence 21:81-93 (2006)
  Copy   BIBTEX

Abstract

Profit Sharing is one of the reinforcement learning methods. An agent, as a learner, selects an action with a state-action value and receives rewards when it reaches a goal state. Then it distributes receiving rewards to state-action values. This paper discusses how to set the initial value of a state-action value. A distribution function ƒ( x ) is called as the reinforcement function. On Profit Sharing, an agent learns a policy by distributing rewards with the reinforcement function. On Markov Decision Processes (MDPs), the reinforcement function ƒ( x ) = 1/ L x is useful, and on Partially Observable Markov Decision Processes (POMDPs), ƒ( x ) = 1/ L w is useful, where L is the sufficient number of rules at each state, and W is the length of an episode. If episodes are always long, the value of the reinforcement function is little. So the differences of rule values become little, and the agent learns little by using the roulette selection as an action selection. This problem is called as Learning Speed Problem. If the value of the reinforcement function for an action is very higher than its state-action value, an agent will not select other action. There is a problem when its action is not a optimal action. This problem is called as Past Experiences Problem. This paper shows that both Learning Speed Problem and Past Experiences Problem are caused by the bad setting between the initial values of a state-action values and the function values of a reinforcement function. We propose how to set the initial values of a state-action values at each state. The experiment shows that an agent can learn correctly even if the length of episode is large. And shows the effectiveness on both MDPs and POMDPs. Our proposed method focuses on the initialization of state-action values and does not limit reinforcement functions. So it can apply to any reinforcement function.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,168

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Profit Sharing 法における強化関数に関する一考察.Tatsumi Shoji Uemura Wataru - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:197-203.
How to profit from profit sharing.J. Bell & D. Wray - 1989 - Business and Society Review 68:57-60.
Profit-sharing and industrial peace.Arthur O. Lovejoy - 1921 - International Journal of Ethics 31 (3):241-263.
不完全知覚判定法を導入した Profit Sharing.Masuda Shiro Saito Ken - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:379-388.
Profit Sharing の不完全知覚環境下への拡張: PS-r^* の提案と評価.Kobayashi Shigenobu Miyazaki Kazuteru - 2003 - Transactions of the Japanese Society for Artificial Intelligence 18:286-296.
Profit: Some moral reflections.Paul F. Camenisch - 1987 - Journal of Business Ethics 6 (3):225 - 231.
The self-dual serial cost-sharing rule.M. J. Albizuri - 2010 - Theory and Decision 69 (4):555-567.
Insights from ifaluk: Food sharing among cooperative fishers.Richard Sosis - 2004 - Behavioral and Brain Sciences 27 (4):568-569.
What Should Be the Data Sharing Policy of Cognitive Science?Mark A. Pitt & Yun Tang - 2013 - Topics in Cognitive Science 5 (1):214-221.

Analytics

Added to PP
2014-03-19

Downloads
19 (#801,944)

6 months
4 (#796,002)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references