経験に固執しない Profit Sharing 法

Ueno Atsushi Uemura Wataru

Download from

dx.doi.org

More download options

経験に固執しない Profit Sharing 法

Ueno Atsushi Uemura Wataru

Transactions of the Japanese Society for Artificial Intelligence 21:81-93 (2006) Copy BIBT_EX

Abstract

Profit Sharing is one of the reinforcement learning methods. An agent, as a learner, selects an action with a state-action value and receives rewards when it reaches a goal state. Then it distributes receiving rewards to state-action values. This paper discusses how to set the initial value of a state-action value. A distribution function ƒ( x ) is called as the reinforcement function. On Profit Sharing, an agent learns a policy by distributing rewards with the reinforcement function. On Markov Decision Processes (MDPs), the reinforcement function ƒ( x ) = 1/ L x is useful, and on Partially Observable Markov Decision Processes (POMDPs), ƒ( x ) = 1/ L w is useful, where L is the sufficient number of rules at each state, and W is the length of an episode. If episodes are always long, the value of the reinforcement function is little. So the differences of rule values become little, and the agent learns little by using the roulette selection as an action selection. This problem is called as Learning Speed Problem. If the value of the reinforcement function for an action is very higher than its state-action value, an agent will not select other action. There is a problem when its action is not a optimal action. This problem is called as Past Experiences Problem. This paper shows that both Learning Speed Problem and Past Experiences Problem are caused by the bad setting between the initial values of a state-action values and the function values of a reinforcement function. We propose how to set the initial values of a state-action values at each state. The experiment shows that an agent can learn correctly even if the length of episode is large. And shows the effectiveness on both MDPs and POMDPs. Our proposed method focuses on the initialization of state-action values and does not limit reinforcement functions. So it can apply to any reinforcement function.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Edit

Keywords

reinforcement learning, profit sharing, exploitation and exploration

Reprint years

DOI

10.1527/tjsai.21.81

My notes

Analytics

Added to PP
2014-03-19

Downloads
19 (#801,944)

6 months
4 (#796,002)

Historical graph of downloads

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

経験に固執しない Profit Sharing 法

Abstract

Categories

Keywords

Reprint years

DOI

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work