Κ-確実探査法と動的計画法を用いた mdps 環境の効率的探索法

Kawada Seiichi Tateyama Takeshi

Download from

dx.doi.org

More download options

Κ-確実探査法と動的計画法を用いた mdps 環境の効率的探索法

Kawada Seiichi Tateyama Takeshi

Transactions of the Japanese Society for Artificial Intelligence 16:11-19 (2001) Copy BIBT_EX

Abstract

One most common problem in reinforcement learning systems (e.g. Q-learning) is to reduce the number of trials to converge to an optimal policy. As one of the solution to the problem, k-certainty exploration method was proposed. Miyazaki reported that this method could determine an optimal policy faster than Q-learning in Markov decision processes (MDPs). This method is very efficient learning method. But, we propose an improvement plan that makes this method more efficient. In k-certainty exploration method, in case there is no k-uncertainty rule (rules which aren’t selected more than k times) in current state, an agent sometimes walks randomly until it finds a state where it can select a k-uncertainty rule. We think that this behavior is not efficient. To reduce this useless behavior, we propose combining k-certainty exploration method with Dynamic programming (DP). Miyazaki’s system uses DP after all rules are executed at least k times. But, our method uses k-certainty exploration method along with DP during learning. Our method, takes two pattern actions. In case an agent can select k-uncertainty rules, one of these rules is selected at random as k-certainty exploration method. In another case there is no k-uncertainty rule, behavior of agent is different from the behavior of k-certainty exploration method. In that case, our method uses DP to compute an optimal policy for moving from a current state to a state in which some k-uncertainty rules remain. The model for DP is constructed by using only known states. The outline will be described below. First, an agent makes a map constructed by using only known states. In the map, goals are states in which there are k-uncertainty rules and arbitrary state values are set in these states. Point to which attention should be paid is that the map is not given from outside. It is made from only experience. Next, each state values of states in which there are only k-certainty rules in the map are computed by DP (we used Policy Iteration). Finally, an agent continues to select greedy action until it arrives at a state in which it can select a k-uncertainty rule. By this improvement, we expect it can determine an optimal policy faster than k-certainty exploration method. And we have verified that our exploration method can determine an optimal policy faster than k-certainty exploration method by computer simulation.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Edit

Keywords

reinforcement learning, exploration, Markov decision processes, k-certainty exploration method, dynamic programming, policy iteration

Reprint years

DOI

10.1527/tjsai.16.11

My notes

Analytics

Added to PP
2014-03-25

Downloads
22 (#709,216)

6 months
7 (#430,521)

Historical graph of downloads

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Κ-確実探査法と動的計画法を用いた mdps 環境の効率的探索法

Abstract

Categories

Keywords

Reprint years

DOI

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work