罰回避政策形成アルゴリズムの改良とオセロゲームへの応用

Transactions of the Japanese Society for Artificial Intelligence 17:548-556 (2002)
  Copy   BIBTEX

Abstract

The purpose of reinforcement learning is to learn an optimal policy in general. However, in 2-players games such as the othello game, it is important to acquire a penalty avoiding policy. In this paper, we focus on formation of a penalty avoiding policy based on the Penalty Avoiding Rational Policy Making algorithm [Miyazaki 01]. In applying it to large-scale problems, we are confronted with the curse of dimensionality. We introduce several ideas and heuristics to overcome the combinational explosion in large-scale problems. First, we propose an algorithm to save the memory by calculation of state transition. Second, we describe how to restrict exploration by two type knowledge; KIFU database and evaluation funcion. We show that our learning player can always defeat against the well-known othello game program KITTY.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,098

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

罰を回避する合理的政策の学習.坪井 創吾 宮崎 和光 - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16 (2):185-192.
合理的政策形成アルゴリズムの連続値入力への拡張.木村 元 宮崎 和光 - 2007 - Transactions of the Japanese Society for Artificial Intelligence 22 (3):332-341.
Profit Sharing の不完全知覚環境下への拡張: PS-r^* の提案と評価.Kobayashi Shigenobu Miyazaki Kazuteru - 2003 - Transactions of the Japanese Society for Artificial Intelligence 18:286-296.
独立制約充足による最適化と送水制御への適用.青木 圭 池田 心 - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:38-46.
尤度情報に基づく温度分布を用いた強化学習法.鈴木 健嗣 小堀 訓成 - 2005 - Transactions of the Japanese Society for Artificial Intelligence 20:297-305.
Qdsega による多足ロボットの歩行運動の獲得.Matsuno Fumitoshi Ito Kazuyuki - 2002 - Transactions of the Japanese Society for Artificial Intelligence 17:363-372.
Ga により探索空間の動的生成を行う Q 学習.Matsuno Fumitoshi Ito Kazuyuki - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16:510-520.

Analytics

Added to PP
2014-03-24

Downloads
22 (#733,560)

6 months
4 (#862,833)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references