Results for 'reinforcement learning, reward and penalty, penalty avoiding rational policy making, the othello game, KITTY'

967 found
Order:
  1.  22
    罰回避政策形成アルゴリズムの改良とオセロゲームへの応用.坪井 創吾 宮崎 和光 - 2002 - Transactions of the Japanese Society for Artificial Intelligence 17:548-556.
    The purpose of reinforcement learning is to learn an optimal policy in general. However, in 2-players games such as the othello game, it is important to acquire a penalty avoiding policy. In this paper, we focus on formation of a penalty avoiding policy based on the Penalty Avoiding Rational Policy Making algorithm [Miyazaki 01]. In applying it to large-scale problems, we are confronted with the curse of dimensionality. (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  2.  15
    罰を回避する合理的政策の学習.坪井 創吾 宮崎 和光 - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16 (2):185-192.
    Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. In general, the purpose of reinforcement learning system is to acquire an optimum policy that can maximize expected reward per an action. However, it is not always important for any environment. Especially, if we apply reinforcement learning system to engineering, environments, we expect the agent to avoid all penalties. In Markov Decision Processes, (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  3.  23
    合理的政策形成アルゴリズムの連続値入力への拡張.木村 元 宮崎 和光 - 2007 - Transactions of the Japanese Society for Artificial Intelligence 22 (3):332-341.
    Reinforcement Learning is a kind of machine learning. We know Profit Sharing, the Rational Policy Making algorithm, the Penalty Avoiding Rational Policy Making algorithm and PS-r* to guarantee the rationality in a typical class of the Partially Observable Markov Decision Processes. However they cannot treat continuous state spaces. In this paper, we present a solution to adapt them in continuous state spaces. We give RPM a mechanism to treat continuous state spaces in the (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  4.  18
    Predictive Movements and Human Reinforcement Learning of Sequential Action.Roy Kleijn, George Kachergis & Bernhard Hommel - 2018 - Cognitive Science 42 (S3):783-808.
    Sequential action makes up the bulk of human daily activity, and yet much remains unknown about how people learn such actions. In one motor learning paradigm, the serial reaction time (SRT) task, people are taught a consistent sequence of button presses by cueing them with the next target response. However, the SRT task only records keypress response times to a cued target, and thus it cannot reveal the full time‐course of motion, including predictive movements. This paper describes a mouse movement (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  5.  28
    Predictive Movements and Human Reinforcement Learning of Sequential Action.Roy de Kleijn, George Kachergis & Bernhard Hommel - 2018 - Cognitive Science 42 (S3):783-808.
    Sequential action makes up the bulk of human daily activity, and yet much remains unknown about how people learn such actions. In one motor learning paradigm, the serial reaction time (SRT) task, people are taught a consistent sequence of button presses by cueing them with the next target response. However, the SRT task only records keypress response times to a cued target, and thus it cannot reveal the full time‐course of motion, including predictive movements. This paper describes a mouse movement (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  6.  10
    Reinforcement Learning-Based Collision Avoidance Guidance Algorithm for Fixed-Wing UAVs.Yu Zhao, Jifeng Guo, Chengchao Bai & Hongxing Zheng - 2021 - Complexity 2021:1-12.
    A deep reinforcement learning-based computational guidance method is presented, which is used to identify and resolve the problem of collision avoidance for a variable number of fixed-wing UAVs in limited airspace. The cooperative guidance process is first analyzed for multiple aircraft by formulating flight scenarios using multiagent Markov game theory and solving it by machine learning algorithm. Furthermore, a self-learning framework is established by using the actor-critic model, which is proposed to train collision avoidance decision-making neural networks. To achieve (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  7. An Analysis of the Interaction Between Intelligent Software Agents and Human Users.Christopher Burr, Nello Cristianini & James Ladyman - 2018 - Minds and Machines 28 (4):735-774.
    Interactions between an intelligent software agent and a human user are ubiquitous in everyday situations such as access to information, entertainment, and purchases. In such interactions, the ISA mediates the user’s access to the content, or controls some other aspect of the user experience, and is not designed to be neutral about outcomes of user choices. Like human users, ISAs are driven by goals, make autonomous decisions, and can learn from experience. Using ideas from bounded rationality, we frame these interactions (...)
    Direct download (8 more)  
     
    Export citation  
     
    Bookmark   37 citations  
  8.  16
    Enforcing ethical goals over reinforcement-learning policies.Guido Governatori, Agata Ciabattoni, Ezio Bartocci & Emery A. Neufeld - 2022 - Ethics and Information Technology 24 (4):1-19.
    Recent years have yielded many discussions on how to endow autonomous agents with the ability to make ethical decisions, and the need for explicit ethical reasoning and transparency is a persistent theme in this literature. We present a modular and transparent approach to equip autonomous agents with the ability to comply with ethical prescriptions, while still enacting pre-learned optimal behaviour. Our approach relies on a normative supervisor module, that integrates a theorem prover for defeasible deontic logic within the control loop (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  9.  77
    Breve storia dell'etica.Sergio Cremaschi - 2012 - Roma RM, Italia: Carocci.
    The book reconstructs the history of Western ethics. The approach chosen focuses the endless dialectic of moral codes, or different kinds of ethos, moral doctrines that are preached in order to bring about a reform of existing ethos, and ethical theories that have taken shape in the context of controversies about the ethos and moral doctrines as means of justifying or reforming moral doctrines. Such dialectic is what is meant here by the phrase ‘moral traditions’, taken as a name for (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   7 citations  
  10.  18
    Avoidant decision making in social anxiety: the interaction of angry faces and emotional responses.Andre Pittig, Mirko Pawlikowski, Michelle G. Craske & Georg W. Alpers - 2014 - Frontiers in Psychology 5:100591.
    Recent research indicates that angry facial expressions are preferentially processed and may facilitate automatic avoidance response, especially in socially anxious individuals. However, few studies have examined whether this bias also expresses itself in more complex cognitive processes and behavior such as decision making. We recently introduced a variation of the Iowa Gambling Task which allowed us to document the influence of task-irrelevant emotional cues on rational decision making. The present study used a modified gambling task to investigate the impact (...)
    Direct download (5 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  11.  63
    Novelty and Inductive Generalization in Human Reinforcement Learning.Samuel J. Gershman & Yael Niv - 2015 - Topics in Cognitive Science 7 (3):391-415.
    In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: What is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: How can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and we describe an equivalence between the Bayesian model (...)
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark   2 citations  
  12.  49
    Games of Competition in a Stochastic Environment.Judith Avrahami, Werner Güth & Yaakov Kareev - 2005 - Theory and Decision 59 (4):255-294.
    The paper presents a set of games of competition between two or three players in which reward is jointly determined by a stochastic biased mechanism and players’ choices. More specifically, a resource can be found with unequal probabilities in one of two locations. The first agent is rewarded only if it finds the resource and avoids being found by the next agent in line; the latter is rewarded only if it finds the former. Five benchmarks, based on different psychological (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  13.  22
    Action control, forward models and expected rewards: representations in reinforcement learning.Jami Pekkanen, Jesse Kuokkanen, Otto Lappi & Anna-Mari Rusanen - 2021 - Synthese 199 (5-6):14017-14033.
    The fundamental cognitive problem for active organisms is to decide what to do next in a changing environment. In this article, we analyze motor and action control in computational models that utilize reinforcement learning (RL) algorithms. In reinforcement learning, action control is governed by an action selection policy that maximizes the expected future reward in light of a predictive world model. In this paper we argue that RL provides a way to explicate the so-called action-oriented views (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  14.  14
    環境状況に応じて自己の報酬を操作する学習エージェントの構築.沼尾 正行 森山 甲一 - 2002 - Transactions of the Japanese Society for Artificial Intelligence 17:676-683.
    The authors aim at constructing an agent which learns appropriate actions in a Multi-Agent environment with and without social dilemmas. For this aim, the agent must have nonrationality that makes it give up its own profit when it should do that. Since there are many studies on rational learning that brings more and more profit, it is desirable to utilize them for constructing the agent. Therefore, we use a reward-handling manner that makes internal evaluation from the agent's rewards, (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  15.  19
    Good games and penalty shoot-outs.Emily Ryall - 2015 - Sport, Ethics and Philosophy 9 (2):205-213.
    This paper considers the concept of a good game in terms of its relation to the fair testing of relevant skills and their aesthetic values. As such, it will consider what makes football ‘the beautiful game’ and what part penalty shoot-outs play, or should play, within it. It begins by outlining and refuting Kretchmar’s proposal that games which end following the elapsing of a set amount of time, such as football, are structurally, morally and aesthetically inferior to games which (...)
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark   2 citations  
  16. The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI.Samuel Allen Alexander - 2020 - Journal of Artificial General Intelligence 11 (1):70-85.
    After generalizing the Archimedean property of real numbers in such a way as to make it adaptable to non-numeric structures, we demonstrate that the real numbers cannot be used to accurately measure non-Archimedean structures. We argue that, since an agent with Artificial General Intelligence (AGI) should have no problem engaging in tasks that inherently involve non-Archimedean rewards, and since traditional reinforcement learning rewards are real numbers, therefore traditional reinforcement learning probably will not lead to AGI. We indicate two (...)
    Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  17.  16
    Ethics, Rationality, and Economic Behaviour.Francesco Farina, Frank Hahn & Stefano Vannucci (eds.) - 1996 - New York: Oxford University Press UK.
    The connection between economics and ethics is as old as economics itself, and central to both disciplines. It is an issue that has recently attracted much interest from economists and philosophers. The connection is, in part, a result of the desire of economists to make policy prescriptions, which clearly require some normative criteria. More deeply, much economic theory is founded on the assumption of utility maximization, thereby creating an immediate connection between the foundations of economics and the philosophical literature (...)
    Direct download  
     
    Export citation  
     
    Bookmark   4 citations  
  18.  7
    Is educational policy making rational — and what would that mean, anyway?Eric Bredo - 2009 - Educational Theory 59 (5):533-547.
    In Moderating the Debate: Rationality and the Promise of American Education, Michael Feuer raises concerns about the consequences of basing educational policy on the model of rational choice drawn from economics. Policy making would be better and more realistic, he suggests, if it were based on a newer procedural model drawn from cognitive science. In this essay Eric Bredo builds on Feuer's analysis by offering a more systematic critique of the traditional model of rationality that Feuer criticizes, (...)
    Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  19.  41
    Neural game theory and the search for rational agents in the brain.Gregory S. Berns - 2003 - Behavioral and Brain Sciences 26 (2):155-156.
    The advent of functional brain imaging has revolutionized the ability to understand the biological mechanisms underlying decision-making. Although it has been amply demonstrated that assumptions of rationality often break down in experimental games, there has not been an overarching theory of why this happens. I describe recent advances in functional brain imaging and suggest a framework for considering the function of the human reward system as a discrete agent.
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  20.  29
    Rationality and Knavery.Daniel Hausman - 1998 - Vienna Circle Institute Yearbook 5:67-79.
    This paper makes a modest point. Suppose one wants to evaluate alternative policies, institutions or even constitutions on the basis of their consequences. To do so, one needs to evaluate their consequences and one needs to know what their consequences are. Let us suppose that the role of economic theories and game theory in particular is mainly to help us to use information we already possess or that we can acquire at a reasonable cost to judge what the consequences will (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   4 citations  
  21.  6
    Reinforcement Learning with Probabilistic Boolean Network Models of Smart Grid Devices.Pedro Juan Rivera Torres, Carlos Gershenson García, María Fernanda Sánchez Puig & Samir Kanaan Izquierdo - 2022 - Complexity 2022:1-15.
    The area of smart power grids needs to constantly improve its efficiency and resilience, to provide high quality electrical power in a resilient grid, while managing faults and avoiding failures. Achieving this requires high component reliability, adequate maintenance, and a studied failure occurrence. Correct system operation involves those activities and novel methodologies to detect, classify, and isolate faults and failures and model and simulate processes with predictive algorithms and analytics. In this paper, we showcase the application of a complex-adaptive, (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  22.  49
    Reinforcement learning: A brief guide for philosophers of mind.Julia Haas - 2022 - Philosophy Compass 17 (9):e12865.
    In this opinionated review, I draw attention to some of the contributions reinforcement learning can make to questions in the philosophy of mind. In particular, I highlight reinforcement learning's foundational emphasis on the role of reward in agent learning, and canvass two ways in which the framework may advance our understanding of perception and motivation.
    No categories
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark   2 citations  
  23.  91
    Making the Social World: The Structure of Human Civilization.John R. Searle - 2010 - , US: Oxford University Press UK.
    The renowned philosopher John Searle reveals the fundamental nature of social reality. What kinds of things are money, property, governments, nations, marriages, cocktail parties, and football games? Searle explains the key role played by language in the creation, constitution, and maintenance of social reality. We make statements about social facts that are completely objective, for example: Barack Obama is President of the United States, the piece of paper in my hand is a twenty-dollar bill, I got married in London, etc. (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   390 citations  
  24.  13
    Deep Reinforcement Learning for UAV Intelligent Mission Planning.Longfei Yue, Rennong Yang, Ying Zhang, Lixin Yu & Zhuangzhuang Wang - 2022 - Complexity 2022:1-13.
    Rapid and precise air operation mission planning is a key technology in unmanned aerial vehicles autonomous combat in battles. In this paper, an end-to-end UAV intelligent mission planning method based on deep reinforcement learning is proposed to solve the shortcomings of the traditional intelligent optimization algorithm, such as relying on simple, static, low-dimensional scenarios, and poor scalability. Specifically, the suppression of enemy air defense mission planning is described as a sequential decision-making problem and formalized as a Markov decision process. (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  25.  53
    SA w_ S _u: An Integrated Model of Associative and Reinforcement Learning.Vladislav D. Veksler, Christopher W. Myers & Kevin A. Gluck - 2014 - Cognitive Science 38 (3):580-598.
    Successfully explaining and replicating the complexity and generality of human and animal learning will require the integration of a variety of learning mechanisms. Here, we introduce a computational model which integrates associative learning (AL) and reinforcement learning (RL). We contrast the integrated model with standalone AL and RL models in three simulation studies. First, a synthetic grid‐navigation task is employed to highlight performance advantages for the integrated model in an environment where the reward structure is both diverse and (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  26. Toxic Speech: Inoculations and Antidotes.Lynne Tirrell - 2018 - Southern Journal of Philosophy 56 (S1):116-144.
    Toxic speech inflicts individual and group harm, damaging the social fabric upon which we all depend. To understand and combat the harms of toxic speech, philosophers can learn from epidemiology, while epidemiologists can benefit from lessons of philosophy of language. In medicine and public health, research into remedies for toxins pushes in two directions: individual protections (personal actions, avoidances, preventive or reparative tonics) and collective action (specific policies or widespread “inoculations” through which we seek herd immunity). This paper is the (...)
    Direct download  
     
    Export citation  
     
    Bookmark   14 citations  
  27. Passive avoidance learning in individuals with psychopathy: modulation by reward but not by punishment.R. J. R. Blair, D. G. V. Mitchell, A. Leonard, S. Budhani, K. S. Peschardt & C. Newman - 2004 - Personality and Individual Differences 37:1179–1192.
    This study investigates the ability of individuals with psychopathy to perform passive avoidance learning and whether this ability is modulated by level of reinforcement/punishment. Nineteen psychopathic and 21 comparison individuals, as defined by the Hare Psychopathy Checklist Revised (Hare, 1991), were given a passive avoidance task with a graded reinforcement schedule. Response to each rewarding number gained a point reward specific to that number (i.e., 1, 700, 1400 or 2000 points). Response to each punishing number lost a (...)
     
    Export citation  
     
    Bookmark   17 citations  
  28.  32
    Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning.Daniel J. Schad, Elisabeth Jünger, Miriam Sebold, Maria Garbusow, Nadine Bernhardt, Amir-Homayoun Javadi, Ulrich S. Zimmermann, Michael N. Smolka, Andreas Heinz, Michael A. Rapp & Quentin J. M. Huys - 2014 - Frontiers in Psychology 5:117016.
    Theories of decision-making and its neural substrates have long assumed the existence of two distinct and competing valuation systems, variously described as goal-directed vs. habitual, or, more recently and based on statistical arguments, as model-free vs. model-based reinforcement-learning. Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation. Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   3 citations  
  29.  12
    Primary Care Groups and NHS Rationing: Implications of the Child B Case.Susan Pickard & Rod Sheaff - 1999 - Health Care Analysis 7 (1):37-56.
    Implementing The new NHS and the 1997 NHS (Primary Care) Act will gradually extend cash-limiting into primary health care, especially general practice. UK policy-makers have avoided providing clear, unambivalent direction about how to 'ration' NHS resources. The 'Child B' case became an epitome of public debate about NHS rationing. Among many other decision-making processes which occurred, Cambridge and Huntingdon Health Authority applied an ethical code to this rationing decision. Using new data this paper analyses the rationing criteria NHS managers (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  30.  8
    The Evolution of Vagueness.Cailin O’Connor - 2014 - Erkenntnis 79 (Suppl 4):707-727.
    Vague predicates, those that exhibit borderline cases, pose a persistent problem for philosophers and logicians. Although they are ubiquitous in natural language, when used in a logical context, vague predicates lead to contradiction. This paper will address a question that is intimately related to this problem. Given their inherent imprecision, why do vague predicates arise in the first place? I discuss a variation of the signaling game where the state space is treated as contiguous, i.e., endowed with a metric that (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark   30 citations  
  31. The Evolution of Vagueness.Cailin O'Connor - 2013 - Erkenntnis (S4):1-21.
    Vague predicates, those that exhibit borderline cases, pose a persistent problem for philosophers and logicians. Although they are ubiquitous in natural language, when used in a logical context, vague predicates lead to contradiction. This paper will address a question that is intimately related to this problem. Given their inherent imprecision, why do vague predicates arise in the first place? I discuss a variation of the signaling game where the state space is treated as contiguous, i.e., endowed with a metric that (...)
    Direct download (6 more)  
     
    Export citation  
     
    Bookmark   29 citations  
  32.  36
    Stigmatization and Denormalization as Public Health Policies: Some Kantian Thoughts.Richard Dean - 2013 - Bioethics 28 (8):414-419.
    The stigmatization of some groups of people, whether for some characteristic they possess or some behavior they engage in, will initially strike most of us as wrong. For many years, academic work in public health, which focused mainly on the stigmatization of HIV-positive individuals, reinforced this natural reaction to stigmatization, by pointing out the negative health effects of stigmatization. But more recently, the apparent success of anti-smoking campaigns which employ stigmatization of smokers has raised questions about whether stigmatization may sometimes (...)
    Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  33.  63
    Bidirectional Optimization from Reasoning and Learning in Games.Michael Franke & Gerhard Jäger - 2012 - Journal of Logic, Language and Information 21 (1):117-139.
    We reopen the investigation into the formal and conceptual relationship between bidirectional optimality theory (Blutner in J Semant 15(2):115–162, 1998 , J Semant 17(3):189–216, 2000 ) and game theory. Unlike a likeminded previous endeavor by Dekker and van Rooij (J Semant 17:217–242, 2000 ), we consider signaling games not strategic games, and seek to ground bidirectional optimization once in a model of rational step-by-step reasoning and once in a model of reinforcement learning. We give sufficient conditions for equivalence (...)
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark   11 citations  
  34. Superintelligence: Fears, Promises and Potentials.Ben Goertzel - 2015 - Journal of Evolution and Technology 25 (2):55-87.
    Oxford philosopher Nick Bostrom; in his recent and celebrated book Superintelligence; argues that advanced AI poses a potentially major existential risk to humanity; and that advanced AI development should be heavily regulated and perhaps even restricted to a small set of government-approved researchers. Bostrom’s ideas and arguments are reviewed and explored in detail; and compared with the thinking of three other current thinkers on the nature and implications of AI: Eliezer Yudkowsky of the Machine Intelligence Research Institute ; and David (...)
    No categories
     
    Export citation  
     
    Bookmark   4 citations  
  35.  99
    On salience and signaling in sender–receiver games: partial pooling, learning, and focal points.Travis LaCroix - 2020 - Synthese 197 (4):1725-1747.
    I introduce an extension of the Lewis-Skyrms signaling game, analysed from a dynamical perspective via simple reinforcement learning. In Lewis’ (Convention, Blackwell, Oxford, 1969) conception of a signaling game, salience is offered as an explanation for how individuals may come to agree upon a linguistic convention. Skyrms (Signals: evolution, learning & information, Oxford University Press, Oxford, 2010a) offers a dynamic explanation of how signaling conventions might arise presupposing no salience whatsoever. The extension of the atomic signaling game examined here—which (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   6 citations  
  36.  6
    Can Model-Free Learning Explain Deontological Moral Judgments?Alisabeth Ayars - 2016 - Cognition 150 (C):232-242.
    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  37.  25
    Decolonizing Universality: Postcolonial Theory and the Quandary of Ethical Agency.Esha Niyogi De - 2002 - Diacritics 32 (2):42-59.
    In lieu of an abstract, here is a brief excerpt of the content:Decolonizing Universality:Postcolonial Theory and the Quandary of Ethical AgencyEsha Niyogi De (bio)Living in colonial India, the Bengali thinker and creative writer Rabindranath Tagore (1861-1941) often meditated on ways that "concord" (milan) and "harmony" (sāmanjasya) could be established between persons and cultures [BIC 450-51]. Noting that "ruptures in balance and harmony" (bhār sāmanjasyer abhāv) that once were more localized now affected the whole world, he maintained that these reinforced the (...)
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark  
  38.  10
    Averaged Soft Actor-Critic for Deep Reinforcement Learning.Feng Ding, Guanfeng Ma, Zhikui Chen, Jing Gao & Peng Li - 2021 - Complexity 2021:1-16.
    With the advent of the era of artificial intelligence, deep reinforcement learning has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  39. Risk, Rationality and (Information) Resistance: De-rationalizing Elite-group Ignorance.Xin Hui Yong - 2023 - Erkenntnis:1-17.
    There has been a movement aiming to teach agents about their privilege by making the information about their privilege as costless as possible. However, some argue that in risk-sensitive frameworks, such as Lara Buchak’s (2013), it can be rational for privileged agents to shield themselves from learning about their privilege, even if the information is costless and relevant. This threatens the efficacy of these information-access efforts in alleviating the problem of elite-group ignorance. In response, I show that even within (...)
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark   2 citations  
  40.  16
    Math Anxiety: Making Room to Breathe.Valerie Allen & Todd Stambaugh - 2023 - Substance 52 (1):217-225.
    In lieu of an abstract, here is a brief excerpt of the content:Math Anxiety:Making Room to BreatheValerie Allen (bio) and Todd Stambaugh (bio)"Don't do that to me, Professor," the student said, and everybody laughed, for by this late in the semester, the atmosphere was relaxed. The instructor in question had just reached the point in a worked problem when they could move from reasoning about specific numbers to stating a general principle: x≤y≤z, meaning that y—the value we sought—was always going (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  41. Worst-Case Planning: Political Decision Making in the West.S. M. Amadae - 2020 - In Thomas Grossboelting & Stefan Lehr (eds.), Politisches Entscheiden im Kalten Krieg. pp. 249-271.
    The goal of this essay is to explore "the highly contested nature of [decision-making through adopting] a historically comparative and interdisciplinary approach." Internalist history of game theory treats decision theory as a science of making choices to maximize expected gain. Game theory is applied to nuclear deterrence and military strategy, building markets and designing institutions, analyzing collective action, developing jurisprudence, and addressing crime and punishment. This essay draws on recent historiography of Cold War decision-making to draw into focus the constructive (...)
    Direct download  
     
    Export citation  
     
    Bookmark  
  42.  52
    Learning to signal: Analysis of a micro-level reinforcement model.Brian Skyrms, Raffaele Argiento, Robin Pemantle & and Stanislav Volkov - manuscript
    We consider the following signaling game. Nature plays first from the set {1, 2}. Player 1 (the Sender) sees this and plays from the set {A, B}. Player 2 (the Receiver) sees only Player 1’s play and plays from the set {1, 2}. Both players win if Player 2’s play equals Nature’s play and lose otherwise. Players are told whether they have won or lost, and the game is repeated. An urn scheme for learning coordination in this game is as (...)
    Direct download  
     
    Export citation  
     
    Bookmark   19 citations  
  43.  20
    Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective.Tom Everitt, Marcus Hutter, Ramana Kumar & Victoria Krakovna - 2021 - Synthese 198 (Suppl 27):6435-6467.
    Can humans get arbitrarily capable reinforcement learning agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principles (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   4 citations  
  44.  51
    The Handbook of Rationality.Markus Knauff & Wolfgang Spohn (eds.) - 2021 - London: MIT Press.
    The first reference on rationality that integrates accounts from psychology and philosophy, covering descriptive and normative theories from both disciplines. Both analytic philosophy and cognitive psychology have made dramatic advances in understanding rationality, but there has been little interaction between the disciplines. This volume offers the first integrated overview of the state of the art in the psychology and philosophy of rationality. Written by leading experts from both disciplines, The Handbook of Rationality covers the main normative and descriptive theories of (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark   4 citations  
  45.  71
    Learning with neighbours: Emergence of convention in a society of learning agents.Roland Mühlenbernd - 2011 - Synthese 183 (S1):87-109.
    I present a game-theoretical multi-agent system to simulate the evolutionary process responsible for the pragmatic phenomenon division of pragmatic labour (DOPL), a linguistic convention emerging from evolutionary forces. Each agent is positioned on a toroid lattice and communicates via signaling games , where the choice of an interlocutor depends on the Manhattan distance between them. In this framework I compare two learning dynamics: reinforcement learning (RL) and belief learning (BL). An agent’s experiences from previous plays influence his communication behaviour, (...)
    Direct download (5 more)  
     
    Export citation  
     
    Bookmark   8 citations  
  46. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition.Christian P. Janssen & Wayne D. Gray - 2012 - Cognitive Science 36 (2):333-358.
    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other interval of task performance), what (the objective function: e.g., performance time or performance accuracy), and how much (the magnitude: with binary, categorical, or continuous values). (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   6 citations  
  47. Integrating reinforcement learning, bidding and genetic algorithms.Ron Sun - unknown
    This paper presents a GA-based multi-agent reinforce- ment learning bidding approach (GMARLB) for perform- ing multi-agent reinforcement learning. GMARLB inte- grates reinforcement learning, bidding and genetic algo- rithms. The general idea of our multi-agent systems is as follows: There are a number of individual agents in a team, each agent of the team has two modules: Q module and CQ module. Each agent can select actions to be performed at each step, which are done by the Q module. (...)
     
    Export citation  
     
    Bookmark  
  48.  41
    Interrogating Feature Learning Models to Discover Insights Into the Development of Human Expertise in a Real‐Time, Dynamic Decision‐Making Task.Catherine Sibert, Wayne D. Gray & John K. Lindstedt - 2017 - Topics in Cognitive Science 9 (2):374-394.
    Tetris provides a difficult, dynamic task environment within which some people are novices and others, after years of work and practice, become extreme experts. Here we study two core skills; namely, choosing the goal or objective function that will maximize performance and a feature-based analysis of the current game board to determine where to place the currently falling zoid so as to maximize the goal. In Study 1, we build cross-entropy reinforcement learning models to determine whether different goals result (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   7 citations  
  49.  18
    Effort Games and the Price of Myopia.Yoram Bachrach, Michael Zuckerman & Jeffrey S. Rosenschein - 2009 - Mathematical Logic Quarterly 55 (4):377-396.
    We consider Effort Games, a game-theoretic model of cooperation in open environments, which is a variant of the principal-agent problem from economic theory. In our multiagent domain, a common project depends on various tasks; carrying out certain subsets of the tasks completes the project successfully, while carrying out other subsets does not. The probability of carrying out a task is higher when the agent in charge of it exerts effort, at a certain cost for that agent. A central authority, called (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  50.  33
    Implicit learning of (boundedly) rational behaviour.Daniel John Zizzo - 2000 - Behavioral and Brain Sciences 23 (5):700-701.
    Stanovich & West's target article undervalues the power of implicit learning (particularly reinforcement learning). Implicit learning may allow the learning of more rational responses–and sometimes even generalisation of knowledge–in contexts where explicit, abstract knowledge proves only of limited value, such as for economic decision-making. Four other comments are made.
    Direct download (5 more)  
     
    Export citation  
     
    Bookmark   1 citation  
1 — 50 / 967