Animals routinely adapt to changes in the environment in order to survive. Though reinforcement learning may play a role in such adaptation, it is not clear that it is the only mechanism involved, as it is not well suited to producing rapid, relatively immediate changes in strategies in response to environmental changes. This research proposes that counterfactual reasoning might be an additional mechanism that facilitates change detection. An experiment is conducted in which a task state changes over time and (...) the participants had to detect the changes in order to perform well and gain monetary rewards. A cognitive model is constructed that incorporates reinforcement learning with counterfactual reasoning to help quickly adjust the utility of task strategies in response to changes. The results show that the model can accurately explain human data and that counterfactual reasoning is key to reproducing the various effects observed in this change detection paradigm. (shrink)
This paper presents a GA-based multi-agent reinforce- ment learning bidding approach (GMARLB) for perform- ing multi-agent reinforcement learning. GMARLB inte- grates reinforcement learning, bidding and genetic algo- rithms. The general idea of our multi-agent systems is as follows: There are a number of individual agents in a team, each agent of the team has two modules: Q module and CQ module. Each agent can select actions to be performed at each step, which are done by the Q module. (...) While the CQ module determines at each step whether the agent should continue or relinquish control. Once an agent relinquishes its control, a new agent is selected by bidding algorithms. We applied GA-based GMARLB to the Backgammon game. The experimental results show GMARLB can achieve a su- perior level of performance in game-playing, outperforming PubEval, while the system uses zero built-in knowledge. (shrink)
Philosophers of science have offered different accounts of what it means for one scientific theory to reduce to another. I propose a more or less friendly amendment to Kenneth Schaffner’s “General Reduction-Replacement” model of scientific unification. Schaffner interprets scientific unification broadly in terms of a continuum from theory reduction to theory replacement. As such, his account leaves no place on its continuum for type irreducible and irreplaceable theories. The same is true for other accounts that incorporate Schaffner's continuum, for example, (...) those developed by Paul Churchland, Clifford Hooker, and John Bickle. Yet I believe a more general account of scientific unification should include type irreducible and irreplaceable theories in an account of their partial reduction, specifically, when there is a reduction of their tokens. Thus I propose a “Reduction-Reception-Replacement” model wherein type irreducible and irreplaceable theories are accepted or received for the purpose of unifying domains of particulars. I also suggest a link between this kind of token reduction and mechanistic explanation. (shrink)
A deep reinforcement learning-based computational guidance method is presented, which is used to identify and resolve the problem of collision avoidance for a variable number of fixed-wing UAVs in limited airspace. The cooperative guidance process is first analyzed for multiple aircraft by formulating flight scenarios using multiagent Markov game theory and solving it by machine learning algorithm. Furthermore, a self-learning framework is established by using the actor-critic model, which is proposed to train collision avoidance decision-making neural networks. To achieve (...) higher scalability, the neural network is customized to incorporate long short-term memory networks, and a coordination strategy is given. Additionally, a simulator suitable for multiagent high-density route scene is designed for validation, in which all UAVs run the proposed algorithm onboard. Simulated experiment results from several case studies show that the real-time guidance algorithm can reduce the collision probability of multiple UAVs in flight effectively even with a large number of aircraft. (shrink)
Many philosophers are building a solid case in favour of the knowledge account of assertion (KAA). According to KAA, if one asserts that P one represents oneself as knowing that P. KAA has recently received support from linguistic data about prompting challenges, parenthetical positioning and predictions. In this article, I add another argument to this rapidly growing list: an argument from what I will call ‘reinforcing parenthesis’.
Rachlin rightly highlights behavioural reinforcement, conditional cooperation, and framing. However, genes may explain part of the variance in altruistic behaviour. Framing cannot be used to support his theory of altruism. Reinforcement of acts is not identical to reinforcement of patterns of acts. Further, many patterns of acts could be reinforced, and Rachlin's altruism is not the most likely candidate.
I argue for the role of reinforcement learning in the philosophy of mind. To start, I make several assumptions about the nature of reinforcement learning and its instantiation in minds like ours. I then review some of the contributions of reinforcement learning methods have made across the so-called 'decision sciences.' Finally, I show how principles from reinforcement learning can shape philosophical debates regarding the nature of perception and characterisations of desire.
This paper addresses weighting and partitioning in complex reinforcement learning tasks, with the aim of facilitating learning. The paper presents some ideas regarding weighting of multiple agents and extends them into partitioning an input/state space into multiple regions with di erential weighting in these regions, to exploit di erential characteristics of regions and di erential characteristics of agents to reduce the learning complexity of agents (and their function approximators) and thus to facilitate the learning overall. It analyzes, in (...) class='Hi'>reinforcement learning tasks, di erent ways of partitioning a task and using agents selectively based on partitioning. Based on the analysis, some heuristic methods are described and experimentally tested. We nd that some o -line heuristic methods performed the best, signi cantly better than single-agent models. (shrink)
We investigate a simple stochastic model of social network formation by the process of reinforcement learning with discounting of the past. In the limit, for any value of the discounting parameter, small, stable cliques are formed. However, the time it takes to reach the limiting state in which cliques have formed is very sensitive to the discounting parameter. Depending on this value, the limiting result may or may not be a good predictor for realistic observation times.
In this paper I examine how the constituent elements of a firm's organizational structure affect the ethical behavior of workers. The formal features of organizations I examine are the compensation practices, performance and evaluation systems, and decision-making assignments. I argue that the formal organizational structure, which is distinguished from corporate culture, is necessary, though not sufficient, in solving ethical problems within firms. At best the formal structure should not undermine the ethical actions of workers. When combined with a strong culture, (...) however, the organizational structure may be sufficient in promoting ethical conduct. While helpful, ethics training and corporate codes are neither necessary nor sufficient in promoting ethical behavior within firms. (shrink)
Behaving ethically depends on the ability to recognize that ethical issues exist, to see from an ethical point of view. This ability to see and respond ethically may be related more to attributes of corporate culture than to attributes of individual employees. Efforts to increase ethical standards and decrease pressure to behave unethically should therefore concentrate on the organization and its culture. The purpose of this paper is to discuss how total quality (TQ) techniques can facilitate the development of a (...) cooperative corporate culture that promotes and encourages ethical behavior throughout an organization. (shrink)
In this opinionated review, I draw attention to some of the contributions reinforcement learning can make to questions in the philosophy of mind. In particular, I highlight reinforcement learning's foundational emphasis on the role of reward in agent learning, and canvass two ways in which the framework may advance our understanding of perception and motivation.
Relational database management system is the most popular database system. It is important to maintain data security from information leakage and data corruption. RDBMS can be attacked by an outsider or an insider. It is difficult to detect an insider attack because its patterns are constantly changing and evolving. In this paper, we propose an adaptive database intrusion detection system that can be resistant to potential insider misuse using evolutionary reinforcement learning, which combines reinforcement learning and evolutionary learning. (...) The model consists of two neural networks, an evaluation network and an action network. The action network detects the intrusion, and the evaluation network provides feedback to the detection of the action network. Evolutionary learning is effective for dynamic patterns and atypical patterns, and reinforcement learning enables online learning. Experimental results show that the performance for detecting abnormal queries improves as the proposed model learns the intrusion adaptively using Transaction Processing performance Council-E scenario-based virtual query data. The proposed method achieves the highest performance at 94.86%, and we demonstrate the usefulness of the proposed method by performing 5-fold cross-validation. (shrink)
Identity politics has been critiqued in various ways. One central problem—the Reinforcement Problem—claims that identity politics reinforces groups rooted in oppression thereby undermining its own liberatory aims. Here I consider two versions of the problem—one psychological and one metaphysical. I defang the first by drawing on work in social psychology. I then argue that careful consideration of the metaphysics of social groups and of the practice of identity politics provides resources to dissolve the second version. Identity politics involves the (...) creation or transformation of groups in ways that do not succumb to the metaphysical Reinforcement Problem. (shrink)
The concept of ‘human dignity’ sits at the heart of international human rights law and a growing number of national constitutions and yet its meaning is heavily contested and contingent. I aim to supplement the theoretical literature on dignity by providing an empirical study of how the concept is used in the specific context of legal discourse on sex work. I will analyse jurisprudence in which commercial sex was declared as incompatible with human dignity, focussing on the South African Constitutional (...) Court case of S v Jordan and the Indian Supreme Court case of Budhadev Karmaskar v State of West Bengal. I will consider how these courts conceptualise dignity and argue that their conclusions on the undignified nature of sex work are predicated on particular sexual norms that privilege emotional and relational intimacy. In light of the stigma faced by sex workers I will explore how a discourse, proclaiming sex work as beneath human dignity, may impact on the way that sex workers are perceived and represented culturally, arguing that it reinforces stigma. I will go on to examine how sex workers subvert the notion that commercial sex is undignified, and resist stigma, by campaigning for the right to sell sex with dignity. I will demonstrate that an alternative legal approach to dignity and sex work is possible, where the two are not considered as inherently incompatible, concluding with thoughts on the risks and benefits of using ‘dignity talk’ in activism and campaigns for sex work law reform. (shrink)
After generalizing the Archimedean property of real numbers in such a way as to make it adaptable to non-numeric structures, we demonstrate that the real numbers cannot be used to accurately measure non-Archimedean structures. We argue that, since an agent with Artificial General Intelligence (AGI) should have no problem engaging in tasks that inherently involve non-Archimedean rewards, and since traditional reinforcement learning rewards are real numbers, therefore traditional reinforcement learning probably will not lead to AGI. We indicate two (...) possible ways traditional reinforcement learning could be altered to remove this roadblock. (shrink)