Get -PDF- Elicitation And Planning In Markov Decision Processes With Unknown Rewards Read Full

Elicitation and Planning in Markov Decision Processes with Unknown Rewards

Author	: Pegah Alizadeh
Publisher	:
Total Pages	: 0
Release	: 2016
ISBN-10	: OCLC:1022562936
ISBN-13	:
Rating	: 4/5 ( Downloads)

GET BOOK

Book Synopsis Elicitation and Planning in Markov Decision Processes with Unknown Rewards by : Pegah Alizadeh

Download or read book Elicitation and Planning in Markov Decision Processes with Unknown Rewards written by Pegah Alizadeh and published by . This book was released on 2016 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Markov decision processes (MDPs) are models for solving sequential decision problemswhere a user interacts with the environment and adapts her policy by taking numericalreward signals into account. The solution of an MDP reduces to formulate the userbehavior in the environment with a policy function that specifies which action to choose ineach situation. In many real world decision problems, the users have various preferences,and therefore, the gain of actions on states are different and should be re-decoded foreach user. In this dissertation, we are interested in solving MDPs for users with differentpreferences.We use a model named Vector-valued MDP (VMDP) with vector rewards. We propose apropagation-search algorithm that allows to assign a vector-value function to each policyand identify each user with a preference vector on the existing set of preferences wherethe preference vector satisfies the user priorities. Since the user preference vector is notknown we present several methods for solving VMDPs while approximating the user'spreference vector.We introduce two algorithms that reduce the number of queries needed to find the optimalpolicy of a user: 1) A propagation-search algorithm, where we propagate a setof possible optimal policies for the given MDP without knowing the user's preferences.2) An interactive value iteration algorithm (IVI) on VMDPs, namely Advantage-basedValue Iteration (ABVI) algorithm that uses clustering and regrouping advantages. Wealso demonstrate how ABVI algorithm works properly for two different types of users:confident and uncertain.We finally work on a minimax regret approximation method as a method for findingthe optimal policy w.r.t the limited information about user's preferences. All possibleobjectives in the system are just bounded between two higher and lower bounds while thesystem is not aware of user's preferences among them. We propose an heuristic minimaxregret approximation method for solving MDPs with unknown rewards that is faster andless complex than the existing methods in the literature.

Elicitation and Planning in Markov Decision Processes with Unknown Rewards

Elicitation and Planning in Markov Decision Processes with Unknown Rewards Related Books

Elicitation and Planning in Markov Decision Processes with Unknown Rewards

Regret-based Reward Elicitation for Markov Decision Processes

Cognitive Electronic Warfare: An Artificial Intelligence Approach

Algorithmic Decision Theory

Planning with Markov Decision Processes