具有平均代价的POMDP的离散化近似

596

收藏 2022-03-13

摘要翻译：
在本文中，我们提出了一个新的POMDP下近似方案，该方案具有折现和平均代价准则。该近似函数由它们在有限个置信点上的值决定，并可以用有限状态MDP的值迭代算法有效地计算。对于贴现问题，前面已经提出了几个较低的近似方案，而我们的方案似乎是第一个针对平均费用问题的方案。我们主要关注平均成本情况，并证明了对于有限状态MDP的多链算法可以有效地计算相应的近似。我们给出了一个初步的分析，表明无论POMDP中是否存在最优平均成本J,所得到的近似是limif最优平均成本函数的一个下界，也可以用来计算limsup最优平均成本函数的一个上界，以及与此近似相关的执行平稳策略的成本界。当最优平均成本为常数，最优微分成本为连续时，证明了成本逼近的收敛性。
---
英文标题：
《Discretized Approximations for POMDP with Average Cost》
---
作者：
Huizhen Yu, Dimitri Bertsekas
---
最新提交年份：
2012
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类：Computer Science 计算机科学
二级分类：Systems and Control 系统与控制
分类描述：cs.SY is an alias for eess.SY. This section includes theoretical and experimental research covering all facets of automatic control systems. The section is focused on methods of control system analysis and design using tools of modeling, simulation and optimization. Specific areas of research include nonlinear, distributed, adaptive, stochastic and robust control in addition to hybrid and discrete event systems. Application areas include automotive and aerospace control systems, network control, biological systems, multiagent and cooperative control, robotics, reinforcement learning, sensor networks, control of cyber-physical and energy-related systems, and control of computing systems.
cs.sy是eess.sy的别名。本部分包括理论和实验研究，涵盖了自动控制系统的各个方面。本节主要介绍利用建模、仿真和优化工具进行控制系统分析和设计的方法。具体研究领域包括非线性、分布式、自适应、随机和鲁棒控制，以及混合和离散事件系统。应用领域包括汽车和航空航天控制系统、网络控制、生物系统、多智能体和协作控制、机器人学、强化学习、传感器网络、信息物理和能源相关系统的控制以及计算系统的控制。
--
一级分类：Mathematics 数学
二级分类：Optimization and Control 优化与控制
分类描述：Operations research, linear programming, control theory, systems theory, optimal control, game theory
运筹学，线性规划，控制论，系统论，最优控制，博弈论
--

---
英文摘要：
In this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a finite number of belief points, and can be computed efficiently using value iteration algorithms for finite-state MDP. While for discounted problems several lower approximation schemes have been proposed earlier, ours seems the first of its kind for average cost problems. We focus primarily on the average cost case, and we show that the corresponding approximation can be computed efficiently using multi-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost J in the POMDP, the approximation obtained is a lower bound of the liminf optimal average cost function, and can also be used to calculate an upper bound on the limsup optimal average cost function, as well as bounds on the cost of executing the stationary policy associated with the approximation. Weshow the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continuous.
---
PDF链接：
https://arxiv.org/pdf/1207.4154

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群