摘要翻译:
研究了由大量对象组成的马尔可夫决策过程对常微分方程优化问题的收敛性。基于Markov决策过程的平均场逼近,我们证明了满足Bellman方程的Markov决策过程的最优报酬收敛于连续Hamilton-Jacobi-Bellman(HJB)方程的解。给出了奖励差的界,并给出了由HJB方程解导出马尔可夫决策过程近似解的构造性算法。我们分别在投资策略、种群动态控制和排队调度三个例子中说明了该方法。它们被用来说明和证明受控ODE的结构,以及通过求解一个连续的HJB方程而不是一个大的离散Bellman方程获得的增益。
---
英文标题:
《Mean field for Markov Decision Processes: from Discrete to Continuous
Optimization》
---
作者:
Nicolas Gast (INRIA Grenoble Rh\^one-Alpes / LIG laboratoire
d'Informatique de Grenoble, EPFL), Bruno Gaujal (INRIA Grenoble Rh\^one-Alpes
/ LIG laboratoire d'Informatique de Grenoble), Jean-Yves Le Boudec (EPFL)
---
最新提交年份:
2011
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Performance 性能
分类描述:Covers performance measurement and evaluation, queueing, and simulation. Roughly includes material in ACM Subject Classes D.4.8 and K.6.2.
涵盖性能测量和评估、排队和模拟。大致包括ACM主题课程D.4.8和K.6.2中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Systems and Control 系统与控制
分类描述:cs.SY is an alias for eess.SY. This section includes theoretical and experimental research covering all facets of automatic control systems. The section is focused on methods of control system analysis and design using tools of modeling, simulation and optimization. Specific areas of research include nonlinear, distributed, adaptive, stochastic and robust control in addition to hybrid and discrete event systems. Application areas include automotive and aerospace control systems, network control, biological systems, multiagent and cooperative control, robotics, reinforcement learning, sensor networks, control of cyber-physical and energy-related systems, and control of computing systems.
cs.sy是eess.sy的别名。本部分包括理论和实验研究,涵盖了自动控制系统的各个方面。本节主要介绍利用建模、仿真和优化工具进行控制系统分析和设计的方法。具体研究领域包括非线性、分布式、自适应、随机和鲁棒控制,以及混合和离散事件系统。应用领域包括汽车和航空航天控制系统、网络控制、生物系统、多智能体和协作控制、机器人学、强化学习、传感器网络、信息物理和能源相关系统的控制以及计算系统的控制。
--
一级分类:Mathematics 数学
二级分类:Optimization and Control 优化与控制
分类描述:Operations research, linear programming, control theory, systems theory, optimal control, game theory
运筹学,线性规划,控制论,系统论,最优控制,博弈论
--
一级分类:Mathematics 数学
二级分类:Probability 概率
分类描述:Theory and applications of probability and stochastic processes: e.g. central limit theorems, large deviations, stochastic differential equations, models from statistical mechanics, queuing theory
概率论与随机过程的理论与应用:例如中心极限定理,大偏差,随机微分方程,统计力学模型,排队论
--
---
英文摘要:
We study the convergence of Markov Decision Processes made of a large number of objects to optimization problems on ordinary differential equations (ODE). We show that the optimal reward of such a Markov Decision Process, satisfying a Bellman equation, converges to the solution of a continuous Hamilton-Jacobi-Bellman (HJB) equation based on the mean field approximation of the Markov Decision Process. We give bounds on the difference of the rewards, and a constructive algorithm for deriving an approximating solution to the Markov Decision Process from a solution of the HJB equations. We illustrate the method on three examples pertaining respectively to investment strategies, population dynamics control and scheduling in queues are developed. They are used to illustrate and justify the construction of the controlled ODE and to show the gain obtained by solving a continuous HJB equation rather than a large discrete Bellman equation.
---
PDF链接:
https://arxiv.org/pdf/1004.2342