基于动态多臂强盗博弈的多Agent频谱共享策略设计

551

收藏 2022-03-03

摘要翻译：
针对无线航空电子通信系统，建立了一个包括信道状态、策略和奖励的多臂强盗博弈模型。简单情况下只包括两个共享频谱的代理，并充分研究了在有限时间范围内累积报酬最大化的问题。利用一种上置信限(UCB)算法求解随机多臂强盗(MAB)问题的最优解。另外，MAB问题也可以从Markov博弈框架的角度来解决。同时，以汤普森抽样(TS)为基准对该方法的性能进行了评价。给出了最小化遗憾期望和选择最佳置信上界参数的数值结果。
---
英文标题：
《Dynamic Multi-Arm Bandit Game Based Multi-Agents Spectrum Sharing
  Strategy Design》
---
作者：
Jingyang Lu, Lun Li, Dan Shen, Genshe Chen, Bin Jia, Erik Blasch,
  Khanh Pham
---
最新提交年份：
2017
---
分类信息：

一级分类：Electrical Engineering and Systems Science 电气工程与系统科学
二级分类：Signal Processing 信号处理
分类描述：Theory, algorithms, performance analysis and applications of signal and data analysis, including physical modeling, processing, detection and parameter estimation, learning, mining, retrieval, and information extraction. The term "signal" includes speech, audio, sonar, radar, geophysical, physiological, (bio-) medical, image, video, and multimodal natural and man-made signals, including communication signals and data. Topics of interest include: statistical signal processing, spectral estimation and system identification; filter design, adaptive filtering / stochastic learning; (compressive) sampling, sensing, and transform-domain methods including fast algorithms; signal processing for machine learning and machine learning for signal processing applications; in-network and graph signal processing; convex and nonconvex optimization methods for signal processing applications; radar, sonar, and sensor array beamforming and direction finding; communications signal processing; low power, multi-core and system-on-chip signal processing; sensing, communication, analysis and optimization for cyber-physical systems such as power grids and the Internet of Things.
信号和数据分析的理论、算法、性能分析和应用，包括物理建模、处理、检测和参数估计、学习、挖掘、检索和信息提取。“信号”一词包括语音、音频、声纳、雷达、地球物理、生理、（生物）医学、图像、视频和多模态自然和人为信号，包括通信信号和数据。感兴趣的主题包括：统计信号处理、谱估计和系统辨识；滤波器设计；自适应滤波/随机学习；（压缩）采样、传感和变换域方法，包括快速算法；用于机器学习的信号处理和用于信号处理应用的机器学习；网络与图形信号处理；信号处理中的凸和非凸优化方法；雷达、声纳和传感器阵列波束形成和测向；通信信号处理；低功耗、多核、片上系统信号处理；信息物理系统的传感、通信、分析和优化，如电网和物联网。
--

---
英文摘要：
  For a wireless avionics communication system, a Multi-arm bandit game is mathematically formulated, which includes channel states, strategies, and rewards. The simple case includes only two agents sharing the spectrum which is fully studied in terms of maximizing the cumulative reward over a finite time horizon. An Upper Confidence Bound (UCB) algorithm is used to achieve the optimal solutions for the stochastic Multi-Arm Bandit (MAB) problem. Also, the MAB problem can also be solved from the Markov game framework perspective. Meanwhile, Thompson Sampling (TS) is also used as benchmark to evaluate the proposed approach performance. Numerical results are also provided regarding minimizing the expectation of the regret and choosing the best parameter for the upper confidence bound.
---
PDF链接：
https://arxiv.org/pdf/1711.04365

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群