摘要翻译:
本文提出了一种基于最近探测的认知无线电网络频谱感知策略。我们将多频段动态频谱接入的频谱感知策略问题描述为一个具有平稳未知奖赏分布的随机不安分多臂强盗问题。在认知无线电网络中,当决定在无线电频谱中的何处寻找可以有效利用的空闲频率进行数据传输时,会出现多臂强盗问题。我们考虑了两个频带动力学模型:1)独立模型,其中频带的状态与过去无关地随机演化;2)Gilbert-Elliot模型,其中状态按照2状态Markov链演化。结果表明,在这些条件下,所提出的感知策略达到渐近对数弱遗憾。本文提出的策略是一种指数策略,其中频带的指数由样本均值项和基于最近时间的勘探奖金项组成。样本平均值促进频谱开发,而探索奖金鼓励对提供高数据速率的空闲频带进行进一步探索。所提出的基于最近性的方法容易地允许构造探索红利,使得它将在次优频带的连续感知时间瞬间之间的时间间隔以指数方式增长,从而导致弱遗憾以对数方式增加。仿真结果证实了对数弱遗憾,发现该策略在低复杂度下比文献中的其他最先进的策略提供了更好的性能。
---
英文标题:
《An order optimal policy for exploiting idle spectrum in cognitive radio
  networks》
---
作者:
Jan Oksanen and Visa Koivunen
---
最新提交年份:
2017
---
分类信息:
一级分类:Electrical Engineering and Systems Science        电气工程与系统科学
二级分类:Signal Processing        信号处理
分类描述:Theory, algorithms, performance analysis and applications of signal and data analysis, including physical modeling, processing, detection and parameter estimation, learning, mining, retrieval, and information extraction. The term "signal" includes speech, audio, sonar, radar, geophysical, physiological, (bio-) medical, image, video, and multimodal natural and man-made signals, including communication signals and data. Topics of interest include: statistical signal processing, spectral estimation and system identification; filter design, adaptive filtering / stochastic learning; (compressive) sampling, sensing, and transform-domain methods including fast algorithms; signal processing for machine learning and machine learning for signal processing applications; in-network and graph signal processing; convex and nonconvex optimization methods for signal processing applications; radar, sonar, and sensor array beamforming and direction finding; communications signal processing; low power, multi-core and system-on-chip signal processing; sensing, communication, analysis and optimization for cyber-physical systems such as power grids and the Internet of Things.
信号和数据分析的理论、算法、性能分析和应用,包括物理建模、处理、检测和参数估计、学习、挖掘、检索和信息提取。“信号”一词包括语音、音频、声纳、雷达、地球物理、生理、(生物)医学、图像、视频和多模态自然和人为信号,包括通信信号和数据。感兴趣的主题包括:统计信号处理、谱估计和系统辨识;滤波器设计;自适应滤波/随机学习;(压缩)采样、传感和变换域方法,包括快速算法;用于机器学习的信号处理和用于信号处理应用的
机器学习;网络与图形信号处理;信号处理中的凸和非凸优化方法;雷达、声纳和传感器阵列波束形成和测向;通信信号处理;低功耗、多核、片上系统信号处理;信息物理系统的传感、通信、分析和优化,如电网和物联网。
--
一级分类:Computer Science        计算机科学
二级分类:Information Theory        信息论
分类描述:Covers theoretical and experimental aspects of information theory and coding. Includes material in ACM Subject Class E.4 and intersects with H.1.1.
涵盖信息论和编码的理论和实验方面。包括ACM学科类E.4中的材料,并与H.1.1有交集。
--
一级分类:Mathematics        数学
二级分类:Information Theory        信息论
分类描述:math.IT is an alias for cs.IT. Covers theoretical and experimental aspects of information theory and coding.
它是cs.it的别名。涵盖信息论和编码的理论和实验方面。
--
---
英文摘要:
  In this paper a spectrum sensing policy employing recency-based exploration is proposed for cognitive radio networks. We formulate the problem of finding a spectrum sensing policy for multi-band dynamic spectrum access as a stochastic restless multi-armed bandit problem with stationary unknown reward distributions. In cognitive radio networks the multi-armed bandit problem arises when deciding where in the radio spectrum to look for idle frequencies that could be efficiently exploited for data transmission. We consider two models for the dynamics of the frequency bands: 1) the independent model where the state of the band evolves randomly independently from the past and 2) the Gilbert-Elliot model, where the states evolve according to a 2-state Markov chain. It is shown that in these conditions the proposed sensing policy attains asymptotically logarithmic weak regret. The policy proposed in this paper is an index policy, in which the index of a frequency band is comprised of a sample mean term and a recency-based exploration bonus term. The sample mean promotes spectrum exploitation whereas the exploration bonus encourages for further exploration for idle bands providing high data rates. The proposed recency based approach readily allows constructing the exploration bonus such that it will grow the time interval between consecutive sensing time instants of a suboptimal band exponentially, which then leads to logarithmically increasing weak regret. Simulation results confirming logarithmic weak regret are presented and it is found that the proposed policy provides often improved performance at low complexity over other state-of-the-art policies in the literature. 
---
PDF链接:
https://arxiv.org/pdf/1709.00237