Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization
Aadirupa Saha 1 Nagarajan Natarajan 2 Praneeth Netrapalli 2 3 Prateek Jain 2 3
Abstract the problem has a "pseudo-1d" structure in the loss func-
tions ft (w) = `t (gt (w; xt )) where gt : Rd → R is a one-
We study online learning with bandit feedback
dimensional function.
(i.e. learner has access to only zeroth-or ...
附件列表