全部版块 我的主页
论坛 提问 悬赏 求职 新闻 读书 功能一区 藏经阁
103 0
2021-07-08
Scaling Submodular Maximization via Pruned
Submodularity Graphs
Tianyi Zhou + , Hua Ouyang  , Yi Chang  , Jeff Bilmes § , Carlos Guestrin +
+ Computer Science & Engineering, § Electrical Engineering, University ofWashington, Seattle
Yahoo Research, Sunnyvale
{ tianyizh, bilmes, guestrin} @uw. edu, { houyang, yichang} @yahoo_inc. com
Abstract
We propose a new random pruning method (called “submodular sparsif i cation
(SS)”) to reduce the cost of submodular maximization. The pruning is applied
via a “submodularity graph” over the n ground elements, where each directed edge
is associated with a pairwise dependency def i ned by the submodular function. In
each step, SS prunes a 1 - 1/ √ c (for c > 1 ) fraction ofthe nodes using weights
on edges computed based on only a small number ( O(log n) ) ofrandomly sampled
nodes. The algorithm requires log √ c n steps with a small and highly parallelizable
per-step computation. An accuracy-speed tradeoffparameter c , set as c = 8 , leads
to a fast shrink rate
√ 2/4
and small iteration complexity log 2 √ 2 n . Analysis shows
that w.h.p., the greedy algorithm on the pruned set ofsize O(log 2 n) can achieve
a guarantee similar to that of processing the original dataset. In news and video
summarization tasks, SS is able to substantially reduce both computational costs
and memory usage, while maintaining (or even slightly exceeding) the quality of
the original (and much more costly) greedy algorithm.
1 Introduction
Machine learning applications benef i t from the existence of large volumes of data. The recent
explosive growth ofdata, however, poses serious challenges both to humans and machines. One ofthe
primary goals of a summarization process is to select a representative subset that reduces redundancy
but preserves f i delity to the original data [ 19 ]. Any further processing on only a summary (a small
representative set) by either a human or machine thus reduces computation, memory requirements,
and overall effort. Summarization has many applications such as news digesting, photo stream
presenting, data subset selection, and video thumbnailing. A summarization algorithm, however,
involves challenging combinatorial optimization problems, whose quality and speed heavily depend
on the objective that assigns quality scores to candidate summaries.
Submodular functions [ 11 , 19 ] are broadly applied as objectives for summarization, since they
naturally capture redundancy amongst groups of data elements. A submodular function is a set
function f : 2 V → R with a diminishing returns property, i.e., given a f i nite “ground” set V , and any
A  B  V and a v / ∈ B, we have:
f(v ∪ A) - f(A) ≥ f(v ∪ B) - f(B). (1)
This implies v is more important to the smaller set A than to the larger set B . The increase
f(v ∪ A) - f(A) ref l ects the importance of v to A and is called the “marginal gain” f(v|A) of v
conditioned on A . The objective f(·) can be chosen from a large family of functions (e.g., including
but not limited to facility location and set cover functions). Usually one requires a small summary, so
a cardinality-based budget is used. Hence, a summarization task can be cast as the following:
max
SV,|S|≤k
f(S). (2)
Knapsacks and matroids are also often used as constraints. In this paper, however, we will primarily
be concerned with cardinality constraints, but our methods do generalize to other constraints as well.
arXiv:1606.00399v1 [cs.LG] 1 Jun 2016


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群