全部版块 我的主页
论坛 提问 悬赏 求职 新闻 读书 功能一区 藏经阁
102 0
2020-11-19
The VLDB Journal (2009) 18:407–427
DOI 10.1007/s00778-008-0127-9
SPECIAL ISSUE PAPER
Anytime measures for top-k algorithms on exact
and fuzzy data sets
Benjamin Arai · Gautam Das · Dimitrios Gunopulos ·
Nick Koudas
Received: 28 February 2008 / Revised: 30 September 2008 / Accepted: 1 December 2008 / Published online: 23 January 2009
(c) Springer-Verlag 2009
Abstract Top-k queries on large multi-attribute data sets
are fundamentaloperations ininformationretrievalandrank-
ing applications. In this article, we initiate research on the
anytime behavior of top-k algorithms on exact and fuzzy
data. In particular, given specif i c top-k algorithms (TA and
TA-Sorted) we are interested in studying their progress
toward identif i cation ofthe correct result at any point during
the algorithms’ execution. We adopt aprobabilistic approach
where we seekto report at any point ofoperation ofthe algo-
rithm the conf i dence that the top-k result has been identif i ed.
Suchafunctionalitycanbe avaluable assetwhenone isinter-
ested in reducing the runtime costoftop-k computations. We
present a thorough experimental evaluation to validate our
techniques using both synthetic and real data sets.
Keywords Approximate query · Anytime · Top-k ·
Fuzzy data
1 Introduction
Top-k queries on large multi-attribute databases are com-
monplace. Such queries report the k highest ranking results
B. Arai ( B )
University of California, Riverside, USA
e-mail: barai@cs.ucr.edu
G. Das
University of Texas, Arlington, USA
e-mail: gdas@cse.uta.edu
D. Gunopulos
University of Athens, Athens, Greece
e-mail: dg@di.uoa.gr
N. Koudas
University of Toronto, Toronto, Canada
e-mail: koudas@cs.toronto.edu
based on similarity scores of attribute values and specif i c
score aggregation functions. Such queries are very frequent
in a multitude of applications including (a) multimedia
similarity search(onimages, audio, etc.), (b) preference que-
ries expressedonattributes ofassorteddatatypes, (c) internet
searches on scores based on word occurrence statistics and
diverse combining functions, and (d) sensor network appli-
cations over streams ofsensor measurements.
Several algorithms have been introduced in literature to
eff i cientlyperformtop-k computations. Amongthemostsuc-
cessful is the TA algorithm discovered independently by
Fagin et al. [16], Guntzer et al. [20] and Nepal et al. [30].
In this algorithm, each value of an attribute can be accessed
independently via an index in descending order of its score.
Such a score is computed with a specif i c query condition.
Numerous algorithms for performing top-k computations
have been proposed [2–4,14,15,18,24,27,29] depending on
themodelofdataaccess, stoppingconditions, etc. Themajor-
ity of such computations however can be exhaustive. The
algorithms come to a stop only when there is absolute cer-
tainty that the correct top-k result has been identif i ed.
An anytime algorithm is an algorithm whose quality of
results improves gradually as computation time increases
[21]. Although several types of such algorithms have been
proposed, interruptible anytime algorithms are highly popu-
larand useful. An interruptible anytime algorithm is an algo-
rithm whose runtime is not determined in advance but at any
time during execution can be interrupted and return a result.
Moreover, interruptible algorithms have an associated per-
formance prof i le which returns result quality (for suitably
def i ned notions of quality) as a function of time (relative
to execution) for a problem instance. Such algorithms are
valuable since at any point during the execution a user can
obtain feedback regarding the result quality at that point. If
one is satisf i ed with the current feedback one may bring the
12 3

Anytime measures for top-.pdf
大小:(12.15 MB)

只需: 50 个论坛币  马上下载


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群