Citation for "Bad n_distinct estimation; hacks suggested?"
| От | Gurmeet Manku |
|---|---|
| Тема | Citation for "Bad n_distinct estimation; hacks suggested?" |
| Дата | |
| Msg-id | Pine.LNX.4.44.0505020858580.28631-100000@xenon.Stanford.EDU обсуждение исходный текст |
| Ответ на | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? (Simon Riggs <simon@2ndquadrant.com>) |
| Список | pgsql-hackers |
Actually, the earliest paper that solves the distinct_n estimation
problem in 1 pass is the following:
"Estimating simple functions on the union of data streams"
by Gibbons and Tirthapura, SPAA 2001.
http://home.eng.iastate.edu/~snt/research/streaming.pdf
The above paper addresses a more difficult problem (1 pass
_and_ a distributed setting).
Gibbon's followup paper in VLDB 2001 limits the problem to a
single machine and contains primarily experimental results (for
a database audience). The algorithmic breakthrough had already been
accomplished in the SPAA paper.
Gurmeet
--
----------------------------------------------------
Gurmeet Singh Manku Google Inc.
http://www.cs.stanford.edu/~manku (650) 967 1890
----------------------------------------------------
В списке pgsql-hackers по дате отправления: