Обсуждение: Top-k optimizations?
Folks, As this came up in a work situation, I was wondering a little bit about the top-k issue. Right now, top-k is implemented (most easily, I think) via a SELECT with a LIMIT and no OFFSET. 3 questions arise from this. 1. Are there currently any optimizations specific to top-k in PostgreSQL? If so, what are they? 2. What kinds of top-k optimizations *can't* be part of PostgreSQL (things that would break MVCC, e.g.)? 3. What kinds of top-k optimizations might (eventually) be included in PostgreSQL? Hoping this stimulates some friendly & informative discussion... :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Thu, 13 Jan 2005, David Fetter wrote: > 3. What kinds of top-k optimizations might (eventually) be included > in PostgreSQL? > See the TODO item: Allow ORDER BY ... LIMIT 1 to select high/low value without sort or index using a sequential scan for highest/lowest values If only one value is needed, there is no need to sort the entire table. Instead a sequential scan could get the matching value. There is some discussion of this on the -general list here: http://groups-beta.google.com/group/comp.databases.postgresql.general/messages/08c615cc2cbdf143,fe626a7cc9021d12,4f1d0575be60c26f,5c44463d8ef0e1ef,ceff42f0dae09272,dc9da98adcb6142c,7f34133e99b38825,28b43c5e79924da6,98be76099ea6513f,5d3f19a69e3b5a93?thread_id=7d35d3eb00ffd0e8&mode=thread&noheader=1&q=oleg+limit+sort#doc_7f34133e99b38825 Kris Jurka
David Fetter wrote: > Folks, > > As this came up in a work situation, I was wondering a little bit > about the top-k issue. Right now, top-k is implemented (most easily, > I think) via a SELECT with a LIMIT and no OFFSET. 3 questions arise > from this. I think the simplest LIMIT query doesn't make it easy to show ties; but if you don't want the equivalent of MSSQL's "with ties" clause I think LIMIT works well. > 1. Are there currently any optimizations specific to top-k in > PostgreSQL? If so, what are they? Well, when I do queries like "select * from customers order by dollarsspent desc limit 3" it happily uses an index on dollarsspent. > 3. What kinds of top-k optimizations might (eventually) be included > in PostgreSQL? I think a slightly related topic is whether syntactically it'd be nice if postgresql had the SQL 2003 optional olap features to specify ways of doing top-k queries as described here: http://troels.arvin.dk/db/rdbms/#select-top-n SELECT * FROM ( SELECT RANK() OVER (ORDER BY age ASC) AS ranking, person_id, person_name, age FROM person ) AS foo WHERE ranking <= 3 Seems IBM and Oracle support that syntax or something very similar. Yeah, I know it's probably an orthogonal question to the optimizations one, but might make porting nicer.