Re: Seqscan in MAX(index_column)

Поиск
Список
Период
Сортировка
От Christopher Browne
Тема Re: Seqscan in MAX(index_column)
Дата
Msg-id m31xuw586t.fsf@chvatal.cbbrowne.com
обсуждение исходный текст
Ответ на Re: Seqscan in MAX(index_column)  (Bruce Momjian <pgman@candle.pha.pa.us>)
Ответы Re: Seqscan in MAX(index_column)  (Bruce Momjian <pgman@candle.pha.pa.us>)
Список pgsql-hackers
The world rejoiced as pgadmin@pse-consulting.de (Andreas Pflug) wrote:
> Bruce Momjian wrote:
>
>>Greg Stark wrote:
>>
>>
>>>It has nothing to do with MVCC. It has to do with implementing this is hard in
>>>the general case.
>>>
>>>Think of examples like:
>>>
>>>select max(foo) group by bar;
>>>
>>>or
>>>
>>>select max(foo) where xyz = z;
>>>
>>>To do it properly max/min have to be special-cased and tightly integrated with
>>>other code to handle index scans and aggregates. As it currently stands
>>>they're implemented the same way as any other aggregate, which means they get
>>>to see all the records in the grouping.
>>>
>>>This is a frequently asked question, I'm surprised you didn't find stuff
>>>searching with google. There have been numerous long discussions on this topic
>>>not long ago. People are still trying to think about how to handle this
>>>better.
>>>
>>>
>>
>>The FAQ does have the example of using ORDER BY LIMIT 1 for MAX().  What
>>we don't have a workaround for is COUNT(*).  I think that will require
>>some cached value that obeys MVCC rules of visibility.
>>
>>
> IMHO portability is an important point. People are used to MAX() and
> COUNT(*), and will be surprised that they need some special
> treatment. While the reasons for this are perfectly explainable,
> speeding up these aggregates with some extra effort would make porting
> a bit easier.

The availability of cleverness with MAX()/MIN() is no grand surprise;
it would be very nice to get some expansion of that to "SELECT VALUE
FROM TABLE WHERE (CRITERIA) ORDER BY VALUE DESCENDING LIMIT 1;"

But I'm _very_ curious as to what the anticipated treatment to collect
COUNT() more efficiently would be.  I would expect that it would only
be able to get tuned much more if there's NO "where" clause, so that
it could use some ("magically-kept-up-to-date") stats on table size.

I don't see any way to optimize COUNT when numbers of rows can
continually vary.  Storing stats somewhere will just make updates more
expensive.  And if those stats are for the table, that doesn't help me
if I want "COUNT(*) FROM TABLE WHERE UPDATED_ON BETWEEN NOW() - '1
day' and NOW()".
-- 
(format nil "~S@~S" "aa454" "freenet.carleton.ca")
http://cbbrowne.com/info/linuxdistributions.html
"Recursion is the root of computation since it trades description for time."
-- Alan Perlis


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: TCP/IP with 7.4 beta2 broken?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Seqscan in MAX(index_column)