Re: Strange count(*) implementation?

Поиск
Список
Период
Сортировка
От Tino Wildenhain
Тема Re: Strange count(*) implementation?
Дата
Msg-id 1098794754.21062.422.camel@sabrina.peacock.de
обсуждение исходный текст
Ответ на Re: Strange count(*) implementation?  (Henk Ernst Blok <h.e.blok@utwente.nl>)
Ответы Re: Strange count(*) implementation?  (Henk Ernst Blok <h.e.blok@utwente.nl>)
Список pgsql-general
On Tue, 2004-10-26 at 13:56, Henk Ernst Blok wrote:
> Hi,
>
> My question was more of a fundamental nature as this count by scan
> seemed to contradict the theory about how to optimize it.

It is hard or next to impossible to optimize count() or more generally
aggregates in a MVCC environment. Note also all the cases where
you have a qualified select to calculate the aggregate.

> I assume(d) the more expensive statistics (e.g., value distribution
> info) are updated only when outdated too much or on request (manual
> vacuum). Usually, other/cheap statistics can easily be maintained
> incrementally and thus reflect actual table state after each update.

I remember some discussion about this. But also here the MVCC and
the general call for performance leads to the current solution
where statistics are only updated on vacuum. With PG8.0 you have
vacuum strategies with better performance which can run more often
as I understand - so while not giving you exact figures your
count() could be estimated at least.

>  Of course, the MVCC principle seems to make things a bit more
> complicated I understand now. But tracking whether statistics are
> dirty has to be in the system anyway. How does it otherwise decide
> when to do its own statistics updates? So  if explain can get the most
> recent count, why not use it in the count as well if you know the
> statistics are still acurate?

The point is: you dont know it. There is curently no: mark statistics
dirty if table has new tuples or tuples removed.

> By the way, a count(*) without any where does occur very frequently if
> you are dealing with an OLAP load, which is the case in my setting.
> So, I indeed already 'fixed' the performance problem by precomputing
> all counts I can predict to be of any use.

I'm not familar with OLAP specifics, so what is the meaning of the
count() here? What is done with this information?

> Anyway, I understood this issue has been subject to discusion before I
> was on the list (searching the archive/website was/is not very
> effective, so I didn't know until someone told me so, sorry). So, I
> leave it to the developers what to do with this topic.

Yes, very very often I can tell you :-)
Really this is an FAQ :-)

Regards
Tino


В списке pgsql-general по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Strange count(*) implementation?
Следующее
От: Mike Mascari
Дата:
Сообщение: Any plans on allowing user-defined triggers to be deferrable?