Re: count(*) performance improvement ideas

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: count(*) performance improvement ideas
Дата
Msg-id 12841.1205813890@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: count(*) performance improvement ideas  ("Stephen Denne" <Stephen.Denne@datamail.co.nz>)
Ответы Re: count(*) performance improvement ideas  (Gregory Stark <stark@enterprisedb.com>)
Re: count(*) performance improvement ideas  ("Stephen Denne" <Stephen.Denne@datamail.co.nz>)
Список pgsql-hackers
"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> I initially thought that what you meant by "having transactions enter "delta" entries" was that I have a trigger that
wouldcreate a row each time it was called, writing how many records where inserted or deleted. I didn't understand how
thiswould be much of an improvement, as each of my rows would contain either +1 or -1.
 

Well, ideally you'd aggregate all the deltas caused by a particular
transaction into one entry in the counting table.  Whether or not that
happens, though, the point of the concept is that some background task
aggregates all the deltas from long-gone transactions into just one base
row, and then deletes the old delta entries.  To get a valid value of
COUNT(*), what onlookers must do is SUM() the base row and delta records
from all transactions that they can "see" under MVCC rules.  The amount
of work involved is proportional to the number of recent updates, not
the total size of the underlying table.

> However I'm not after a fast count(*) from table, but more like a fast
>     select grouping_id, count(*) from my_table group by grouping_id

You could apply the same technique across each group id, though this
certainly is getting beyond what any built-in feature might offer.

> Can you clarify the lack of MVCC problems?

The point there is that the "right answer" actually depends on the
observer, since each observer might have a different snapshot and
therefore be able to "see" a different set of committed rows in the
underlying table.  The multiple-delta table handles this automatically,
because you can "see" a delta entry if and only if you could "see"
the underlying-table changes it represents.

> Does this idea apply with the same efficiency in pre 8.3, non-HOT implementations?

I didn't claim it was amazingly efficient in any implementation ;-).
HOT in particular is nearly useless since most rows in the count
table will never be updated, only inserted and eventually deleted.
You might get some mileage on the base row, but that'd be about it.
The count table will need frequent vacuums as well as frequent
aggregation scans.

It should beat scanning a large underlying table, but it's hardly
gonna be free.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gregory Stark
Дата:
Сообщение: Re: New style of hash join proposal
Следующее
От: tomas@tuxteam.de
Дата:
Сообщение: Re: Rewriting Free Space Map