Re: count(*) performance improvement ideas

Поиск
Список
Период
Сортировка
От Stephen Denne
Тема Re: count(*) performance improvement ideas
Дата
Msg-id F0238EBA67824444BC1CB4700960CB4804EAC60E@dmpeints002.isotach.com
обсуждение исходный текст
Ответ на Re: count(*) performance improvement ideas  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: count(*) performance improvement ideas  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
(Sorry for the repeat email Tom, I forgot the cc's)

Tom Lane wrote:
> "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> > I initially thought that what you meant by "having
> > transactions enter "delta" entries" was that I have a trigger
> > that would create a row each time it was called, writing how
> > many records where inserted or deleted. I didn't understand
> > how this would be much of an improvement, as each of my rows
> > would contain either +1 or -1.
>
> Well, ideally you'd aggregate all the deltas caused by a particular
> transaction into one entry in the counting table.

Yes, that's what I was attempting to do, but without changing the application code.

Using txid_current() can do that, so each of a large number of individual inserts or deletes within a transaction
updatesthe same delta row for that transaction. I haven't found any references to this being a solution, and thought it
wasworth recording. 

> > However I'm not after a fast count(*) from table, but more
> like a fast
> >     select grouping_id, count(*) from my_table group by grouping_id
>
> You could apply the same technique across each group id, though this
> certainly is getting beyond what any built-in feature might offer.

Agreed. I've tried it out now, and am fairly happy with what I've got.

> > Can you clarify the lack of MVCC problems?
>
> The point there is that the "right answer" actually depends on the
> observer, since each observer might have a different snapshot and
> therefore be able to "see" a different set of committed rows in the
> underlying table.  The multiple-delta table handles this
> automatically,
> because you can "see" a delta entry if and only if you could "see"
> the underlying-table changes it represents.
>
> > Does this idea apply with the same efficiency in pre 8.3,
> non-HOT implementations?
>
> I didn't claim it was amazingly efficient in any implementation ;-).
> HOT in particular is nearly useless since most rows in the count
> table will never be updated, only inserted and eventually deleted.
> You might get some mileage on the base row, but that'd be about it.
> The count table will need frequent vacuums as well as frequent
> aggregation scans.
>
> It should beat scanning a large underlying table, but it's hardly
> gonna be free.

What I was asking about with those questions, is if a single row is inserted in transaction A, and updated 1000 times
stillwithin transaction A, then transaction A commits... does a single row version get written, or 1001, 1000 of which
areremovable? 

I tested this with a small summary table freshly vacuum full'ed.

10 transactions, one after the other, each transaction inserted three delta rows, and updates one of them three times,
andthe other two five times each. 
So 3 inserts and 13 updates per transaction.
The updates only affect non-indexed fields in rows created in the same transaction.

The next vacuum full found 52 removable row versions.

I repeated the test, and got 13 removable row versions.

I repeated the test again, and got 13 removable row versions.

I repeated just one of the ten transactions, 13 removable row versions.

All inserts and updates are probably in the one page that has a fair bit of free space.

Is it possible to update the HOT code to re-use row versions on the same page if they were created in the same
transaction?

Conclusion: making use of txid_current(), I can get single delta rows with deltas of 10000, but doing so creates 10000
deadrow versions. 

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 
__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality             Electronic Messaging Suite. 
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: pgindent issue with EXEC_BACKEND-only typedefs
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: broken head?