Re: why does count take so long?

Поиск
Список
Период
Сортировка
От Jean-Luc Lachance
Тема Re: why does count take so long?
Дата
Msg-id 3F5F406B.9E8B5138@nsd.ca
обсуждение исходный текст
Ответ на why does count take so long?  (Joseph Shraibman <jks@selectacast.net>)
Список pgsql-general
Christopher,

You did not quite understand.

The counts in question is the actual counts (deltas) for the
transtactions.

the tuple should be something like:
xid, reltype, insert_count, update_count, delete_count

When a COUNT(*) is issued the commited tuples are totaled up stored as
xid 0 or whatever and the commited tuples deleted (all this with
appropriate locks).

I am not sure if there is a need for the update count.

JLL


Christopher Browne wrote:
>
> In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote:
> > How about keeping counts of inserts, deletes and updates per table per
> > transaction as part of the live statistics?
>
> Aye, there's the rub.
>
> That's exactly what _won't_ work, and that's exactly the case that is
> somewhat pathological under MVCC.
>
> With MVCC, data "appears as if by magic" when its transaction COMMITs,
> thereby revealing the rows to the world.
>
> Unfortunately, there's no simple way of making updates to counts
> "simply appear" when they take effect, not without turning the updates
> into a concurrency bottleneck.
>
> Here's a bit of a wild thought...
>
> Assume a table with schema as follows:
> create table pg_count (
>   xid integer,   ---  Actually, it's not an integer, right, Andrew? :-(
>   reltype oid,
>   count integer
> );
>
> Every transaction, "xid," affects some number of tuples.  So that for
> a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
> rows from table with 45678, part of the transaction would include
> inserting these rows:
>
>   insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
>   insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);
>
> In order to get the count for table 12345, you could then go to
> pg_count and request:
>   select sum(count) from pg_count where reltype = 12345;
>
> The size of this table would increase every time a transaction gets
> committed, so presumably part of VACUUM TABLE would be a
> collection/summarization, thus...
>
>   -- Collect existing stats into 1 row
>   insert into pg_count(xid, reltype, count) values (currxid,
>     currtable, select sum(count) from pg_count where reltype =
>     currtable);
>
>   -- Delete the old stats
>   delete from pg_count where reltype = currtable and xid <> currxid;
>
> This will cope with concurrency reasonably well (modulo having to make
> sure that the "collect/delete" transaction is a serialized one).
>
> Unfortunately, if a table is updated frequently, the summary
>   select sum(count) from pg_count where reltype = 12345;
> will have to collect together quite a large number of entries, which
> makes this "less cheap."
>
> That suggests an optimization; any time the COUNT is selected, the old
> stats can and should be collected into 1 row and the old data deleted.
> --
> wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
> http://www3.sympatico.ca/cbbrowne/nonrdbms.html
> Sturgeon's Law: 90% of *EVERYTHING* is crud.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

В списке pgsql-general по дате отправления:

Предыдущее
От: Mike Mascari
Дата:
Сообщение: Re: Constructing column from different individual fields
Следующее
От: "scott.marlowe"
Дата:
Сообщение: Re: Picture with Postgres and Delphi