Re: why does count take so long?

Поиск
Список
Период
Сортировка
От Christopher Browne
Тема Re: why does count take so long?
Дата
Msg-id m3ekypvn5i.fsf@chvatal.cbbrowne.com
обсуждение исходный текст
Ответ на why does count take so long?  (Joseph Shraibman <jks@selectacast.net>)
Список pgsql-general
In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote:
> How about keeping counts of inserts, deletes and updates per table per
> transaction as part of the live statistics?

Aye, there's the rub.

That's exactly what _won't_ work, and that's exactly the case that is
somewhat pathological under MVCC.

With MVCC, data "appears as if by magic" when its transaction COMMITs,
thereby revealing the rows to the world.

Unfortunately, there's no simple way of making updates to counts
"simply appear" when they take effect, not without turning the updates
into a concurrency bottleneck.

Here's a bit of a wild thought...

Assume a table with schema as follows:
create table pg_count (
  xid integer,   ---  Actually, it's not an integer, right, Andrew? :-(
  reltype oid,
  count integer
);

Every transaction, "xid," affects some number of tuples.  So that for
a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
rows from table with 45678, part of the transaction would include
inserting these rows:

  insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
  insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);

In order to get the count for table 12345, you could then go to
pg_count and request:
  select sum(count) from pg_count where reltype = 12345;

The size of this table would increase every time a transaction gets
committed, so presumably part of VACUUM TABLE would be a
collection/summarization, thus...

  -- Collect existing stats into 1 row
  insert into pg_count(xid, reltype, count) values (currxid,
    currtable, select sum(count) from pg_count where reltype =
    currtable);

  -- Delete the old stats
  delete from pg_count where reltype = currtable and xid <> currxid;

This will cope with concurrency reasonably well (modulo having to make
sure that the "collect/delete" transaction is a serialized one).

Unfortunately, if a table is updated frequently, the summary
  select sum(count) from pg_count where reltype = 12345;
will have to collect together quite a large number of entries, which
makes this "less cheap."

That suggests an optimization; any time the COUNT is selected, the old
stats can and should be collected into 1 row and the old data deleted.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
Sturgeon's Law: 90% of *EVERYTHING* is crud.

В списке pgsql-general по дате отправления:

Предыдущее
От: Daniel Schuchardt
Дата:
Сообщение: Kill -9 Postmaster and Temporary Tables
Следующее
От: "Marc G. Fournier"
Дата:
Сообщение: Re: Help needed in Replicating pgsql