Re: Seqscan in MAX(index_column)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Seqscan in MAX(index_column)
Дата
Msg-id 11010.1062771161@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Seqscan in MAX(index_column)  (Christopher Browne <cbbrowne@acm.org>)
Список pgsql-hackers
Christopher Browne <cbbrowne@acm.org> writes:
> Wouldn't this more or less be the same thing as having a trigger that
> does, upon each insert/delete "update pg_counts set count = count + 1
> where reltable = 45232;"?  (... where 1 would be -1 for deletes, and where
> 45232 is the OID of the table...)

I think that's exactly what Bruce was suggesting.  A slightly more
efficient variant is for each transaction to save up its net deltas,
and issue a single UPDATE for each table it's touched just before it
commits.  But that just reduces the volume of update traffic, it doesn't
fundamentally alter the concept.

> Technically, it seems _feasible_, albeit with the problem that it
> turns pg_counts into a pretty horrid bottleneck.

Not to mention a likely source of deadlocks.  And it still doesn't solve
the fundamental objection that you can't get an MVCC-correct answer by
examining the table.

An idea I was toying with is to do something similar to what was just
suggested to David Skoll for his stats problem: instead of using
UPDATEs, use INSERTs of delta records.  That is, every time a
transaction is about to commit, it INSERTs into the counts table a row
like "45232 +1" ("I inserted one row") or "45232 -10" ("I deleted ten
rows").  Assume that we somehow initialized the counts table with an
entry "45232 total-rows" for each table.  Then, a COUNT(*) on table
45232 is equivalent to "SELECT SUM(deltas) FROM counts WHERE reltable =
45232".  As long as the number of rows you have to look at to compute
this sum is smaller than the number of rows in the original table,
it's a win.

The cool thing about this approach is that it is actually MVCC-correct.
If some transaction has committed, but is uncommitted according to your
worldview, your SUM will automatically ignore its delta row.  Another
cool thing is that the INSERTs don't conflict with each other, so
there's no contention or deadlock risk.

You would periodically (perhaps during VACUUM) update the counts table
with operations that are conceptually like
    BEGIN;    INSERT INTO counts        SELECT reltable, SUM(deltas) FROM counts        WHERE xid < GLOBALXMIN
GROUPBY reltable;    DELETE FROM counts WHERE xid < GLOBALXMIN;    COMMIT;
 

to sweep together the past deltas from transactions that are so old no
one cares about their individual effects anymore (GLOBALXMIN is the same
cutoff used by VACUUM to decide it can remove a committed-dead tuple).
This prevents the number of delta rows from growing indefinitely large
over time.

> And if I have tables where I insert lots of data, but couldn't care
> less how many rows they have, this effort is wasted.

Yes, this mechanism would be hugely expensive in any case.  I can't see
enabling it overall, it would have to be turned on only for specific
tables by user command.  It'd be interesting to try to code it as a
contrib module that's fired by triggers on the tables you want to track.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Stats Collector Error 7.4beta1 and 7.4beta2
Следующее
От: Rod Taylor
Дата:
Сообщение: Re: [PATCHES] Warning for missing createlang