Re: MVCC catalog access

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: MVCC catalog access
Дата
Msg-id 20130617121255.GF5875@alap2.anarazel.de
обсуждение исходный текст
Ответ на Re: MVCC catalog access  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: MVCC catalog access  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2013-06-03 14:57:12 -0400, Robert Haas wrote:
> On Thu, May 30, 2013 at 1:39 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > +1.
> 
> Here's a more serious patch for MVCC catalog access.  This one
> involves more data copying than the last one, I think, because the
> previous version did not register the snapshots it took, which I think
> is not safe.  So this needs to be re-tested for performance, which I
> have so far made no attempt to do.

Ok, I am starting to take a bit more serious look.

Minor issues I noticed:
* index.c:index_constraint_create()s - comments need to get updated
* index.c:IndexCheckExclusion() - why do we still use a SnapshotNow? I'd rather not use *Now if it isn't necessary.
* the * CONCURRENTLY infrastructure should be simplified once this has been applied, but I think it makes sense to keep
thatseparate.
 
* index.c:reindex_index() - SnapshotNow comment should be updated

I still think that renaming SnapshotNow to something like
SnapshotPerTuple to force everyone to reavaluate their usage would be
good.

So, the biggest issue with the patch seems to be performance worries. I
tried to create a worst case scenario:
postgres (patched and HEAD) running with:
-c shared_buffers=4GB \
-c max_connections=2000 \
-c maintenance_work_mem=2GB \
-c checkpoint_segments=300 \
-c wal_buffers=64MB \
-c synchronous_commit=off \
-c autovacuum=off \
-p 5440

With one background pgbench running:
pgbench -p 5440 -h /tmp -f /tmp/readonly-busy.sql -c 1000 -j 10 -T 100 postgres
readonly-busy.sql:
BEGIN;
SELECT txid_current();
SELECT pg_sleep(0.0001);
COMMIT;

I measured the performance of one other pgbench:
pgbench -h /tmp -p 5440 postgres -T 10 -c 100 -j 100 -n -f /tmp/simplequery.sql -C
simplequery.sql:
SELECT * FROM af1, af2 WHERE af1.x = af2.x;
tables:
create table af1 (x) as select g from generate_series(1,4) g;
create table af2 (x) as select g from generate_series(4,7) g;

With that setup one can create quite a noticeable overhead for the mvcc
patch (best of 5):

master-optimize:
tps = 1261.629474 (including connections establishing)
tps = 15121.648834 (excluding connections establishing)

dev-optimize:
tps = 773.719637 (including connections establishing)
tps = 2804.239979 (excluding connections establishing)

Most of the time in both, patched and unpatched is by far spent in
GetSnapshotData. I think the reason this shows a far higher overhead
than what you previously measured is that a) in your test the other
backends were idle, in mine they actually modify PGXACT which causes
noticeable cacheline bouncing b) I have higher numer of connections &
#max_connections

A quick test shows that even with max_connection=600, 400 background,
and 100 foreground pgbenches there's noticeable overhead:
master-optimize:
tps = 2221.226711 (including connections establishing)
tps = 31203.259472 (excluding connections establishing)
dev-optimize:
tps = 1629.734352 (including connections establishing)
tps = 4754.449726 (excluding connections establishing)

Now I grant that's a somewhat harsh test for postgres, but I don't
think it's entirely unreasonable and the performance impact is quite
stark.

> It strikes me as rather unfortunate that the snapshot interface is
> designed in such a way as to require so much data copying.  It seems
> we always take a snapshot by copying from PGXACT/PGPROC into
> CurrentSnapshotData or SecondarySnapshotData, and then copying data a
> second time from there to someplace more permanent.  It would be nice
> to avoid that, at least in common cases.

Sounds doable. But let's do one thing at a atime ;). That copy wasn't
visible in the rather extreme workload from above btw...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Szymon Guz
Дата:
Сообщение: Re: Add regression tests for SET xxx
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: refresh materialized view concurrently