Re: Reducing the size of BufferTag & remodeling forks

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Reducing the size of BufferTag & remodeling forks
Дата
Msg-id 20150703171603.GK3291@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Reducing the size of BufferTag & remodeling forks  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
On 2015-07-03 13:59:07 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > 2) Replace relation forks, with the exception of the init fork which is
> >    special anyway, with separate relfilenodes. Stored in seperate
> >    columns in pg_class.
> 
> Different AMs have different fork needs; for heaps you want one main
> fork, one VM, one fsm.  But for indexes, the VM fork is not necessary,
> and some AMs might want different ones.  For instance, GIN would benefit
> from having separate forks to store the internal indexes, and BRIN would
> benefit from a separate fork for the revmap.
> 
> What I'm saying is that I'm not sure it's okay to store the forks in
> pg_class columns

Right. Part of the point of this design is that you could easily add
further forks without system wide knowledge and that it is *not*
required anymore that all relfilenodes are in one column. I think it'd
probably make sense to have at least _vm and _fsm in pg_class, but we
could easily add further ones elsewhere.

> instead perhaps we should have a separate catalog on
> which each relation can have many forks, or perhaps have the pg_class
> entry store an array (ick).  Or perhaps rather than "relvmfork" (the
> pg_class column for the relfilenode for the VM fork) we could store
> relfilenode1, relfilenode2 where the value for each N fork is
> AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is
> main, 2 is revmap; and so forth).

None of these sound particularly pretty to me. An array of relfilenodes
would probably be the least ugly one.

> FWIW the whole idea seems reasonable to me.  I worry about concurrent
> traffic into the pg_relfilenode shared table -- if temp table creation
> is common across many databases, is it going to become a contention
> point?

I don't think it'll be. It's essentially just inserts into a tiny table
with a single index, right? We can do a bootload of inserts into one of
these.

In an *assert enabled* build:
CREATE TABLE pg_relfilenode() WITH OIDS;
ALTER TABLE pg_relfilenode ADD CONSTRAINT pg_relfilenode_pkey PRIMARY KEY(oid);

which is pretty much how pg_relfilenode would look like. Although we'd
not go through the whole executor for the inserts.

pgbench -h localhost -p 5440 postgres -n -f /tmp/pg_relfilenode.sql -j 16 -c 16 -T 20 -P 1
cat /tmp/pg_relfilenode.sql:
INSERT INTO pg_relfilenode DEFAULT VALUES;
progress: 5.0 s, 32168.4 tps, lat 0.495 ms stddev 1.728
progress: 6.0 s, 33719.6 tps, lat 0.473 ms stddev 0.773

andres@awork2:~/src/postgresql$ pgbench -h localhost -p 5440 postgres -n -f /tmp/temptable.sql -j 16 -c 16 -T 20 -P 1
CREATE TEMPORARY TABLE blarg() ON COMMIT DROP;
progress: 6.0 s, 5018.2 tps, lat 3.185 ms stddev 3.671
progress: 7.0 s, 4890.9 tps, lat 3.272 ms stddev 4.346

and that's with zero actual columns. If you instead add some:
CREATE TEMPORARY TABLE blarg(id serial primary key, data text, blarg int) ON COMMIT DROP;
progress: 7.0 s, 974.1 tps, lat 16.462 ms stddev 9.058
progress: 8.0 s, 999.9 tps, lat 16.045 ms stddev 7.011

So no, I don't think this'll be a relevant problem. We do so many
inserts for a single temp table that a single insert into another one
completely vanishes in comparison.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: WAL logging problem in 9.4.3?
Следующее
От: Andres Freund
Дата:
Сообщение: Re: WAL logging problem in 9.4.3?