Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization
Дата
Msg-id 20140102191643.GA2542@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization  (Mark Dilger <markdilger@yahoo.com>)
Ответы Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization  (Mark Dilger <markdilger@yahoo.com>)
Список pgsql-hackers
On 2014-01-02 10:18:52 -0800, Mark Dilger wrote:
> I anticipated that my proposal would require partitioning the catalogs.
> For instance, autovacuum could only run on locally owned tables, and
> would need to store the analyze stats data in a catalog partition belonging
> to the local server, but that doesn't seem like a fundamental barrier to
> it working.

It would make every catalog lookup noticeably more expensive.

>  The partitioned catalog tables would get replicated like
> everything else.  The code that needs to open catalogs and look things
> up could open the specific catalog partition needed if it already knew the
> Oid of the table/index/whatever that it was interested in, as the catalog
> partition desired would have the same modulus as the Oid of the object
> being researched. 

Far, far, far from every lookup is by oid. Most prominently the names of
database objects. Those will have to scan every catalog partition. Not
fun.

> Your point about increasing the runtime of pg_upgrade is taken.  I will
> need to think about that some more.

It's not about increasing the runtime, it's about simply breaking
it. pg_upgrade relies on binary compatibility of user relation's files
and you're breaking that if you change the width of datatypes.

> Your claim that what I describe is not multi-master is at least partially
> correct, depending on how you think about the word "master".  Certainly
> every server is the master of its own chunk.

Well, you're essentially just describing a sharded system - that's not
usually coined multimaster.

> Your claim that BDR doesn't have to be much slower than what I am
> proposing is quite interesting, as if that is true I can ditch this idea and
> use BDR instead.  It is hard to empirically test, though, as I don't have
> the alternate implementation on hand.

Well, I can tell you that for the changeset extraction stuff (which is the
basis for BDR) the biggest bottleneck so far seems to be the CRC
computation when reading the WAL - and that's something plain WAL apply
has to do as well. And it is optimizable.
When actually testing decoding & apply, for workloads fitting into
memory I had to try very hard to construe situations where apply was a
big bottleneck. It is easier for seek bound workloads, where the standby
is less powerful than the primary, since there's more random reads for
those due to full page writes removing the need for reads in many cases.

> I think the expectation that performance will be harmed if postgres
> uses 8 byte Oids is not quite correct.
>
> Several years ago I ported postgresql sources to use 64bit everything.
> Oids, varlena headers, variables tracking offsets, etc.  It was a fair
> amount of work, but all the doom and gloom predictions that I have
> heard over the years about how 8-byte varlena headers would kill
> performance, 8-byte Oids would kill performance, etc, turned out to
> be quite inaccurate.

Well, it can increase the size of the database, turning a system where
the hot set fits into memory into one where it doesn't anymore. But
really, the performance concerns were more about the catalog lookups.

Fundamentally, I think there's nothing I see preventing such a scheme
from being implemented - but I think there's about zap chance of it ever
getting integrated, it's just far to invasive with very high costs in
scenarios where it's not used for not all that much gain. Not to speak
about the amount of engineering it would require to implement.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: ERROR: missing chunk number 0 for toast value