Re: Proposal for CSN based snapshots

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Proposal for CSN based snapshots
Дата
Msg-id 5388CFAC.8020700@vmware.com
обсуждение исходный текст
Ответ на Re: Proposal for CSN based snapshots  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 05/30/2014 06:27 PM, Andres Freund wrote:
> On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:
>> One thorny issue came up in discussions with other hackers on this in PGCon:
>>
>> When a transaction is committed asynchronously, it becomes visible to other
>> backends before the commit WAL record is flushed. With CSN-based snapshots,
>> the order that transactions become visible is always based on the LSNs of
>> the WAL records. This is a problem when there is a mix of synchronous and
>> asynchronous commits:
>>
>> If transaction A commits synchronously with commit LSN 1, and transaction B
>> commits asynchronously with commit LSN 2, B cannot become visible before A.
>> And we cannot acknowledge B as committed to the client until it's visible to
>> other transactions. That means that B will have to wait for A's commit
>> record to be flushed to disk, before it can return, even though it was an
>> asynchronous commit.
>
>> I personally think that's annoying, but we can live with it. The most common
>> usage of synchronous_commit=off is to run a lot of transactions in that
>> mode, setting it in postgresql.conf. And it wouldn't completely defeat the
>> purpose of mixing synchronous and asynchronous commits either: an
>> asynchronous commit still only needs to wait for any already-logged
>> synchronous commits to be flushed to disk, not the commit record of the
>> asynchronous transaction itself.
>
> I have a hard time believing that users won't hate us for such a
> regression. It's pretty common to mix both sorts of transactions and
> this will - by my guesstimate - dramatically reduce throughput for the
> async backends.

Yeah, it probably would. Not sure how many people would care.

For an asynchronous commit, we could store the current WAL flush 
location as the commit LSN, instead of the location of the commit 
record. That would break the property that LSN == commit order, but that 
property is fundamentally incompatible with having async commits become 
visible without flushing previous transactions. Or we could even make it 
configurable, it would be fairly easy to support both behaviors.

>> * Logical decoding is broken. I hacked on it enough that it looks roughly
>> sane and it compiles, but didn't spend more time to debug.
>
> I think we can live with it not working for the first few
> iterations. I'll look into it once the patch has stabilized a bit.

Thanks!

>> * I expanded pg_clog to 64-bits per XID, but people suggested keeping
>> pg_clog as is, with two bits per commit, and adding a new SLRU for the
>> commit LSNs beside it. Probably will need to do something like that to avoid
>> bloating the clog.
>
> It also influences how on-disk compatibility is dealt with. So: How are
> you planning to deal with on-disk compatibility?
>
>> * Add some kind of backend-private caching of clog, to make it faster to
>> access. The visibility checks are now hitting the clog a lot more heavily
>> than before, as you need to check the clog even if the hint bits are set, if
>> the XID falls between xmin and xmax of the snapshot.
>
> That'll hurt a lot in concurrent scenarios :/. Have you measured how
> 'wide' xmax-xmin usually is?

That depends entirely on the workload. The worst case is a mix of a 
long-running transaction and a lot of short transaction. It could grow 
to millions of transactions or more in that case.

> I wonder if we could just copy a range of
> values from the clog when we start scanning....

I don't think that's practical, if the xmin-xmax gap is wide.

Perhaps we should take the bull by the horns and make clog faster to 
look up. If we e.g. mmapped the clog file into backend-private address 
space, we could all the locking overhead of an SLRU. On platforms with 
atomic 64-bit instructions, you could read the clog with just a memory 
barrier. Even on other architectures, you'd only need a spinlock.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Teodor Sigaev
Дата:
Сообщение: Re: jsonb access operators inefficiency
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Proposal for CSN based snapshots