Re: sequences vs. synchronous replication

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: sequences vs. synchronous replication
Дата
Msg-id 59b88954-cc72-7bc4-9e83-63096597fa55@enterprisedb.com
обсуждение исходный текст
Ответ на Re: sequences vs. synchronous replication  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Ответы Re: sequences vs. synchronous replication  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Список pgsql-hackers
On 1/15/22 06:12, Fujii Masao wrote:
> 
> 
> On 2022/01/12 1:07, Tomas Vondra wrote:
>> I explored the idea of using page LSN a bit
> 
> Many thanks!
> 
> 
>> The patch from 22/12 simply checks if the change should/would wait for
>> sync replica, and if yes it WAL-logs the sequence increment. There's a
>> couple problems with this, unfortunately:
> 
> Yes, you're right.
> 
> 
>> So I think this approach is not really an improvement over WAL-logging
>> every increment. But there's a better way, I think - we don't need to
>> generate WAL, we just need to ensure we wait for it to be flushed at
>> transaction end in RecordTransactionCommit().
>>
>> That is, instead of generating more WAL, simply update XactLastRecEnd
>> and then ensure RecordTransactionCommit flushes/waits etc. Attached is a
>> patch doing that - the changes in sequence.c are trivial, changes in
>> RecordTransactionCommit simply ensure we flush/wait even without XID
>> (this actually raises some additional questions that I'll discuss in a
>> separate message in this thread).
> 
> This approach (and also my previous proposal) seems to assume that the 
> value returned from nextval() should not be used until the transaction 
> executing that nextval() has been committed successfully. But I'm not 
> sure how many applications follow this assumption. Some application 
> might use the return value of nextval() instantly before issuing commit 
> command. Some might use the return value of nextval() executed in 
> rollbacked transaction.
> 

IMO any application that assumes data from uncommitted transactions is 
outright broken and we should not try to fix that because it's quite 
futile (and likely will affect well-behaving applications).

The issue I'm trying to fix in this thread is much narrower - we don't 
actually meet the guarantees for committed transactions (that only did 
nextval without generating any WAL).

> If we want to avoid duplicate sequence value even in those cases, ISTM 
> that the transaction needs to wait for WAL flush and sync rep before 
> nextval() returns the value. Of course, this might cause other issues 
> like performance decrease, though.
> 

Right, something like that. But that'd hurt well-behaving applications, 
because by doing the wait earlier (in nextval, not at commit) it 
increases the probability of waiting.

FWIW I'm not against improvements in this direction, but it goes way 
beyong fixing the original issue.

> 
>> On btrfs, it looks like this (the numbers next to nextval are the cache
>> size, with 1 being the default):
>>
>>    client  test         master   log-all  page-lsn   log-all  page-lsn
>>    -------------------------------------------------------------------
>>         1  insert          829       807       802       97%       97%
>>            nextval/1     16491       814     16465        5%      100%
>>            nextval/32    24487     16462     24632       67%      101%
>>            nextval/64    24516     24918     24671      102%      101%
>>            nextval/128   32337     33178     32863      103%      102%
>>
>>    client  test         master   log-all  page-lsn   log-all  page-lsn
>>    -------------------------------------------------------------------
>>         4  insert         1577      1590      1546      101%       98%
>>            nextval/1     45607      1579     21220        3%       47%
>>            nextval/32    68453     49141     51170       72%       75%
>>            nextval/64    66928     65534     66408       98%       99%
>>            nextval/128   83502     81835     82576       98%       99%
>>
>> The results seem clearly better, I think.
> 
> Thanks for benchmarking this! I agree that page-lsn is obviously better 
> than log-all.
> 
> 
>> For "insert" there's no drop at all (same as before), because as soon as
>> a transaction generates any WAL, it has to flush/wait anyway.
>>
>> And for "nextval" there's a drop, but only with 4 clients, and it's much
>> smaller (53% instead of 97%). And increasing the cache size eliminates
>> even that.
> 
> Yes, but 53% drop would be critial for some applications that don't want 
> to increase the cache size for some reasons. So IMO it's better to 
> provide the option to enable/disable that page-lsn approach.
> 

I disagree. This drop applies only to extremely simple transactions - 
once the transaction does any WAL write, it disappears. Even if the 
transaction does only a couple reads, it'll go away. I find it hard to 
believe there's any serious application doing this.

So I think we should get it reliable (to not lose data after commit) 
first and then maybe see if we can improve this.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: extended stats on partitioned tables
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Isn't wait_for_catchup slightly broken?