Re: sequences vs. synchronous replication

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: sequences vs. synchronous replication
Дата
Msg-id ad83e515-44f6-4691-8fe0-503b94499ed5@enterprisedb.com
обсуждение исходный текст
Ответ на Re: sequences vs. synchronous replication  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: sequences vs. synchronous replication  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers

On 12/24/21 06:37, Kyotaro Horiguchi wrote:
> At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote in
>> On 12/23/21 15:42, Fujii Masao wrote:
>>> On 2021/12/23 3:49, Tomas Vondra wrote:
>>>> Attached is a patch tweaking WAL logging - in wal_level=minimal we do
>>>> the same thing as now, in higher levels we log every sequence fetch.
>>> Thanks for the patch!
>>> With the patch, I found that the regression test for sequences failed.
>>> +            fetch = log = fetch;
>>> This should be "log = fetch"?
>>> On second thought, originally a sequence doesn't guarantee that the
>>> value already returned by nextval() will never be returned by
>>> subsequent nextval() after the server crash recovery. That is,
>>> nextval() may return the same value across crash recovery. Is this
>>> understanding right? For example, this case can happen if the server
>>> crashes after nextval() returned the value but before WAL for the
>>> sequence was flushed to the permanent storage.
>>
>> I think the important step is commit. We don't guarantee anything for
>> changes in uncommitted transactions. If you do nextval in a
>> transaction and the server crashes before the WAL gets flushed before
>> COMMIT, then yes, nextval may generate the same nextval again. But
>> after commit that is not OK - it must not happen.
> 
> I don't mean to stand on Fujii-san's side particularly, but it seems
> to me sequences of RDBSs are not rolled back generally.  Some googling
> told me that at least Oracle (documented), MySQL, DB2 and MS-SQL
> server doesn't rewind sequences at rollback, that is, sequences are
> incremented independtly from transaction control.  It seems common to
> think that two nextval() calls for the same sequence must not return
> the same value in any context.
> 

Yes, sequences are not rolled back on abort generally. That would 
require much stricter locking, and that'd go against using sequences in 
concurrent sessions.

But we're not talking about sequence rollback - we're talking about data 
loss, caused by failure to flush WAL for a sequence. But that affects 
the *current* code too, and to much greater extent.

Consider this:

BEGIN;
SELECT nextval('s') FROM generate_series(1,1000) s(i);
ROLLBACK; -- or crash of a different backend

BEGIN;
SELECT nextval('s');
COMMIT;

With the current code, this may easily lose the WAL, and we'll generate 
duplicate values from the sequence. We pretty much ignore the COMMIT.

With the proposed change to WAL logging, that is not possible. The 
COMMIT flushes enough WAL to prevent this issue.

So this actually makes this issue less severe.

Maybe I'm missing some important detail, though. Can you show an example 
where the proposed changes make the issue worse?

>>> So it's not a bug that sync standby may return the same value as
>>> already returned in the primary because the corresponding WAL has not
>>> been replicated yet, isn't it?
>>>
>>
>> No, I don't think so. Once the COMMIT happens (and gets confirmed by
>> the sync standby), it should be possible to failover to the sync
>> replica without losing any data in committed transaction. Generating
>> duplicate values is a clear violation of that.
> 
> So, strictly speaking, that is a violation of the constraint I
> mentioned regardless whether the transaction is committed or
> not. However we have technical limitations as below.
> 

I don't follow. What violates what?

If the transaction commits (and gets a confirmation from sync replica), 
the modified WAL logging prevents duplicate values. It does nothing for 
uncommitted transactions. Seems like an improvement to me.

>> IMHO the fact that we allow a transaction to commit (even just
>> locally) without flushing all the WAL it depends on is clearly a data
>> loss bug.
>>
>>> BTW, if the returned value is stored in database, the same value is
>>> guaranteed not to be returned again after the server crash or by sync
>>> standby. Because in that case the WAL of the transaction storing that
>>> value is flushed and replicated.
>>>
>>
>> True, assuming the table is WAL-logged etc. I agree the issue may be
>> affecting a fairly small fraction of workloads, because most people
>> use sequences to generate data for inserts etc.
> 
> It seems to me, from the fact that sequences are designed explicitly
> untransactional and that behavior is widely adopted, the discussion
> might be missing some significant use-cases.  But there's a
> possibility that the spec of sequence came from some technical
> limitation in the past, but I'm not sure..
> 

No idea. IMHO from the correctness / behavior point of view, the 
modified logging is an improvement. The only issue is the additional 
overhead, and I think the cache addresses that quite well.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: An obsolete comment of pg_stat_statements
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: Allow escape in application_name