Re: MultiXact\SLRU buffers configuration

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: MultiXact\SLRU buffers configuration
Дата
Msg-id 20200514.151659.1224056315528271804.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: MultiXact\SLRU buffers configuration  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Ответы Re: MultiXact\SLRU buffers configuration  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Список pgsql-hackers
At Thu, 14 May 2020 10:19:42 +0500, "Andrey M. Borodin" <x4mmm@yandex-team.ru> wrote in 
> >> I'm looking more at MultiXact and it seems to me that we have a race condition there.
> >> 
> >> When we create a new MultiXact we do:
> >> 1. Generate new MultiXactId under MultiXactGenLock
> >> 2. Record new mxid with members and offset to WAL
> >> 3. Write offset to SLRU under MultiXactOffsetControlLock
> >> 4. Write members to SLRU under MultiXactMemberControlLock
> > 
> > But, don't we hold exclusive lock on the buffer through all the steps
> > above?
> Yes...Unless MultiXact is observed on StandBy. This could lead to observing inconsistent snapshot: one of lockers
committedtuple delete, but standby sees it as alive.
 

Ah, right. I looked from GetNewMultiXactId. Actually
XLOG_MULTIXACT_CREATE_ID is not protected from concurrent reference to
the creating mxact id. And GetMultiXactIdMembers is considering that
case.

> >> When we read MultiXact we do:
> >> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
> >> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
> >> 3. Retrieve members from SLRU under MultiXactMemberControlLock
> >> 4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members
list.
> > 
> > So transactions never see such incomplete mxids, I believe.
> I've observed sleep in step 2. I believe it's possible to observe special effects of step 4 too.
> Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it hits hard on Standbys: if someone is
lockingwhole table on primary - all seq scans on standbys follow him with MultiXactOffsetControlLock contention.
 

GetMultiXactIdMembers believes that 4 is successfully done if 2
returned valid offset, but actually that is not obvious.

If we add a single giant lock just to isolate ,say,
GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
unnecessarily.  Perhaps we need finer-grained locking-key for standby
that works similary to buffer lock on primary, that doesn't cause
confilicts between irrelevant mxids.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Parallel copy