Re: [HACKERS] Issues with logical replication

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: [HACKERS] Issues with logical replication
Дата
Msg-id 68879b3a-46c0-0aab-f6a2-d374010b4e71@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Issues with logical replication  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Issues with logical replication
Список pgsql-hackers
Hi,

(sorry for not being active here, I am still catching up after being
away for some family issues)

On 16/11/17 21:12, Robert Haas wrote:
> On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund <andres@anarazel.de> wrote:
>>> To me, it seems like SnapBuildWaitSnapshot() is fundamentally
>>> misdesigned
>>
>> Maybe I'm confused, but why is it fundamentally misdesigned? It's not
>> such an absurd idea to wait for an xid in a WAL record.  I get that
>> there's a race condition here, which obviously bad, but I don't really
>> see as evidence of the above claim.
>>
>> I actually think this code used to be safe because ProcArrayLock used to
>> be held while generating and logging the running snapshots record.  That
>> was removed when fixing some other bug, but perhaps that shouldn't have
>> been done...
> 
> OK.  Well, I might be overstating the case.  My comment about
> fundamental misdesign was really just based on the assumption that
> XactLockTableWait() could be used to wait for an XID the instant it
> was generated.  That was never gonna work and there's no obvious clean
> workaround for the problem.  Getting snapshot building to work
> properly seems to be Hard (TM).
> 

No kidding, we've been at it since 9.4.

But anyway, while sure we have race condition because
XactLockTableWait() finishes too early, all that should mean is we call
LogStandbySnapshot() too early and as a result taking snapshot (and
hence slot creation) will take longer as we'll wait for next call of
LogStandbySnapshot() from some other caller (because the tx we care
about will still be running). The whole SnapBuildWaitSnapshot() is just
optimization to make slot creation take less time (and also to be able
to write tests).

What I don't understand is how it leads to crash (and I could not
reproduce it using the pgbench file attached in this thread either) and
moreover how it leads to 0 xid being logged. The only explanation I can
come up is that some kind of similar race has to be in
LogStandbySnapshot() but we explicitly check for 0 xid value there.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: using index or check in ALTER TABLE SET NOT NULL
Следующее
От: Tom Lane
Дата:
Сообщение: Re: using index or check in ALTER TABLE SET NOT NULL