Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti
Дата
Msg-id 2138662.1623460441@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti
Список pgsql-bugs
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Jun 11, 2021 at 08:31:30PM -0400, Tom Lane wrote:
>> AFAICS, none of the other callers of sub reload are doing anything
>> special to wait for it to take effect.  Not sure why this one
>> would need more protection.  If we do need a fix for that, likely
>> sub reload itself had better do it.

> Perhaps.

I looked around a bit, and I think that we don't need to sweat
about this.  Certainly we've not identified any buildfarm failures
related to reload not happening fast enough.  I think the sequence
of events is:

* pg_ctl will deliver the SIGHUP signal to the postmaster before
returning.

* Although surely the postmaster might not process that signal
instantly, we can expect that it will do so before it accepts
another connection request.  Since the tests cannot detect
whether the reload has happened without issuing a new command
(which, in all these tests, involves launching psql or pgbench
or pg_basebackup or what-have-you, which needs a new connection),
the reload will appear to have happened so far as the postmaster
itself is concerned, and therefore also in any new backend it
spawns.

* This does seem to leave in question whether a pre-existing
background process such as a logrep worker will get the word fast
enough.  However, I think the same principle applies there.  The
postmaster will have turned around and signaled the worker process
while it processes the SIGHUP for itself.  While that again doesn't
ensure instantaneous update of the worker's state, we can expect
that it will notice the SIGHUP before noticing any newly-arrived
replication data, which is Good Enough.  This argument does assume
that we know the worker is idle when we issue the reload ... but
we already waited for it to be idle.

Perhaps the buildfarm will prove me wrong, but I think we don't
have to do anything.  And if we did want to do something, what
would it be?  There is no feedback mechanism to let us observe
when the worker processed the signal.

>> This coding is also stolen verbatim from other test scripts.
>> What is your concern about it exactly?

> My main concern are future changes in this test suite that could cause
> more queries to generate logs that overlap with this sequence of
> tests, falsifying what this test is checking.

Oh, I see, you wonder if there could already be entries like these
in the file.  But since we only turned the log level up for the
duration of the specific test, it doesn't seem like a problem.
(I did think of this to the extent of making sure the second test
group in 013_partition.pl looks for different messages than the
first group does.)

            regards, tom lane



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti