Re: 001_rep_changes.pl fails due to publisher stuck on shutdown

Поиск
Список
Период
Сортировка
От Peter Smith
Тема Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
Дата
Msg-id CAHut+PtZk8Q3k_gymTqkiBueB=BLAXBuhRfvvbc3wstXg7bzUA@mail.gmail.com
обсуждение исходный текст
Ответ на 001_rep_changes.pl fails due to publisher stuck on shutdown  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
Список pgsql-hackers
Hi, I have reproduced this multiple times now.

I confirmed the initial post/steps from Alexander. i.e. The test
script provided [1] gets itself into a state where function
ReadPageInternal (called by XLogDecodeNextRecord and commented "Wait
for the next page to become available") constantly returns
XLREAD_FAIL. Ultimately the test times out because WalSndLoop() loops
forever, since it never calls WalSndDone() to exit the walsender
process.

~~~

I've made a patch to inject lots of logging, and when the test script
fails a cycle of function failures can be seen. I don't know how to
fix it yet, so I'm attaching my log results, hoping the information
may be useful for anyone familiar with this area of the code.

~~~

Attachment #1 "v1-0001-DEBUG-LOGGING.patch" -- Patch to inject some
logging. Be careful if you apply this because the resulting log files
can be huge (e.g. 3G)

Attachment #2 "bad8_logs_last500lines.txt" -- This is the last 500
lines of a 3G logfile from a failing test run.

Attachment #3 "bad8_logs_last500lines-simple.txt" -- Same log file as
above, but it's a simplified extract in which I showed the CYCLES of
failure more clearly.

Attachment #4 "bad8_digram"-- Same execution patch information as from
the log files, but in diagram form (just to help me visualise the
logic more easily).

~~~

Just so you know, the test script does not always cause the problem.
Sometimes it happens after just 20 script iterations. Or, sometimes it
takes a very long time and multiple runs (e.g. 400-500 script
iterations). Either way, when the problem eventually occurs the CYCLES
of the ReadPageInternal() failures always have the the same pattern
shown in these attached logs.

======
[1] OP - https://www.postgresql.org/message-id/f15d665f-4cd1-4894-037c-afdbe369287e%40gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [multithreading] extension compatibility
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Pgoutput not capturing the generated columns