Re: WIP: WAL prefetch (another approach)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: WIP: WAL prefetch (another approach)
Дата
Msg-id 3967044.1620157661@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: WIP: WAL prefetch (another approach)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WIP: WAL prefetch (another approach)  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
I wrote:
> I suppose that if we're unable to reproduce it on at least one other box,
> we have to write it off as hardware flakiness.

BTW, that conclusion shouldn't distract us from the very real bug
that Andres identified.  I was just scraping the buildfarm logs
concerning recent failures, and I found several recent cases
that match the symptom he reported:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2021-04-23%2022%3A27%3A41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-04-21%2005%3A15%3A24
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-20%2002%3A03%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-05-04%2004%3A07%3A41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2021-04-20%2021%3A08%3A59

They all show the standby in recovery/019_replslot_limit.pl failing
with symptoms like

2021-05-04 07:42:00.968 UTC [24707406:1] LOG:  database system was shut down in recovery at 2021-05-04 07:41:39 UTC
2021-05-04 07:42:00.968 UTC [24707406:2] LOG:  entering standby mode
2021-05-04 07:42:01.050 UTC [24707406:3] LOG:  redo starts at 0/1C000D8
2021-05-04 07:42:01.079 UTC [24707406:4] LOG:  consistent recovery state reached at 0/1D00000
2021-05-04 07:42:01.079 UTC [24707406:5] FATAL:  invalid memory alloc request size 1476397045
2021-05-04 07:42:01.080 UTC [13238274:3] LOG:  database system is ready to accept read only connections
2021-05-04 07:42:01.082 UTC [13238274:4] LOG:  startup process (PID 24707406) exited with exit code 1

(BTW, the behavior seen here where the failure occurs *immediately*
after reporting "consistent recovery state reached" is seen in the
other reports as well, including Andres' version.  I wonder if that
means anything.)

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: PG in container w/ pid namespace is init, process exits cause restart