Re: How abnormal server shutdown could be detected by tests?

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: How abnormal server shutdown could be detected by tests?
Дата
Msg-id CAJpy0uAqpyWKSvYiCRUyVMJBT+4-MKX6CtA6D=-_jjaxTXwvhA@mail.gmail.com
обсуждение исходный текст
Ответ на How abnormal server shutdown could be detected by tests?  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: How abnormal server shutdown could be detected by tests?  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-hackers
On Sat, Dec 9, 2023 at 9:30 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Hello hackers,
>
> While studying bug #18158, I've come to the conclusion that the existing
> testing infrastructure is unable to detect abnormal situations. of some
> kind.
>
> Just a simple example:
> With Assert(0) injected in walsender (say:
> sed "s/WalSndDone(send_data)/Assert(0)/" -i src/backend/replication/walsender.c
> ), I observe the following:
> $ make -s check -C src/test/recovery PROVE_TESTS="t/012*"
>
> t/012_subtransactions.pl .. ok
> All tests successful.
>
> (The same with meson:
> 1/1 postgresql:recovery / recovery/012_subtransactions OK                3.24s   12 subtests passed)
>
> But:
> $ grep TRAP src/test/recovery/tmp_check/log/*
> src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line:
> 2528, PID: 373729
> src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line:
> 2528, PID: 373750
> src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line:
> 2528, PID: 373821
> src/test/recovery/tmp_check/log/012_subtransactions_standby.log:TRAP: failed Assert("0"), File: "walsender.c", Line:
> 2528, PID: 373786
>
> src/test/recovery/tmp_check/log/012_subtransactions_primary.log contains:
> ...
> 2023-12-09 03:23:20.210 UTC [375933] LOG:  server process (PID 375975) was terminated by signal 6: Aborted
> 2023-12-09 03:23:20.210 UTC [375933] DETAIL:  Failed process was running: START_REPLICATION 0/3000000 TIMELINE 3
> 2023-12-09 03:23:20.210 UTC [375933] LOG:  terminating any other active server processes
> 2023-12-09 03:23:20.210 UTC [375933] LOG:  abnormal database system shutdown
> 2023-12-09 03:23:20.211 UTC [375933] LOG:  database system is shut down
> ...
>
> So the shutdown definitely was considered abnormal, but we missed that.
>
> With log_min_messages = DEBUG3, I can see in the log:
> 2023-12-09 03:26:50.145 UTC [377844] LOG:  abnormal database system shutdown
> 2023-12-09 03:26:50.145 UTC [377844] DEBUG:  shmem_exit(1): 0 before_shmem_exit callbacks to make
> 2023-12-09 03:26:50.145 UTC [377844] DEBUG:  shmem_exit(1): 5 on_shmem_exit callbacks to make
> 2023-12-09 03:26:50.145 UTC [377844] DEBUG:  cleaning up orphaned dynamic shared memory with ID 2898643884
> 2023-12-09 03:26:50.145 UTC [377844] DEBUG:  cleaning up dynamic shared memory control segment with ID 3446966170
> 2023-12-09 03:26:50.146 UTC [377844] DEBUG:  proc_exit(1): 2 callbacks to make
> 2023-12-09 03:26:50.146 UTC [377844] LOG:  database system is shut down
> 2023-12-09 03:26:50.146 UTC [377844] DEBUG:  exit(1)
> 2023-12-09 03:26:50.146 UTC [377844] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
> 2023-12-09 03:26:50.146 UTC [377844] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
> 2023-12-09 03:26:50.146 UTC [377844] DEBUG:  proc_exit(-1): 0 callbacks to make
>
> The postmaster process exits with exit code 1, but pg_ctl can't get the
> code and just reports that stop was completed successfully.
>

For what it's worth, there is another thread which stated the similar problem:
https://www.postgresql.org/message-id/flat/2366244.1651681550%40sss.pgh.pa.us

> Should this be improved? And if yes, how it can be done?
> Maybe postmaster shouldn't remove it's postmaster.pid when it exits
> abnormally to let pg_ctl know of it?
>


thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nicolas Lutic
Дата:
Сообщение: planner chooses incremental but not the best one
Следующее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby