Re: backup manifests and contemporaneous buildfarm failures

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: backup manifests and contemporaneous buildfarm failures
Дата
Msg-id 26044.1585954081@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: backup manifests and contemporaneous buildfarm failures  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: backup manifests and contemporaneous buildfarm failures
Re: backup manifests and contemporaneous buildfarm failures
Список pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
> presumably by the perl subroutine mutilate_open_directory_fails.  I
> see this in my inbox (the build farm wrote it to stderr or stdout
> rather than the log file):

> cannot chdir to child for
> pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
> Permission denied at ./run_build.pl line 1013.
> cannot remove directory for
> pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
> Directory not empty at ./run_build.pl line 1013.

I'm guessing that we're looking at a platform-specific difference in
whether "rm -rf" fails outright on an unreadable subdirectory, or
just tries to carry on by unlinking it anyway.

A partial fix would be to have the test script put back normal
permissions on that directory before it exits ... but any failure
partway through the script would leave a time bomb requiring manual
cleanup.

On the whole, I'd argue that testing that behavior is not valuable
enough to take risks of periodically breaking buildfarm members
in a way that will require manual recovery --- to say nothing of
annoying developers who trip over it.  So my vote is to remove
that part of the test and be satisfied with checking the behavior
for an unreadable file.

This doesn't directly explain the failure-at-next-configure behavior
that we're seeing in the buildfarm, but it wouldn't be too surprising
if it ends up being that the buildfarm client script doesn't manage
to fully recover from the situation.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: backup manifests and contemporaneous buildfarm failures
Следующее
От: Andres Freund
Дата:
Сообщение: vacuum_defer_cleanup_age inconsistently applied on replicas