[HACKERS] Still another race condition in recovery TAP tests

Поиск
Список
Период
Сортировка
От Tom Lane
Тема [HACKERS] Still another race condition in recovery TAP tests
Дата
Msg-id 24621.1504924323@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: [HACKERS] Still another race condition in recovery TAP tests  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] Still another race condition in recovery TAP tests  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
In a moment of idleness I tried to run the TAP tests on pademelon,
which is a mighty old and slow machine.  Behold,
src/test/recovery/t/010_logical_decoding_timelines.pl fell over,
with the relevant section of its log contents being:

# testing logical timeline following with a filesystem-level copy
# Taking filesystem backup b1 from node "master"
# pg_start_backup: 0/2000028
could not
opendir(/home/postgres/pgsql/src/test/recovery/tmp_check/t_010_logical_decoding_timelines_master_data/pgdata/pg_wal/archive_status/000000010000000000000001.ready):
Nosuch file or directory at ../../../src/test/perl//RecursiveCopy.pm line 115. 
### Stopping node "master" using mode immediate

The postmaster log has this relevant entry:

2017-09-08 22:03:22.917 EDT [19160] DEBUG:  archived write-ahead log file "000000010000000000000001"

It looks to me like the archiver removed 000000010000000000000001.ready
between where RecursiveCopy.pm checks that $srcpath is a regular file
or directory (line 95) and where it rechecks whether it's a regular
file (line 105).  Then the "-f" test on line 105 fails, allowing it to
fall through to the its-a-directory path, and unsurprisingly the opendir
at line 115 fails with the above symptom.

In short, RecursiveCopy.pm is woefully unprepared for the idea that the
source directory tree might be changing underneath it.

I'm not real sure if the appropriate answer to this is "we need to fix
RecursiveCopy" or "we need to fix the callers to not call RecursiveCopy
until the source directory is stable".  Thoughts?

(I do kinda wonder why we rolled our own RecursiveCopy; surely there's
a better implementation in CPAN?)
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] Parallel worker error
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] [POC] Faster processing at Gather node