Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)
Дата
Msg-id b2b8cc4c-b7d7-0d1b-f6ee-a02bbef5bdd0@gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hello,
18.12.2019 6:25, Kyotaro Horiguchi wrote:
> At Tue, 17 Dec 2019 18:50:26 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
>>
>> Now, it's mighty suspicious that this has a timeout of 30 sec
>> which is exactly how long we made pgwin32_open retry for ---
>> but how is it that this test is causing an ERROR_ACCESS_DENIED
>> failure?  And if it was, how did we get past that before?
Not exactly, we set timeout for 1 sec. It seems we have this timeout
occurred many times.
>> It does look suspiciously like this wait is triggering on these
>> machines and somehow getting masked in most other test cases.
>> If you compare the per-step runtimes in jacana's last successful
>> run,
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-12-16%2013%3A01%3A26
>>
>> with those after this patch went in,
>>
>> https://buildfarm.postgresql.org/cgi-bin/shocw_log.pl?nm=jacana&dt=2019-12-17%2004%3A16%3A40
>>
>> there's clearly something very wrong.  fairywren likewise.
> The immediate cause of the failure seems to be a slowdown of startup
> or recovery. It took only 8 minutes in the successful case but 25
> minutes even though stopped on the way in the failure case.
>
> In the failure case, it took 30 seconds to move only 26.1MB from
> 15E8C48 to 30013A0. just around 0.3 seconds for a move of the same
> size (15E9C48->3000DE8 (26.1MB)) in the successful case (*1). In the
> sucessful case, the time took from archiving 00000002.history to
> removing old segments is 0.15 s, but about 3.5 s in the failure case
> (*2).
>
> Seemingly the immediate cause of the timeout looks like just an
> extreme slowdown of file read, but it took 2 seconds to move within a
> page from 3000C18 to 3000C50. So I suspect that 6d7547c219 might slow
> down readTimeLineHistory() significantly in the case.
>
> regards.
>
>
Yes, some tests are really slowed down (namely, test-decoding-check,
pg_basebackup-check, ...).
ProcessMonitor shows:

"8:11:57.1428800

AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","SUCCESS",...
"8:11:57.1429851

AM","postgres.exe","2640","SetRenameInformationFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state.tmp","SUCCESS",...
"8:11:57.1432041

AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","SUCCESS",""
"8:11:57.1432964

AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS",""
"8:11:57.1435137

AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS",...
"8:11:57.1435709

AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS",""
"8:11:57.1437251
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:57.2451100
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:57.3571077
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:57.4643527
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:57.5778052
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:57.6825679
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:57.7921704
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:57.9018507
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:58.0109597
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:58.1204329
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:58.2300829
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS
DIRECTORY",...
"8:11:58.2302124
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:58.3393345
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:58.4484310
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:58.5584079
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:58.6683586
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:58.7773396
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:58.8858091
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:58.9955264
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:59.1050665
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:59.2137752
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:59.3234641
AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS
DIRECTORY",...
"8:11:59.3261361
AM","postgres.exe","2640","QueryOpen","C:\Windows\System32\kernel.appcore.dll","FAST
IO DISALLOWED",""

https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996
tells that STATUS_FILE_IS_A_DIRECTORY is mapped to ERROR_ACCESS_DENIED too.
It seems that the cause of the issue is in fsync_fname_ext:
    fd = OpenTransientFile(fname, flags);

    /*
     * Some OSs don't allow us to open directories at all (Windows returns
     * EACCES), just ignore the error in that case.  If desired also
silently
     * ignoring errors about unreadable files. Log others.
     */
    if (fd < 0 && isdir && (errno == EISDIR || errno == EACCES))
        return 0;
Maybe we need to take isdir into account before calling
OpenTransientFile on Windows...
Please look at the simple patch that makes the recoverycheck test pass.

Best regards,
Alexander

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kaijiang Chen
Дата:
Сообщение: Fwd: weird long time query
Следующее
От: Roman Cervenak
Дата:
Сообщение: Re: Memory leak (possibly connected to postgis) leading to server crash