Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)
Дата
Msg-id b2cc98b1-0ba0-1358-572f-8a1a01a3880e@gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
18.12.2019 19:03, Tom Lane wrote:
Alexander Lakhin <exclusion@gmail.com> writes:
18.12.2019 6:25, Kyotaro Horiguchi wrote:
At Tue, 17 Dec 2019 18:50:26 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
It does look suspiciously like this wait is triggering on these
machines and somehow getting masked in most other test cases.
https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996
tells that STATUS_FILE_IS_A_DIRECTORY is mapped to ERROR_ACCESS_DENIED too.
Ugh.  I wondered if we could really get away with slowing down all
ERROR_ACCESS_DENIED cases.

It seems that the cause of the issue is in fsync_fname_ext:
I think you're blaming the messenger.  We cannot suddenly impose a
rule that people mustn't try to open() files that might be directories.
Even if fsync_fname_ext() happens to be the only place that does that
today (which I doubt) it's going to bite us on the rear regularly in
the future, because there's no way to enforce it, and most developers
aren't going to notice the problem in testing.
Regarding fsync_fname_ext(), I thought that it's OK to avoid performing a call that is known to fail. And even if someone will try to open() a directory, he will need to add the same check for EACCES, otherwise the code will fail on Windows. In fact, I see the same pattern in pre_sync_fname, fsync_fname in file_utils.c.
It's possible that we could deal with this specific case by having
pgwin32_open() try to stat the file and see if it's a directory.  But
I think that's messy.  Also, the list of kernel-level conditions that
that webpage says are mapped to ERROR_ACCESS_DENIED is scarily long:

STATUS_THREAD_IS_TERMINATING            ERROR_ACCESS_DENIED
STATUS_PROCESS_IS_TERMINATING           ERROR_ACCESS_DENIED
STATUS_INVALID_LOCK_SEQUENCE            ERROR_ACCESS_DENIED
STATUS_INVALID_VIEW_SIZE                ERROR_ACCESS_DENIED
STATUS_ALREADY_COMMITTED                ERROR_ACCESS_DENIED
STATUS_ACCESS_DENIED                    ERROR_ACCESS_DENIED
STATUS_FILE_IS_A_DIRECTORY              ERROR_ACCESS_DENIED
STATUS_CANNOT_DELETE                    ERROR_ACCESS_DENIED
STATUS_FILE_DELETED                     ERROR_ACCESS_DENIED
STATUS_FILE_RENAMED                     ERROR_ACCESS_DENIED
STATUS_DELETE_PENDING                   ERROR_ACCESS_DENIED
STATUS_PORT_CONNECTION_REFUSED          ERROR_ACCESS_DENIED
...
STATUS_ENCRYPTION_FAILED                ERROR_ACCESS_DENIED
STATUS_DECRYPTION_FAILED                ERROR_ACCESS_DENIED
STATUS_NO_RECOVERY_POLICY               ERROR_ACCESS_DENIED
STATUS_NO_EFS                           ERROR_ACCESS_DENIED
STATUS_WRONG_EFS                        ERROR_ACCESS_DENIED
STATUS_NO_USER_KEYS                     ERROR_ACCESS_DENIED
...
SEC_E_MESSAGE_ALTERED                   ERROR_ACCESS_DENIED   
SEC_E_OUT_OF_SEQUENCE                   ERROR_ACCESS_DENIED   

Maybe most of these can't occur during fopen, but I'd
rather not assume that
I see your point about future uncertainty, but I don't think that receiving e.g. STATUS_FILE_DELETED or STATUS_NO_EFS error every few seconds and ignoring it is a normal practice.
Is there a way to get the original kernel error code?
It'd be a lot better if we could be sure that the condition
is STATUS_DELETE_PENDING before looping.
Unfortunately no, there is no known way  to get the kernel error code. As stated in https://stackoverflow.com/questions/3764072, there is the GetFileInformationByHandleEx function, that can return DeletePending state, but to use it we should open the file first (to get a file handle).

So I see three ways now:
1) Revert the pgwin32_open change and choose the previously rejected approach: Unlink postmaster.pid using rename operation (adopt the solution from https://stackoverflow.com/questions/3764072/).
2) Add a check for directory in pgwin32_open, but I think then we're getting close to checking there just for postmaster.pid.
3) Find out how often we get ERROR_ACCESS_DENIED during all regression tests and if such error count is near zero, then add the check for isdir in fsync_fname_ext and friends, and get done.

I'm still thinking that we shouldn't perform doomed calls, but I'm ready to change the course.

Best regards,
Alexander

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Fwd: weird long time query
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)