Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()
Дата
Msg-id 21575.1511831661@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-bugs
Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 1. twophase.c does this:
>>     cldir = AllocateDir(TWOPHASE_DIR);
>>     LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>>     while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
>> which is flat out wrong because LWLockAcquire might well clobber errno.

> I have not checked if it actually updates errno or not, but relying on
> the fact that it may do it sucks.

I think it might only change errno in some code paths, for example
LWLockAcquire -> wait needed -> PGSemaphoreLock -> sem_wait gets
interrupted -> EINTR.  Which is the worst of all worlds, because
the report would usually be right and only occasionally mislead you.

>> There might be some other problems I missed in a quick scan.

> I have spotted more problems.

Ugh.

> I think that this requires its own new thread with a more extended
> analysis on -hackers to attract attention,

Agreed.

One thing I was thinking about is that there are places such as
DeleteAllExportedSnapshotFiles and RelationCacheInitFileRemove that
want the error message to come out only at LOG level, not ERROR,
because they're running in the postmaster.  But not only are they
issuing a mostly-duplicative ereport call, but they're really not
protecting themselves fully, because you'd probably want failures
inside ReadDir to be reported at LOG level as well.  So what this
leads me to think is that we should export ReadDirExtended (currently
that's only accessible inside fd.c) and then the use-case would be
solvable with
dir = AllocateDir(path);while ((dirent = ReadDirExtended(dir, path, LOG)) != NULL)    process dirent;FreeDir(dir);

This line of thinking also says that FreeDir needs to be tweaked to
do nothing if dir == NULL, assuming that somebody should have logged
the directory-open failure already.

The other thing that really could use discussion, as you mentioned
upthread, is how useful are custom error messages for AllocateDir
failures.  I'm not real sure for instance thatcould not open write-ahead log directory "pg_wal": ...
is a useful improvement over a generic messagecould not open directory "pg_wal": ...
Experts will know what pg_wal is anyway, and non-experts may not
gain much from being told it's a write-ahead log directory.

I'm prepared to listen to arguments that these custom messages
(or at least some of them) are worth keeping, but I think someone
ought to make that case, rather than letting them stand just because
they're there now.
        regards, tom lane


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: BUG #14928: Unchecked SearchSysCacheCopy1() return value
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #14931: Unchecked attnum value in ATExecAlterColumnType()