[PATCH] Fix for infinite signal loop in parallel scan

Поиск
Список
Период
Сортировка
От Chris Travers
Тема [PATCH] Fix for infinite signal loop in parallel scan
Дата
Msg-id CAN-RpxB-oeZve_J3SM_6=HXPmvEG=HX+9V9pi8g2YR7YW0rBBg@mail.gmail.com
обсуждение исходный текст
Ответы Re: [PATCH] Fix for infinite signal loop in parallel scan  (Andres Freund <andres@anarazel.de>)
Re: [PATCH] Fix for infinite signal loop in parallel scan  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: [PATCH] Fix for infinite signal loop in parallel scan  (Oleksii Kliukin <alexk@hintbits.com>)
Список pgsql-hackers
Hi;

Attached is the patch we are fully testing at Adjust.  There are a few non-obvious aspects of the code around where the patch hits.    I have run make check on Linux and MacOS, and make check-world on Linux (check-world fails on MacOS on all versions and all branches due to ecpg failures).  Automatic tests are difficult because it is to a rare race condition which is difficult (or possibly impossible) to automatically create.  Our current approach testing is to force the issue using a debugger.  Any ideas on how to reproduce automatically are appreciated but even on our heavily loaded systems this seems to be a very small portion of queries that hit this case (meaning the issue happens about once a week for us).

The main problem is that under certain rare circumstances we see recovery conflicts going into loops sending sigusr1 to the parallel query which retries a posix_falloc call.  The call gets interrupted by the signal perpetually and the replication cannot proceed.

The patch attempts to handle cases where we are trying to cancel the query or terminate the process in the same way we would handle an ENOSPC in the resize operation, namely to break off the loop, do relevant cleanup, and then throw relevant exceptions.

There is a very serious complication in this area, however, which is that dsm_impl_op takes an elevel parameter which determines whether or not it is safe to throw errors.  This elevel is then passed to ereport inside the function, and this function also calls the resize operation itself.  Since this is not safe with CHECK_FOR_INTERRUPTS(), I conditionally do this only if elevel is greater or equal to ERROR.  

Currently all codepaths here use elevel ERROR when reaching this path but given that the calling function supports semantics where this could be lower, and where a caller might have additional cleanup to do, I don't think one can just add CHECK_FOR_INTERRUPTS() there even though that would work for now since this could create all kinds of subtle bugs in the future.

So what the patch does is check for whether we are trying to end the query or the backend and does not retry the resize operation if either of those conditions are true.  In those cases it will set errno to EINTR and return the same.

The only caller then, if the resize operation failed, checks for an elevel greater or equal to ERROR, and whether the errno is set to EINTR.  If so it checks for signals.  If these do not abort the query, we then fall through and pass the ereport with the supplied elevel as we would have otherwise, and return false to the caller.

In current calls to this function, this means that interrupts are handled right after cleanup of the dynamic shared memory and these then abort the query or exit the backend.  Future calls could specify a WARNING elevel if they have extra cleanup to do, in which case signals would not be checked, and the same warning found today would found in the log.  At the next CHECK_FOR_INTERRUPTS() call, the appropriate errors would be raised etc.

In the past it had been suggested we do PG_TRY(); and PG_CATCH(), but given that it is not consistent whether we can raise an error or whether we MUST raise an error, I don't see how this approach can work.  As far as I can see, we MUST raise an error in the appropriate spot if and only if elevel is set to a sufficient level.

Backporting this is pretty trivial.  We expect to confirm this ourselves on both master and 10.  I can prepare back ports fairly quickly.

Is there any feedback on this approach before I add it to the next commitfest?

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: Removing useless \. at the end of copy in pgbench
Следующее
От: Maksim Milyutin
Дата:
Сообщение: Re: Hint to set owner for tablespace directory