Re: PQexec() hangs on OOM

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: PQexec() hangs on OOM
Дата
Msg-id 559AB74A.6070602@iki.fi
обсуждение исходный текст
Ответ на Re: PQexec() hangs on OOM  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: PQexec() hangs on OOM  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-bugs
On 07/04/2015 03:40 PM, Michael Paquier wrote:
> On Sat, Jul 4, 2015 at 1:32 AM, Heikki Linnakangas wrote:
>> In short, the idea of returning a status code from parseInput(), instead of
>> just dealing with the error, was a bad one. Sorry I didn't have this
>> epiphany earlier...
>>
>> I came up with the attached. It fixes the few cases where we were currently
>> returning "need more data" when OOM happened, to deal with the OOM error
>> instead, by setting conn->errorMessage. How does this look to you?
>
> So this relies on the fact that when errorMessage is set subsequent
> messages are ignored, right? This looks neat.
>
> At the bottom of getAnotherTuple() in fe-protocol2.c if
> PQmakeEmptyPGresult returns NULL, shouldn't the error message be
> enforced to "out of memory"? It is an error code path, so an error
> will be set, but perhaps the message is incorrect.
>
> -       if (!res->errMsg)
> -               goto failure;
> +       if (res)
> +       {
> +               res->resultStatus = isError ? PGRES_FATAL_ERROR :
> PGRES_NONFATAL_ERROR;
> +               res->errMsg = pqResultStrdup(res, workBuf.data);
> +       }
> If res->errMsg is NULL, we may have a crash later on when calling
> appendPQExpBufferStr on this field. I think that we should add an
> additional check on it.

Yeah, added.

I tested the various error paths this patch modifies with a debugger,
and found out that the getCopyStart changes were not doing much good.
The caller still waits for the COPY-IN result, so it still gets stuck.
So I left out that change.

The getParamDescriptions() changes were slightly broken. It didn't read
the whole input message with pqGetInt() etc., so pqParseInput3() threw
the "message contents do not agree with length in message type" error. I
started fixing that, by changing the error handling in that function to
be more like that in getRowDescriptions(), but then I realized that all
the EOF return cases in the pqParseInput3() subroutines are actually
dead code. pqParseInput3() always reads the whole message, before
passing it on to the right subroutine. That was documented for
getRowDescriptions() and getAnotherTuple(), but the rest of the
functions were more like the protocol version 2 code, prepared to deal
with incomplete messages. I think it would be good to refactor that,
removing the EOF return cases altogether. So I left out that change for
now as well.

That left me with the attached patch. It doesn't handle the
getParamDescription() case, nor the getCopyStart() case, but it's a
start. We don't necessarily need to fix everything in one go. Does this
look correct to you, as far as it goes?

- Heikki


Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #13488: Wrong netmask calculation
Следующее
От: olivier.gosseaume@free.fr
Дата:
Сообщение: Re: BUG #13484: Performance problem with logical decoding