Обсуждение: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

Поиск
Список
Период
Сортировка

[HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
Tom Lane
Дата:
We have a report in pgsql-general of a dblink query failing withERROR: unknown error
This is, to say the least, unhelpful.  And it violates our error
message style guidelines.

Where that is coming from is a situation where we've failed to extract
any primary message from a remote error.  (I theorize that today's report
is triggered by an out-of-memory situation, but that's only an unsupported
guess at the moment.)

I propose that we should change that string to "could not obtain message
string for error on connection "foo"", or something along that line.

postgres_fdw has the same disease.  It wouldn't have the notion of a named
connection, but maybe we could insert the foreign server name instead.

A possible objection is that if we really are on the hairy edge of OOM,
trying to construct such error strings might push us over the edge and
then you get "out of memory" instead.  But IMO that's not worse; it
could be argued to be a more useful description of the problem.

Comments?
        regards, tom lane



Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
"David G. Johnston"
Дата:
On Wed, Dec 21, 2016 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A possible objection is that if we really are on the hairy edge of OOM,
trying to construct such error strings might push us over the edge

What am I missing here?  Constructing said string occurs on the local end of the connection but the memory problem exists on the remote end.

David J.

Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
Tom Lane
Дата:
I wrote:
> We have a report in pgsql-general of a dblink query failing with
>     ERROR: unknown error

Er, fingers faster than brain this morning.  Actually that report is in
pgsql-bugs, specifically bug #14471:
https://www.postgresql.org/message-id/20161221094443.25614.47807%40wrigleys.postgresql.org
        regards, tom lane



Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wed, Dec 21, 2016 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A possible objection is that if we really are on the hairy edge of OOM,
>> trying to construct such error strings might push us over the edge

> What am I missing here?  Constructing said string occurs on the local end
> of the connection but the memory problem exists on the remote end.

Um, that's not clear, in fact it's not likely.  The backend usually
manages to tell you so if it's out of memory --- or, worst case, it
crashes trying, in which case the local libpq ought to gin up a report
about loss of connection.  So the root cause I'm suspecting is that
the local libpq was unable to obtain memory for a PGresult or something
like that.  That theory has some holes of its own, because libpq also
keeps some cards up its sleeve that usually let it report out-of-memory
successfully, but it's the best I can do with the info at hand.

In any case, the point of the error style guidelines is that it's *always*
possible to do better than "unknown error"; now that it's been proven
that this case is reachable in the field, we should try harder.
        regards, tom lane



Re: [HACKERS] Getting rid of "unknown error" in dblink andpostgres_fdw

От
Joe Conway
Дата:
On 12/21/2016 08:49 AM, Tom Lane wrote:
> We have a report in pgsql-general of a dblink query failing with
>     ERROR: unknown error
> This is, to say the least, unhelpful.  And it violates our error
> message style guidelines.
>
> Where that is coming from is a situation where we've failed to extract
> any primary message from a remote error.  (I theorize that today's report
> is triggered by an out-of-memory situation, but that's only an unsupported
> guess at the moment.)
>
> I propose that we should change that string to "could not obtain message
> string for error on connection "foo"", or something along that line.
>
> postgres_fdw has the same disease.  It wouldn't have the notion of a named
> connection, but maybe we could insert the foreign server name instead.
>
> A possible objection is that if we really are on the hairy edge of OOM,
> trying to construct such error strings might push us over the edge and
> then you get "out of memory" instead.  But IMO that's not worse; it
> could be argued to be a more useful description of the problem.
>
> Comments?

Seems reasonable to me. I can work on it if you'd like. Do you think
this should be backpatched?

Joe


--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> On 12/21/2016 08:49 AM, Tom Lane wrote:
>> I propose that we should change that string to "could not obtain message
>> string for error on connection "foo"", or something along that line.
>>
>> postgres_fdw has the same disease.  It wouldn't have the notion of a named
>> connection, but maybe we could insert the foreign server name instead.

> Seems reasonable to me. I can work on it if you'd like. Do you think
> this should be backpatched?

If you have time for it, please do, I have lots on my plate already.

I'd vote for back-patching; the benefits of a clearer error message
are obvious, and it hardly seems likely that any existing applications
are depending on this specific error string.
        regards, tom lane



Re: [HACKERS] Getting rid of "unknown error" in dblink andpostgres_fdw

От
Joe Conway
Дата:
On 12/21/2016 09:27 AM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> On 12/21/2016 08:49 AM, Tom Lane wrote:
>>> I propose that we should change that string to "could not obtain message
>>> string for error on connection "foo"", or something along that line.
>>>
>>> postgres_fdw has the same disease.  It wouldn't have the notion of a named
>>> connection, but maybe we could insert the foreign server name instead.
>
>> Seems reasonable to me. I can work on it if you'd like. Do you think
>> this should be backpatched?
>
> If you have time for it, please do, I have lots on my plate already.

Ok, will do.

> I'd vote for back-patching; the benefits of a clearer error message
> are obvious, and it hardly seems likely that any existing applications
> are depending on this specific error string.

Agreed.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
Tom Lane
Дата:
I wrote:
>>> I propose that we should change that string to "could not obtain message
>>> string for error on connection "foo"", or something along that line.

BTW, looking closer, I notice that the dblink case already has
         errcontext("Error occurred on dblink connection named \"%s\": %s.",                    dblink_context_conname,
dblink_context_msg)));

so we probably don't need the connection name in the primary error
message.  Now I think "could not obtain message string for remote error"
would be a sufficient message.

In the postgres_fdw case, I'd be inclined to use the same replacement
primary message.  Maybe we should think about adding the server name
to the errcontext there, but that seems like an independent improvement.
        regards, tom lane



Re: [HACKERS] Getting rid of "unknown error" in dblink andpostgres_fdw

От
Joe Conway
Дата:
On 12/21/2016 10:08 AM, Tom Lane wrote:
> I wrote:
>>>> I propose that we should change that string to "could not obtain message
>>>> string for error on connection "foo"", or something along that line.
>
> BTW, looking closer, I notice that the dblink case already has
>
>           errcontext("Error occurred on dblink connection named \"%s\": %s.",
>                      dblink_context_conname, dblink_context_msg)));
>
> so we probably don't need the connection name in the primary error
> message.  Now I think "could not obtain message string for remote error"
> would be a sufficient message.
>
> In the postgres_fdw case, I'd be inclined to use the same replacement
> primary message.  Maybe we should think about adding the server name
> to the errcontext there, but that seems like an independent improvement.

Committed that way.

I did notice that postgres_fdw has the following stanza that I don't see
in dblink:

8<------------------
/** If we don't get a message from the PGresult, try the PGconn.  This* is needed because for connection-level
failures,PQexec may just* return NULL, not a PGresult at all.*/ 
if (message_primary == NULL)message_primary = PQerrorMessage(conn);
8<------------------

I wonder if the original issue on pgsql-bugs was a connection-level
failure rather than OOM? Seems like dblink ought to do the same.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> I did notice that postgres_fdw has the following stanza that I don't see
> in dblink:

> 8<------------------
> /*
>  * If we don't get a message from the PGresult, try the PGconn.  This
>  * is needed because for connection-level failures, PQexec may just
>  * return NULL, not a PGresult at all.
>  */
> if (message_primary == NULL)
>     message_primary = PQerrorMessage(conn);
> 8<------------------

> I wonder if the original issue on pgsql-bugs was a connection-level
> failure rather than OOM? Seems like dblink ought to do the same.

Oooh ... I had thought that code was in both, which was why I was having
a hard time explaining the OP's failure.  But I see you're right,
which provides a very straightforward explanation for the report.
I believe that if libpq is unable to malloc a PGresult, it will return
NULL but put an "out of memory" message into the PGconn's error buffer.
I had supposed that we'd capture and report the latter, but as the
dblink code stands, it won't.

In short, yes, please copy that bit into dblink.
        regards, tom lane



Re: [HACKERS] Getting rid of "unknown error" in dblink andpostgres_fdw

От
Joe Conway
Дата:
On 12/21/2016 04:22 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> I did notice that postgres_fdw has the following stanza that I don't see
>> in dblink:
>
>> 8<------------------
>> /*
>>  * If we don't get a message from the PGresult, try the PGconn.  This
>>  * is needed because for connection-level failures, PQexec may just
>>  * return NULL, not a PGresult at all.
>>  */
>> if (message_primary == NULL)
>>     message_primary = PQerrorMessage(conn);
>> 8<------------------
>
>> I wonder if the original issue on pgsql-bugs was a connection-level
>> failure rather than OOM? Seems like dblink ought to do the same.
>
> Oooh ... I had thought that code was in both, which was why I was having
> a hard time explaining the OP's failure.  But I see you're right,
> which provides a very straightforward explanation for the report.
> I believe that if libpq is unable to malloc a PGresult, it will return
> NULL but put an "out of memory" message into the PGconn's error buffer.
> I had supposed that we'd capture and report the latter, but as the
> dblink code stands, it won't.
>
> In short, yes, please copy that bit into dblink.

The attached should do the trick I think. You think it is reasonable to
backpatch this part too?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Вложения

Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> On 12/21/2016 04:22 PM, Tom Lane wrote:
>> In short, yes, please copy that bit into dblink.

> The attached should do the trick I think.

I see that you need to pass the PGconn into dblink_res_error() now, but
what's the point of the new "bool fail" parameter?

> You think it is reasonable to backpatch this part too?

Yes, definitely.
        regards, tom lane



Re: [HACKERS] Getting rid of "unknown error" in dblink andpostgres_fdw

От
Joe Conway
Дата:
On 12/21/2016 09:20 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> On 12/21/2016 04:22 PM, Tom Lane wrote:
>>> In short, yes, please copy that bit into dblink.
>
>> The attached should do the trick I think.
>
> I see that you need to pass the PGconn into dblink_res_error() now, but
> what's the point of the new "bool fail" parameter?

That part isn't new -- we added it sometime prior to 9.2:

8<--------------if (fail)    level = ERROR;else    level = NOTICE;
8<--------------

It allows dblink to throw a NOTICE on remote errors rather than an
actual ERROR, e.g. for an autonomous transaction.

From the docs (9.2 in this case)
8<--------------
fail_on_error
   If true (the default when omitted) then an error thrown on the
remote side of the connection causes an error to also be thrown locally.
If false, the remote error is locally reported as a NOTICE, and the
function returns no rows.
8<--------------

>> You think it is reasonable to backpatch this part too?
>
> Yes, definitely.

Ok, will do.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> On 12/21/2016 09:20 PM, Tom Lane wrote:
>> I see that you need to pass the PGconn into dblink_res_error() now, but
>> what's the point of the new "bool fail" parameter?

> That part isn't new -- we added it sometime prior to 9.2:

Oh!  I misread the patch --- something about an unluckily-placed line
wrap and not looking very closely :-(.  Yeah, it's fine as is.
Sorry for the noise.
        regards, tom lane



Re: [HACKERS] Getting rid of "unknown error" in dblink andpostgres_fdw

От
Joe Conway
Дата:
On 12/22/2016 06:55 AM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> On 12/21/2016 09:20 PM, Tom Lane wrote:
>>> I see that you need to pass the PGconn into dblink_res_error() now, but
>>> what's the point of the new "bool fail" parameter?
>
>> That part isn't new -- we added it sometime prior to 9.2:
>
> Oh!  I misread the patch --- something about an unluckily-placed line
> wrap and not looking very closely :-(.  Yeah, it's fine as is.
> Sorry for the noise.

Thanks -- committed/backpatched to 9.2

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development