Обсуждение: log bind parameter values on error

Поиск
Список
Период
Сортировка

log bind parameter values on error

От
Alexey Bashtanov
Дата:
Hello,

I'd like to propose a patch to log bind parameter values not only when 
logging duration,
but also on error (timeout in particular) or in whatever situation the 
statement normally gets logged.
This mostly could be useful when the request originator doesn't log them 
either, so it's hard
to reproduce the problem.

Unfortunately, when enabled, the feature comes with some memory and CPU 
overhead,
as we cannot convert certain values to text when in aborted transaction.

We potentially could do the trick with built-in types, but it would need 
cautious work with composite types,
and also require more computation on the logging stage, which is a risk 
of cascading errors.
Custom types still wouldn't be loggable, even as passed by client, which 
would be not great.

So I decided to cache textual representations on bind stage,
which is especially easy if the client uses text protocol.

Best,
   Alexey

Вложения

Re: log bind parameter values on error

От
Peter Eisentraut
Дата:
On 15/12/2018 00:04, Alexey Bashtanov wrote:
> I'd like to propose a patch to log bind parameter values not only when 
> logging duration,
> but also on error (timeout in particular) or in whatever situation the 
> statement normally gets logged.
> This mostly could be useful when the request originator doesn't log them 
> either, so it's hard
> to reproduce the problem.

That's a reasonable problem to solve.

> So I decided to cache textual representations on bind stage,
> which is especially easy if the client uses text protocol.

That sounds like a plausible approach.  Have you done any performance
measurements?

In your patch, I would organize the changes to the portal API a bit
differently.  Don't change the signature of CreatePortal().  Get rid of
PortalSetCurrentTop() and PortalClearCurrentTop().  Just have a global
variable CurrentPortal and set it directly.  And then change
GetCurrentPortalBindParameters() to take a Portal as argument.  This can
all happen inside postgres.c without changing the Portal APIs.

In fact, maybe don't use the Portal structure at all and just store the
saved textualized values inside postgres.c in a static variable.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
> That sounds like a plausible approach.  Have you done any performance
> measurements?

No I haven't yet

> In your patch, I would organize the changes to the portal API a bit
> differently.  Don't change the signature of CreatePortal().
okay
> Get rid of PortalSetCurrentTop() and PortalClearCurrentTop().

I'll remove them from Portal API, but possibly have them in postgres.c
if you don't mind, just to avoid code duplication.
Renamed and maybe even inlined.

> Just have a global
> variable CurrentPortal and set it directly.  And then change
> GetCurrentPortalBindParameters() to take a Portal as argument.  This can
> all happen inside postgres.c without changing the Portal APIs.
Okay, will do
> In fact, maybe don't use the Portal structure at all and just store the
> saved textualized values inside postgres.c in a static variable.

Unlike SQL, parameters may spend much more memory, so I'd have them
in portal memory context to make sure the memory is released earlier 
rather than later.
Tracking individual variable lifetime like we do with debug_query_string
sounds doable but a bit non-straightforward to me,
see e.g. the tricks we do with transaction commands.

Also, I'd like to avoid early forming of the error string, as we may 
need to combine
them differently in future, e.g. when logging in various logging formats 
or languages.
One-by-one pfree-ing doesn't look tempting either.

Do you think it would be acceptable to leave them cached in parameters 
structure?

Best,
   Alex



Re: log bind parameter values on error

От
Peter Eisentraut
Дата:
On 02/01/2019 23:53, Alexey Bashtanov wrote:
>> In fact, maybe don't use the Portal structure at all and just store the
>> saved textualized values inside postgres.c in a static variable.
> 
> Unlike SQL, parameters may spend much more memory, so I'd have them
> in portal memory context to make sure the memory is released earlier 
> rather than later.

Having them in the portal structure is different from having it in the
portal memory context.  Although there is perhaps value in keeping them
together.

> Do you think it would be acceptable to leave them cached in parameters 
> structure?

Let's see how it looks.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Hello Peter,

>> Unlike SQL, parameters may spend much more memory, so I'd have them
>> in portal memory context to make sure the memory is released earlier
>> rather than later.
> Having them in the portal structure is different from having it in the
> portal memory context.  Although there is perhaps value in keeping them
> together.
yeah, to avoid pointers to deallocated areas

> Let's see how it looks.
>
please see attached

Best,
   Alex

Вложения

Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
> please see attached
sorry, some unintended changes sneaked in, please see the corrected patch

Вложения

Re: log bind parameter values on error

От
Peter Eisentraut
Дата:
There appears to be a problem with how this affects current logging
behavior.

I'm using

    pgbench -M extended -S -T 10 bench

to test the extended protocol.

Unpatched, with log_statement=all, you get something like

LOG:  execute <unnamed>: SELECT abalance FROM pgbench_accounts WHERE aid
= $1;
DETAIL:  parameters: $1 = '30223'

With your patch, with log_statement=all and log_parameters=on, you get
the same, but with log_statement=all and log_parameters=off you get

LOG:  execute <unnamed>: SELECT abalance FROM pgbench_accounts WHERE aid
= $1;
DETAIL:  parameters: $1 = UNKNOWN TYPE

We should probably keep the existing parameter logging working as before.

This also raises the question of the new parameter name.  Parameters are
already logged.  So the name should perhaps be more like
log_parameters_on_error.

Some API notes on your patch:  I think you can change
get_portal_bind_parameters() to take a ParamListInfo, since you're not
doing anything with the Portal other than grab the parameters.  And that
would allow you to keep the signature of errdetail_params() unchanged.

I did some performance tests using the commands shown above and didn't
find any problems.  Obviously the default pgbench stuff isn't very
parameter-intensive.  Do you have tests with more and larger parameter
values?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Hi Peter,

> With your patch, with log_statement=all and log_parameters=on, you get
> the same, but with log_statement=all and log_parameters=off you get
>
> LOG:  execute <unnamed>: SELECT abalance FROM pgbench_accounts WHERE aid
> = $1;
> DETAIL:  parameters: $1 = UNKNOWN TYPE
Thanks for spotting this, I've fixed it, see the new patch attached.
> This also raises the question of the new parameter name.  Parameters are
> already logged.  So the name should perhaps be more like
> log_parameters_on_error.
Done
> Some API notes on your patch:  I think you can change
> get_portal_bind_parameters() to take a ParamListInfo, since you're not
> doing anything with the Portal other than grab the parameters.  And that
> would allow you to keep the signature of errdetail_params() unchanged.
Done
> I did some performance tests using the commands shown above and didn't
> find any problems.  Obviously the default pgbench stuff isn't very
> parameter-intensive.  Do you have tests with more and larger parameter
> values?
>

I've done some tests, but they are not very reproducible:
the difference between runs is more than the difference between master 
vs feature branch
and log_parameters_on_error on vs off.

I was using a small java app, it tested the speed using only a single 
connection.
See its code and the results attached.

I'm sorry for the delay, feel free to move it to next commitfest if needed.

Best,
   Alex

Вложения

Re: log bind parameter values on error

От
Michael Paquier
Дата:
On Mon, Jan 28, 2019 at 12:15:51AM +0000, Alexey Bashtanov wrote:
> I'm sorry for the delay, feel free to move it to next commitfest if
> needed.

Done.
--
Michael

Вложения

Re: log bind parameter values on error

От
Peter Eisentraut
Дата:
This piece in your patch probably doesn't belong:

+                       elog(WARNING, "params->hasTextValues=%d,
IsAbortedTransactionBlockState()=%d",
+                                          params->hasTextValues &&
IsAbortedTransactionBlockState());

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: log bind parameter values on error

От
Andres Freund
Дата:
Hi,

On 2018-12-14 23:04:26 +0000, Alexey Bashtanov wrote:
> Unfortunately, when enabled, the feature comes with some memory and CPU
> overhead,
> as we cannot convert certain values to text when in aborted transaction.

Have you analyzed how invasive it'd be to delay that till we actually
can safely start a [sub]transaction to do that logging? Probably too
expensive, but it seems like something that ought to be analyzed.

- Andres


Re: log bind parameter values on error

От
Andres Freund
Дата:
Hi,

tiny scroll-through review.

On 2019-01-28 00:15:51 +0000, Alexey Bashtanov wrote:
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index b6f5822..997e6e8 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -6274,6 +6274,23 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry id="guc-log-parameters-on-error" xreflabel="log_parameters_on_error">
> +      <term><varname>log_parameters_on_error</varname> (<type>boolean</type>)
> +      <indexterm>
> +       <primary><varname>log_parameters_on_error</varname> configuration parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        Controls whether the statement is logged with bind parameter values. 
> +        It adds some overhead, as postgres will cache textual
> +        representations of parameter values in memory for all statements,
> +        even if they eventually do not get logged. The default is
> +        <literal>off</literal>. Only superusers can change this setting.
> +       </para>
> +      </listitem>
> +     </varlistentry>

This needs a bit of language polishing.



> @@ -31,6 +31,8 @@
>   * set of parameter values.  If dynamic parameter hooks are present, we
>   * intentionally do not copy them into the result.  Rather, we forcibly
>   * instantiate all available parameter values and copy the datum values.
> + *
> + * We don't copy textual representations here.
>   */

That probably needs to be expanded on. Including a comment about what
guarantees that there are no memory lifetime issues.


> -                /* Free result of encoding conversion, if any */
> -                if (pstring && pstring != pbuf.data)
> -                    pfree(pstring);
> +                if (pstring)
> +                {
> +                    if (need_text_values)
> +                    {
> +                        if (pstring == pbuf.data)
> +                        {
> +                            /*
> +                             * Copy textual representation to portal context.
> +                             */
> +                            params->params[paramno].textValue =
> +                                                            pstrdup(pstring);
> +                        }
> +                        else
> +                        {
> +                            /* Reuse the result of encoding conversion for it */
> +                            params->params[paramno].textValue = pstring;
> +                        }
> +                    }
> +                    else
> +                    {
> +                        /* Free result of encoding conversion */
> +                        if (pstring != pbuf.data)
> +                            pfree(pstring);
> +                    }
> +                }

So the parameters we log here haven't actually gone through the output
function? Isn't that an issue? I mean that'll cause the contents to
differ from the non-error case, no? And differs from the binary protocol
case?
>      else
>      {
> +        /*
> +         * We do it for non-xact commands only, as an xact command
> +         * 1) shouldn't have any parameters to log;
> +         * 2) may have the portal dropped early.
> +         */
> +        Assert(current_top_portal == NULL);
> +        current_top_portal = portal;
> +        portalParams = NULL;
> +

"it"? ISTM the comment doesn't really stand on its own?



> +/*
> + * get_portal_bind_parameters
> + *         Get the string containing parameters data, is used for logging.
> + *
> + * Can return NULL if there are no parameters in the portal
> + * or the portal is not valid, or the text representations of the parameters are
> + * not available. If returning a non-NULL value, it allocates memory
> + * for the returned string in the current context, and it's the caller's
> + * responsibility to pfree() it if needed.
> + */
> +char *
> +get_portal_bind_parameters(ParamListInfo params)
> +{
> +    StringInfoData param_str;
> +
> +    /* No parameters to format */
> +    if (!params || params->numParams == 0)
> +        return NULL;
> +
> +            elog(WARNING, "params->hasTextValues=%d, IsAbortedTransactionBlockState()=%d",
> +                           params->hasTextValues && IsAbortedTransactionBlockState());

Err, huh?


Greetings,

Andres Freund


Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Hello Anders and Peter,

Thanks for your messages.
Please see the new patch version attached.

 > Have you analyzed how invasive it'd be to delay that till we actually
 > can safely start a [sub]transaction to do that logging? Probably too
 > expensive, but it seems like something that ought to be analyzed.

The fundamental problem here is that the output functions for the types
of the values to be logged may be present only in the transaction
that has just been aborted.

Also I don't like the idea of doing complex operations in the error handler,
as it increases the chances of cascading errors.

I thought of pre-calculating types' output functions instead of their 
results,
but that would work for certain built-in types only.

>> +     <varlistentry id="guc-log-parameters-on-error" xreflabel="log_parameters_on_error">
>> +      <term><varname>log_parameters_on_error</varname> (<type>boolean</type>)
>> +      <indexterm>
>> +       <primary><varname>log_parameters_on_error</varname> configuration parameter</primary>
>> +      </indexterm>
>> +      </term>
>> +      <listitem>
>> +       <para>
>> +        Controls whether the statement is logged with bind parameter values.
>> +        It adds some overhead, as postgres will cache textual
>> +        representations of parameter values in memory for all statements,
>> +        even if they eventually do not get logged. The default is
>> +        <literal>off</literal>. Only superusers can change this setting.
>> +       </para>
>> +      </listitem>
>> +     </varlistentry>
> This needs a bit of language polishing.

I would appreciate if you had any suggestions, as my English isn't great.

>> @@ -31,6 +31,8 @@
>>    * set of parameter values.  If dynamic parameter hooks are present, we
>>    * intentionally do not copy them into the result.  Rather, we forcibly
>>    * instantiate all available parameter values and copy the datum values.
>> + *
>> + * We don't copy textual representations here.
>>    */
> That probably needs to be expanded on. Including a comment about what
> guarantees that there are no memory lifetime issues.

What kind of memory lifetime issues do you mean?
We're not copying textual representations, so the worst can happen
is they don't get logged when appropriate. Luckily, we never use this
function when copying to a portal we use for logging, I added this to
the comment. Do you think it's any better?

>
> So the parameters we log here haven't actually gone through the output
> function? Isn't that an issue? I mean that'll cause the contents to
> differ from the non-error case, no? And differs from the binary protocol
> case?
I don't think it's much of a problem. Text input and output functions are
meant to match, but the CREATE TYPE spec isn't too specific about what 
it means.
Of course it does not mean that typoutput(typinput(foo)) is always 
exactly foo.
However, I really hope that at least typinput(typoutput(foo)) = foo,
where "=" is the correspondent operator registered in postgres.

If a cheeky client passes '007' as a numeric value I don't mind it being
sometimes logged as '007' and sometimes as '7', depending on the settings.
It anyway denotes the same number, and we'll know what to pass to 
reproduce the problem.
For binary protocol it'll be '7' as well, as it'll undergo the typrecv 
and then typoutput procedures.

>>       else
>>       {
>> +        /*
>> +         * We do it for non-xact commands only, as an xact command
>> +         * 1) shouldn't have any parameters to log;
>> +         * 2) may have the portal dropped early.
>> +         */
>> +        Assert(current_top_portal == NULL);
>> +        current_top_portal = portal;
>> +        portalParams = NULL;
>> +
> "it"? ISTM the comment doesn't really stand on its own?
Thanks, I fixed the comment and some code around it.

>> +/*
>> + * get_portal_bind_parameters
>> + *         Get the string containing parameters data, is used for logging.
>> + *
>> + * Can return NULL if there are no parameters in the portal
>> + * or the portal is not valid, or the text representations of the parameters are
>> + * not available. If returning a non-NULL value, it allocates memory
>> + * for the returned string in the current context, and it's the caller's
>> + * responsibility to pfree() it if needed.
>> + */
>> +char *
>> +get_portal_bind_parameters(ParamListInfo params)
>> +{
>> +    StringInfoData param_str;
>> +
>> +    /* No parameters to format */
>> +    if (!params || params->numParams == 0)
>> +        return NULL;
>> +
>> +            elog(WARNING, "params->hasTextValues=%d, IsAbortedTransactionBlockState()=%d",
>> +                           params->hasTextValues && IsAbortedTransactionBlockState());
> Err, huh?
This was some debugging, I threw it away now.

Best,
  Alex


Вложения

Re: log bind parameter values on error

От
Peter Eisentraut
Дата:
On 2019-02-15 15:02, Alexey Bashtanov wrote:
> Hello Anders and Peter,
> 
> Thanks for your messages.
> Please see the new patch version attached.

In my testing, I couldn't get this patch to do anything.  Could you
please share your testing steps?

I did

postgres -D data --log-parameters-on-error=on

pgbench -i bench

alter table pgbench_accounts alter column aid type smallint;

pgbench -M extended -S -T 10 bench

This will then error out on type overflows, but I don't see any
parameters being logged:

ERROR:  value "62812" is out of range for type smallint
STATEMENT:  SELECT abalance FROM pgbench_accounts WHERE aid = $1;

(In this case the error message contains the parameter value, so it's
not a very practical case, but it should work, it seems.)

Meanwhile, I have committed a patch that refactors the ParamListInfo
initialization a bit, so you don't have to initialize hasTextValues in a
bunch of places unrelated to your core code.  So please rebase your
patch on that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Re: log bind parameter values on error

От
David Steele
Дата:
Hi Alexey,

On 3/14/19 12:38 PM, Peter Eisentraut wrote:
> 
> Meanwhile, I have committed a patch that refactors the ParamListInfo
> initialization a bit, so you don't have to initialize hasTextValues in a
> bunch of places unrelated to your core code.  So please rebase your
> patch on that.

It's been two weeks with no new patch or answers to Peter's questions. 
Since we are nearly at the end of the CF I'll target this patch for PG13 
and mark it Returned with Feedback at the end of the CF if there's no 
new patch by then.

Regards,
-- 
-David
david@pgmasters.net



Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Hello and sorry for weeks of silence.

Hello Anders and Peter,

Thanks for your messages.
Please see the new patch version attached.
In my testing, I couldn't get this patch to do anything.  Could you
please share your testing steps?

Sure. Provided you're in the postgres checkout and you've run make in src/test/examples/ this should work

CREATE SCHEMA testlibpq3;
SET search_path = testlibpq3;
CREATE TABLE test1_(i int4, t text, b bytea);
INSERT INTO test1_ VALUES(0, '', '');
CREATE VIEW test1 AS SELECT 1/i i, t, b FROM test1_;

-- will log only statement
\! ./src/test/examples/testlibpq3

ALTER SYSTEM SET log_parameters_on_error TO on;
SELECT pg_reload_conf();

-- will log statement with parameters
\! ./src/test/examples/testlibpq3

I did

postgres -D data --log-parameters-on-error=on

pgbench -i bench

alter table pgbench_accounts alter column aid type smallint;

pgbench -M extended -S -T 10 bench

This will then error out on type overflows, but I don't see any
parameters being logged:

ERROR:  value "62812" is out of range for type smallint
STATEMENT:  SELECT abalance FROM pgbench_accounts WHERE aid = $1;

(In this case the error message contains the parameter value, so it's
not a very practical case, but it should work, it seems.)
I guess this error occurred while binding, so the parameters probably weren't yet all bound by the time of error reporting.
That's why the error message came without parameters.

Meanwhile, I have committed a patch that refactors the ParamListInfo
initialization a bit, so you don't have to initialize hasTextValues in a
bunch of places unrelated to your core code.  So please rebase your
patch on that.

Please find rebased patch attached.

I apologize for no reply before: I first thought my patch was really faulty and knew I wouldn't have time to fix it these days, but it seems to me it actually works.

Anyway, I don't suppose you manage to review it within the remaining few days, so I'll rebase it again in the beginning of the next CF.

Best regards,
  Alexey
Вложения

Re: log bind parameter values on error

От
Peter Eisentraut
Дата:
On 2019-03-29 15:55, Alexey Bashtanov wrote:
>> ERROR:  value "62812" is out of range for type smallint
>> STATEMENT:  SELECT abalance FROM pgbench_accounts WHERE aid = $1;
>>
>> (In this case the error message contains the parameter value, so it's
>> not a very practical case, but it should work, it seems.)
> I guess this error occurred /while/ binding, so the parameters probably
> weren't yet all bound by the time of error reporting.
> That's why the error message came without parameters.

I see.  But I think that could be fixed.  Change exec_bind_message() to
loop over the parameters twice: once to save them away, once to actually
process them.  I think the case of a faulty input value is probably very
common, so it would be confusing if that didn't work.

I think this patch needs some tests.  Manually testing it is cumbersome,
and as we are seeing now, it is not quite clear which cases it is
supposed to cover.  There are also additional cases for binary
parameters, and there are additions to the CSV output format.  We need
tests for all that so the behavior explains itself and doesn't have to
be rediscovered every time someone wants to look at this again.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
On 05/04/2019 12:23, Peter Eisentraut wrote:
> On 2019-03-29 15:55, Alexey Bashtanov wrote:
>>> ERROR:  value "62812" is out of range for type smallint
>>> STATEMENT:  SELECT abalance FROM pgbench_accounts WHERE aid = $1;
>>>
>>> (In this case the error message contains the parameter value, so it's
>>> not a very practical case, but it should work, it seems.)
>> I guess this error occurred /while/ binding, so the parameters probably
>> weren't yet all bound by the time of error reporting.
>> That's why the error message came without parameters.
> I see.  But I think that could be fixed.  Change exec_bind_message() to
> loop over the parameters twice: once to save them away, once to actually
> process them.  I think the case of a faulty input value is probably very
> common, so it would be confusing if that didn't work.
I understand this may sound lazy of me, however let me take this risk
and try to explain why I think logging them here in the same fashion
would be inconsistent.

The original feature is intended to log textual representations of
the bind parameter values, whereas the problem suggested to be solved
together with it is verbose logging of the part of the bind message that
failed to get converted to a Datum.

These are equivalent only under the following conditions:
1) the rest of the message was correctly formatted, so we can extract
something for each of the parameter values
2) the values have been passed in text mode
(all of them, not only the one failed to process)
3) the values have been passed in the server encoding

I think it would be a bit of inconsistent to log the parts of the message
only when we are lucky to satisfy the 3 conditions above.

If we are to log bind parameters message parsing and processing problems
more verbosely, what do you think of rather wrapping calls to
OidInputFunctionCall, OidReceiveFunctionCall and pg_client_to_server
into PG_TRY blocks and logging their arguments individually?

In case it doesn't work for any reason, I have an alternative idea on
how to log half-processed parameters.
Instead of hasTextValues store the number of parameters having
their textual representations successfully saved.
This way we will be able, in case of text mode,
save the original value and increment the counter
before the call to OidInputFunctionCall. When logging and not having values
for some parameters, we can print an ellipsis in the log to indicate there's
some more of them missing.

> I think this patch needs some tests.  Manually testing it is cumbersome,
> and as we are seeing now, it is not quite clear which cases it is
> supposed to cover.  There are also additional cases for binary
> parameters, and there are additions to the CSV output format.  We need
> tests for all that so the behavior explains itself and doesn't have to
> be rediscovered every time someone wants to look at this again.
>

I have added a section into src/test/examples/testlibpq3.c,
please see attached.

As far as I could see these tests never run on "make check" or "make 
installcheck",
hence the README change. Please correct me if I'm wrong.

I also failed to find an automatic way to test what actually gets logged
to the server log, at least not in the installcheck case.
I would appreciate if you had any suggestions about it.

Best regards,
   Alex

Вложения

Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Please find the rebased patch attached.

Tested like the following.
Provided you're in the postgres checkout and you've run make in 
src/test/examples/ and connected to db=postgres:

CREATE SCHEMA testlibpq3;
SET search_path = testlibpq3;
CREATE TABLE test1_(i int4, t text, b bytea);
INSERT INTO test1_ VALUES(0, '', '');
CREATE VIEW test1 AS SELECT 1/i i, t, b FROM test1_;

-- will log only statement
\! ./src/test/examples/testlibpq3

ALTER SYSTEM SET log_parameters_on_error TO on;
SELECT pg_reload_conf();

-- will log statement with parameters
\! ./src/test/examples/testlibpq3

Best regards,
   Alexey

Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
Nice patch, thanks.

I didn't like abusing testlibpq3.c for your new stuff, so I moved it off
to a new file testlibpq5.c.  I cleaned up a few other cosmetics things
about this -- v10 attached.  I eventually noticed that this patch fails
to initialize each param's textValue to NULL, which probably explains
why you have to be so careful about only setting hasTextValues after the
whole loop.  That seems a bit too trusting; I think it would be better
to set all these to NULL in makeParamList instead of leaving the memory
undefined.  One way would be to have a for() look in makeParamList that
nullifies the member; another would be to use palloc0().

A third possibility is to inspect each caller of makeParamList and have
them all set textValue to NULL to each parameter.

I'm marking this waiting on author.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Andres Freund
Дата:
Hi,

On 2019-09-18 17:58:53 -0300, Alvaro Herrera wrote:
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -6414,6 +6414,23 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry id="guc-log-parameters-on-error" xreflabel="log_parameters_on_error">
> +      <term><varname>log_parameters_on_error</varname> (<type>boolean</type>)
> +      <indexterm>
> +       <primary><varname>log_parameters_on_error</varname> configuration parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        Controls whether the statement is logged with bind parameter values. 

Trailing whitespace.

I find the "the statement" formulation a bit odd, but can't quite put my
finger on why.

I think it might be worthwhile to cross-reference
log_min_error_statement, as log_parameters_on_error will only take effect when the
statement is logged due to log_min_error_statement.


> +        It adds some overhead, as postgres will cache textual
> +        representations of parameter values in memory for all statements,
> +        even if they eventually do not get logged. The default is
> +        <literal>off</literal>. Only superusers can change this setting.

I don't think "cache" is the right descriptor. Usually caching implies
trying to make repeated tasks faster, which isn't the case here.


> +/*
> + * The top-level portal that the client is immediately working with:
> + * creating, binding, executing, or all at one using simple protocol
> + */
> +Portal current_top_portal = NULL;
> +

This strikes me as decidedly not nice. For one this means that we'll not
be able to use this infrastructure for binds that are done serverside,
e.g. as part of plpgsql.  I'm basically inclined to think that
integrating this at the postges.c level is the wrong thing.

It also adds new error handling complexity, which is already quite
substantial. And spreads knowledge of portals to elog.c, which imo
shouldn't have to know about them.

It seems to me that this would really need to be tracked inside the
portal infrastructure. To avoid unnecessary overhead, we could continue
to set the text values in exec_bin_message() in the pformat == 0 case,
using hasTextValues somewhere in the portal code to determine whether
the text representation has to be computed (for other input format, and
internal queries as e.g. generated by plpgsql).

And then the PG_CATCHes in pquery.c can add the errdetail() in the error
cases using an econtext callback.



> +/*
> + * get_portal_bind_params
> + *         Get a string containing parameters data -- used for logging.
> + *
> + * Can return NULL if there are no parameters in the portal or the portal is
> + * not valid, or the text representations of the parameters are not available.
> + * If returning a non-NULL value, it allocates memory for the returned string
> + * in the current context, and it's the caller's responsibility to pfree() it.
> + */
> +char *
> +get_portal_bind_params(ParamListInfo params)
> +{
> +    StringInfoData param_str;
> +
> +    /* No parameters to format */
> +    if (!params || params->numParams == 0)
> +        return NULL;
> +
> +    /*
> +     * We either need textual representation of parameters pre-calculated, or
> +     * call potentially user-defined I/O functions to convert the internal
> +     * representation into text, which cannot be done in an aborted xact
> +     */
> +    if (!params->hasTextValues && IsAbortedTransactionBlockState())
> +        return NULL;

Maybe I'm naive, but what is the point of keeping the separate
parameters, allocated separately, when all we're doing is building a
string containing them all at error time? Seems better just directly
form the full string when decideding to keep the text parameters - for
one it'll often end up being more efficient. But more importantly it
also makes it a lot less likely to run out of memory while handling the
error. The individual text parameters can be large, and this will always
additionally need at least the combined size of all parameters from
within the error context. That's not great.


> +        appendStringInfoCharMacro(¶m_str, '\'');
> +        for (p = pstring; *p; p++)
> +        {
> +            if (*p == '\'') /* double single quotes */
> +                appendStringInfoCharMacro(¶m_str, *p);
> +            appendStringInfoCharMacro(¶m_str, *p);
> +        }
> +        appendStringInfoCharMacro(¶m_str, '\'');

I know this is old code, but: This is really inefficient. Will cause a
lot of unnecessary memory-reallocations for large text outputs, because
we don't immediately grow to at least close to the required size.


Greetings,

Andres Freund



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
Hi, thanks for looking.

On 2019-Sep-20, Andres Freund wrote:

> On 2019-09-18 17:58:53 -0300, Alvaro Herrera wrote:

> > +     <varlistentry id="guc-log-parameters-on-error" xreflabel="log_parameters_on_error">
> > +      <term><varname>log_parameters_on_error</varname> (<type>boolean</type>)
> > +      <indexterm>
> > +       <primary><varname>log_parameters_on_error</varname> configuration parameter</primary>
> > +      </indexterm>
> > +      </term>
> > +      <listitem>
> > +       <para>
> > +        Controls whether the statement is logged with bind parameter values. 
> 
> Trailing whitespace.
> 
> I find the "the statement" formulation a bit odd, but can't quite put my
> finger on why.

Yeah, I think that wording is pretty confusing.  I would use "Controls
whether bind parameters are logged when a statement is logged."

> > +/*
> > + * The top-level portal that the client is immediately working with:
> > + * creating, binding, executing, or all at one using simple protocol
> > + */
> > +Portal current_top_portal = NULL;
> > +
> 
> This strikes me as decidedly not nice. For one this means that we'll not
> be able to use this infrastructure for binds that are done serverside,
> e.g. as part of plpgsql.  I'm basically inclined to think that
> integrating this at the postges.c level is the wrong thing.
[...]
> It seems to me that this would really need to be tracked inside the
> portal infrastructure.

I think that's how this was done at first, then Peter E drove him away
from that into the current design.

> It also adds new error handling complexity, which is already quite
> substantial. And spreads knowledge of portals to elog.c, which imo
> shouldn't have to know about them.

Makes sense.

> > +        appendStringInfoCharMacro(¶m_str, '\'');
> > +        for (p = pstring; *p; p++)
> > +        {
> > +            if (*p == '\'') /* double single quotes */
> > +                appendStringInfoCharMacro(¶m_str, *p);
> > +            appendStringInfoCharMacro(¶m_str, *p);
> > +        }
> > +        appendStringInfoCharMacro(¶m_str, '\'');
> 
> I know this is old code, but: This is really inefficient. Will cause a
> lot of unnecessary memory-reallocations for large text outputs, because
> we don't immediately grow to at least close to the required size.

Agreed, but we can't blame a patch because it moves around some old
crufty code.  If Alexey wants to include another patch to change this to
a better formulation, I'm happy to push that before or after his main
patch.  And if he doesn't want to, that's fine with me too.

(Is doing a strlen first to enlarge the stringinfo an overall better
approach?)  (I wonder if it would make sense to strchr each ' and memcpy
the intervening bytes ... if only that didn't break the SI abstraction
completely ...)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Andres Freund
Дата:
Hi,

On 2019-09-20 16:56:57 -0300, Alvaro Herrera wrote:
> > > +/*
> > > + * The top-level portal that the client is immediately working with:
> > > + * creating, binding, executing, or all at one using simple protocol
> > > + */
> > > +Portal current_top_portal = NULL;
> > > +
> > 
> > This strikes me as decidedly not nice. For one this means that we'll not
> > be able to use this infrastructure for binds that are done serverside,
> > e.g. as part of plpgsql.  I'm basically inclined to think that
> > integrating this at the postges.c level is the wrong thing.
> [...]
> > It seems to me that this would really need to be tracked inside the
> > portal infrastructure.
> 
> I think that's how this was done at first, then Peter E drove him away
> from that into the current design.

I don't think it really was the way I am suggesting. There were a bunch
of helper functions managing current_top_portal, but it otherwise is
(and afaict was in all versions) still all postgres.c
controlled. Whereas I think it should be nearly exclusively be handled
by pquery.c, with the possible exception of an efficiency hack to reuse
client input string when they're in text format.

What I'm suggesting is that PortalStart() would build a string
representation out of the parameter list (using
ParamExternData.textValue if set, calling the output function
otherwise), and stash that in the portal.

At the start of all the already existing PG_TRY blocks in pquery.c we
push an element onto the error_context_stack that adds the errdetail
with the parameters to the error.  Alternatively we could also add them
in in the PG_CATCH blocks, but that'd only work for elevel == ERROR
(i.e. neither FATAL nor non throwing log levels would be able to enrich
the error).


Thinking about this: I think the current approach doesn't actually
handle recursive errors correctly. Even if we fail to emit the error
message due to the parameter details requiring a lot of memory, we'd
again do so (and again fail) when handling that OOM error, because
current_top_portal afaict would still be set.  At the very least this'd
need to integrate with the recursion_depth logic in elog.c.


> > > +        appendStringInfoCharMacro(¶m_str, '\'');
> > > +        for (p = pstring; *p; p++)
> > > +        {
> > > +            if (*p == '\'') /* double single quotes */
> > > +                appendStringInfoCharMacro(¶m_str, *p);
> > > +            appendStringInfoCharMacro(¶m_str, *p);
> > > +        }
> > > +        appendStringInfoCharMacro(¶m_str, '\'');
> > 
> > I know this is old code, but: This is really inefficient. Will cause a
> > lot of unnecessary memory-reallocations for large text outputs, because
> > we don't immediately grow to at least close to the required size.
> 
> Agreed, but we can't blame a patch because it moves around some old
> crufty code.  If Alexey wants to include another patch to change this to
> a better formulation, I'm happy to push that before or after his main
> patch.  And if he doesn't want to, that's fine with me too.

Well, this patch makes it potentially a considerably hotter path, so I
think there's some justification for pushing a bit. But I'd not require
it either.

As I said, I think we cannot generate this string at error time, because
it makes it much much more likely to exhaust the error context - a bad
thing.


> (Is doing a strlen first to enlarge the stringinfo an overall better
> approach?)

Yes, it'd be better.


> (I wonder if it would make sense to strchr each ' and memcpy the
> intervening bytes ... if only that didn't break the SI abstraction
> completely ...)

I'd probably just count the ' in one pass, enlarge the stringinfo to the
required size, and then put the string directly into he stringbuffer.
Possibly just putting the necessary code into stringinfo.c. We already
have multiple copies of this inefficient logic...

But even if not, I don't think writing data into the stringbuf directly
is that ugly, we do that in enough places that I'd argue that that's
basically part of how it's expected to be used.

In HEAD there's at least
- postgres.c:errdetail_params(),
- pl_exec.c:format_preparedparamsdata()
  pl_exec.c:format_expr_params()
and
- dblink.c:escape_param_str()
- deparse.c:deparseStringLiteral()
- xlog.c:do_pg_start_backup() (after "Add the escape character"),
- tsearchcmds.c:serialize_deflist()
- ruleutils.c:simple_quote_literal()

are nearly the same.

Greetings,

Andres Freund



Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Hello Alvaro,

> I didn't like abusing testlibpq3.c for your new stuff, so I moved it off
> to a new file testlibpq5.c.  I cleaned up a few other cosmetics things
> about this -- v10 attached.
Thanks for doing this.

> I eventually noticed that this patch fails
> to initialize each param's textValue to NULL,
I've added the changes to set textValue to NULL for each parameter after 
calling makeParamList.
I think it's the best option, as it's natural to populate text 
representations (albeit with NULLs)
in the same loop as we write the other parameters attributes.

It still requires us to set hasTextValues afterwards, as it doesn't seem 
practical to me to make
two loops, first null them all and next populate them: if there's a 
parsing or converting problem
midway it's out of the feature scope and already being logged elsewhere.

Attaching v11, marking as waiting for review.

Best, Alex

Вложения

Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Hi Anders and Alvaro,

I must have missed this conversation branch when sending in v011.
Sorry about that.

 > I think it might be worthwhile to cross-reference
 > log_min_error_statement, as log_parameters_on_error will only take 
effect when the
 > statement is logged due to log_min_error_statement.

Agreed, added some clarification.

 > I don't think "cache" is the right descriptor. Usually caching implies
 > trying to make repeated tasks faster, which isn't the case here.

Well, potentially it can, if we set log_min_error_statement to something
lower than error and emit tons of warnings. But it's not the primary use
case, so I just replaced it by the word "save".


> What I'm suggesting is that PortalStart() would build a string
> representation out of the parameter list (using
> ParamExternData.textValue if set, calling the output function
> otherwise), and stash that in the portal.
>
> At the start of all the already existing PG_TRY blocks in pquery.c we
> push an element onto the error_context_stack that adds the errdetail
> with the parameters to the error.  Alternatively we could also add them
> in in the PG_CATCH blocks, but that'd only work for elevel == ERROR
> (i.e. neither FATAL nor non throwing log levels would be able to enrich
> the error).

I'm a bit worried about going this way, as it makes us log
the query and its parameters too far apart in the code,
and it's trickier to make sure we never log parameters without the query.

I think logging the parameters should not be part of error_context_stack,
but rather a primary part of logging facility itself, like statement.
That's because whether we want to log parameters depends
on print_stmt in elog.c. With print_stmt being computed upon edata,
I'm not sure how to work it out nicely.

> Thinking about this: I think the current approach doesn't actually
> handle recursive errors correctly. Even if we fail to emit the error
> message due to the parameter details requiring a lot of memory, we'd
> again do so (and again fail) when handling that OOM error, because
> current_top_portal afaict would still be set.  At the very least this'd
> need to integrate with the recursion_depth logic in elog.c.
We handle it the same way as we do it for debug_query_string itself:

         if (in_error_recursion_trouble())
         {
             error_context_stack = NULL;
             debug_query_string = NULL;
             current_top_portal = NULL;
         }


I'm attaching the patch with docs fixes.

Best regards,
   Alexey


Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
So, if some parameters are large (they can be up to 1 GB-1, remember)
then we can bloat the log file severely.  I think we need to place an
upper limit on the strings that we're going to log -- as inspiration,
callers of ExecBuildValueDescription uses 64 chars per value maximum.
Something like that seems reasonable.  So I think you need to add some
pg_mbcliplen() calls in a couple of places in exec_bind_message.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Nov-07, Alvaro Herrera wrote:

> So, if some parameters are large (they can be up to 1 GB-1, remember)
> then we can bloat the log file severely.  I think we need to place an
> upper limit on the strings that we're going to log -- as inspiration,
> callers of ExecBuildValueDescription uses 64 chars per value maximum.
> Something like that seems reasonable.  So I think you need to add some
> pg_mbcliplen() calls in a couple of places in exec_bind_message.

(BTW it looks like plpgsql_param_fetch() is creating ParamExternData
values and not setting textValue for them.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Andres Freund
Дата:
Hi,

On 2019-11-05 12:07:50 +0000, Alexey Bashtanov wrote:
> > What I'm suggesting is that PortalStart() would build a string
> > representation out of the parameter list (using
> > ParamExternData.textValue if set, calling the output function
> > otherwise), and stash that in the portal.
> > 
> > At the start of all the already existing PG_TRY blocks in pquery.c we
> > push an element onto the error_context_stack that adds the errdetail
> > with the parameters to the error.  Alternatively we could also add them
> > in in the PG_CATCH blocks, but that'd only work for elevel == ERROR
> > (i.e. neither FATAL nor non throwing log levels would be able to enrich
> > the error).
> 
> I'm a bit worried about going this way, as it makes us log
> the query and its parameters too far apart in the code,
> and it's trickier to make sure we never log parameters without the query.

The way you do it you need to do it in numerous places, and I'm pretty
sure you're missing some already. E.g. this will not work to log
parameters for parametrized statements generated on the server side,
e.g. for foreign key queries. I don't think that's the right direction
to go. You can maybe argue that we don't need support for logging server
side generated parameters in the initial version, but the approach
should be compatible with adding that support.


> I think logging the parameters should not be part of error_context_stack,
> but rather a primary part of logging facility itself, like statement.
> That's because whether we want to log parameters depends
> on print_stmt in elog.c. With print_stmt being computed upon edata,
> I'm not sure how to work it out nicely.

I don't think those really are contradictions. You can continue to have
an errdetail_params(), and but call it from the error context callback
set up in the portal code.

That'd also get rid of the two different copies of the logic for
deciding whether to log bind parameters. Right now your approach
e.g. doesn't work for emit_log_hook. Nor is all that clear to me that
we'd never want to send this information to the client.

Even leaving that aside, I'm *STRONGLY* against entangling elog.c with
query execution details like ParamListInfo. We should work to make
elog.c know less about different parts of the system, not more.


> > Thinking about this: I think the current approach doesn't actually
> > handle recursive errors correctly. Even if we fail to emit the error
> > message due to the parameter details requiring a lot of memory, we'd
> > again do so (and again fail) when handling that OOM error, because
> > current_top_portal afaict would still be set.  At the very least this'd
> > need to integrate with the recursion_depth logic in elog.c.

> We handle it the same way as we do it for debug_query_string itself:
> 
>         if (in_error_recursion_trouble())
>         {
>             error_context_stack = NULL;
>             debug_query_string = NULL;
>             current_top_portal = NULL;
>         }
> 

Sorry, had missed that, it wasn't apparent in the diff. I personally
just always use a larger context size to make it easier to recognize...

Greetings,

Andres Freund



Re: log bind parameter values on error

От
Michael Paquier
Дата:
On Thu, Nov 07, 2019 at 03:41:04PM -0800, Andres Freund wrote:
> The way you do it you need to do it in numerous places, and I'm pretty
> sure you're missing some already. E.g. this will not work to log
> parameters for parametrized statements generated on the server side,
> e.g. for foreign key queries. I don't think that's the right direction
> to go. You can maybe argue that we don't need support for logging server
> side generated parameters in the initial version, but the approach
> should be compatible with adding that support.

This patch had a review from two committers, with no updates from the
author, so marked as returned with feedback.
--
Michael

Вложения

Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Hi,

I'm sorry for replying so late.

> I don't think those really are contradictions. You can continue to have
> an errdetail_params(), and but call it from the error context callback
> set up in the portal code
...
> Even leaving that aside, I'm *STRONGLY* against entangling elog.c with
> query execution details like ParamListInfo. We should work to make
> elog.c know less about different parts of the system, not more.
I've implemented it using error contexts, please could you have a look 
at the patch attached?

I did not use the PG_TRY blocks in portal code because they are not wide 
enough.
Otherwise, we wouldn't catch errors that can happen in GetCachedPlan.
Instead I added an item to error context chain in postgres.c 
exec_*_message routines.
I can't find why it could be unsafe, but I'm still a bit worried about 
it as there's no
other error context changes that early in the call stack.

I've also made string formatting more efficient, and pre-calculate
the whole string instead of individual values as suggested. Unfortunately
it still copies the error text when reporting -- I'm not sure how I 
change it
without substantial change of error logging infrastructure.

One more concern I'd like to ask for opinions. I'm using errdetail_log
for parameters logging, as it's the only way to make it appear in the 
log only
but not get sent to the client. It looks a bit awkward, as it appears like
ERROR
DETAIL
CONTEXT
STATEMENT,
where CONTEXT may in fact be something inner nested than the parameters
that appear in DETAILS.
With this regards, I'm thinking of adding some special arrangements into 
elog.c.
One idea is a econtext_log to add a log-only data into CONTEXT, however 
it'll still log
params above the statement, although directly above.
Another option is a special estatementparams (or perhaps 
efinalcontext_log?) to log
below the statement.
What do you think?

2 Alvaro:
> So, if some parameters are large (they can be up to 1 GB-1, remember)
> then we can bloat the log file severely.  I think we need to place an
> upper limit on the strings that we're going to log -- as inspiration,
> callers of ExecBuildValueDescription uses 64 chars per value maximum.
> Something like that seems reasonable.  So I think you need to add some
> pg_mbcliplen() calls in a couple of places in exec_bind_message.

I like the idea, but I don't think it's directly related to the change
proposed. We are already logging parameters in certain situations
with no limits on their lenghts. Given the time it takes to get the change
through (not least because of me addressing reviewers' comments
really slowly) I'd not include it in the patch. Do you find it acceptable?

Best, Alex

Вложения

Re: log bind parameter values on error

От
Alexey Bashtanov
Дата:
Patch file name is not great, changing it.

On 30/11/2019 12:35, Alexey Bashtanov wrote:
> I've implemented it using error contexts, please could you have a look 
> at the patch attached?


Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Sep-20, Andres Freund wrote:

> > > > +        appendStringInfoCharMacro(¶m_str, '\'');
> > > > +        for (p = pstring; *p; p++)
> > > > +        {
> > > > +            if (*p == '\'') /* double single quotes */
> > > > +                appendStringInfoCharMacro(¶m_str, *p);
> > > > +            appendStringInfoCharMacro(¶m_str, *p);
> > > > +        }
> > > > +        appendStringInfoCharMacro(¶m_str, '\'');
> > > 
> > > I know this is old code, but: This is really inefficient. Will cause a
> > > lot of unnecessary memory-reallocations for large text outputs, because
> > > we don't immediately grow to at least close to the required size.

> I'd probably just count the ' in one pass, enlarge the stringinfo to the
> required size, and then put the string directly into he stringbuffer.
> Possibly just putting the necessary code into stringinfo.c. We already
> have multiple copies of this inefficient logic...

I stole Alexey's code for this from downthread and tried to apply to all
these places.  However, I did not put it to use in all these places you
mention, because each of them has slightly different requirements;
details below.

Now, I have to say that this doesn't make me terribly happy, because I
would like the additional ability to limit the printed values to N
bytes.  This means the new function would have to have an additional
argument to indicate the maximum length (pass 0 to print all args
whole) ... and the logic then because more involved because we need
logic to stop printing early.

I think the current callers should all use the early termination logic;
having plpgsql potentially produce 1 MB of log output because of a long
argument seems silly.  So I see no use for this function *without* the
length-limiting logic.

> But even if not, I don't think writing data into the stringbuf directly
> is that ugly, we do that in enough places that I'd argue that that's
> basically part of how it's expected to be used.
> 
> In HEAD there's at least
> - postgres.c:errdetail_params(),
> - pl_exec.c:format_preparedparamsdata()
>   pl_exec.c:format_expr_params()

These are changed in the patch; they're all alike.

> - dblink.c:escape_param_str()

This one wants to use \' and \\ instead of just doubling each '.
The resulting string is used as libpq conninfo, so using doubled ' does
not work.

> - deparse.c:deparseStringLiteral()

This one wants to use E''-style strings that ignore std_conforming_strings;
the logic would need to detect two chars ' and \ instead of just ' so
we'd need to use strcspn instead of strchr(); that seems a lot slower.

> - xlog.c:do_pg_start_backup() (after "Add the escape character"),

This one wants to escape \n chars.

> - tsearchcmds.c:serialize_deflist()

This wants E''-style strings, like deparse.c.

> - ruleutils.c:simple_quote_literal()

Produce an escaped string according to the prevailing
std_conforming_strings.


I think it's possible to produce a function that would satisfy all of
these, but it would be another function similar to the one I'm writing
here, not the same one.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
ggPostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-03, Alvaro Herrera wrote:

> Now, I have to say that this doesn't make me terribly happy, because I
> would like the additional ability to limit the printed values to N
> bytes.  This means the new function would have to have an additional
> argument to indicate the maximum length (pass 0 to print all args
> whole) ... and the logic then because more involved because we need
> logic to stop printing early.

So after actually writing this, it seems that it's better to have a
separate path for this; otherwise you need to scan the whole input
string (strchr or friends), which might be long.  I tried to see how to
make strnlen() work for us, but I found no simple way to print the "..."
at the end.  So this just runs once per char.  I think that's okay,
since the intended usage is to print a few dozen bytes, like
ExecBuildSlotValueDescription does (64 bytes is what all its callers
do).

Another point: this implementation is not strict about how many
input characters it prints.  It counts up to maxlen *output* characters,
which means if it duplicates a ', that's one less input char printed.
(Given the usage of this function, this seems a feature rather than a
bug.  The first implementation did the opposite, and on the whole it
seemed worse.)

If anyone can suggest a smarter implementation, I'm all ears.

(Maybe do strnlen(maxlen), then count strnlen(1) starting at that point
-- so if that returns >=1, print the "..."?)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
> (Maybe do strnlen(maxlen), then count strnlen(1) starting at that point
> -- so if that returns >=1, print the "..."?)

So I found that I can make the code more reasonable with this simple
coding,

    if (maxlen > 0)
    {
        s = pnstrdup(s, maxlen);
        ellipsis = strnlen(s, maxlen + 1) > maxlen;
        /* enlarge while we can do so cheaply */
        enlargeStringInfo(str, maxlen);
    }

... but the problem is that we now compile stringinfo.c for frontend
environments also, and there's no pnstrdup() in frontends.  And to
introduce it, we'd need a configure check (because GNU libc has it) and
a src/port naive implementation and a fe_memutils.c addition.

Sigh.

Still, it's not that much code, so I'll just go do that and open a
separate thread for it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-04, Alvaro Herrera wrote:

> 
> > (Maybe do strnlen(maxlen), then count strnlen(1) starting at that point
> > -- so if that returns >=1, print the "..."?)
> 
> So I found that I can make the code more reasonable with this simple
> coding,

With strndup.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
I made a number of changes on the proposed patch and ended up with the
attached.  (I have not included the business to limit number of
characters, yet).

What this does is set an error callback, which adds the parameter values
in the DETAIL line.  This is at odds with all existing error callbacks:
they only add stuff to the CONTEXT string.  The implication is that
if an error site has a direct errdetail_log() and is run under this
error stack, that site's detail is going to be lost and replaced by the
"params" one.  Currently there aren't many errdetail_log() lines, so it
doesn't seem like a terrible problem.  However, it's something to keep
in mind.

This is not final form; there are a couple of XXX comments, and a number
of other small infelicities still to fix.  But the overall idea should
be final.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Andres Freund
Дата:
Hi,

On 2019-12-05 14:19:45 -0300, Alvaro Herrera wrote:
> What this does is set an error callback, which adds the parameter values
> in the DETAIL line.  This is at odds with all existing error callbacks:
> they only add stuff to the CONTEXT string.  The implication is that
> if an error site has a direct errdetail_log() and is run under this
> error stack, that site's detail is going to be lost and replaced by the
> "params" one.  Currently there aren't many errdetail_log() lines, so it
> doesn't seem like a terrible problem.  However, it's something to keep
> in mind.

It'd not be too hard avoiding overwriting if we wanted. Makes me wonder
though, if we ought to invent something similar to the errdata fields we
have for schema, table, etc...

Haven't yet had time to catch up to the discussion in this thread of the
last two weeks...

Greetings,

Andres Freund



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
Hello

On 2019-Dec-05, Andres Freund wrote:

> On 2019-12-05 14:19:45 -0300, Alvaro Herrera wrote:
> > What this does is set an error callback, which adds the parameter values
> > in the DETAIL line.  This is at odds with all existing error callbacks:
> > they only add stuff to the CONTEXT string.  The implication is that
> > if an error site has a direct errdetail_log() and is run under this
> > error stack, that site's detail is going to be lost and replaced by the
> > "params" one.  Currently there aren't many errdetail_log() lines, so it
> > doesn't seem like a terrible problem.  However, it's something to keep
> > in mind.
> 
> It'd not be too hard avoiding overwriting if we wanted.

Hmm, maybe errdetail_log() ought to do that.  However: if some place has
errdetail() and then this introduces errdetail_log(), then the former is
not included in the server log (only in the client log).

> Makes me wonder though, if we ought to invent something similar to the
> errdata fields we have for schema, table, etc...

Hmm, perhaps we should do that.  It avoids the problem altogether.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-05, Alvaro Herrera wrote:

> > Makes me wonder though, if we ought to invent something similar to the
> > errdata fields we have for schema, table, etc...
> 
> Hmm, perhaps we should do that.  It avoids the problem altogether.

... then again, I'm not up for writing everything that would be required
to do that.  If somebody wants to propose that separately, I volunteer
to review it.  But let's not have that block this patch, since this is
already a clear improvement.

So here's v16.  I got rid of all that strlen() business of the output
values; instead, preallocate a reasonable guess at the final length (64
bytes for each of the first 128 parameters, plus 16 bytes for for
parameter from 129 to 1024).  This boils down to exactly the initial
size of a stringinfo (1024 bytes) when there are 8 parameters or less.
64 bytes avg of guessed length should be plenty (and it will be even
more so when I add the patch to limit the output length to 64+epsilon
bytes per param)  I admit there are perhaps too many magical numbers in
those three lines, though.

I don't understand how we ended up with the params put in
errdetail_log() -- seems to have been introduced silently between v13
and v14.  What's the reason for that?  I put it back as errdetail().
Alexey added some discussion about using context/detail when he sent
that version, but I think that's an issue we can leave for the
hypothetical errparameters() patch.


One problem I noticed is that we don't log parameters when
exec_bind_message fetches the parameter values.  So the very first
example problem in testlibpq5 fails to print the values of any
parameters previously seen.  I don't think this is a real problem in
practice.  You still get the unparseable value in the error message from
the input function, so not having the errdetail() does not seem very
important.

If anybody would like to review it once more, now's the time.  I'm
aiming at getting it pushed tomorrow (including the length-limited
appendStringInfoStringQuoted stuff).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> If anybody would like to review it once more, now's the time.  I'm
> aiming at getting it pushed tomorrow (including the length-limited
> appendStringInfoStringQuoted stuff).

Um, surely this bit is unacceptable?

+    /*
+     * XXX this clobbers any other DETAIL line that the error callsite could
+     * have had.  Do we care?
+     */
+    errdetail("parameters: %s", params->paramValuesStr);

Even if the scope of the clobber were limited to the parameter input
conversion step, this seems like a pretty bad compromise.  But AFAICS
this'll clobber errdetail for *any* error occurring during query
execution in extended query mode.  Not only does that render useless
a whole lot of sweat we've put into detail messages over the years,
but it means that you get fundamentally different (and largely worse)
error reporting in extended query mode than you do in simple query mode.

I think this issue has to be fixed before committing, not after.

The most straightforward fix would be to make the parameter values
a brand new component of error reports.  However, I see the point
that that'd require a huge amount of additional work, and cooperation
of client-side libraries that might be a long time coming.

Possibly a workable compromise is to emit the info as an error
context line, appending it to whatever context exists today.
The result might look like, say,

ERROR:  whatever
CONTEXT: SQL function "foo"
extended query with parameters x, y, ...

For extra credit maybe we could include the query's statement or
portal name?

    errcontext("extended query \"%s\" with parameters %s", ...);

An independent point: it seems like just wishful thinking to imagine that
src/test/examples/ can serve as a regression test for this.  Nor is the
proposed program very useful as an example.  I'd drop that and find a
way to have an actual, routinely-exercised-by-check-world regression
test.  If memory serves, pgbench can be cajoled into executing stuff
in extended query mode --- maybe we could create a test case using
that?

            regards, tom lane



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-05, Tom Lane wrote:

> Possibly a workable compromise is to emit the info as an error
> context line, appending it to whatever context exists today.
> The result might look like, say,
> 
> ERROR:  whatever
> CONTEXT: SQL function "foo"
> extended query with parameters x, y, ...

This is easy to accomodate -- just change the errdetail() to errcontext.
That makes the params be reported different than in the non-error case
(for duration).

> For extra credit maybe we could include the query's statement or
> portal name?
> 
>     errcontext("extended query \"%s\" with parameters %s", ...);

Sure.  Done in the attached.

> An independent point: it seems like just wishful thinking to imagine that
> src/test/examples/ can serve as a regression test for this.  Nor is the
> proposed program very useful as an example.  I'd drop that and find a
> way to have an actual, routinely-exercised-by-check-world regression
> test.  If memory serves, pgbench can be cajoled into executing stuff
> in extended query mode --- maybe we could create a test case using
> that?

I tried

pgbench -c1 -t1 -n -M prepared -f errparameters.bnch
with this file

  select '{ invalid ' as value \gset
  select column1::jsonb from (values (:value)) as q;

and got these lines:

2019-12-06 13:49:36.388 -03 [31028] ERROR:  invalid input syntax for type json
2019-12-06 13:49:36.388 -03 [31028] DETAIL:  Token "invalid" is invalid.
2019-12-06 13:49:36.388 -03 [31028] CONTEXT:  JSON data, line 1: { invalid...
        extended query with parameters: $1 = '{ invalid '
2019-12-06 13:49:36.388 -03 [31028] STATEMENT:  select column1::jsonb from (values ($1)) as q;

With this file,
  select '1' as one \gset
  SELECT 1 / (random() / 2)::int, :one::int, :two::int;

using the same pgbench invocation as above, I get this in the log:

2019-12-06 14:50:59.181 -03 [6187] ERROR:  division by zero
2019-12-06 14:50:59.181 -03 [6187] CONTEXT:  extended query with parameters: $1 = '1', $2 = NULL
2019-12-06 14:50:59.181 -03 [6187] STATEMENT:  SELECT 1 / (random() / 2)::int, $1::int, $2::int;

(While testing this I noticed that failing to strdup the text repr of
the datum when it's given as a text-format parameter results in bogus
output.  So I added a pstrdup there.)

(It seems a bit weird that if I assign NULL to :two pgbench stores the
empty string, instead of the NULL that I get as in the above which is
what happens when the variable is not defined at all.  That's probably a
bug in \gset ...)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Pavel Stehule
Дата:


pá 6. 12. 2019 v 18:57 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:
On 2019-Dec-05, Tom Lane wrote:

> Possibly a workable compromise is to emit the info as an error
> context line, appending it to whatever context exists today.
> The result might look like, say,
>
> ERROR:  whatever
> CONTEXT: SQL function "foo"
> extended query with parameters x, y, ...

This is easy to accomodate -- just change the errdetail() to errcontext.
That makes the params be reported different than in the non-error case
(for duration).

> For extra credit maybe we could include the query's statement or
> portal name?
>
>       errcontext("extended query \"%s\" with parameters %s", ...);

Sure.  Done in the attached.

> An independent point: it seems like just wishful thinking to imagine that
> src/test/examples/ can serve as a regression test for this.  Nor is the
> proposed program very useful as an example.  I'd drop that and find a
> way to have an actual, routinely-exercised-by-check-world regression
> test.  If memory serves, pgbench can be cajoled into executing stuff
> in extended query mode --- maybe we could create a test case using
> that?

I tried

pgbench -c1 -t1 -n -M prepared -f errparameters.bnch
with this file

  select '{ invalid ' as value \gset
  select column1::jsonb from (values (:value)) as q;

and got these lines:

2019-12-06 13:49:36.388 -03 [31028] ERROR:  invalid input syntax for type json
2019-12-06 13:49:36.388 -03 [31028] DETAIL:  Token "invalid" is invalid.
2019-12-06 13:49:36.388 -03 [31028] CONTEXT:  JSON data, line 1: { invalid...
        extended query with parameters: $1 = '{ invalid '
2019-12-06 13:49:36.388 -03 [31028] STATEMENT:  select column1::jsonb from (values ($1)) as q;

With this file,
  select '1' as one \gset
  SELECT 1 / (random() / 2)::int, :one::int, :two::int;

using the same pgbench invocation as above, I get this in the log:

2019-12-06 14:50:59.181 -03 [6187] ERROR:  division by zero
2019-12-06 14:50:59.181 -03 [6187] CONTEXT:  extended query with parameters: $1 = '1', $2 = NULL
2019-12-06 14:50:59.181 -03 [6187] STATEMENT:  SELECT 1 / (random() / 2)::int, $1::int, $2::int;

(While testing this I noticed that failing to strdup the text repr of
the datum when it's given as a text-format parameter results in bogus
output.  So I added a pstrdup there.)

(It seems a bit weird that if I assign NULL to :two pgbench stores the
empty string, instead of the NULL that I get as in the above which is
what happens when the variable is not defined at all.  That's probably a
bug in \gset ...)

psql session variables cannot to hold NULL

Regards

Pavel

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
I added some tests to the pgbench suite in the attached.  I also finally
put it the business to limit the length of parameters reported.

I'd probably drop testlibpq5.c, since it seems a bit pointless now ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-06, Pavel Stehule wrote:

> pá 6. 12. 2019 v 18:57 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com>
> napsal:

> > (It seems a bit weird that if I assign NULL to :two pgbench stores the
> > empty string, instead of the NULL that I get as in the above which is
> > what happens when the variable is not defined at all.  That's probably a
> > bug in \gset ...)
> 
> psql session variables cannot to hold NULL

I also just learned that the :foo expansion in pgbench behaves crazily
when in normal query mode, particularly re. quotes in variables.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I added some tests to the pgbench suite in the attached.  I also finally
> put it the business to limit the length of parameters reported.
> I'd probably drop testlibpq5.c, since it seems a bit pointless now ...

I took a look through the v17 patches.

0001:

The two header comments for appendStringInfoStringQuoted really ought to
define the maxlen parameter, rather than making readers reverse-engineer
what that does.

I'm not very sold on the enlargeStringInfo call in the maxlen>0 code path,
mainly because its calculation of the needed space seems entirely wrong:
(a) it's considering maxlen not the actual length, and (b) on the other
hand it's not accounting for quotes and ellipses.  Forcing an inadequate
enlargement is probably worse than doing nothing, so I'd be inclined to
just drop that.

It is a very bad idea that this is truncating text without regard to
multibyte character boundaries.

0002:

Seems like BuildParamLogString's "valueLen" parameter ought to be called
"maxlen", for consistency with 0001 and because "valueLen" is at best
misleading about what the parameter means.

I'd toss the enlargeStringInfo call here too, as it seems overly
complicated and underly correct or useful.

Probably doing MemoryContextReset after each parameter (even the last one!)
is a net loss compared to just letting it go till the end.  Although
I'd be inclined to use ALLOCSET_DEFAULT_SIZES not SMALL_SIZES if you
do it like that.

Please do not throw away the existing comment "/* Free result of encoding
conversion, if any */" in exec_bind_message.

As above, this risks generating partial multibyte characters.  You might
be able to get away with letting appendStringInfoStringQuoted do the
multibyte-aware truncation, but you probably have to copy more than just
one more extra byte first.

I have zero faith in this:

+    params_errcxt.arg = (void *) &((ParamsErrorCbData)
+                                   { portal->name, params });

How do you know where the compiler is putting that value, ie what
its lifespan is going to be?  Better to use an explicit variable.

I concur with dropping testlibpq5.

            regards, tom lane



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-07, Tom Lane wrote:

> 0001:

> It is a very bad idea that this is truncating text without regard to
> multibyte character boundaries.

Hmm.  So it turns out that we recently made stringinfo.c exported to
frontends in libpgcommon.  Clipping strings at a given character size
means we need some facility like pg_encoding_mbcliplen.

However, 1) pgcommon does not have wchar.c (libpq does have it), and 2)
mbcliplen is not in wchar.c anyway.

I see four possible ways forward, with nuances.  In order of preference:

1. change enough of the build system so that pg_encoding_mbcliplen is
   available.  (Offhand I see no reason why we couldn't move the
   function from mbutils.c to wchar.c, but I haven't tried.)

2. in frontend, ignore the specified length; always copy the whole
   string.

3. don't allow length clipping in frontend.  (We have no use for it
   in the near future.)
3a. Throw a run-time error if it's requested.
3b. throw a compile-time error if requested

4. don't provide this new function #if FRONTEND

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-09, Alvaro Herrera wrote:

> I see four possible ways forward, with nuances.  In order of preference:
> 
> 1. change enough of the build system so that pg_encoding_mbcliplen is
>    available.  (Offhand I see no reason why we couldn't move the
>    function from mbutils.c to wchar.c, but I haven't tried.)

Ah, this seems to require putting encnames.o and wchar.o in src/common's
Makefile and moving pg_encoding_mbcliplen() from mbutils.c to wchar.c.
Fairly simple.

(There's a little bit of additional fallout because libpq also has
encnames/wchar in its roster, but I think this means we can just remove
them from there and let them be acquired via libpgcommon.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Dec-07, Tom Lane wrote:
>> It is a very bad idea that this is truncating text without regard to
>> multibyte character boundaries.

> I see four possible ways forward, with nuances.  In order of preference:

> 1. change enough of the build system so that pg_encoding_mbcliplen is
>    available.  (Offhand I see no reason why we couldn't move the
>    function from mbutils.c to wchar.c, but I haven't tried.)

I'd be in favor of this if it doesn't turn out to require nasty
contortions.  My gut feeling (like yours, without having looked) is that
the incremental amount of code to be moved into wchar.c wouldn't be much.

BTW, not really the fault of this patch, but I wonder if we shouldn't
make an effort to push the FRONTEND-available parts of wchar.c into
src/common, just to make the system structure and build rules less
confusing.  I think the current situation only exists because this
code predates our invention of src/common.

            regards, tom lane



Re: log bind parameter values on error

От
Tom Lane
Дата:
I wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> 1. change enough of the build system so that pg_encoding_mbcliplen is
>> available.  (Offhand I see no reason why we couldn't move the
>> function from mbutils.c to wchar.c, but I haven't tried.)

> I'd be in favor of this if it doesn't turn out to require nasty
> contortions.  My gut feeling (like yours, without having looked) is that
> the incremental amount of code to be moved into wchar.c wouldn't be much.

Actually, on second thought --- the issue here is not so much how much new
code shows up in libpgcommon, and more that executables using stringinfo.o
will now find themselves pulling in all of wchar.o, where before they
might've pulled in none of it.  We need to look at how much code bloat
ensues.  In the end it might be smart to put this new functionality in
some separate source file instead of dropping it into stringinfo.c.
(It could also be that all the executables that need stringinfo.o are
already pulling in wchar functionality for other reasons, but we should
check the code-size implications before assuming this is fine.)

            regards, tom lane



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-09, Tom Lane wrote:

> I wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> >> 1. change enough of the build system so that pg_encoding_mbcliplen is
> >> available.  (Offhand I see no reason why we couldn't move the
> >> function from mbutils.c to wchar.c, but I haven't tried.)
> 
> > I'd be in favor of this if it doesn't turn out to require nasty
> > contortions.  My gut feeling (like yours, without having looked) is that
> > the incremental amount of code to be moved into wchar.c wouldn't be much.
> 
> Actually, on second thought --- the issue here is not so much how much new
> code shows up in libpgcommon, and more that executables using stringinfo.o
> will now find themselves pulling in all of wchar.o, where before they
> might've pulled in none of it.  We need to look at how much code bloat
> ensues.  In the end it might be smart to put this new functionality in
> some separate source file instead of dropping it into stringinfo.c.
> (It could also be that all the executables that need stringinfo.o are
> already pulling in wchar functionality for other reasons, but we should
> check the code-size implications before assuming this is fine.)

Hmm, that's a good thought.  On the one hand, nothing in the frontend
needs this new function ... the only current use is pg_waldump which
likely doesn't need to quote anything, and I just looked at Andres'
programmatically- written node support code which is why we have
StringInfo in frontend in the first place -- it also doesn't care.

So rather than mess with stringinfo.c at all I could just create
stringinfo_server.c and put this function there, compiled only for
backend ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So rather than mess with stringinfo.c at all I could just create
> stringinfo_server.c and put this function there, compiled only for
> backend ...

Good point: if we make a separate source file then we don't have
to solve any of the code-movement issues till somebody wants this
functionality in frontend.  But we should expect that that day might
come eventually, so I don't much like "stringinfo_server.c" as the
file name.  It'll look pretty silly once we start compiling it for
frontend.  Perhaps "appendquoted.c" or some such?

            regards, tom lane



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-09, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > So rather than mess with stringinfo.c at all I could just create
> > stringinfo_server.c and put this function there, compiled only for
> > backend ...
> 
> Good point: if we make a separate source file then we don't have
> to solve any of the code-movement issues till somebody wants this
> functionality in frontend.  But we should expect that that day might
> come eventually, so I don't much like "stringinfo_server.c" as the
> file name.  It'll look pretty silly once we start compiling it for
> frontend.  Perhaps "appendquoted.c" or some such?

Okay, so here are two patches.  I had already used the name
stringinfo_mb.c, so that's what's v19.  (I'm fine with renaming it to
appendquoted.c but we might gain other such functions in the future.
Other opinions?)

The other patch (v18) puts the new function together with moving wchar
to pgcommon.

The API is different in each case: if we want it available in frontend,
we need to pass the encoding as a parameter rather than use
GetDatabaseEncoding().

This is pg_waldump with the first patch (src/utils/mb/stringinfo_mb.c):

$ nm --print-size  /pgsql/install/master/bin/pg_waldump | grep -i stringinfo
000000000000a8f0 0000000000000060 T appendBinaryStringInfo
000000000000a980 0000000000000051 T appendBinaryStringInfoNT
000000000000a770 00000000000000d7 T appendStringInfo
000000000000a850 0000000000000050 T appendStringInfoChar
000000000000a8a0 000000000000004d T appendStringInfoSpaces
000000000000a950 0000000000000027 T appendStringInfoString
000000000000a630 0000000000000083 T appendStringInfoVA
000000000000a6c0 00000000000000af T enlargeStringInfo
000000000000a5e0 000000000000002b T initStringInfo
000000000000a5a0 0000000000000038 T makeStringInfo
000000000000a610 0000000000000015 T resetStringInfo

$ ls -l  /pgsql/install/master/bin/pg_waldump 
-rwxr-xr-x 1 alvherre alvherre 647576 dic  9 16:23 /pgsql/install/master/bin/pg_waldump*


This is with v18:

$ nm --print-size  /pgsql/install/master/bin/pg_waldump | grep -i stringinfo
000000000000c8f0 0000000000000060 T appendBinaryStringInfo
000000000000c980 0000000000000051 T appendBinaryStringInfoNT
000000000000c770 00000000000000d7 T appendStringInfo
000000000000c850 0000000000000050 T appendStringInfoChar
000000000000c8a0 000000000000004d T appendStringInfoSpaces
000000000000c950 0000000000000027 T appendStringInfoString
000000000000c9e0 00000000000001b1 T appendStringInfoStringQuoted
000000000000c630 0000000000000083 T appendStringInfoVA
000000000000c6c0 00000000000000af T enlargeStringInfo
000000000000c5e0 000000000000002b T initStringInfo
000000000000c5a0 0000000000000038 T makeStringInfo
000000000000c610 0000000000000015 T resetStringInfo

$ ls -l  /pgsql/install/master/bin/pg_waldump 
-rwxr-xr-x 1 alvherre alvherre 704808 dic  9 16:29 /pgsql/install/master/bin/pg_waldump*

While the function itself is tiny (though it's the largest of all
stringinfo functions, hmm), it has led to a rather huge bloating of the
binary.  That's because the new binary contains a number of functions
from wchar.c, like you said, such as

$ nm --print-size  /pgsql/install/master/bin/pg_waldump | grep dsplen
000000000000d820 0000000000000026 t pg_ascii_dsplen
000000000000e200 0000000000000005 t pg_big5_dsplen
000000000000eb30 0000000000000030 T pg_encoding_dsplen
000000000000dac0 000000000000002e t pg_euccn_dsplen
000000000000d960 0000000000000036 t pg_eucjp_dsplen
000000000000d9b0 000000000000002e t pg_euckr_dsplen
000000000000dc10 000000000000002e t pg_euctw_dsplen
000000000000e230 0000000000000005 t pg_gb18030_dsplen
000000000000e210 0000000000000005 t pg_gbk_dsplen
000000000000dd10 0000000000000005 t pg_johab_dsplen
000000000000e170 0000000000000026 t pg_latin1_dsplen
000000000000e0f0 0000000000000034 t pg_mule_dsplen
000000000000e1c0 0000000000000036 t pg_sjis_dsplen
000000000000e220 0000000000000005 t pg_uhc_dsplen
000000000000e870 000000000000013e t pg_utf_dsplen

and some others.

All in all, it seems like v19 is the way to go.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
Also:

* Thoughts from Andres, who was busily messing about with stringinfo.c
  on his patch series for programmatic out/read node functions?

* Thoughts on changing the current usage of the ...Quoted() function in
  postgres.c and pl_exec.c so that they print N characters (64) instead
  of the current default of printing everything?  I'm up for changing,
  but have got no +1s on that.  I think bloating log files with enormous
  parameters is pointless.

* v18 and v19 now alwys do a "strlen(s)", i.e. they scan the whole input
  string -- pointless when maxlen is given.  We could avoid that for
  very large input strings by doing strnlen(maxlen + MAX_MULTIBYTE_CHAR_LEN)
  so that we capture our input string plus one additional multibyte
  char.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Dec-09, Tom Lane wrote:
>> Good point: if we make a separate source file then we don't have
>> to solve any of the code-movement issues till somebody wants this
>> functionality in frontend.  But we should expect that that day might
>> come eventually, so I don't much like "stringinfo_server.c" as the
>> file name.  It'll look pretty silly once we start compiling it for
>> frontend.  Perhaps "appendquoted.c" or some such?

> Okay, so here are two patches.  I had already used the name
> stringinfo_mb.c, so that's what's v19.  (I'm fine with renaming it to
> appendquoted.c but we might gain other such functions in the future.
> Other opinions?)

"stringinfo_mb.c" is fine with me.

            regards, tom lane



Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> * Thoughts on changing the current usage of the ...Quoted() function in
>   postgres.c and pl_exec.c so that they print N characters (64) instead
>   of the current default of printing everything?  I'm up for changing,
>   but have got no +1s on that.  I think bloating log files with enormous
>   parameters is pointless.

Meh ... people will inevitably complain that they needed to see the
whole value, and we'll end up having to add another configuration
variable.  Let's not go there just yet.

            regards, tom lane



Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Also:
> * v18 and v19 now alwys do a "strlen(s)", i.e. they scan the whole input
>   string -- pointless when maxlen is given.  We could avoid that for
>   very large input strings by doing strnlen(maxlen + MAX_MULTIBYTE_CHAR_LEN)
>   so that we capture our input string plus one additional multibyte
>   char.

BTW, as far as that goes, it seems to me this patch is already suffering
from a lot of premature micro-optimization.  Is there even any evidence
to justify that complicated chunk-at-a-time copying strategy, rather than
doing quote-doubling the same way we do it everywhere else?  The fact that
that effectively scans the input string twice means that it's not an
ironclad win compared to the naive way, and it seems like it could lose
significantly for a case with lots of quote marks.  Moreover, for the
lengths of strings I expect we're mostly dealing with here, it would be
impossible to measure any savings even assuming there is some.  If I were
the committer I think I'd just flush that and do it the same way as we
do it in existing code.

            regards, tom lane



Re: log bind parameter values on error

От
Andres Freund
Дата:
Hi,

On 2019-12-09 17:05:31 -0300, Alvaro Herrera wrote:
> * Thoughts from Andres, who was busily messing about with stringinfo.c
>   on his patch series for programmatic out/read node functions?

I don't immediately see a need for mb functionality there. But to me it
seems pretty clear that we're going to need mb functionality in more
frontend programs over time. The only frontend part needing stringinfo
there is the metadata generation tool, and that doesn't care about
anything like this. The node functions themselves are obviously still
backend only.

I'm not even sure that it's worth caring over the binary size this way,
tbh. If somebody is actually concerned about that, they ought to use LTO
+ -Os, leading to unreferenced functions (and even unreferenced *parts*
of functions) getting completely thrown out.

I'd also assume that size number you cited earlier contains debug
information etc? Because after stripping / without debug information, I
get like 110k/97k for assert/optimize builds of waldump from master?
130k/113k for v18. And 110k/97k for v19.

Greetings,

Andres Freund



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-09, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Also:
> > * v18 and v19 now alwys do a "strlen(s)", i.e. they scan the whole input
> >   string -- pointless when maxlen is given.  We could avoid that for
> >   very large input strings by doing strnlen(maxlen + MAX_MULTIBYTE_CHAR_LEN)
> >   so that we capture our input string plus one additional multibyte
> >   char.
> 
> BTW, as far as that goes, it seems to me this patch is already suffering
> from a lot of premature micro-optimization.  Is there even any evidence
> to justify that complicated chunk-at-a-time copying strategy, rather than
> doing quote-doubling the same way we do it everywhere else?  The fact that
> that effectively scans the input string twice means that it's not an
> ironclad win compared to the naive way, and it seems like it could lose
> significantly for a case with lots of quote marks.  Moreover, for the
> lengths of strings I expect we're mostly dealing with here, it would be
> impossible to measure any savings even assuming there is some.  If I were
> the committer I think I'd just flush that and do it the same way as we
> do it in existing code.

It's been us committers (Andres and I) that have derailed this patch
with some apparently overcomplicated logic.  There was nothing of that
stuff in the original.  See Andres review comments
https://postgr.es/m/20190920203905.xkv5udsd5dxfs6tr@alap3.anarazel.de
There's also a comment nearby about growing the stringinfo to a
reasonable guess of the final size, rather than letting it grow
normally, to avoid multiple reallocs.  That's why we had all that stuff
there.  He has looked at execution profiles much more than I have, so I
take his word from it.

The stuff about truncating to N chars was of my own devising.  If we
don't want truncation in log_parameters_on_error either, we could do
away with the stringinfo changes altogether.  (I stand by my opinion
that we should truncate, but if there are contrary votes I'm happy to
stand down and just get the suggested feature pushed.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> The API is different in each case: if we want it available in frontend,
> we need to pass the encoding as a parameter rather than use
> GetDatabaseEncoding().

Hm, yeah, so that's another reason we aren't going to be sharing this
code exactly with frontend.  Given that, dropping it under utils/mb
seems fine for now.  (I'm still wondering how much of utils/mb could
be pushed over to src/common, but that seems like an independent
project.)

Some quick review of v19:

* If it's going into utils/mb, then you should update utils/mb/README.
(Or else we should get rid of that file's explanation of all the .c
files in its directory.)

* The file header comment for stringinfo_mb.c should perhaps explain
why we didn't just drop this into stringinfo.c, along the lines
of "This is separate from stringinfo.c so that frontend users of that
file need not pull in a bunch of multibyte encoding support."

* The file header comment for stringinfo_mb.c gives its path incorrectly.

* What's the need for including utils/memutils.h here?

* Still no documentation of maxlen in the .c file, and the docs added
to the .h file are confused (what "given encoding"? also it should say
"bytes" not "characters").

* This patch shouldn't be touching stringinfo.c at all anymore.

* I'm kind of dubious that stringinfo.h is where to put the extern
either; with where we're at now, perhaps pg_wchar.h is better.

            regards, tom lane



Re: log bind parameter values on error

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> The stuff about truncating to N chars was of my own devising.  If we
> don't want truncation in log_parameters_on_error either, we could do
> away with the stringinfo changes altogether.  (I stand by my opinion
> that we should truncate, but if there are contrary votes I'm happy to
> stand down and just get the suggested feature pushed.)

I think it's a reasonable thing to do, modulo my fear that people
are going to insist on having a way to configure the limit.

            regards, tom lane



Re: log bind parameter values on error

От
Greg Stark
Дата:
On Mon, 9 Dec 2019 at 15:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Meh ... people will inevitably complain that they needed to see the
> whole value, and we'll end up having to add another configuration
> variable.  Let's not go there just yet.

I haven't been following this whole thread but my initial reaction is
that this particular configuration parameter would actually carry it's
weight.

While some people have applications where they know the expected size
of the data and can safely log the entire data to the logs other
people deal with user-supplied data that can be very large. It would
suck to have something like Gitlab dump entire merge diffs to the log
and those aren't even very large, probably under a megabyte. Some
users will have individual data that are up to 1GB....

-- 
greg



Re: log bind parameter values on error

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> On Mon, 9 Dec 2019 at 15:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Meh ... people will inevitably complain that they needed to see the
>> whole value, and we'll end up having to add another configuration
>> variable.  Let's not go there just yet.

> I haven't been following this whole thread but my initial reaction is
> that this particular configuration parameter would actually carry it's
> weight.

Possibly so.  I was mainly objecting to changing existing behaviors
without offering any configuration recourse for getting back the
existing behavior.

Although it would be sensible to allow logging of parameter values
to be controlled by a new GUC, it's less clear to me that the same GUC
ought to control what plpgsql's language feature print_strict_params
does.  So there would be room for discussion about that even if we
agreed on making this configurable.

            regards, tom lane



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-09, Tom Lane wrote:

> Some quick review of v19:

Here's v20 with all these comments hopefully addressed.


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-10, Alvaro Herrera wrote:

> On 2019-Dec-09, Tom Lane wrote:
> 
> > Some quick review of v19:
> 
> Here's v20 with all these comments hopefully addressed.

Grr, I had forgotten to git add the stringinfo.h -> pg_wchar.h changes,
so the prototype isn't anywhere in v20.  However, after looking at it
again, I'm not very happy with how that turned out, because pg_wchar.h
is a frontend-exposed header.  So I instead propose to put it in a
separate file src/include/mb/stringinfo_mb.h.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-07, Tom Lane wrote:

> 0002:

Here's a version of this part with fixes for these comments.  It applies
on top of the stringinfo_mb.c patch sent elsewhere in the thread.

(If we were to add a "log_parameters_on_error_max_length" GUC to decide
the length to log, we would get rid of the remaining magical numbers in
this code).


> Seems like BuildParamLogString's "valueLen" parameter ought to be called
> "maxlen", for consistency with 0001 and because "valueLen" is at best
> misleading about what the parameter means.
> 
> I'd toss the enlargeStringInfo call here too, as it seems overly
> complicated and underly correct or useful.
> 
> Probably doing MemoryContextReset after each parameter (even the last one!)
> is a net loss compared to just letting it go till the end.  Although
> I'd be inclined to use ALLOCSET_DEFAULT_SIZES not SMALL_SIZES if you
> do it like that.
> 
> Please do not throw away the existing comment "/* Free result of encoding
> conversion, if any */" in exec_bind_message.
> 
> As above, this risks generating partial multibyte characters.  You might
> be able to get away with letting appendStringInfoStringQuoted do the
> multibyte-aware truncation, but you probably have to copy more than just
> one more extra byte first.
> 
> I have zero faith in this:
> 
> +    params_errcxt.arg = (void *) &((ParamsErrorCbData)
> +                                   { portal->name, params });
> 
> How do you know where the compiler is putting that value, ie what
> its lifespan is going to be?  Better to use an explicit variable.
> 
> I concur with dropping testlibpq5.
> 
>             regards, tom lane


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
On 2019-Dec-10, Alvaro Herrera wrote:

> On 2019-Dec-10, Alvaro Herrera wrote:
> 
> > On 2019-Dec-09, Tom Lane wrote:
> > 
> > > Some quick review of v19:
> > 
> > Here's v20 with all these comments hopefully addressed.
> 
> Grr, I had forgotten to git add the stringinfo.h -> pg_wchar.h changes,
> so the prototype isn't anywhere in v20.  However, after looking at it
> again, I'm not very happy with how that turned out, because pg_wchar.h
> is a frontend-exposed header.  So I instead propose to put it in a
> separate file src/include/mb/stringinfo_mb.h.

Pushed 0001.

Thanks for all the reviews

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
Here's a curious thing that happens with this patch.  If you have
log_duration set so that parameters are logged during the bind phase,
and then an error occurs during the execution phase but you don't have
log_parameters_on_error set true, the second error will log the
parameters nonetheless ... because they were saved in the ParamListInfo
struct by the errdetail_params() call in the check_log_durations block
during bind.

I'm not sure this is a terrible problem, but does anybody think we
*not* save params->paramValuesStr across a log_duration when
log_parameters_on_error is not set?

(I think this is somewhat equivalent to the "was_logged" case in
check_log_duration itself.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
Alexey: I would appreciate it if you give this patch a spin.  Let me
know if it does what you wanted it to do.

On 2019-Dec-10, Alvaro Herrera wrote:

> Here's a curious thing that happens with this patch.  If you have
> log_duration set so that parameters are logged during the bind phase,
> and then an error occurs during the execution phase but you don't have
> log_parameters_on_error set true, the second error will log the
> parameters nonetheless ... because they were saved in the ParamListInfo
> struct by the errdetail_params() call in the check_log_durations block
> during bind.

AFAICS this has a simple solution, which is to stop saving the parameter
string in BuildParamLogString; instead, just *return* the string.
Caller can then assign it into params->paramValuesStr if appropriate
(which, in the case of errdetail_params(), it is not.)  v22 does it like
that.

There is still a much smaller issue that if you have both log_durations
set to log the params during bind, and log_parameters_on_error to true,
the parameters will appear twice.  But I think that's correct by
definition.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: log bind parameter values on error

От
Andres Freund
Дата:
Hi,

I see that you pushed this patch. I'm unfortunately unhappy with the
approach taken.  As previously said, handling a lot of this from exec_*
is a mistake in my opinion. Pretty much the same problem exists for
parametrized query execution from other contexts, e.g. for queries
executed inside plpgsql. By putting the responsibility to manage error
contexts etc from with exec_*, we'd need to add a copy of that to pretty
much every place executing queries. This should all be in the portal
code, with an optimization for computing the parameter values from
strings inside postgres.c, when the input is text.

Greetings,

Andres Freund



Re: log bind parameter values on error

От
Alvaro Herrera
Дата:
Hello

On 2019-Dec-12, Andres Freund wrote:

> I see that you pushed this patch.

Yeah, a version of it -- significantly different from what Alexey
submitted last.

> I'm unfortunately unhappy with the
> approach taken.  As previously said, handling a lot of this from exec_*
> is a mistake in my opinion. Pretty much the same problem exists for
> parametrized query execution from other contexts, e.g. for queries
> executed inside plpgsql. By putting the responsibility to manage error
> contexts etc from with exec_*, we'd need to add a copy of that to pretty
> much every place executing queries. This should all be in the portal
> code, with an optimization for computing the parameter values from
> strings inside postgres.c, when the input is text.

Hmm.  I think there are two pieces of interest for this feature.  One
saves a text version of the params; the other is setting the error
context that reports them.  I agree that possibly it would work to have
PortalStart be in charge of the former.

However, AFAICS the error context must be set up in each place that's
executing the query; I don't think it would work to try to hack it in
SPI_cursor_open_internal, for example which is what plpgsql uses; rather
it has to be in plpgsql/pl_exec.c itself.  (For example, you cannot even
put it directly in exec_dynquery_with_params -- it has to be in the
callers of that routine, and surround all code until after
exec_for_query.)

I think the current state of affairs is already a good improvement over
what was there before.  Can it be improved?  Sure.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services