Обсуждение: log bind parameter values on error
Hello, I'd like to propose a patch to log bind parameter values not only when logging duration, but also on error (timeout in particular) or in whatever situation the statement normally gets logged. This mostly could be useful when the request originator doesn't log them either, so it's hard to reproduce the problem. Unfortunately, when enabled, the feature comes with some memory and CPU overhead, as we cannot convert certain values to text when in aborted transaction. We potentially could do the trick with built-in types, but it would need cautious work with composite types, and also require more computation on the logging stage, which is a risk of cascading errors. Custom types still wouldn't be loggable, even as passed by client, which would be not great. So I decided to cache textual representations on bind stage, which is especially easy if the client uses text protocol. Best, Alexey
Вложения
On 15/12/2018 00:04, Alexey Bashtanov wrote: > I'd like to propose a patch to log bind parameter values not only when > logging duration, > but also on error (timeout in particular) or in whatever situation the > statement normally gets logged. > This mostly could be useful when the request originator doesn't log them > either, so it's hard > to reproduce the problem. That's a reasonable problem to solve. > So I decided to cache textual representations on bind stage, > which is especially easy if the client uses text protocol. That sounds like a plausible approach. Have you done any performance measurements? In your patch, I would organize the changes to the portal API a bit differently. Don't change the signature of CreatePortal(). Get rid of PortalSetCurrentTop() and PortalClearCurrentTop(). Just have a global variable CurrentPortal and set it directly. And then change GetCurrentPortalBindParameters() to take a Portal as argument. This can all happen inside postgres.c without changing the Portal APIs. In fact, maybe don't use the Portal structure at all and just store the saved textualized values inside postgres.c in a static variable. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> That sounds like a plausible approach. Have you done any performance > measurements? No I haven't yet > In your patch, I would organize the changes to the portal API a bit > differently. Don't change the signature of CreatePortal(). okay > Get rid of PortalSetCurrentTop() and PortalClearCurrentTop(). I'll remove them from Portal API, but possibly have them in postgres.c if you don't mind, just to avoid code duplication. Renamed and maybe even inlined. > Just have a global > variable CurrentPortal and set it directly. And then change > GetCurrentPortalBindParameters() to take a Portal as argument. This can > all happen inside postgres.c without changing the Portal APIs. Okay, will do > In fact, maybe don't use the Portal structure at all and just store the > saved textualized values inside postgres.c in a static variable. Unlike SQL, parameters may spend much more memory, so I'd have them in portal memory context to make sure the memory is released earlier rather than later. Tracking individual variable lifetime like we do with debug_query_string sounds doable but a bit non-straightforward to me, see e.g. the tricks we do with transaction commands. Also, I'd like to avoid early forming of the error string, as we may need to combine them differently in future, e.g. when logging in various logging formats or languages. One-by-one pfree-ing doesn't look tempting either. Do you think it would be acceptable to leave them cached in parameters structure? Best, Alex
On 02/01/2019 23:53, Alexey Bashtanov wrote: >> In fact, maybe don't use the Portal structure at all and just store the >> saved textualized values inside postgres.c in a static variable. > > Unlike SQL, parameters may spend much more memory, so I'd have them > in portal memory context to make sure the memory is released earlier > rather than later. Having them in the portal structure is different from having it in the portal memory context. Although there is perhaps value in keeping them together. > Do you think it would be acceptable to leave them cached in parameters > structure? Let's see how it looks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Peter, >> Unlike SQL, parameters may spend much more memory, so I'd have them >> in portal memory context to make sure the memory is released earlier >> rather than later. > Having them in the portal structure is different from having it in the > portal memory context. Although there is perhaps value in keeping them > together. yeah, to avoid pointers to deallocated areas > Let's see how it looks. > please see attached Best, Alex
Вложения
> please see attached sorry, some unintended changes sneaked in, please see the corrected patch
Вложения
There appears to be a problem with how this affects current logging behavior. I'm using pgbench -M extended -S -T 10 bench to test the extended protocol. Unpatched, with log_statement=all, you get something like LOG: execute <unnamed>: SELECT abalance FROM pgbench_accounts WHERE aid = $1; DETAIL: parameters: $1 = '30223' With your patch, with log_statement=all and log_parameters=on, you get the same, but with log_statement=all and log_parameters=off you get LOG: execute <unnamed>: SELECT abalance FROM pgbench_accounts WHERE aid = $1; DETAIL: parameters: $1 = UNKNOWN TYPE We should probably keep the existing parameter logging working as before. This also raises the question of the new parameter name. Parameters are already logged. So the name should perhaps be more like log_parameters_on_error. Some API notes on your patch: I think you can change get_portal_bind_parameters() to take a ParamListInfo, since you're not doing anything with the Portal other than grab the parameters. And that would allow you to keep the signature of errdetail_params() unchanged. I did some performance tests using the commands shown above and didn't find any problems. Obviously the default pgbench stuff isn't very parameter-intensive. Do you have tests with more and larger parameter values? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Peter, > With your patch, with log_statement=all and log_parameters=on, you get > the same, but with log_statement=all and log_parameters=off you get > > LOG: execute <unnamed>: SELECT abalance FROM pgbench_accounts WHERE aid > = $1; > DETAIL: parameters: $1 = UNKNOWN TYPE Thanks for spotting this, I've fixed it, see the new patch attached. > This also raises the question of the new parameter name. Parameters are > already logged. So the name should perhaps be more like > log_parameters_on_error. Done > Some API notes on your patch: I think you can change > get_portal_bind_parameters() to take a ParamListInfo, since you're not > doing anything with the Portal other than grab the parameters. And that > would allow you to keep the signature of errdetail_params() unchanged. Done > I did some performance tests using the commands shown above and didn't > find any problems. Obviously the default pgbench stuff isn't very > parameter-intensive. Do you have tests with more and larger parameter > values? > I've done some tests, but they are not very reproducible: the difference between runs is more than the difference between master vs feature branch and log_parameters_on_error on vs off. I was using a small java app, it tested the speed using only a single connection. See its code and the results attached. I'm sorry for the delay, feel free to move it to next commitfest if needed. Best, Alex
Вложения
On Mon, Jan 28, 2019 at 12:15:51AM +0000, Alexey Bashtanov wrote: > I'm sorry for the delay, feel free to move it to next commitfest if > needed. Done. -- Michael
Вложения
This piece in your patch probably doesn't belong: + elog(WARNING, "params->hasTextValues=%d, IsAbortedTransactionBlockState()=%d", + params->hasTextValues && IsAbortedTransactionBlockState()); -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-12-14 23:04:26 +0000, Alexey Bashtanov wrote: > Unfortunately, when enabled, the feature comes with some memory and CPU > overhead, > as we cannot convert certain values to text when in aborted transaction. Have you analyzed how invasive it'd be to delay that till we actually can safely start a [sub]transaction to do that logging? Probably too expensive, but it seems like something that ought to be analyzed. - Andres
Hi, tiny scroll-through review. On 2019-01-28 00:15:51 +0000, Alexey Bashtanov wrote: > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > index b6f5822..997e6e8 100644 > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -6274,6 +6274,23 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' > </listitem> > </varlistentry> > > + <varlistentry id="guc-log-parameters-on-error" xreflabel="log_parameters_on_error"> > + <term><varname>log_parameters_on_error</varname> (<type>boolean</type>) > + <indexterm> > + <primary><varname>log_parameters_on_error</varname> configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Controls whether the statement is logged with bind parameter values. > + It adds some overhead, as postgres will cache textual > + representations of parameter values in memory for all statements, > + even if they eventually do not get logged. The default is > + <literal>off</literal>. Only superusers can change this setting. > + </para> > + </listitem> > + </varlistentry> This needs a bit of language polishing. > @@ -31,6 +31,8 @@ > * set of parameter values. If dynamic parameter hooks are present, we > * intentionally do not copy them into the result. Rather, we forcibly > * instantiate all available parameter values and copy the datum values. > + * > + * We don't copy textual representations here. > */ That probably needs to be expanded on. Including a comment about what guarantees that there are no memory lifetime issues. > - /* Free result of encoding conversion, if any */ > - if (pstring && pstring != pbuf.data) > - pfree(pstring); > + if (pstring) > + { > + if (need_text_values) > + { > + if (pstring == pbuf.data) > + { > + /* > + * Copy textual representation to portal context. > + */ > + params->params[paramno].textValue = > + pstrdup(pstring); > + } > + else > + { > + /* Reuse the result of encoding conversion for it */ > + params->params[paramno].textValue = pstring; > + } > + } > + else > + { > + /* Free result of encoding conversion */ > + if (pstring != pbuf.data) > + pfree(pstring); > + } > + } So the parameters we log here haven't actually gone through the output function? Isn't that an issue? I mean that'll cause the contents to differ from the non-error case, no? And differs from the binary protocol case? > else > { > + /* > + * We do it for non-xact commands only, as an xact command > + * 1) shouldn't have any parameters to log; > + * 2) may have the portal dropped early. > + */ > + Assert(current_top_portal == NULL); > + current_top_portal = portal; > + portalParams = NULL; > + "it"? ISTM the comment doesn't really stand on its own? > +/* > + * get_portal_bind_parameters > + * Get the string containing parameters data, is used for logging. > + * > + * Can return NULL if there are no parameters in the portal > + * or the portal is not valid, or the text representations of the parameters are > + * not available. If returning a non-NULL value, it allocates memory > + * for the returned string in the current context, and it's the caller's > + * responsibility to pfree() it if needed. > + */ > +char * > +get_portal_bind_parameters(ParamListInfo params) > +{ > + StringInfoData param_str; > + > + /* No parameters to format */ > + if (!params || params->numParams == 0) > + return NULL; > + > + elog(WARNING, "params->hasTextValues=%d, IsAbortedTransactionBlockState()=%d", > + params->hasTextValues && IsAbortedTransactionBlockState()); Err, huh? Greetings, Andres Freund
Hello Anders and Peter, Thanks for your messages. Please see the new patch version attached. > Have you analyzed how invasive it'd be to delay that till we actually > can safely start a [sub]transaction to do that logging? Probably too > expensive, but it seems like something that ought to be analyzed. The fundamental problem here is that the output functions for the types of the values to be logged may be present only in the transaction that has just been aborted. Also I don't like the idea of doing complex operations in the error handler, as it increases the chances of cascading errors. I thought of pre-calculating types' output functions instead of their results, but that would work for certain built-in types only. >> + <varlistentry id="guc-log-parameters-on-error" xreflabel="log_parameters_on_error"> >> + <term><varname>log_parameters_on_error</varname> (<type>boolean</type>) >> + <indexterm> >> + <primary><varname>log_parameters_on_error</varname> configuration parameter</primary> >> + </indexterm> >> + </term> >> + <listitem> >> + <para> >> + Controls whether the statement is logged with bind parameter values. >> + It adds some overhead, as postgres will cache textual >> + representations of parameter values in memory for all statements, >> + even if they eventually do not get logged. The default is >> + <literal>off</literal>. Only superusers can change this setting. >> + </para> >> + </listitem> >> + </varlistentry> > This needs a bit of language polishing. I would appreciate if you had any suggestions, as my English isn't great. >> @@ -31,6 +31,8 @@ >> * set of parameter values. If dynamic parameter hooks are present, we >> * intentionally do not copy them into the result. Rather, we forcibly >> * instantiate all available parameter values and copy the datum values. >> + * >> + * We don't copy textual representations here. >> */ > That probably needs to be expanded on. Including a comment about what > guarantees that there are no memory lifetime issues. What kind of memory lifetime issues do you mean? We're not copying textual representations, so the worst can happen is they don't get logged when appropriate. Luckily, we never use this function when copying to a portal we use for logging, I added this to the comment. Do you think it's any better? > > So the parameters we log here haven't actually gone through the output > function? Isn't that an issue? I mean that'll cause the contents to > differ from the non-error case, no? And differs from the binary protocol > case? I don't think it's much of a problem. Text input and output functions are meant to match, but the CREATE TYPE spec isn't too specific about what it means. Of course it does not mean that typoutput(typinput(foo)) is always exactly foo. However, I really hope that at least typinput(typoutput(foo)) = foo, where "=" is the correspondent operator registered in postgres. If a cheeky client passes '007' as a numeric value I don't mind it being sometimes logged as '007' and sometimes as '7', depending on the settings. It anyway denotes the same number, and we'll know what to pass to reproduce the problem. For binary protocol it'll be '7' as well, as it'll undergo the typrecv and then typoutput procedures. >> else >> { >> + /* >> + * We do it for non-xact commands only, as an xact command >> + * 1) shouldn't have any parameters to log; >> + * 2) may have the portal dropped early. >> + */ >> + Assert(current_top_portal == NULL); >> + current_top_portal = portal; >> + portalParams = NULL; >> + > "it"? ISTM the comment doesn't really stand on its own? Thanks, I fixed the comment and some code around it. >> +/* >> + * get_portal_bind_parameters >> + * Get the string containing parameters data, is used for logging. >> + * >> + * Can return NULL if there are no parameters in the portal >> + * or the portal is not valid, or the text representations of the parameters are >> + * not available. If returning a non-NULL value, it allocates memory >> + * for the returned string in the current context, and it's the caller's >> + * responsibility to pfree() it if needed. >> + */ >> +char * >> +get_portal_bind_parameters(ParamListInfo params) >> +{ >> + StringInfoData param_str; >> + >> + /* No parameters to format */ >> + if (!params || params->numParams == 0) >> + return NULL; >> + >> + elog(WARNING, "params->hasTextValues=%d, IsAbortedTransactionBlockState()=%d", >> + params->hasTextValues && IsAbortedTransactionBlockState()); > Err, huh? This was some debugging, I threw it away now. Best, Alex
Вложения
On 2019-02-15 15:02, Alexey Bashtanov wrote: > Hello Anders and Peter, > > Thanks for your messages. > Please see the new patch version attached. In my testing, I couldn't get this patch to do anything. Could you please share your testing steps? I did postgres -D data --log-parameters-on-error=on pgbench -i bench alter table pgbench_accounts alter column aid type smallint; pgbench -M extended -S -T 10 bench This will then error out on type overflows, but I don't see any parameters being logged: ERROR: value "62812" is out of range for type smallint STATEMENT: SELECT abalance FROM pgbench_accounts WHERE aid = $1; (In this case the error message contains the parameter value, so it's not a very practical case, but it should work, it seems.) Meanwhile, I have committed a patch that refactors the ParamListInfo initialization a bit, so you don't have to initialize hasTextValues in a bunch of places unrelated to your core code. So please rebase your patch on that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alexey, On 3/14/19 12:38 PM, Peter Eisentraut wrote: > > Meanwhile, I have committed a patch that refactors the ParamListInfo > initialization a bit, so you don't have to initialize hasTextValues in a > bunch of places unrelated to your core code. So please rebase your > patch on that. It's been two weeks with no new patch or answers to Peter's questions. Since we are nearly at the end of the CF I'll target this patch for PG13 and mark it Returned with Feedback at the end of the CF if there's no new patch by then. Regards, -- -David david@pgmasters.net
Hello and sorry for weeks of silence.
Sure. Provided you're in the postgres checkout and you've run make in src/test/examples/ this should work
CREATE SCHEMA testlibpq3;
SET search_path = testlibpq3;
CREATE TABLE test1_(i int4, t text, b bytea);
INSERT INTO test1_ VALUES(0, '', '');
CREATE VIEW test1 AS SELECT 1/i i, t, b FROM test1_;
-- will log only statement
\! ./src/test/examples/testlibpq3
ALTER SYSTEM SET log_parameters_on_error TO on;
SELECT pg_reload_conf();
-- will log statement with parameters
\! ./src/test/examples/testlibpq3
That's why the error message came without parameters.
Please find rebased patch attached.
I apologize for no reply before: I first thought my patch was really faulty and knew I wouldn't have time to fix it these days, but it seems to me it actually works.
Anyway, I don't suppose you manage to review it within the remaining few days, so I'll rebase it again in the beginning of the next CF.
Best regards,
Alexey
Hello Anders and Peter, Thanks for your messages. Please see the new patch version attached.In my testing, I couldn't get this patch to do anything. Could you please share your testing steps?
Sure. Provided you're in the postgres checkout and you've run make in src/test/examples/ this should work
CREATE SCHEMA testlibpq3;
SET search_path = testlibpq3;
CREATE TABLE test1_(i int4, t text, b bytea);
INSERT INTO test1_ VALUES(0, '', '');
CREATE VIEW test1 AS SELECT 1/i i, t, b FROM test1_;
-- will log only statement
\! ./src/test/examples/testlibpq3
ALTER SYSTEM SET log_parameters_on_error TO on;
SELECT pg_reload_conf();
-- will log statement with parameters
\! ./src/test/examples/testlibpq3
I guess this error occurred while binding, so the parameters probably weren't yet all bound by the time of error reporting.I did postgres -D data --log-parameters-on-error=on pgbench -i bench alter table pgbench_accounts alter column aid type smallint; pgbench -M extended -S -T 10 bench This will then error out on type overflows, but I don't see any parameters being logged: ERROR: value "62812" is out of range for type smallint STATEMENT: SELECT abalance FROM pgbench_accounts WHERE aid = $1; (In this case the error message contains the parameter value, so it's not a very practical case, but it should work, it seems.)
That's why the error message came without parameters.
Meanwhile, I have committed a patch that refactors the ParamListInfo initialization a bit, so you don't have to initialize hasTextValues in a bunch of places unrelated to your core code. So please rebase your patch on that.
Please find rebased patch attached.
I apologize for no reply before: I first thought my patch was really faulty and knew I wouldn't have time to fix it these days, but it seems to me it actually works.
Anyway, I don't suppose you manage to review it within the remaining few days, so I'll rebase it again in the beginning of the next CF.
Best regards,
Alexey
Вложения
On 2019-03-29 15:55, Alexey Bashtanov wrote: >> ERROR: value "62812" is out of range for type smallint >> STATEMENT: SELECT abalance FROM pgbench_accounts WHERE aid = $1; >> >> (In this case the error message contains the parameter value, so it's >> not a very practical case, but it should work, it seems.) > I guess this error occurred /while/ binding, so the parameters probably > weren't yet all bound by the time of error reporting. > That's why the error message came without parameters. I see. But I think that could be fixed. Change exec_bind_message() to loop over the parameters twice: once to save them away, once to actually process them. I think the case of a faulty input value is probably very common, so it would be confusing if that didn't work. I think this patch needs some tests. Manually testing it is cumbersome, and as we are seeing now, it is not quite clear which cases it is supposed to cover. There are also additional cases for binary parameters, and there are additions to the CSV output format. We need tests for all that so the behavior explains itself and doesn't have to be rediscovered every time someone wants to look at this again. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/04/2019 12:23, Peter Eisentraut wrote: > On 2019-03-29 15:55, Alexey Bashtanov wrote: >>> ERROR: value "62812" is out of range for type smallint >>> STATEMENT: SELECT abalance FROM pgbench_accounts WHERE aid = $1; >>> >>> (In this case the error message contains the parameter value, so it's >>> not a very practical case, but it should work, it seems.) >> I guess this error occurred /while/ binding, so the parameters probably >> weren't yet all bound by the time of error reporting. >> That's why the error message came without parameters. > I see. But I think that could be fixed. Change exec_bind_message() to > loop over the parameters twice: once to save them away, once to actually > process them. I think the case of a faulty input value is probably very > common, so it would be confusing if that didn't work. I understand this may sound lazy of me, however let me take this risk and try to explain why I think logging them here in the same fashion would be inconsistent. The original feature is intended to log textual representations of the bind parameter values, whereas the problem suggested to be solved together with it is verbose logging of the part of the bind message that failed to get converted to a Datum. These are equivalent only under the following conditions: 1) the rest of the message was correctly formatted, so we can extract something for each of the parameter values 2) the values have been passed in text mode (all of them, not only the one failed to process) 3) the values have been passed in the server encoding I think it would be a bit of inconsistent to log the parts of the message only when we are lucky to satisfy the 3 conditions above. If we are to log bind parameters message parsing and processing problems more verbosely, what do you think of rather wrapping calls to OidInputFunctionCall, OidReceiveFunctionCall and pg_client_to_server into PG_TRY blocks and logging their arguments individually? In case it doesn't work for any reason, I have an alternative idea on how to log half-processed parameters. Instead of hasTextValues store the number of parameters having their textual representations successfully saved. This way we will be able, in case of text mode, save the original value and increment the counter before the call to OidInputFunctionCall. When logging and not having values for some parameters, we can print an ellipsis in the log to indicate there's some more of them missing. > I think this patch needs some tests. Manually testing it is cumbersome, > and as we are seeing now, it is not quite clear which cases it is > supposed to cover. There are also additional cases for binary > parameters, and there are additions to the CSV output format. We need > tests for all that so the behavior explains itself and doesn't have to > be rediscovered every time someone wants to look at this again. > I have added a section into src/test/examples/testlibpq3.c, please see attached. As far as I could see these tests never run on "make check" or "make installcheck", hence the README change. Please correct me if I'm wrong. I also failed to find an automatic way to test what actually gets logged to the server log, at least not in the installcheck case. I would appreciate if you had any suggestions about it. Best regards, Alex
Вложения
Please find the rebased patch attached. Tested like the following. Provided you're in the postgres checkout and you've run make in src/test/examples/ and connected to db=postgres: CREATE SCHEMA testlibpq3; SET search_path = testlibpq3; CREATE TABLE test1_(i int4, t text, b bytea); INSERT INTO test1_ VALUES(0, '', ''); CREATE VIEW test1 AS SELECT 1/i i, t, b FROM test1_; -- will log only statement \! ./src/test/examples/testlibpq3 ALTER SYSTEM SET log_parameters_on_error TO on; SELECT pg_reload_conf(); -- will log statement with parameters \! ./src/test/examples/testlibpq3 Best regards, Alexey
Вложения
Nice patch, thanks. I didn't like abusing testlibpq3.c for your new stuff, so I moved it off to a new file testlibpq5.c. I cleaned up a few other cosmetics things about this -- v10 attached. I eventually noticed that this patch fails to initialize each param's textValue to NULL, which probably explains why you have to be so careful about only setting hasTextValues after the whole loop. That seems a bit too trusting; I think it would be better to set all these to NULL in makeParamList instead of leaving the memory undefined. One way would be to have a for() look in makeParamList that nullifies the member; another would be to use palloc0(). A third possibility is to inspect each caller of makeParamList and have them all set textValue to NULL to each parameter. I'm marking this waiting on author. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Hi, On 2019-09-18 17:58:53 -0300, Alvaro Herrera wrote: > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -6414,6 +6414,23 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' > </listitem> > </varlistentry> > > + <varlistentry id="guc-log-parameters-on-error" xreflabel="log_parameters_on_error"> > + <term><varname>log_parameters_on_error</varname> (<type>boolean</type>) > + <indexterm> > + <primary><varname>log_parameters_on_error</varname> configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Controls whether the statement is logged with bind parameter values. Trailing whitespace. I find the "the statement" formulation a bit odd, but can't quite put my finger on why. I think it might be worthwhile to cross-reference log_min_error_statement, as log_parameters_on_error will only take effect when the statement is logged due to log_min_error_statement. > + It adds some overhead, as postgres will cache textual > + representations of parameter values in memory for all statements, > + even if they eventually do not get logged. The default is > + <literal>off</literal>. Only superusers can change this setting. I don't think "cache" is the right descriptor. Usually caching implies trying to make repeated tasks faster, which isn't the case here. > +/* > + * The top-level portal that the client is immediately working with: > + * creating, binding, executing, or all at one using simple protocol > + */ > +Portal current_top_portal = NULL; > + This strikes me as decidedly not nice. For one this means that we'll not be able to use this infrastructure for binds that are done serverside, e.g. as part of plpgsql. I'm basically inclined to think that integrating this at the postges.c level is the wrong thing. It also adds new error handling complexity, which is already quite substantial. And spreads knowledge of portals to elog.c, which imo shouldn't have to know about them. It seems to me that this would really need to be tracked inside the portal infrastructure. To avoid unnecessary overhead, we could continue to set the text values in exec_bin_message() in the pformat == 0 case, using hasTextValues somewhere in the portal code to determine whether the text representation has to be computed (for other input format, and internal queries as e.g. generated by plpgsql). And then the PG_CATCHes in pquery.c can add the errdetail() in the error cases using an econtext callback. > +/* > + * get_portal_bind_params > + * Get a string containing parameters data -- used for logging. > + * > + * Can return NULL if there are no parameters in the portal or the portal is > + * not valid, or the text representations of the parameters are not available. > + * If returning a non-NULL value, it allocates memory for the returned string > + * in the current context, and it's the caller's responsibility to pfree() it. > + */ > +char * > +get_portal_bind_params(ParamListInfo params) > +{ > + StringInfoData param_str; > + > + /* No parameters to format */ > + if (!params || params->numParams == 0) > + return NULL; > + > + /* > + * We either need textual representation of parameters pre-calculated, or > + * call potentially user-defined I/O functions to convert the internal > + * representation into text, which cannot be done in an aborted xact > + */ > + if (!params->hasTextValues && IsAbortedTransactionBlockState()) > + return NULL; Maybe I'm naive, but what is the point of keeping the separate parameters, allocated separately, when all we're doing is building a string containing them all at error time? Seems better just directly form the full string when decideding to keep the text parameters - for one it'll often end up being more efficient. But more importantly it also makes it a lot less likely to run out of memory while handling the error. The individual text parameters can be large, and this will always additionally need at least the combined size of all parameters from within the error context. That's not great. > + appendStringInfoCharMacro(¶m_str, '\''); > + for (p = pstring; *p; p++) > + { > + if (*p == '\'') /* double single quotes */ > + appendStringInfoCharMacro(¶m_str, *p); > + appendStringInfoCharMacro(¶m_str, *p); > + } > + appendStringInfoCharMacro(¶m_str, '\''); I know this is old code, but: This is really inefficient. Will cause a lot of unnecessary memory-reallocations for large text outputs, because we don't immediately grow to at least close to the required size. Greetings, Andres Freund
Hi, thanks for looking. On 2019-Sep-20, Andres Freund wrote: > On 2019-09-18 17:58:53 -0300, Alvaro Herrera wrote: > > + <varlistentry id="guc-log-parameters-on-error" xreflabel="log_parameters_on_error"> > > + <term><varname>log_parameters_on_error</varname> (<type>boolean</type>) > > + <indexterm> > > + <primary><varname>log_parameters_on_error</varname> configuration parameter</primary> > > + </indexterm> > > + </term> > > + <listitem> > > + <para> > > + Controls whether the statement is logged with bind parameter values. > > Trailing whitespace. > > I find the "the statement" formulation a bit odd, but can't quite put my > finger on why. Yeah, I think that wording is pretty confusing. I would use "Controls whether bind parameters are logged when a statement is logged." > > +/* > > + * The top-level portal that the client is immediately working with: > > + * creating, binding, executing, or all at one using simple protocol > > + */ > > +Portal current_top_portal = NULL; > > + > > This strikes me as decidedly not nice. For one this means that we'll not > be able to use this infrastructure for binds that are done serverside, > e.g. as part of plpgsql. I'm basically inclined to think that > integrating this at the postges.c level is the wrong thing. [...] > It seems to me that this would really need to be tracked inside the > portal infrastructure. I think that's how this was done at first, then Peter E drove him away from that into the current design. > It also adds new error handling complexity, which is already quite > substantial. And spreads knowledge of portals to elog.c, which imo > shouldn't have to know about them. Makes sense. > > + appendStringInfoCharMacro(¶m_str, '\''); > > + for (p = pstring; *p; p++) > > + { > > + if (*p == '\'') /* double single quotes */ > > + appendStringInfoCharMacro(¶m_str, *p); > > + appendStringInfoCharMacro(¶m_str, *p); > > + } > > + appendStringInfoCharMacro(¶m_str, '\''); > > I know this is old code, but: This is really inefficient. Will cause a > lot of unnecessary memory-reallocations for large text outputs, because > we don't immediately grow to at least close to the required size. Agreed, but we can't blame a patch because it moves around some old crufty code. If Alexey wants to include another patch to change this to a better formulation, I'm happy to push that before or after his main patch. And if he doesn't want to, that's fine with me too. (Is doing a strlen first to enlarge the stringinfo an overall better approach?) (I wonder if it would make sense to strchr each ' and memcpy the intervening bytes ... if only that didn't break the SI abstraction completely ...) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-09-20 16:56:57 -0300, Alvaro Herrera wrote: > > > +/* > > > + * The top-level portal that the client is immediately working with: > > > + * creating, binding, executing, or all at one using simple protocol > > > + */ > > > +Portal current_top_portal = NULL; > > > + > > > > This strikes me as decidedly not nice. For one this means that we'll not > > be able to use this infrastructure for binds that are done serverside, > > e.g. as part of plpgsql. I'm basically inclined to think that > > integrating this at the postges.c level is the wrong thing. > [...] > > It seems to me that this would really need to be tracked inside the > > portal infrastructure. > > I think that's how this was done at first, then Peter E drove him away > from that into the current design. I don't think it really was the way I am suggesting. There were a bunch of helper functions managing current_top_portal, but it otherwise is (and afaict was in all versions) still all postgres.c controlled. Whereas I think it should be nearly exclusively be handled by pquery.c, with the possible exception of an efficiency hack to reuse client input string when they're in text format. What I'm suggesting is that PortalStart() would build a string representation out of the parameter list (using ParamExternData.textValue if set, calling the output function otherwise), and stash that in the portal. At the start of all the already existing PG_TRY blocks in pquery.c we push an element onto the error_context_stack that adds the errdetail with the parameters to the error. Alternatively we could also add them in in the PG_CATCH blocks, but that'd only work for elevel == ERROR (i.e. neither FATAL nor non throwing log levels would be able to enrich the error). Thinking about this: I think the current approach doesn't actually handle recursive errors correctly. Even if we fail to emit the error message due to the parameter details requiring a lot of memory, we'd again do so (and again fail) when handling that OOM error, because current_top_portal afaict would still be set. At the very least this'd need to integrate with the recursion_depth logic in elog.c. > > > + appendStringInfoCharMacro(¶m_str, '\''); > > > + for (p = pstring; *p; p++) > > > + { > > > + if (*p == '\'') /* double single quotes */ > > > + appendStringInfoCharMacro(¶m_str, *p); > > > + appendStringInfoCharMacro(¶m_str, *p); > > > + } > > > + appendStringInfoCharMacro(¶m_str, '\''); > > > > I know this is old code, but: This is really inefficient. Will cause a > > lot of unnecessary memory-reallocations for large text outputs, because > > we don't immediately grow to at least close to the required size. > > Agreed, but we can't blame a patch because it moves around some old > crufty code. If Alexey wants to include another patch to change this to > a better formulation, I'm happy to push that before or after his main > patch. And if he doesn't want to, that's fine with me too. Well, this patch makes it potentially a considerably hotter path, so I think there's some justification for pushing a bit. But I'd not require it either. As I said, I think we cannot generate this string at error time, because it makes it much much more likely to exhaust the error context - a bad thing. > (Is doing a strlen first to enlarge the stringinfo an overall better > approach?) Yes, it'd be better. > (I wonder if it would make sense to strchr each ' and memcpy the > intervening bytes ... if only that didn't break the SI abstraction > completely ...) I'd probably just count the ' in one pass, enlarge the stringinfo to the required size, and then put the string directly into he stringbuffer. Possibly just putting the necessary code into stringinfo.c. We already have multiple copies of this inefficient logic... But even if not, I don't think writing data into the stringbuf directly is that ugly, we do that in enough places that I'd argue that that's basically part of how it's expected to be used. In HEAD there's at least - postgres.c:errdetail_params(), - pl_exec.c:format_preparedparamsdata() pl_exec.c:format_expr_params() and - dblink.c:escape_param_str() - deparse.c:deparseStringLiteral() - xlog.c:do_pg_start_backup() (after "Add the escape character"), - tsearchcmds.c:serialize_deflist() - ruleutils.c:simple_quote_literal() are nearly the same. Greetings, Andres Freund
Hello Alvaro, > I didn't like abusing testlibpq3.c for your new stuff, so I moved it off > to a new file testlibpq5.c. I cleaned up a few other cosmetics things > about this -- v10 attached. Thanks for doing this. > I eventually noticed that this patch fails > to initialize each param's textValue to NULL, I've added the changes to set textValue to NULL for each parameter after calling makeParamList. I think it's the best option, as it's natural to populate text representations (albeit with NULLs) in the same loop as we write the other parameters attributes. It still requires us to set hasTextValues afterwards, as it doesn't seem practical to me to make two loops, first null them all and next populate them: if there's a parsing or converting problem midway it's out of the feature scope and already being logged elsewhere. Attaching v11, marking as waiting for review. Best, Alex
Вложения
Hi Anders and Alvaro, I must have missed this conversation branch when sending in v011. Sorry about that. > I think it might be worthwhile to cross-reference > log_min_error_statement, as log_parameters_on_error will only take effect when the > statement is logged due to log_min_error_statement. Agreed, added some clarification. > I don't think "cache" is the right descriptor. Usually caching implies > trying to make repeated tasks faster, which isn't the case here. Well, potentially it can, if we set log_min_error_statement to something lower than error and emit tons of warnings. But it's not the primary use case, so I just replaced it by the word "save". > What I'm suggesting is that PortalStart() would build a string > representation out of the parameter list (using > ParamExternData.textValue if set, calling the output function > otherwise), and stash that in the portal. > > At the start of all the already existing PG_TRY blocks in pquery.c we > push an element onto the error_context_stack that adds the errdetail > with the parameters to the error. Alternatively we could also add them > in in the PG_CATCH blocks, but that'd only work for elevel == ERROR > (i.e. neither FATAL nor non throwing log levels would be able to enrich > the error). I'm a bit worried about going this way, as it makes us log the query and its parameters too far apart in the code, and it's trickier to make sure we never log parameters without the query. I think logging the parameters should not be part of error_context_stack, but rather a primary part of logging facility itself, like statement. That's because whether we want to log parameters depends on print_stmt in elog.c. With print_stmt being computed upon edata, I'm not sure how to work it out nicely. > Thinking about this: I think the current approach doesn't actually > handle recursive errors correctly. Even if we fail to emit the error > message due to the parameter details requiring a lot of memory, we'd > again do so (and again fail) when handling that OOM error, because > current_top_portal afaict would still be set. At the very least this'd > need to integrate with the recursion_depth logic in elog.c. We handle it the same way as we do it for debug_query_string itself: if (in_error_recursion_trouble()) { error_context_stack = NULL; debug_query_string = NULL; current_top_portal = NULL; } I'm attaching the patch with docs fixes. Best regards, Alexey
Вложения
So, if some parameters are large (they can be up to 1 GB-1, remember) then we can bloat the log file severely. I think we need to place an upper limit on the strings that we're going to log -- as inspiration, callers of ExecBuildValueDescription uses 64 chars per value maximum. Something like that seems reasonable. So I think you need to add some pg_mbcliplen() calls in a couple of places in exec_bind_message. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Nov-07, Alvaro Herrera wrote: > So, if some parameters are large (they can be up to 1 GB-1, remember) > then we can bloat the log file severely. I think we need to place an > upper limit on the strings that we're going to log -- as inspiration, > callers of ExecBuildValueDescription uses 64 chars per value maximum. > Something like that seems reasonable. So I think you need to add some > pg_mbcliplen() calls in a couple of places in exec_bind_message. (BTW it looks like plpgsql_param_fetch() is creating ParamExternData values and not setting textValue for them.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-11-05 12:07:50 +0000, Alexey Bashtanov wrote: > > What I'm suggesting is that PortalStart() would build a string > > representation out of the parameter list (using > > ParamExternData.textValue if set, calling the output function > > otherwise), and stash that in the portal. > > > > At the start of all the already existing PG_TRY blocks in pquery.c we > > push an element onto the error_context_stack that adds the errdetail > > with the parameters to the error. Alternatively we could also add them > > in in the PG_CATCH blocks, but that'd only work for elevel == ERROR > > (i.e. neither FATAL nor non throwing log levels would be able to enrich > > the error). > > I'm a bit worried about going this way, as it makes us log > the query and its parameters too far apart in the code, > and it's trickier to make sure we never log parameters without the query. The way you do it you need to do it in numerous places, and I'm pretty sure you're missing some already. E.g. this will not work to log parameters for parametrized statements generated on the server side, e.g. for foreign key queries. I don't think that's the right direction to go. You can maybe argue that we don't need support for logging server side generated parameters in the initial version, but the approach should be compatible with adding that support. > I think logging the parameters should not be part of error_context_stack, > but rather a primary part of logging facility itself, like statement. > That's because whether we want to log parameters depends > on print_stmt in elog.c. With print_stmt being computed upon edata, > I'm not sure how to work it out nicely. I don't think those really are contradictions. You can continue to have an errdetail_params(), and but call it from the error context callback set up in the portal code. That'd also get rid of the two different copies of the logic for deciding whether to log bind parameters. Right now your approach e.g. doesn't work for emit_log_hook. Nor is all that clear to me that we'd never want to send this information to the client. Even leaving that aside, I'm *STRONGLY* against entangling elog.c with query execution details like ParamListInfo. We should work to make elog.c know less about different parts of the system, not more. > > Thinking about this: I think the current approach doesn't actually > > handle recursive errors correctly. Even if we fail to emit the error > > message due to the parameter details requiring a lot of memory, we'd > > again do so (and again fail) when handling that OOM error, because > > current_top_portal afaict would still be set. At the very least this'd > > need to integrate with the recursion_depth logic in elog.c. > We handle it the same way as we do it for debug_query_string itself: > > if (in_error_recursion_trouble()) > { > error_context_stack = NULL; > debug_query_string = NULL; > current_top_portal = NULL; > } > Sorry, had missed that, it wasn't apparent in the diff. I personally just always use a larger context size to make it easier to recognize... Greetings, Andres Freund
On Thu, Nov 07, 2019 at 03:41:04PM -0800, Andres Freund wrote: > The way you do it you need to do it in numerous places, and I'm pretty > sure you're missing some already. E.g. this will not work to log > parameters for parametrized statements generated on the server side, > e.g. for foreign key queries. I don't think that's the right direction > to go. You can maybe argue that we don't need support for logging server > side generated parameters in the initial version, but the approach > should be compatible with adding that support. This patch had a review from two committers, with no updates from the author, so marked as returned with feedback. -- Michael
Вложения
Hi, I'm sorry for replying so late. > I don't think those really are contradictions. You can continue to have > an errdetail_params(), and but call it from the error context callback > set up in the portal code ... > Even leaving that aside, I'm *STRONGLY* against entangling elog.c with > query execution details like ParamListInfo. We should work to make > elog.c know less about different parts of the system, not more. I've implemented it using error contexts, please could you have a look at the patch attached? I did not use the PG_TRY blocks in portal code because they are not wide enough. Otherwise, we wouldn't catch errors that can happen in GetCachedPlan. Instead I added an item to error context chain in postgres.c exec_*_message routines. I can't find why it could be unsafe, but I'm still a bit worried about it as there's no other error context changes that early in the call stack. I've also made string formatting more efficient, and pre-calculate the whole string instead of individual values as suggested. Unfortunately it still copies the error text when reporting -- I'm not sure how I change it without substantial change of error logging infrastructure. One more concern I'd like to ask for opinions. I'm using errdetail_log for parameters logging, as it's the only way to make it appear in the log only but not get sent to the client. It looks a bit awkward, as it appears like ERROR DETAIL CONTEXT STATEMENT, where CONTEXT may in fact be something inner nested than the parameters that appear in DETAILS. With this regards, I'm thinking of adding some special arrangements into elog.c. One idea is a econtext_log to add a log-only data into CONTEXT, however it'll still log params above the statement, although directly above. Another option is a special estatementparams (or perhaps efinalcontext_log?) to log below the statement. What do you think? 2 Alvaro: > So, if some parameters are large (they can be up to 1 GB-1, remember) > then we can bloat the log file severely. I think we need to place an > upper limit on the strings that we're going to log -- as inspiration, > callers of ExecBuildValueDescription uses 64 chars per value maximum. > Something like that seems reasonable. So I think you need to add some > pg_mbcliplen() calls in a couple of places in exec_bind_message. I like the idea, but I don't think it's directly related to the change proposed. We are already logging parameters in certain situations with no limits on their lenghts. Given the time it takes to get the change through (not least because of me addressing reviewers' comments really slowly) I'd not include it in the patch. Do you find it acceptable? Best, Alex
Вложения
Patch file name is not great, changing it. On 30/11/2019 12:35, Alexey Bashtanov wrote: > I've implemented it using error contexts, please could you have a look > at the patch attached?
Вложения
On 2019-Sep-20, Andres Freund wrote: > > > > + appendStringInfoCharMacro(¶m_str, '\''); > > > > + for (p = pstring; *p; p++) > > > > + { > > > > + if (*p == '\'') /* double single quotes */ > > > > + appendStringInfoCharMacro(¶m_str, *p); > > > > + appendStringInfoCharMacro(¶m_str, *p); > > > > + } > > > > + appendStringInfoCharMacro(¶m_str, '\''); > > > > > > I know this is old code, but: This is really inefficient. Will cause a > > > lot of unnecessary memory-reallocations for large text outputs, because > > > we don't immediately grow to at least close to the required size. > I'd probably just count the ' in one pass, enlarge the stringinfo to the > required size, and then put the string directly into he stringbuffer. > Possibly just putting the necessary code into stringinfo.c. We already > have multiple copies of this inefficient logic... I stole Alexey's code for this from downthread and tried to apply to all these places. However, I did not put it to use in all these places you mention, because each of them has slightly different requirements; details below. Now, I have to say that this doesn't make me terribly happy, because I would like the additional ability to limit the printed values to N bytes. This means the new function would have to have an additional argument to indicate the maximum length (pass 0 to print all args whole) ... and the logic then because more involved because we need logic to stop printing early. I think the current callers should all use the early termination logic; having plpgsql potentially produce 1 MB of log output because of a long argument seems silly. So I see no use for this function *without* the length-limiting logic. > But even if not, I don't think writing data into the stringbuf directly > is that ugly, we do that in enough places that I'd argue that that's > basically part of how it's expected to be used. > > In HEAD there's at least > - postgres.c:errdetail_params(), > - pl_exec.c:format_preparedparamsdata() > pl_exec.c:format_expr_params() These are changed in the patch; they're all alike. > - dblink.c:escape_param_str() This one wants to use \' and \\ instead of just doubling each '. The resulting string is used as libpq conninfo, so using doubled ' does not work. > - deparse.c:deparseStringLiteral() This one wants to use E''-style strings that ignore std_conforming_strings; the logic would need to detect two chars ' and \ instead of just ' so we'd need to use strcspn instead of strchr(); that seems a lot slower. > - xlog.c:do_pg_start_backup() (after "Add the escape character"), This one wants to escape \n chars. > - tsearchcmds.c:serialize_deflist() This wants E''-style strings, like deparse.c. > - ruleutils.c:simple_quote_literal() Produce an escaped string according to the prevailing std_conforming_strings. I think it's possible to produce a function that would satisfy all of these, but it would be another function similar to the one I'm writing here, not the same one. -- Álvaro Herrera https://www.2ndQuadrant.com/ ggPostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On 2019-Dec-03, Alvaro Herrera wrote: > Now, I have to say that this doesn't make me terribly happy, because I > would like the additional ability to limit the printed values to N > bytes. This means the new function would have to have an additional > argument to indicate the maximum length (pass 0 to print all args > whole) ... and the logic then because more involved because we need > logic to stop printing early. So after actually writing this, it seems that it's better to have a separate path for this; otherwise you need to scan the whole input string (strchr or friends), which might be long. I tried to see how to make strnlen() work for us, but I found no simple way to print the "..." at the end. So this just runs once per char. I think that's okay, since the intended usage is to print a few dozen bytes, like ExecBuildSlotValueDescription does (64 bytes is what all its callers do). Another point: this implementation is not strict about how many input characters it prints. It counts up to maxlen *output* characters, which means if it duplicates a ', that's one less input char printed. (Given the usage of this function, this seems a feature rather than a bug. The first implementation did the opposite, and on the whole it seemed worse.) If anyone can suggest a smarter implementation, I'm all ears. (Maybe do strnlen(maxlen), then count strnlen(1) starting at that point -- so if that returns >=1, print the "..."?) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> (Maybe do strnlen(maxlen), then count strnlen(1) starting at that point > -- so if that returns >=1, print the "..."?) So I found that I can make the code more reasonable with this simple coding, if (maxlen > 0) { s = pnstrdup(s, maxlen); ellipsis = strnlen(s, maxlen + 1) > maxlen; /* enlarge while we can do so cheaply */ enlargeStringInfo(str, maxlen); } ... but the problem is that we now compile stringinfo.c for frontend environments also, and there's no pnstrdup() in frontends. And to introduce it, we'd need a configure check (because GNU libc has it) and a src/port naive implementation and a fe_memutils.c addition. Sigh. Still, it's not that much code, so I'll just go do that and open a separate thread for it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Dec-04, Alvaro Herrera wrote: > > > (Maybe do strnlen(maxlen), then count strnlen(1) starting at that point > > -- so if that returns >=1, print the "..."?) > > So I found that I can make the code more reasonable with this simple > coding, With strndup. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
I made a number of changes on the proposed patch and ended up with the attached. (I have not included the business to limit number of characters, yet). What this does is set an error callback, which adds the parameter values in the DETAIL line. This is at odds with all existing error callbacks: they only add stuff to the CONTEXT string. The implication is that if an error site has a direct errdetail_log() and is run under this error stack, that site's detail is going to be lost and replaced by the "params" one. Currently there aren't many errdetail_log() lines, so it doesn't seem like a terrible problem. However, it's something to keep in mind. This is not final form; there are a couple of XXX comments, and a number of other small infelicities still to fix. But the overall idea should be final. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Hi, On 2019-12-05 14:19:45 -0300, Alvaro Herrera wrote: > What this does is set an error callback, which adds the parameter values > in the DETAIL line. This is at odds with all existing error callbacks: > they only add stuff to the CONTEXT string. The implication is that > if an error site has a direct errdetail_log() and is run under this > error stack, that site's detail is going to be lost and replaced by the > "params" one. Currently there aren't many errdetail_log() lines, so it > doesn't seem like a terrible problem. However, it's something to keep > in mind. It'd not be too hard avoiding overwriting if we wanted. Makes me wonder though, if we ought to invent something similar to the errdata fields we have for schema, table, etc... Haven't yet had time to catch up to the discussion in this thread of the last two weeks... Greetings, Andres Freund
Hello On 2019-Dec-05, Andres Freund wrote: > On 2019-12-05 14:19:45 -0300, Alvaro Herrera wrote: > > What this does is set an error callback, which adds the parameter values > > in the DETAIL line. This is at odds with all existing error callbacks: > > they only add stuff to the CONTEXT string. The implication is that > > if an error site has a direct errdetail_log() and is run under this > > error stack, that site's detail is going to be lost and replaced by the > > "params" one. Currently there aren't many errdetail_log() lines, so it > > doesn't seem like a terrible problem. However, it's something to keep > > in mind. > > It'd not be too hard avoiding overwriting if we wanted. Hmm, maybe errdetail_log() ought to do that. However: if some place has errdetail() and then this introduces errdetail_log(), then the former is not included in the server log (only in the client log). > Makes me wonder though, if we ought to invent something similar to the > errdata fields we have for schema, table, etc... Hmm, perhaps we should do that. It avoids the problem altogether. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Dec-05, Alvaro Herrera wrote: > > Makes me wonder though, if we ought to invent something similar to the > > errdata fields we have for schema, table, etc... > > Hmm, perhaps we should do that. It avoids the problem altogether. ... then again, I'm not up for writing everything that would be required to do that. If somebody wants to propose that separately, I volunteer to review it. But let's not have that block this patch, since this is already a clear improvement. So here's v16. I got rid of all that strlen() business of the output values; instead, preallocate a reasonable guess at the final length (64 bytes for each of the first 128 parameters, plus 16 bytes for for parameter from 129 to 1024). This boils down to exactly the initial size of a stringinfo (1024 bytes) when there are 8 parameters or less. 64 bytes avg of guessed length should be plenty (and it will be even more so when I add the patch to limit the output length to 64+epsilon bytes per param) I admit there are perhaps too many magical numbers in those three lines, though. I don't understand how we ended up with the params put in errdetail_log() -- seems to have been introduced silently between v13 and v14. What's the reason for that? I put it back as errdetail(). Alexey added some discussion about using context/detail when he sent that version, but I think that's an issue we can leave for the hypothetical errparameters() patch. One problem I noticed is that we don't log parameters when exec_bind_message fetches the parameter values. So the very first example problem in testlibpq5 fails to print the values of any parameters previously seen. I don't think this is a real problem in practice. You still get the unparseable value in the error message from the input function, so not having the errdetail() does not seem very important. If anybody would like to review it once more, now's the time. I'm aiming at getting it pushed tomorrow (including the length-limited appendStringInfoStringQuoted stuff). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > If anybody would like to review it once more, now's the time. I'm > aiming at getting it pushed tomorrow (including the length-limited > appendStringInfoStringQuoted stuff). Um, surely this bit is unacceptable? + /* + * XXX this clobbers any other DETAIL line that the error callsite could + * have had. Do we care? + */ + errdetail("parameters: %s", params->paramValuesStr); Even if the scope of the clobber were limited to the parameter input conversion step, this seems like a pretty bad compromise. But AFAICS this'll clobber errdetail for *any* error occurring during query execution in extended query mode. Not only does that render useless a whole lot of sweat we've put into detail messages over the years, but it means that you get fundamentally different (and largely worse) error reporting in extended query mode than you do in simple query mode. I think this issue has to be fixed before committing, not after. The most straightforward fix would be to make the parameter values a brand new component of error reports. However, I see the point that that'd require a huge amount of additional work, and cooperation of client-side libraries that might be a long time coming. Possibly a workable compromise is to emit the info as an error context line, appending it to whatever context exists today. The result might look like, say, ERROR: whatever CONTEXT: SQL function "foo" extended query with parameters x, y, ... For extra credit maybe we could include the query's statement or portal name? errcontext("extended query \"%s\" with parameters %s", ...); An independent point: it seems like just wishful thinking to imagine that src/test/examples/ can serve as a regression test for this. Nor is the proposed program very useful as an example. I'd drop that and find a way to have an actual, routinely-exercised-by-check-world regression test. If memory serves, pgbench can be cajoled into executing stuff in extended query mode --- maybe we could create a test case using that? regards, tom lane
On 2019-Dec-05, Tom Lane wrote: > Possibly a workable compromise is to emit the info as an error > context line, appending it to whatever context exists today. > The result might look like, say, > > ERROR: whatever > CONTEXT: SQL function "foo" > extended query with parameters x, y, ... This is easy to accomodate -- just change the errdetail() to errcontext. That makes the params be reported different than in the non-error case (for duration). > For extra credit maybe we could include the query's statement or > portal name? > > errcontext("extended query \"%s\" with parameters %s", ...); Sure. Done in the attached. > An independent point: it seems like just wishful thinking to imagine that > src/test/examples/ can serve as a regression test for this. Nor is the > proposed program very useful as an example. I'd drop that and find a > way to have an actual, routinely-exercised-by-check-world regression > test. If memory serves, pgbench can be cajoled into executing stuff > in extended query mode --- maybe we could create a test case using > that? I tried pgbench -c1 -t1 -n -M prepared -f errparameters.bnch with this file select '{ invalid ' as value \gset select column1::jsonb from (values (:value)) as q; and got these lines: 2019-12-06 13:49:36.388 -03 [31028] ERROR: invalid input syntax for type json 2019-12-06 13:49:36.388 -03 [31028] DETAIL: Token "invalid" is invalid. 2019-12-06 13:49:36.388 -03 [31028] CONTEXT: JSON data, line 1: { invalid... extended query with parameters: $1 = '{ invalid ' 2019-12-06 13:49:36.388 -03 [31028] STATEMENT: select column1::jsonb from (values ($1)) as q; With this file, select '1' as one \gset SELECT 1 / (random() / 2)::int, :one::int, :two::int; using the same pgbench invocation as above, I get this in the log: 2019-12-06 14:50:59.181 -03 [6187] ERROR: division by zero 2019-12-06 14:50:59.181 -03 [6187] CONTEXT: extended query with parameters: $1 = '1', $2 = NULL 2019-12-06 14:50:59.181 -03 [6187] STATEMENT: SELECT 1 / (random() / 2)::int, $1::int, $2::int; (While testing this I noticed that failing to strdup the text repr of the datum when it's given as a text-format parameter results in bogus output. So I added a pstrdup there.) (It seems a bit weird that if I assign NULL to :two pgbench stores the empty string, instead of the NULL that I get as in the above which is what happens when the variable is not defined at all. That's probably a bug in \gset ...) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
pá 6. 12. 2019 v 18:57 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:
On 2019-Dec-05, Tom Lane wrote:
> Possibly a workable compromise is to emit the info as an error
> context line, appending it to whatever context exists today.
> The result might look like, say,
>
> ERROR: whatever
> CONTEXT: SQL function "foo"
> extended query with parameters x, y, ...
This is easy to accomodate -- just change the errdetail() to errcontext.
That makes the params be reported different than in the non-error case
(for duration).
> For extra credit maybe we could include the query's statement or
> portal name?
>
> errcontext("extended query \"%s\" with parameters %s", ...);
Sure. Done in the attached.
> An independent point: it seems like just wishful thinking to imagine that
> src/test/examples/ can serve as a regression test for this. Nor is the
> proposed program very useful as an example. I'd drop that and find a
> way to have an actual, routinely-exercised-by-check-world regression
> test. If memory serves, pgbench can be cajoled into executing stuff
> in extended query mode --- maybe we could create a test case using
> that?
I tried
pgbench -c1 -t1 -n -M prepared -f errparameters.bnch
with this file
select '{ invalid ' as value \gset
select column1::jsonb from (values (:value)) as q;
and got these lines:
2019-12-06 13:49:36.388 -03 [31028] ERROR: invalid input syntax for type json
2019-12-06 13:49:36.388 -03 [31028] DETAIL: Token "invalid" is invalid.
2019-12-06 13:49:36.388 -03 [31028] CONTEXT: JSON data, line 1: { invalid...
extended query with parameters: $1 = '{ invalid '
2019-12-06 13:49:36.388 -03 [31028] STATEMENT: select column1::jsonb from (values ($1)) as q;
With this file,
select '1' as one \gset
SELECT 1 / (random() / 2)::int, :one::int, :two::int;
using the same pgbench invocation as above, I get this in the log:
2019-12-06 14:50:59.181 -03 [6187] ERROR: division by zero
2019-12-06 14:50:59.181 -03 [6187] CONTEXT: extended query with parameters: $1 = '1', $2 = NULL
2019-12-06 14:50:59.181 -03 [6187] STATEMENT: SELECT 1 / (random() / 2)::int, $1::int, $2::int;
(While testing this I noticed that failing to strdup the text repr of
the datum when it's given as a text-format parameter results in bogus
output. So I added a pstrdup there.)
(It seems a bit weird that if I assign NULL to :two pgbench stores the
empty string, instead of the NULL that I get as in the above which is
what happens when the variable is not defined at all. That's probably a
bug in \gset ...)
psql session variables cannot to hold NULL
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I added some tests to the pgbench suite in the attached. I also finally put it the business to limit the length of parameters reported. I'd probably drop testlibpq5.c, since it seems a bit pointless now ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On 2019-Dec-06, Pavel Stehule wrote: > pá 6. 12. 2019 v 18:57 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> > napsal: > > (It seems a bit weird that if I assign NULL to :two pgbench stores the > > empty string, instead of the NULL that I get as in the above which is > > what happens when the variable is not defined at all. That's probably a > > bug in \gset ...) > > psql session variables cannot to hold NULL I also just learned that the :foo expansion in pgbench behaves crazily when in normal query mode, particularly re. quotes in variables. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I added some tests to the pgbench suite in the attached. I also finally > put it the business to limit the length of parameters reported. > I'd probably drop testlibpq5.c, since it seems a bit pointless now ... I took a look through the v17 patches. 0001: The two header comments for appendStringInfoStringQuoted really ought to define the maxlen parameter, rather than making readers reverse-engineer what that does. I'm not very sold on the enlargeStringInfo call in the maxlen>0 code path, mainly because its calculation of the needed space seems entirely wrong: (a) it's considering maxlen not the actual length, and (b) on the other hand it's not accounting for quotes and ellipses. Forcing an inadequate enlargement is probably worse than doing nothing, so I'd be inclined to just drop that. It is a very bad idea that this is truncating text without regard to multibyte character boundaries. 0002: Seems like BuildParamLogString's "valueLen" parameter ought to be called "maxlen", for consistency with 0001 and because "valueLen" is at best misleading about what the parameter means. I'd toss the enlargeStringInfo call here too, as it seems overly complicated and underly correct or useful. Probably doing MemoryContextReset after each parameter (even the last one!) is a net loss compared to just letting it go till the end. Although I'd be inclined to use ALLOCSET_DEFAULT_SIZES not SMALL_SIZES if you do it like that. Please do not throw away the existing comment "/* Free result of encoding conversion, if any */" in exec_bind_message. As above, this risks generating partial multibyte characters. You might be able to get away with letting appendStringInfoStringQuoted do the multibyte-aware truncation, but you probably have to copy more than just one more extra byte first. I have zero faith in this: + params_errcxt.arg = (void *) &((ParamsErrorCbData) + { portal->name, params }); How do you know where the compiler is putting that value, ie what its lifespan is going to be? Better to use an explicit variable. I concur with dropping testlibpq5. regards, tom lane
On 2019-Dec-07, Tom Lane wrote: > 0001: > It is a very bad idea that this is truncating text without regard to > multibyte character boundaries. Hmm. So it turns out that we recently made stringinfo.c exported to frontends in libpgcommon. Clipping strings at a given character size means we need some facility like pg_encoding_mbcliplen. However, 1) pgcommon does not have wchar.c (libpq does have it), and 2) mbcliplen is not in wchar.c anyway. I see four possible ways forward, with nuances. In order of preference: 1. change enough of the build system so that pg_encoding_mbcliplen is available. (Offhand I see no reason why we couldn't move the function from mbutils.c to wchar.c, but I haven't tried.) 2. in frontend, ignore the specified length; always copy the whole string. 3. don't allow length clipping in frontend. (We have no use for it in the near future.) 3a. Throw a run-time error if it's requested. 3b. throw a compile-time error if requested 4. don't provide this new function #if FRONTEND -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Dec-09, Alvaro Herrera wrote: > I see four possible ways forward, with nuances. In order of preference: > > 1. change enough of the build system so that pg_encoding_mbcliplen is > available. (Offhand I see no reason why we couldn't move the > function from mbutils.c to wchar.c, but I haven't tried.) Ah, this seems to require putting encnames.o and wchar.o in src/common's Makefile and moving pg_encoding_mbcliplen() from mbutils.c to wchar.c. Fairly simple. (There's a little bit of additional fallout because libpq also has encnames/wchar in its roster, but I think this means we can just remove them from there and let them be acquired via libpgcommon.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2019-Dec-07, Tom Lane wrote: >> It is a very bad idea that this is truncating text without regard to >> multibyte character boundaries. > I see four possible ways forward, with nuances. In order of preference: > 1. change enough of the build system so that pg_encoding_mbcliplen is > available. (Offhand I see no reason why we couldn't move the > function from mbutils.c to wchar.c, but I haven't tried.) I'd be in favor of this if it doesn't turn out to require nasty contortions. My gut feeling (like yours, without having looked) is that the incremental amount of code to be moved into wchar.c wouldn't be much. BTW, not really the fault of this patch, but I wonder if we shouldn't make an effort to push the FRONTEND-available parts of wchar.c into src/common, just to make the system structure and build rules less confusing. I think the current situation only exists because this code predates our invention of src/common. regards, tom lane
I wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> 1. change enough of the build system so that pg_encoding_mbcliplen is >> available. (Offhand I see no reason why we couldn't move the >> function from mbutils.c to wchar.c, but I haven't tried.) > I'd be in favor of this if it doesn't turn out to require nasty > contortions. My gut feeling (like yours, without having looked) is that > the incremental amount of code to be moved into wchar.c wouldn't be much. Actually, on second thought --- the issue here is not so much how much new code shows up in libpgcommon, and more that executables using stringinfo.o will now find themselves pulling in all of wchar.o, where before they might've pulled in none of it. We need to look at how much code bloat ensues. In the end it might be smart to put this new functionality in some separate source file instead of dropping it into stringinfo.c. (It could also be that all the executables that need stringinfo.o are already pulling in wchar functionality for other reasons, but we should check the code-size implications before assuming this is fine.) regards, tom lane
On 2019-Dec-09, Tom Lane wrote: > I wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> 1. change enough of the build system so that pg_encoding_mbcliplen is > >> available. (Offhand I see no reason why we couldn't move the > >> function from mbutils.c to wchar.c, but I haven't tried.) > > > I'd be in favor of this if it doesn't turn out to require nasty > > contortions. My gut feeling (like yours, without having looked) is that > > the incremental amount of code to be moved into wchar.c wouldn't be much. > > Actually, on second thought --- the issue here is not so much how much new > code shows up in libpgcommon, and more that executables using stringinfo.o > will now find themselves pulling in all of wchar.o, where before they > might've pulled in none of it. We need to look at how much code bloat > ensues. In the end it might be smart to put this new functionality in > some separate source file instead of dropping it into stringinfo.c. > (It could also be that all the executables that need stringinfo.o are > already pulling in wchar functionality for other reasons, but we should > check the code-size implications before assuming this is fine.) Hmm, that's a good thought. On the one hand, nothing in the frontend needs this new function ... the only current use is pg_waldump which likely doesn't need to quote anything, and I just looked at Andres' programmatically- written node support code which is why we have StringInfo in frontend in the first place -- it also doesn't care. So rather than mess with stringinfo.c at all I could just create stringinfo_server.c and put this function there, compiled only for backend ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > So rather than mess with stringinfo.c at all I could just create > stringinfo_server.c and put this function there, compiled only for > backend ... Good point: if we make a separate source file then we don't have to solve any of the code-movement issues till somebody wants this functionality in frontend. But we should expect that that day might come eventually, so I don't much like "stringinfo_server.c" as the file name. It'll look pretty silly once we start compiling it for frontend. Perhaps "appendquoted.c" or some such? regards, tom lane
On 2019-Dec-09, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > So rather than mess with stringinfo.c at all I could just create > > stringinfo_server.c and put this function there, compiled only for > > backend ... > > Good point: if we make a separate source file then we don't have > to solve any of the code-movement issues till somebody wants this > functionality in frontend. But we should expect that that day might > come eventually, so I don't much like "stringinfo_server.c" as the > file name. It'll look pretty silly once we start compiling it for > frontend. Perhaps "appendquoted.c" or some such? Okay, so here are two patches. I had already used the name stringinfo_mb.c, so that's what's v19. (I'm fine with renaming it to appendquoted.c but we might gain other such functions in the future. Other opinions?) The other patch (v18) puts the new function together with moving wchar to pgcommon. The API is different in each case: if we want it available in frontend, we need to pass the encoding as a parameter rather than use GetDatabaseEncoding(). This is pg_waldump with the first patch (src/utils/mb/stringinfo_mb.c): $ nm --print-size /pgsql/install/master/bin/pg_waldump | grep -i stringinfo 000000000000a8f0 0000000000000060 T appendBinaryStringInfo 000000000000a980 0000000000000051 T appendBinaryStringInfoNT 000000000000a770 00000000000000d7 T appendStringInfo 000000000000a850 0000000000000050 T appendStringInfoChar 000000000000a8a0 000000000000004d T appendStringInfoSpaces 000000000000a950 0000000000000027 T appendStringInfoString 000000000000a630 0000000000000083 T appendStringInfoVA 000000000000a6c0 00000000000000af T enlargeStringInfo 000000000000a5e0 000000000000002b T initStringInfo 000000000000a5a0 0000000000000038 T makeStringInfo 000000000000a610 0000000000000015 T resetStringInfo $ ls -l /pgsql/install/master/bin/pg_waldump -rwxr-xr-x 1 alvherre alvherre 647576 dic 9 16:23 /pgsql/install/master/bin/pg_waldump* This is with v18: $ nm --print-size /pgsql/install/master/bin/pg_waldump | grep -i stringinfo 000000000000c8f0 0000000000000060 T appendBinaryStringInfo 000000000000c980 0000000000000051 T appendBinaryStringInfoNT 000000000000c770 00000000000000d7 T appendStringInfo 000000000000c850 0000000000000050 T appendStringInfoChar 000000000000c8a0 000000000000004d T appendStringInfoSpaces 000000000000c950 0000000000000027 T appendStringInfoString 000000000000c9e0 00000000000001b1 T appendStringInfoStringQuoted 000000000000c630 0000000000000083 T appendStringInfoVA 000000000000c6c0 00000000000000af T enlargeStringInfo 000000000000c5e0 000000000000002b T initStringInfo 000000000000c5a0 0000000000000038 T makeStringInfo 000000000000c610 0000000000000015 T resetStringInfo $ ls -l /pgsql/install/master/bin/pg_waldump -rwxr-xr-x 1 alvherre alvherre 704808 dic 9 16:29 /pgsql/install/master/bin/pg_waldump* While the function itself is tiny (though it's the largest of all stringinfo functions, hmm), it has led to a rather huge bloating of the binary. That's because the new binary contains a number of functions from wchar.c, like you said, such as $ nm --print-size /pgsql/install/master/bin/pg_waldump | grep dsplen 000000000000d820 0000000000000026 t pg_ascii_dsplen 000000000000e200 0000000000000005 t pg_big5_dsplen 000000000000eb30 0000000000000030 T pg_encoding_dsplen 000000000000dac0 000000000000002e t pg_euccn_dsplen 000000000000d960 0000000000000036 t pg_eucjp_dsplen 000000000000d9b0 000000000000002e t pg_euckr_dsplen 000000000000dc10 000000000000002e t pg_euctw_dsplen 000000000000e230 0000000000000005 t pg_gb18030_dsplen 000000000000e210 0000000000000005 t pg_gbk_dsplen 000000000000dd10 0000000000000005 t pg_johab_dsplen 000000000000e170 0000000000000026 t pg_latin1_dsplen 000000000000e0f0 0000000000000034 t pg_mule_dsplen 000000000000e1c0 0000000000000036 t pg_sjis_dsplen 000000000000e220 0000000000000005 t pg_uhc_dsplen 000000000000e870 000000000000013e t pg_utf_dsplen and some others. All in all, it seems like v19 is the way to go. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Also: * Thoughts from Andres, who was busily messing about with stringinfo.c on his patch series for programmatic out/read node functions? * Thoughts on changing the current usage of the ...Quoted() function in postgres.c and pl_exec.c so that they print N characters (64) instead of the current default of printing everything? I'm up for changing, but have got no +1s on that. I think bloating log files with enormous parameters is pointless. * v18 and v19 now alwys do a "strlen(s)", i.e. they scan the whole input string -- pointless when maxlen is given. We could avoid that for very large input strings by doing strnlen(maxlen + MAX_MULTIBYTE_CHAR_LEN) so that we capture our input string plus one additional multibyte char. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2019-Dec-09, Tom Lane wrote: >> Good point: if we make a separate source file then we don't have >> to solve any of the code-movement issues till somebody wants this >> functionality in frontend. But we should expect that that day might >> come eventually, so I don't much like "stringinfo_server.c" as the >> file name. It'll look pretty silly once we start compiling it for >> frontend. Perhaps "appendquoted.c" or some such? > Okay, so here are two patches. I had already used the name > stringinfo_mb.c, so that's what's v19. (I'm fine with renaming it to > appendquoted.c but we might gain other such functions in the future. > Other opinions?) "stringinfo_mb.c" is fine with me. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > * Thoughts on changing the current usage of the ...Quoted() function in > postgres.c and pl_exec.c so that they print N characters (64) instead > of the current default of printing everything? I'm up for changing, > but have got no +1s on that. I think bloating log files with enormous > parameters is pointless. Meh ... people will inevitably complain that they needed to see the whole value, and we'll end up having to add another configuration variable. Let's not go there just yet. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Also: > * v18 and v19 now alwys do a "strlen(s)", i.e. they scan the whole input > string -- pointless when maxlen is given. We could avoid that for > very large input strings by doing strnlen(maxlen + MAX_MULTIBYTE_CHAR_LEN) > so that we capture our input string plus one additional multibyte > char. BTW, as far as that goes, it seems to me this patch is already suffering from a lot of premature micro-optimization. Is there even any evidence to justify that complicated chunk-at-a-time copying strategy, rather than doing quote-doubling the same way we do it everywhere else? The fact that that effectively scans the input string twice means that it's not an ironclad win compared to the naive way, and it seems like it could lose significantly for a case with lots of quote marks. Moreover, for the lengths of strings I expect we're mostly dealing with here, it would be impossible to measure any savings even assuming there is some. If I were the committer I think I'd just flush that and do it the same way as we do it in existing code. regards, tom lane
Hi, On 2019-12-09 17:05:31 -0300, Alvaro Herrera wrote: > * Thoughts from Andres, who was busily messing about with stringinfo.c > on his patch series for programmatic out/read node functions? I don't immediately see a need for mb functionality there. But to me it seems pretty clear that we're going to need mb functionality in more frontend programs over time. The only frontend part needing stringinfo there is the metadata generation tool, and that doesn't care about anything like this. The node functions themselves are obviously still backend only. I'm not even sure that it's worth caring over the binary size this way, tbh. If somebody is actually concerned about that, they ought to use LTO + -Os, leading to unreferenced functions (and even unreferenced *parts* of functions) getting completely thrown out. I'd also assume that size number you cited earlier contains debug information etc? Because after stripping / without debug information, I get like 110k/97k for assert/optimize builds of waldump from master? 130k/113k for v18. And 110k/97k for v19. Greetings, Andres Freund
On 2019-Dec-09, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Also: > > * v18 and v19 now alwys do a "strlen(s)", i.e. they scan the whole input > > string -- pointless when maxlen is given. We could avoid that for > > very large input strings by doing strnlen(maxlen + MAX_MULTIBYTE_CHAR_LEN) > > so that we capture our input string plus one additional multibyte > > char. > > BTW, as far as that goes, it seems to me this patch is already suffering > from a lot of premature micro-optimization. Is there even any evidence > to justify that complicated chunk-at-a-time copying strategy, rather than > doing quote-doubling the same way we do it everywhere else? The fact that > that effectively scans the input string twice means that it's not an > ironclad win compared to the naive way, and it seems like it could lose > significantly for a case with lots of quote marks. Moreover, for the > lengths of strings I expect we're mostly dealing with here, it would be > impossible to measure any savings even assuming there is some. If I were > the committer I think I'd just flush that and do it the same way as we > do it in existing code. It's been us committers (Andres and I) that have derailed this patch with some apparently overcomplicated logic. There was nothing of that stuff in the original. See Andres review comments https://postgr.es/m/20190920203905.xkv5udsd5dxfs6tr@alap3.anarazel.de There's also a comment nearby about growing the stringinfo to a reasonable guess of the final size, rather than letting it grow normally, to avoid multiple reallocs. That's why we had all that stuff there. He has looked at execution profiles much more than I have, so I take his word from it. The stuff about truncating to N chars was of my own devising. If we don't want truncation in log_parameters_on_error either, we could do away with the stringinfo changes altogether. (I stand by my opinion that we should truncate, but if there are contrary votes I'm happy to stand down and just get the suggested feature pushed.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > The API is different in each case: if we want it available in frontend, > we need to pass the encoding as a parameter rather than use > GetDatabaseEncoding(). Hm, yeah, so that's another reason we aren't going to be sharing this code exactly with frontend. Given that, dropping it under utils/mb seems fine for now. (I'm still wondering how much of utils/mb could be pushed over to src/common, but that seems like an independent project.) Some quick review of v19: * If it's going into utils/mb, then you should update utils/mb/README. (Or else we should get rid of that file's explanation of all the .c files in its directory.) * The file header comment for stringinfo_mb.c should perhaps explain why we didn't just drop this into stringinfo.c, along the lines of "This is separate from stringinfo.c so that frontend users of that file need not pull in a bunch of multibyte encoding support." * The file header comment for stringinfo_mb.c gives its path incorrectly. * What's the need for including utils/memutils.h here? * Still no documentation of maxlen in the .c file, and the docs added to the .h file are confused (what "given encoding"? also it should say "bytes" not "characters"). * This patch shouldn't be touching stringinfo.c at all anymore. * I'm kind of dubious that stringinfo.h is where to put the extern either; with where we're at now, perhaps pg_wchar.h is better. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > The stuff about truncating to N chars was of my own devising. If we > don't want truncation in log_parameters_on_error either, we could do > away with the stringinfo changes altogether. (I stand by my opinion > that we should truncate, but if there are contrary votes I'm happy to > stand down and just get the suggested feature pushed.) I think it's a reasonable thing to do, modulo my fear that people are going to insist on having a way to configure the limit. regards, tom lane
On Mon, 9 Dec 2019 at 15:17, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Meh ... people will inevitably complain that they needed to see the > whole value, and we'll end up having to add another configuration > variable. Let's not go there just yet. I haven't been following this whole thread but my initial reaction is that this particular configuration parameter would actually carry it's weight. While some people have applications where they know the expected size of the data and can safely log the entire data to the logs other people deal with user-supplied data that can be very large. It would suck to have something like Gitlab dump entire merge diffs to the log and those aren't even very large, probably under a megabyte. Some users will have individual data that are up to 1GB.... -- greg
Greg Stark <stark@mit.edu> writes: > On Mon, 9 Dec 2019 at 15:17, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Meh ... people will inevitably complain that they needed to see the >> whole value, and we'll end up having to add another configuration >> variable. Let's not go there just yet. > I haven't been following this whole thread but my initial reaction is > that this particular configuration parameter would actually carry it's > weight. Possibly so. I was mainly objecting to changing existing behaviors without offering any configuration recourse for getting back the existing behavior. Although it would be sensible to allow logging of parameter values to be controlled by a new GUC, it's less clear to me that the same GUC ought to control what plpgsql's language feature print_strict_params does. So there would be room for discussion about that even if we agreed on making this configurable. regards, tom lane
On 2019-Dec-09, Tom Lane wrote: > Some quick review of v19: Here's v20 with all these comments hopefully addressed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On 2019-Dec-10, Alvaro Herrera wrote: > On 2019-Dec-09, Tom Lane wrote: > > > Some quick review of v19: > > Here's v20 with all these comments hopefully addressed. Grr, I had forgotten to git add the stringinfo.h -> pg_wchar.h changes, so the prototype isn't anywhere in v20. However, after looking at it again, I'm not very happy with how that turned out, because pg_wchar.h is a frontend-exposed header. So I instead propose to put it in a separate file src/include/mb/stringinfo_mb.h. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On 2019-Dec-07, Tom Lane wrote: > 0002: Here's a version of this part with fixes for these comments. It applies on top of the stringinfo_mb.c patch sent elsewhere in the thread. (If we were to add a "log_parameters_on_error_max_length" GUC to decide the length to log, we would get rid of the remaining magical numbers in this code). > Seems like BuildParamLogString's "valueLen" parameter ought to be called > "maxlen", for consistency with 0001 and because "valueLen" is at best > misleading about what the parameter means. > > I'd toss the enlargeStringInfo call here too, as it seems overly > complicated and underly correct or useful. > > Probably doing MemoryContextReset after each parameter (even the last one!) > is a net loss compared to just letting it go till the end. Although > I'd be inclined to use ALLOCSET_DEFAULT_SIZES not SMALL_SIZES if you > do it like that. > > Please do not throw away the existing comment "/* Free result of encoding > conversion, if any */" in exec_bind_message. > > As above, this risks generating partial multibyte characters. You might > be able to get away with letting appendStringInfoStringQuoted do the > multibyte-aware truncation, but you probably have to copy more than just > one more extra byte first. > > I have zero faith in this: > > + params_errcxt.arg = (void *) &((ParamsErrorCbData) > + { portal->name, params }); > > How do you know where the compiler is putting that value, ie what > its lifespan is going to be? Better to use an explicit variable. > > I concur with dropping testlibpq5. > > regards, tom lane -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On 2019-Dec-10, Alvaro Herrera wrote: > On 2019-Dec-10, Alvaro Herrera wrote: > > > On 2019-Dec-09, Tom Lane wrote: > > > > > Some quick review of v19: > > > > Here's v20 with all these comments hopefully addressed. > > Grr, I had forgotten to git add the stringinfo.h -> pg_wchar.h changes, > so the prototype isn't anywhere in v20. However, after looking at it > again, I'm not very happy with how that turned out, because pg_wchar.h > is a frontend-exposed header. So I instead propose to put it in a > separate file src/include/mb/stringinfo_mb.h. Pushed 0001. Thanks for all the reviews -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Here's a curious thing that happens with this patch. If you have log_duration set so that parameters are logged during the bind phase, and then an error occurs during the execution phase but you don't have log_parameters_on_error set true, the second error will log the parameters nonetheless ... because they were saved in the ParamListInfo struct by the errdetail_params() call in the check_log_durations block during bind. I'm not sure this is a terrible problem, but does anybody think we *not* save params->paramValuesStr across a log_duration when log_parameters_on_error is not set? (I think this is somewhat equivalent to the "was_logged" case in check_log_duration itself.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alexey: I would appreciate it if you give this patch a spin. Let me know if it does what you wanted it to do. On 2019-Dec-10, Alvaro Herrera wrote: > Here's a curious thing that happens with this patch. If you have > log_duration set so that parameters are logged during the bind phase, > and then an error occurs during the execution phase but you don't have > log_parameters_on_error set true, the second error will log the > parameters nonetheless ... because they were saved in the ParamListInfo > struct by the errdetail_params() call in the check_log_durations block > during bind. AFAICS this has a simple solution, which is to stop saving the parameter string in BuildParamLogString; instead, just *return* the string. Caller can then assign it into params->paramValuesStr if appropriate (which, in the case of errdetail_params(), it is not.) v22 does it like that. There is still a much smaller issue that if you have both log_durations set to log the params during bind, and log_parameters_on_error to true, the parameters will appear twice. But I think that's correct by definition. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Hi, I see that you pushed this patch. I'm unfortunately unhappy with the approach taken. As previously said, handling a lot of this from exec_* is a mistake in my opinion. Pretty much the same problem exists for parametrized query execution from other contexts, e.g. for queries executed inside plpgsql. By putting the responsibility to manage error contexts etc from with exec_*, we'd need to add a copy of that to pretty much every place executing queries. This should all be in the portal code, with an optimization for computing the parameter values from strings inside postgres.c, when the input is text. Greetings, Andres Freund
Hello On 2019-Dec-12, Andres Freund wrote: > I see that you pushed this patch. Yeah, a version of it -- significantly different from what Alexey submitted last. > I'm unfortunately unhappy with the > approach taken. As previously said, handling a lot of this from exec_* > is a mistake in my opinion. Pretty much the same problem exists for > parametrized query execution from other contexts, e.g. for queries > executed inside plpgsql. By putting the responsibility to manage error > contexts etc from with exec_*, we'd need to add a copy of that to pretty > much every place executing queries. This should all be in the portal > code, with an optimization for computing the parameter values from > strings inside postgres.c, when the input is text. Hmm. I think there are two pieces of interest for this feature. One saves a text version of the params; the other is setting the error context that reports them. I agree that possibly it would work to have PortalStart be in charge of the former. However, AFAICS the error context must be set up in each place that's executing the query; I don't think it would work to try to hack it in SPI_cursor_open_internal, for example which is what plpgsql uses; rather it has to be in plpgsql/pl_exec.c itself. (For example, you cannot even put it directly in exec_dynquery_with_params -- it has to be in the callers of that routine, and surround all code until after exec_for_query.) I think the current state of affairs is already a good improvement over what was there before. Can it be improved? Sure. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services