Обсуждение: Bogus duplicate command issued in pg_dump

Поиск
Список
Период
Сортировка

Bogus duplicate command issued in pg_dump

От
Tom Lane
Дата:
While investigating for 353708e1f, I happened to notice that pg_dump's
binary_upgrade_set_type_oids_by_type_oid() contains

    PQExpBuffer upgrade_query = createPQExpBuffer();
    ...
    appendPQExpBuffer(upgrade_query,
                      "SELECT typarray "
    ...
    res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
    ...
            appendPQExpBuffer(upgrade_query,
                              "SELECT t.oid, t.typarray "
    ...
            res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);

How's that work?  It turns out that the second ExecuteSqlQueryForSingleRow
is sending a string like "SELECT typarray ...;SELECT t.oid, t.typarray ..."
which the backend happily interprets as multiple commands.  It sends
back multiple query results, and then PQexec discards all but the last
one.  So, very accidentally, there's no observable bug, just some wasted
cycles.

I will go fix this, but I wondered if there are any other similar
errors, or what we might do to prevent the same type of mistake
in future.  I experimented with replacing most of pg_dump's PQexec
calls with PQexecParams, as in the attached quick hack (NOT meant
for commit).  That did not turn up any additional cases, but of
course I have limited faith in the code coverage of check-world.

We could consider a more global change to get rid of using
appendPQExpBuffer where it's not absolutely necessary, so that
there are fewer bad examples to copy.  Another idea is to deem
it an anti-pattern to end a query with a semicolon.  But I don't
have a lot of faith in people following those coding rules in
future, either.  It'd also be a lot of code churn for what is
in the end a relatively harmless bug.

Thoughts?

            regards, tom lane

diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index bc5251be82..b48a679faf 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -1320,7 +1320,9 @@ lockTableForWorker(ArchiveHandle *AH, TocEntry *te)
     appendPQExpBuffer(query, "LOCK TABLE %s IN ACCESS SHARE MODE NOWAIT",
                       qualId);

-    res = PQexec(AH->connection, query->data);
+    res = PQexecParams(AH->connection, query->data,
+                       0, NULL, NULL, NULL,
+                       NULL, 0);

     if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
         fatal("could not obtain lock on relation \"%s\"\n"
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 49bf0907cd..85e6fe6523 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -3122,7 +3122,9 @@ _doSetSessionAuth(ArchiveHandle *AH, const char *user)
     {
         PGresult   *res;

-        res = PQexec(AH->connection, cmd->data);
+        res = PQexecParams(AH->connection, cmd->data,
+                           0, NULL, NULL, NULL,
+                           NULL, 0);

         if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
             /* NOT warn_or_exit_horribly... use -O instead to skip this. */
@@ -3259,7 +3261,9 @@ _selectOutputSchema(ArchiveHandle *AH, const char *schemaName)
     {
         PGresult   *res;

-        res = PQexec(AH->connection, qry->data);
+        res = PQexecParams(AH->connection, qry->data,
+                           0, NULL, NULL, NULL,
+                           NULL, 0);

         if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
             warn_or_exit_horribly(AH,
@@ -3321,7 +3325,9 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
     {
         PGresult   *res;

-        res = PQexec(AH->connection, qry->data);
+        res = PQexecParams(AH->connection, qry->data,
+                           0, NULL, NULL, NULL,
+                           NULL, 0);

         if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
             warn_or_exit_horribly(AH,
@@ -3371,7 +3377,9 @@ _selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
     {
         PGresult   *res;

-        res = PQexec(AH->connection, cmd->data);
+        res = PQexecParams(AH->connection, cmd->data,
+                           0, NULL, NULL, NULL,
+                           NULL, 0);

         if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
             warn_or_exit_horribly(AH,
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index 3184eda3e7..b621f43933 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -279,7 +279,9 @@ ExecuteSqlStatement(Archive *AHX, const char *query)
     ArchiveHandle *AH = (ArchiveHandle *) AHX;
     PGresult   *res;

-    res = PQexec(AH->connection, query);
+    res = PQexecParams(AH->connection, query,
+                       0, NULL, NULL, NULL,
+                       NULL, 0);
     if (PQresultStatus(res) != PGRES_COMMAND_OK)
         die_on_query_failure(AH, query);
     PQclear(res);
@@ -291,7 +293,9 @@ ExecuteSqlQuery(Archive *AHX, const char *query, ExecStatusType status)
     ArchiveHandle *AH = (ArchiveHandle *) AHX;
     PGresult   *res;

-    res = PQexec(AH->connection, query);
+    res = PQexecParams(AH->connection, query,
+                       0, NULL, NULL, NULL,
+                       NULL, 0);
     if (PQresultStatus(res) != status)
         die_on_query_failure(AH, query);
     return res;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7c0e396ce1..f1e8b0b5c2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4574,7 +4574,7 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
     {
         if (fout->remoteVersion >= 140000)
         {
-            appendPQExpBuffer(upgrade_query,
+            printfPQExpBuffer(upgrade_query,
                               "SELECT t.oid, t.typarray "
                               "FROM pg_catalog.pg_type t "
                               "JOIN pg_catalog.pg_range r "
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 10383c713f..cfe97bf530 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -1667,7 +1667,9 @@ executeQuery(PGconn *conn, const char *query)

     pg_log_info("executing %s", query);

-    res = PQexec(conn, query);
+    res = PQexecParams(conn, query,
+                       0, NULL, NULL, NULL,
+                       NULL, 0);
     if (!res ||
         PQresultStatus(res) != PGRES_TUPLES_OK)
     {
@@ -1690,7 +1692,9 @@ executeCommand(PGconn *conn, const char *query)

     pg_log_info("executing %s", query);

-    res = PQexec(conn, query);
+    res = PQexecParams(conn, query,
+                       0, NULL, NULL, NULL,
+                       NULL, 0);
     if (!res ||
         PQresultStatus(res) != PGRES_COMMAND_OK)
     {

Re: Bogus duplicate command issued in pg_dump

От
"David G. Johnston"
Дата:
On Sun, Jan 23, 2022 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

    res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
    ...
            appendPQExpBuffer(upgrade_query,
                              "SELECT t.oid, t.typarray "
    ...
            res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);

How's that work?

I just spent 10 minutes thinking you were wrong because I confused the upgrade_query and upgrade_buffer variables in that function.

You might just as well have fixed the first upgrade_query command to be print instead of append.  And, better yet, renamed its variable to "array_oid_query" then added a new PQExpBuffer variable "range_oid_query".  Because, is double-purposing a variable here, with a badly chosen generic name, really worth saving a create buffer call?  If it is, naming is something like "oid_query" would be better than leading with "upgrade".  Though I am looking at this function in isolation...

We could consider a more global change to get rid of using
appendPQExpBuffer where it's not absolutely necessary, so that
there are fewer bad examples to copy.  Another idea is to deem
it an anti-pattern to end a query with a semicolon.  But I don't
have a lot of faith in people following those coding rules in
future, either.  It'd also be a lot of code churn for what is
in the end a relatively harmless bug.

Thoughts?


I would avoid overreacting.  The biggest issue would be when the previous query used to execute in some cases but using append incorrectly prevents that prior execution.  I don't think that is likely to get past review and committed in practice.  Here it is all new code and while as I noted above it has some quality concerns it did work correctly when committed and that isn't surprising.  I don't see enough benefit to warrant refactoring here.

I think a contributing factor here is the fact that the upgrade_buffer is designed around using appendPQExpBuffer.  The kind of typo seems obvious given that in most cases it will actually provide valid results.  But it also seems to restrict our ability to do something globally.

David J.

Re: Bogus duplicate command issued in pg_dump

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> I just spent 10 minutes thinking you were wrong because I confused the
> upgrade_query and upgrade_buffer variables in that function.

> You might just as well have fixed the first upgrade_query command to be
> print instead of append.  And, better yet, renamed its variable to
> "array_oid_query" then added a new PQExpBuffer variable "range_oid_query".
> Because, is double-purposing a variable here, with a badly chosen generic
> name, really worth saving a create buffer call?  If it is, naming is
> something like "oid_query" would be better than leading with "upgrade".

Yeah, I was not terribly impressed with the naming choices in that
code either, but I didn't feel like getting into cosmetic changes.
If I were renaming things in that area, I'd start with the function
name --- binary_upgrade_set_type_oids_by_type_oid is long enough
to aggravate carpal-tunnel problems, and yet it's as clear as mud;
what does "by" mean in this context?  Don't expect the function's
comment to tell you, because there is none, another way in which
this code is subpar.

The bigger issue to me is that the behavior of PQexec masks what
seems like a pretty easy mistake to make.  I don't like that,
but I'm not seeing any non-invasive way to improve things.
And, as you say, an invasive change seems like overreaction.

            regards, tom lane



Re: Bogus duplicate command issued in pg_dump

От
Michael Paquier
Дата:
On Sun, Jan 23, 2022 at 01:31:03PM -0500, Tom Lane wrote:
> We could consider a more global change to get rid of using
> appendPQExpBuffer where it's not absolutely necessary, so that
> there are fewer bad examples to copy.  Another idea is to deem
> it an anti-pattern to end a query with a semicolon.  But I don't
> have a lot of faith in people following those coding rules in
> future, either.  It'd also be a lot of code churn for what is
> in the end a relatively harmless bug.

Could a backend-side, run-time configurable developper GUC,
potentially help here?  This could look after multiple queries in code
paths where we don't want any, once you combine it with a specific
compilation flag à-la-ENFORCE_REGRESSION_TEST.
--
Michael

Вложения

Re: Bogus duplicate command issued in pg_dump

От
"David G. Johnston"
Дата:
On Sun, Jan 23, 2022 at 7:25 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Jan 23, 2022 at 01:31:03PM -0500, Tom Lane wrote:
> We could consider a more global change to get rid of using
> appendPQExpBuffer where it's not absolutely necessary, so that
> there are fewer bad examples to copy.  Another idea is to deem
> it an anti-pattern to end a query with a semicolon.  But I don't
> have a lot of faith in people following those coding rules in
> future, either.  It'd also be a lot of code churn for what is
> in the end a relatively harmless bug.

Could a backend-side, run-time configurable developper GUC,
potentially help here?  This could look after multiple queries in code
paths where we don't want any, once you combine it with a specific
compilation flag à-la-ENFORCE_REGRESSION_TEST.


I don't see how this helps unless you change the system (under the compilation flag) into "Don't allow multiple commands under simple query protocol unless the user has indicated that they will be doing that by enabling the allow_multiple_commands_in_simple_query_protocol GUC at the start of their, possibly multi-transaction, query." (I don't even want to consider them toggling it).  Forcing an author to specify when they don't want multiple commands is just going to be ignored and since no errors will be raised (even under the compiler flag) we will effectively remain status quo.

I do not see that alternative mode being practical, let alone a net positive.

Is there some trick in C where you can avoid the buffer?  That is what basically caused the issue.  If one could write:
res = ExecuteSqlQueryForSingleRow(fout, "SELECT "
typname FROM " ...);
directly the decision to print or append the buffer would not be necessary and I would expect one-shot queries to then be done using this, thus avoiding the observed issue.

I could see having the executor operate in a mode where if a query result is discarded it logs a warning.  But that cannot be unconditional.  "SELECT perform_function(); SELECT * FROM table_the_function_just_populated;" discards a result but because functions must be executed in SELECT this situation is one that should be ignored.  In short, the setup seems like it should be easy enough (I'd hope we can figure out when we've discarded a query result because a new one came after) but defining the exceptions to the rule seems much trickier (we'd then probably want the GUC in order to get rid of false positives that cannot be added to the exceptions).  And in order to catch existing bugs you still have to have confidence in the check-world.  But if you have that then a behavioral bug introduced by this kind of error is sufficiently likely to be caught anyway that the need for this decreases substantially.

So, it seems the time would probably be better spent doing organized code exploring and improving test coverage if there is a real concern that there are more bugs of this ilk out there causing behavioral or meaningful performance issues.  At least for pg_dump we ostensibly can test that the most important outcome isn't violated - the what was dumped gets restored.  I presume we do that and it feels like if we missed capturing the outcome of a select command that would be unlikely.

David J.