Обсуждение: COPY TO returning empty result with parallel ALTER TABLE
Hi all,
we experienced what seems to be a bug in the COPY TO implementation. When a
table is being rewritten by an ALTER TABLE statement, a parallel COPY TO
results in an empty result.
Consider the following table data:
  CREATE TABLE test (id INTEGER NOT NULL, PRIMARY KEY (id));
  INSERT INTO test (id) SELECT generate_series(1, 10000000);
One session does:
  ALTER TABLE test ADD COLUMN dummy BOOLEAN NOT NULL DEFAULT FALSE;
This acquires an exclusive lock to the table.
Another session now performs parallel:
  COPY test TO STDOUT;
This blocks on the exclusive lock held by the ALTER statement. When the ALTER
staement is finished, the COPY statement returns with an empty result. Same
goes for COPY (SELECT ...) TO, whereas the same SELECT executed without COPY
blocks and returns the correct result as expected.
This is my analysis of this issue:
The backend opens the rewritten data files, but it ignores the complete
content, which indicates the data is being ignored because of XIDs. For direct
SELECT statements the top-level query parsing acquires locks on involved tables
and creates a new snapshot, but for COPY statements the parsing and locking is
done later in COPY code. After locking the tables in COPY code, the data
is read with an old snapshot, effectively ignoring all data from the rewritten
table.
I've check git master and 9.x and all show the same behaviour. I came up with
the patch below, which is against curent git master. The patch modifies the
COPY TO code to create a new snapshot, after acquiring the necessary locks on
the source tables, so that it sees any modification commited by other backends.
Despite thinking this is the correct solution, another solution or
optimization would be to have ALTER TABLE, which holds the highest lock level,
set the XID of rewritten tuples to the FrozenXID, as no other backend should
access the table before the ALTER TABLE is committed.
Sven
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b83576..fe2d157 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1344,6 +1344,13 @@ BeginCopy(bool is_from,
                     (errcode(ERRCODE_UNDEFINED_COLUMN),
                      errmsg("table \"%s\" does not have OIDs",
                             RelationGetRelationName(cstate->rel))));
+
+        /*
+         * Use a new snapshot to ensure this query sees
+         * results of any previously executed queries.
+         */
+        if (!is_from)
+            PushActiveSnapshot(GetTransactionSnapshot());
     }
     else
     {
@@ -1394,11 +1401,10 @@ BeginCopy(bool is_from,
         plan = planner(query, 0, NULL);
         /*
-         * Use a snapshot with an updated command ID to ensure this query sees
+         * Use a new snapshot to ensure this query sees
          * results of any previously executed queries.
          */
-        PushCopiedSnapshot(GetActiveSnapshot());
-        UpdateActiveSnapshotCommandId();
+        PushActiveSnapshot(GetTransactionSnapshot());
         /* Create dest receiver for COPY OUT */
         dest = CreateDestReceiver(DestCopyOut);
@@ -1741,9 +1747,11 @@ EndCopyTo(CopyState cstate)
         ExecutorFinish(cstate->queryDesc);
         ExecutorEnd(cstate->queryDesc);
         FreeQueryDesc(cstate->queryDesc);
-        PopActiveSnapshot();
     }
+    /* Discard snapshot */
+    PopActiveSnapshot();
+
     /* Clean up storage */
     EndCopy(cstate);
 }
			
		
--On 3. November 2014 18:15:04 +0100 Sven Wegener
<sven.wegener@stealer.net> wrote:
> I've check git master and 9.x and all show the same behaviour. I came up
> with the patch below, which is against curent git master. The patch
> modifies the COPY TO code to create a new snapshot, after acquiring the
> necessary locks on the source tables, so that it sees any modification
> commited by other backends.
Well, i have the feeling that there's nothing wrong with it. The ALTER
TABLE command has rewritten all tuples with its own XID, thus the current
snapshot does not "see" these tuples anymore. I suppose that in
SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still
doesn't return the tuples you'd like to see.
--
Thanks
    Bernd
			
		Bernd Helmle <mailings@oopsware.de> writes:
> --On 3. November 2014 18:15:04 +0100 Sven Wegener
> <sven.wegener@stealer.net> wrote:
>> I've check git master and 9.x and all show the same behaviour. I came up
>> with the patch below, which is against curent git master. The patch
>> modifies the COPY TO code to create a new snapshot, after acquiring the
>> necessary locks on the source tables, so that it sees any modification
>> commited by other backends.
> Well, i have the feeling that there's nothing wrong with it. The ALTER
> TABLE command has rewritten all tuples with its own XID, thus the current
> snapshot does not "see" these tuples anymore. I suppose that in
> SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still
> doesn't return the tuples you'd like to see.
Not sure.  The OP's point is that in a SELECT, you do get unsurprising
results, because a SELECT will acquire its execution snapshot after it's
gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
already acts like he wants, so why isn't plain COPY equivalent to that?
What concerns me more is whether there are other utility statements that
have comparable issues.  We should not fix just COPY if there are more
cases.
            regards, tom lane
			
		On 11/04/2014 01:51 PM, Tom Lane wrote: > Bernd Helmle <mailings@oopsware.de> writes: >> --On 3. November 2014 18:15:04 +0100 Sven Wegener >> <sven.wegener@stealer.net> wrote: >>> I've check git master and 9.x and all show the same behaviour. I came up >>> with the patch below, which is against curent git master. The patch >>> modifies the COPY TO code to create a new snapshot, after acquiring the >>> necessary locks on the source tables, so that it sees any modification >>> commited by other backends. >> Well, i have the feeling that there's nothing wrong with it. The ALTER >> TABLE command has rewritten all tuples with its own XID, thus the current >> snapshot does not "see" these tuples anymore. I suppose that in >> SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still >> doesn't return the tuples you'd like to see. > Not sure. The OP's point is that in a SELECT, you do get unsurprising > results, because a SELECT will acquire its execution snapshot after it's > gotten AccessShareLock on the table. Arguably COPY should behave likewise. > Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably > already acts like he wants, so why isn't plain COPY equivalent to that? Yes, that seems like an outright bug. cheers andrew
On Tue, 4 Nov 2014, Bernd Helmle wrote: > --On 3. November 2014 18:15:04 +0100 Sven Wegener <sven.wegener@stealer.net> > wrote: > > > I've check git master and 9.x and all show the same behaviour. I came up > > with the patch below, which is against curent git master. The patch > > modifies the COPY TO code to create a new snapshot, after acquiring the > > necessary locks on the source tables, so that it sees any modification > > commited by other backends. > > Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE > command has rewritten all tuples with its own XID, thus the current snapshot > does not "see" these tuples anymore. I suppose that in SERIALIZABLE or > REPEATABLE READ transaction isolation your proposal still doesn't return the > tuples you'd like to see. No, but with REPEATABLE READ and SERIALIZABLE the plain SELECT case is currently broken too. The issue I'm fixing is that COPY (SELECT ...) TO and SELECT behave differently. So, how does an ALTER TABLE should behave transaction-wise? PostgreSQL claims transactional DDL support. As mentioned above a parallel SELECT with SERIALIZABLE returns an empty result, but it also sees the schema change, at least the change shows up in the result tuple description. The change doesn't show up in pg_catalog, so that bit is handled correctly. The schema change transaction can be rolled back the way it is now, so it's kind of transactional, but other transactions seeing the schema change in query results is broken. The empty result might be fixed by just keeping the old XID of rewritten tuples, as the exclusive lock ALTER TABLE helds should be enough to make sure nobody is actively accessing the rewritten table. But I'm pondering if there is a solution for the visible schema change that doesn't involve keeping the old data files around or to just raise an serialization error. Coming back on my mentioned solution of setting the XID of rewritten tuples to the FrozenXID. That's broken as in the REPEATABLE READ/SERIALIZABLE case there might be tuples that should not be seen by older transactions and moving the tuples to the FrozenXID would make them visible. Sven
On Tue, 4 Nov 2014, Tom Lane wrote: > Bernd Helmle <mailings@oopsware.de> writes: > > --On 3. November 2014 18:15:04 +0100 Sven Wegener > > <sven.wegener@stealer.net> wrote: > >> I've check git master and 9.x and all show the same behaviour. I came up > >> with the patch below, which is against curent git master. The patch > >> modifies the COPY TO code to create a new snapshot, after acquiring the > >> necessary locks on the source tables, so that it sees any modification > >> commited by other backends. > > > Well, i have the feeling that there's nothing wrong with it. The ALTER > > TABLE command has rewritten all tuples with its own XID, thus the current > > snapshot does not "see" these tuples anymore. I suppose that in > > SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still > > doesn't return the tuples you'd like to see. > > Not sure. The OP's point is that in a SELECT, you do get unsurprising > results, because a SELECT will acquire its execution snapshot after it's > gotten AccessShareLock on the table. Arguably COPY should behave likewise. > Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably > already acts like he wants, so why isn't plain COPY equivalent to that? No, both plain COPY and COPY (SELECT ...) TO are broken. Sven
On 2014-11-04 13:51:23 -0500, Tom Lane wrote: > Bernd Helmle <mailings@oopsware.de> writes: > > --On 3. November 2014 18:15:04 +0100 Sven Wegener > > <sven.wegener@stealer.net> wrote: > >> I've check git master and 9.x and all show the same behaviour. I came up > >> with the patch below, which is against curent git master. The patch > >> modifies the COPY TO code to create a new snapshot, after acquiring the > >> necessary locks on the source tables, so that it sees any modification > >> commited by other backends. > > > Well, i have the feeling that there's nothing wrong with it. The ALTER > > TABLE command has rewritten all tuples with its own XID, thus the current > > snapshot does not "see" these tuples anymore. I suppose that in > > SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still > > doesn't return the tuples you'd like to see. > > Not sure. The OP's point is that in a SELECT, you do get unsurprising > results, because a SELECT will acquire its execution snapshot after it's > gotten AccessShareLock on the table. Arguably COPY should behave likewise. > Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably > already acts like he wants, so why isn't plain COPY equivalent to that? Even a plain SELECT essentially acts that way if I recall correctly if you use REPEATABLE READ+ and force a snapshot to be acquired beforehand. It's imo not very surprising. All ALTER TABLE rewrites just disregard visibility of existing tuples. Only CLUSTER/VACUUM FULL do the full hangups to keep all the necessary tuples + ctid chains around. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-04 13:51:23 -0500, Tom Lane wrote:
>> Not sure.  The OP's point is that in a SELECT, you do get unsurprising
>> results, because a SELECT will acquire its execution snapshot after it's
>> gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
>> Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
>> already acts like he wants, so why isn't plain COPY equivalent to that?
> Even a plain SELECT essentially acts that way if I recall correctly if
> you use REPEATABLE READ+ and force a snapshot to be acquired
> beforehand. It's imo not very surprising.
"It doesn't fail in a non-default isolation mode" is hardly much of an
argument for this being okay in READ COMMITTED.
> All ALTER TABLE rewrites just disregard visibility of existing
> tuples. Only CLUSTER/VACUUM FULL do the full hangups to keep all the
> necessary tuples + ctid chains around.
Yeah, and I think that it's entirely reasonable for rewriting ALTER TABLEs
to update the xmin of the rewritten tuples; after all, the output data
could be arbitrarily different from what the previous transactions put
into the table.  But that is not the question here.  If the COPY blocks
until the ALTER completes --- as it must --- why is its execution snapshot
not taken *after* the lock is acquired?
            regards, tom lane
			
		Re: [GENERAL] Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
От
 
		    	Bernd Helmle
		    Дата:
		        
--On 4. November 2014 17:18:14 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, and I think that it's entirely reasonable for rewriting ALTER TABLEs
> to update the xmin of the rewritten tuples; after all, the output data
> could be arbitrarily different from what the previous transactions put
> into the table.  But that is not the question here.  If the COPY blocks
> until the ALTER completes --- as it must --- why is its execution snapshot
> not taken *after* the lock is acquired?
COPY waits in DoCopy() but its snapshot gets acquired in PortalRunUtility()
earlier. SELECT has it's lock already during transform/analyse phase and
its snapshot is taken much later.  Looks like we need something that
analyses a utility statement to get its lock before executing.
--
Thanks
    Bernd