Обсуждение: BUG #11867: Strange behaviour with composite types after resetting database tablespace
BUG #11867: Strange behaviour with composite types after resetting database tablespace
От
marc@bloodnok.com
Дата:
The following bug has been logged on the website: Bug reference: 11867 Logged by: Marc Munro Email address: marc@bloodnok.com PostgreSQL version: 9.3.5 Operating system: Linux Debian 7.6 (Wheezy) on x86_64 Description: I have a script (below) that sets and resets the tablespace for a database and drops and recreates a composite type. The script fails when trying to re-create the previously dropped composite type because the type has apparently been magically resuscitated by changing the tablespace for the database. I assume this is a bug. I have reproduced this on postgres 8.4 and 9.3 This is not mission-critical for me but I'd be grateful for suggestions for work-arounds. I am running Debian Wheezy (release 7.6) on x86_64, with the following postgres versions: PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit PostgreSQL 8.4.22 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.7.2-5) 4.7.2, 64-bit ---Shell script follows--- TBS3="/var/lib/postgresql/8.4/main/tbs3" #TBS3="/var/lib/postgresql/9.3/main/tbs3" psql -d postgres -v tbs3=$TBS3 <<'CLUSTEREOF' \set tbs3dir '''':tbs3'''' create tablespace "tbs3" location :tbs3dir; create database "regressdb" with encoding 'UTF8' tablespace "tbs3" connection limit = -1; CLUSTEREOF psql -d regressdb <<'DBEOF' create type "public"."vv2_t" as ( "name" "pg_catalog"."text", "type" "pg_catalog"."text", "shared" "pg_catalog"."bool" ); DBEOF psql -d postgres <<'CLUSTEREOF' alter database regressdb set tablespace pg_default; CLUSTEREOF psql -d regressdb <<'DBEOF' drop type public.vv2_t; DBEOF psql -d postgres <<'CLUSTEREOF' alter database regressdb set tablespace tbs3; CLUSTEREOF psql -d regressdb <<'DBEOF' create type public.vv2_t as ( name pg_catalog.text, type pg_catalog.text, shared pg_catalog.bool); DBEOF ---Shell script ends--- __ Marc
marc@bloodnok.com writes: > I have a script (below) that sets and resets the tablespace for a database > and drops and recreates a composite type. The script fails when trying to > re-create the previously dropped composite type because the type has > apparently been magically resuscitated by changing the tablespace for the > database. I assume this is a bug. <spock>Fascinating.</spock> I believe what is happening is: * You create the composite type. This results in entries in the database's pg_class and pg_type tables. These entries are in blocks of those tables that are in shared buffers, with hashtag RelFileNodes that mention the database's original tablespace (tbs3). * You move the database to another tablespace. We faithfully flush the dirty catalog blocks to disk and then copy them somewhere else on disk. * You drop the composite type. This entails fetching in relevant blocks of the database's pg_class and pg_type tables from their new tablespace and marking the catalog rows deleted. This happens in shared buffers that have hashtags mentioning the database's new tablespace (pg_default, in this example). * You move the database back to its original tablespace. We faithfully flush the dirty catalog blocks to disk and then copy them back to the original on-disk location. However: at no point in this sequence did we flush the original catalog blocks out of shared buffers. Therefore, a read of the database's pg_class or pg_type at this point is likely to find the previous states of those pages (pre-composite-type-drop) as valid, matching entries, so that we skip reading the up-to-date page contents back from disk. A quick fix for this would be to issue DropDatabaseBuffers during movedb(), though I'm not sure offhand where in that sequence is the safest place for it. We could be more restrictive and only drop buffers that belong to the source tablespace, but I'm not sure it's worth any extra code to do so. > This is not mission-critical for me but I'd be grateful for suggestions for > work-arounds. I don't see any workaround that's much easier than fixing the bug. But what's your use case that involves flipping databases from one tablespace to another and then back again within the life expectancy of unused shared-buffers pages? It doesn't seem terribly surprising that nobody noticed this before ... regards, tom lane
On Mon, 2014-11-03 at 22:08 -0500, Tom Lane wrote: > <spock>Fascinating.</spock> :-) > I believe what is happening is: [ . . . ] > > This is not mission-critical for me but I'd be grateful for suggestions for > > work-arounds. > > I don't see any workaround that's much easier than fixing the bug. > But what's your use case that involves flipping databases from one > tablespace to another and then back again within the life expectancy of > unused shared-buffers pages? It doesn't seem terribly surprising that > nobody noticed this before ... It's a completely unreal use case. I am working on a diff tool, and this was found as part of my test suite: build db x, drop db x, build db y, apply diff y->x, compare with original x, apply diff x->y, compare with original y. So this is a fairly minor inconvenience for me. __ Marc
I wrote: > However: at no point in this sequence did we flush the original catalog > blocks out of shared buffers. Therefore, a read of the database's > pg_class or pg_type at this point is likely to find the previous states of > those pages (pre-composite-type-drop) as valid, matching entries, so that > we skip reading the up-to-date page contents back from disk. > A quick fix for this would be to issue DropDatabaseBuffers during > movedb() BTW, I wondered whether the exact same hazard didn't exist for ALTER TABLE SET TABLESPACE. At first glance it looks bad, because ATExecSetTableSpace doesn't forcibly drop old shared buffers for the moved table either. Closer examination says we're safe though: * The buffers will be dropped during smgrdounlink at end of transaction, so you could only be at risk if you moved a table, modified it, and moved it back in a single transaction. * A new relfilenode will be assigned during each movement. * When assigning a relfilenode during the move-back, we'd be certain to choose a relfilenode different from the original one, because the old relation still exists on-disk at this point. So it's safe as things stand; but this seems a bit, um, rickety. Should we insert a DropRelFileNodesAllBuffers call into ATExecSetTableSpace to make it safer? It's kind of annoying to have to scan the buffer pool twice, but I'm afraid that sometime in the future somebody will try to get clever about optimizing away the end-of-transaction buffer pool scan, and break this case. Or maybe I'm just worrying too much. regards, tom lane
marc@bloodnok.com writes: > I have a script (below) that sets and resets the tablespace for a database > and drops and recreates a composite type. The script fails when trying to > re-create the previously dropped composite type because the type has > apparently been magically resuscitated by changing the tablespace for the > database. I assume this is a bug. I've confirmed that this patch fixes the described behavior: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=33f80f8480434f02beb940b8f0627d534f3fd3af regards, tom lane
On 11/4/14, 10:45 AM, Tom Lane wrote: > So it's safe as things stand; but this seems a bit, um, rickety. Should > we insert a DropRelFileNodesAllBuffers call into ATExecSetTableSpace to > make it safer? It's kind of annoying to have to scan the buffer pool > twice, but I'm afraid that sometime in the future somebody will try to get > clever about optimizing away the end-of-transaction buffer pool scan, and > break this case. Or maybe I'm just worrying too much. We could possibly remember what relations have been moved to different tablespaces in a transaction and avoid the call inthat case, but that seems rather silly. If there's any non-trivial actual data involved then presumably the buffer scan won't be noticed amidst all the IO, so thiswould only be an issue if you're playing games with mostly empty tables, and only then if you've got really large sharedbuffers. I can't actually come up with any real use case for that. So +1 for doing the safe thing... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On Tue, Nov 4, 2014 at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> However: at no point in this sequence did we flush the original catalog >> blocks out of shared buffers. Therefore, a read of the database's >> pg_class or pg_type at this point is likely to find the previous states of >> those pages (pre-composite-type-drop) as valid, matching entries, so that >> we skip reading the up-to-date page contents back from disk. > >> A quick fix for this would be to issue DropDatabaseBuffers during >> movedb() > > BTW, I wondered whether the exact same hazard didn't exist for ALTER TABLE > SET TABLESPACE. At first glance it looks bad, because ATExecSetTableSpace > doesn't forcibly drop old shared buffers for the moved table either. > Closer examination says we're safe though: > > * The buffers will be dropped during smgrdounlink at end of transaction, > so you could only be at risk if you moved a table, modified it, and moved > it back in a single transaction. > > * A new relfilenode will be assigned during each movement. > > * When assigning a relfilenode during the move-back, we'd be certain to > choose a relfilenode different from the original one, because the old > relation still exists on-disk at this point. > > So it's safe as things stand; but this seems a bit, um, rickety. Should > we insert a DropRelFileNodesAllBuffers call into ATExecSetTableSpace to > make it safer? It's kind of annoying to have to scan the buffer pool > twice, but I'm afraid that sometime in the future somebody will try to get > clever about optimizing away the end-of-transaction buffer pool scan, and > break this case. Or maybe I'm just worrying too much. I think you're worrying too much. Adding a comment to the code that drives the end-of-transaction buffer pool scan to warn future optimizers not to be too clever might be warranted; adding run-time overhead to avoid a bug we don't have seems excessive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company