Обсуждение: Weird behaviour with ALTER TABLE ... SET TABLESPACE ... statement
Hi, I just found a weird behaviour with this statement. Here is a complete log of my session with a 8.3(.4) server: guillaume@laptop$ mkdir /home/guillaume/ts1 guillaume@laptop$ createdb db1 guillaume@laptop$ LANG=C psql db1 Welcome to psql 8.3.4, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g orterminate with semicolon to execute query \q to quit db1=# create table t1 (id integer); CREATE TABLE db1=# create tablespace ts1 location '/home/guillaume/ts1'; CREATE TABLESPACE db1=# insert into t1 values (1); INSERT 0 1 db1=# select pg_database.oid as db_folder, relfilenode as table_filename db1-# from pg_database, pg_class where datname='db1' and relname='t1';db_folder | table_filename -----------+---------------- 74472 | 74475 (1 row) db1=# \! ls -l /opt/postgresql-8.3/data/base/74472/74475 -rw------- 1 guillaume guillaume 8192 Oct 6 10:59 /opt/postgresql-8.3/data/base/74472/74475 Till now, everything seems good. I have one 8k file for my table t1. db1=# alter table t1 set tablespace ts1; ALTER TABLE /opt/postgresql-8.3/data/base/74472/74475 db1=# \! ls -l /home/guillaume/ts1/74472/74475 -rw------- 1 guillaume guillaume 8192 Oct 6 11:00 /home/guillaume/ts1/74472/74475 My table moved to my own tablespace. db1=# \! ls -l /opt/postgresql-8.3/data/base/74472/74475 -rw------- 1 guillaume guillaume 0 Oct 6 11:00 This seems weird. I expected to have no file 74475 in the pg_default tablespace after the ALTER TABLE. Of course, now, I can't get my table back on the previous tablespace. db1=# alter table t1 set tablespace pg_default; ERROR: could not create relation 1663/74472/74475: File exists db1=# alter table t1 set tablespace pg_default; ERROR: could not create relation 1663/74472/74475: File exists I finally discovered that a CHECKPOINT resolves my issue. db1=# checkpoint; CHECKPOINT And I can move my table back to the previous tablespace. db1=# alter table t1 set tablespace pg_default; ALTER TABLE db1=# \! ls -l /opt/postgresql-8.3/data/base/74472/74475 -rw------- 1 guillaume guillaume 8192 Oct 6 11:01 /opt/postgresql-8.3/data/base/74472/74475 db1=# \! ls -l /home/guillaume/ts1/74472/74475 -rw------- 1 guillaume guillaume 0 Oct 6 11:01 /home/guillaume/ts1/74472/74475 This behaviour happens on the REL8_3_STABLE and HEAD branches. In REL8_2_STABLE, it works as I expected it to work : guillaume@laptop$ mkdir /home/guillaume/ts2 guillaume@laptop$ createdb db2 CREATE DATABASE guillaume@laptop$ psql db2 Bienvenue dans psql 8.2.10, l'interface interactive de PostgreSQL. Saisissez: \copyright pour les termes de distribution \h pour l'aide-mémoire des commandes SQL \? pour l'aide-mémoiredes commandes psql \g ou point-virgule en fin d'instruction pour exécuter la requête \q pour quitter db2=# create table t2 (id integer); CREATE TABLE db2=# create tablespace ts2 location '/home/guillaume/ts2'; CREATE TABLESPACE db2=# insert into t2 values (1); INSERT 0 1 db2=# select pg_database.oid as db_folder, relfilenode as table_filename from pg_database, pg_class where datname='db2' and relname='t2';db_folder | table_filename -----------+---------------- 47944 | 47945 (1 ligne) db2=# \! ls -l /opt/postgresql-8.2/data/base/47944/47945 -rw------- 1 guillaume guillaume 8192 2008-10-06 11:12 /opt/postgresql-8.2/data/base/47944/47945 db2=# alter table t2 set tablespace ts2; ALTER TABLE db2=# \! ls -l /opt/postgresql-8.2/data/base/47944/47945 ls: cannot access /opt/postgresql-8.2/data/base/47944/47945: No such file or directory db2=# \! ls -l /home/guillaume/ts2/47944/47945 -rw------- 1 guillaume guillaume 8192 2008-10-06 11:13 /home/guillaume/ts2/47944/47945 db2=# alter table t2 set tablespace pg_default; ALTER TABLE db2=# \! ls -l /home/guillaume/ts2/47944/47945 ls: cannot access /home/guillaume/ts2/47944/47945: No such file or directory db2=# \! ls -l /opt/postgresql-8.2/data/base/47944/47945 -rw------- 1 guillaume guillaume 8192 2008-10-06 11:13 /opt/postgresql-8.2/data/base/47944/47945 I don't need a checkpoint to be able to move my table back. It doesn't seem a big issue because checkpoints are issued frequently but it deserves to get fixed. I looked a bit at the source code. The old file gets in a queue of to-be-removed files (see smgrscheduleunlink() function in storage/smgr/smgr.c). But I failed to see where it really gets deleted. I would welcome any pointer. Regards. -- Guillaume.http://www.postgresqlfr.orghttp://dalibo.com
Guillaume Lelarge wrote: > db1=# alter table t1 set tablespace ts1; > ALTER TABLE > /opt/postgresql-8.3/data/base/74472/74475 > db1=# \! ls -l /home/guillaume/ts1/74472/74475 > -rw------- 1 guillaume guillaume 8192 Oct 6 11:00 > /home/guillaume/ts1/74472/74475 > > My table moved to my own tablespace. > > db1=# \! ls -l /opt/postgresql-8.3/data/base/74472/74475 > -rw------- 1 guillaume guillaume 0 Oct 6 11:00 > > This seems weird. I expected to have no file 74475 in the pg_default > tablespace after the ALTER TABLE. > > Of course, now, I can't get my table back on the previous tablespace. > > db1=# alter table t1 set tablespace pg_default; > ERROR: could not create relation 1663/74472/74475: File exists > db1=# alter table t1 set tablespace pg_default; > ERROR: could not create relation 1663/74472/74475: File exists > > I finally discovered that a CHECKPOINT resolves my issue. Hmm. We force a checkpoint in dropdb() for similar reasons. > It doesn't seem a big issue because checkpoints are issued frequently > but it deserves to get fixed. I looked a bit at the source code. The old > file gets in a queue of to-be-removed files (see smgrscheduleunlink() > function in storage/smgr/smgr.c). But I failed to see where it really > gets deleted. I would welcome any pointer. In mdpostckpt(). The trivial fix is to just force a checkpoint in ALTER TABLE SET TABLESPACE. Can we do better than that? Perhaps only force a checkpoint when we find that the file already exists. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > The trivial fix is to just force a checkpoint in ALTER TABLE SET > TABLESPACE. Can we do better than that? Perhaps only force a checkpoint > when we find that the file already exists. If ALTER TABLE SET TABLESPACE is assuming that it can always use the same relfilenode number in the new space as in the old, it's just plain broken. We need to fix that assumption. regards, tom lane
Tom Lane a écrit : > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> The trivial fix is to just force a checkpoint in ALTER TABLE SET >> TABLESPACE. Can we do better than that? Perhaps only force a checkpoint >> when we find that the file already exists. > > If ALTER TABLE SET TABLESPACE is assuming that it can always use the > same relfilenode number in the new space as in the old, it's just plain > broken. We need to fix that assumption. > Do you mean the backend should change the relfilenode number whenever we use the ALTER TABLE SET TABLESPACE ? I can't think of a way to get twice the same relfilenode. Perhaps a better question would be : how the next relfilenode is computed? -- Guillaume.http://www.postgresqlfr.orghttp://dalibo.com
Guillaume Lelarge wrote: > Tom Lane a écrit : >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> The trivial fix is to just force a checkpoint in ALTER TABLE SET >>> TABLESPACE. Can we do better than that? Perhaps only force a checkpoint >>> when we find that the file already exists. >> If ALTER TABLE SET TABLESPACE is assuming that it can always use the >> same relfilenode number in the new space as in the old, it's just plain >> broken. We need to fix that assumption. > > Do you mean the backend should change the relfilenode number whenever we > use the ALTER TABLE SET TABLESPACE ? I can't think of a way to get twice > the same relfilenode. > > Perhaps a better question would be : how the next relfilenode is computed? 1. Get the next OID from the counter. 2. Check if a file with that name exists. If it does, goto 1. Yeah, seems like we need to allocate a new relfilenode in the new tablespace. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Yeah, seems like we need to allocate a new relfilenode in the new > tablespace. I looked into tablecmds.c and verified that ATExecSetTableSpace doesn't worry about selecting a new relfilenode. I'm also noticing a number of permissions-type checks that seem like they'd better be done in ATPrepSetTableSpace, because we don't go through ATExecSetTableSpace if the table requires rewriting for other reasons. All in all this code seems to need more careful review than it's gotten so far. You want to do it? regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Yeah, seems like we need to allocate a new relfilenode in the new >> tablespace. > > I looked into tablecmds.c and verified that ATExecSetTableSpace doesn't > worry about selecting a new relfilenode. I'm also noticing a number of > permissions-type checks that seem like they'd better be done in > ATPrepSetTableSpace, because we don't go through ATExecSetTableSpace > if the table requires rewriting for other reasons. All in all this > code seems to need more careful review than it's gotten so far. > You want to do it? I'll give it a shot. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Yeah, seems like we need to allocate a new relfilenode in the new >> tablespace. > > I looked into tablecmds.c and verified that ATExecSetTableSpace doesn't > worry about selecting a new relfilenode. I'm also noticing a number of > permissions-type checks that seem like they'd better be done in > ATPrepSetTableSpace, because we don't go through ATExecSetTableSpace > if the table requires rewriting for other reasons. The same tests are performed in the rewriting code path in ATRewriteTables() and in heap_create_with_catalog(). I fixed the relfilenode allocation in 8.1-HEAD. Doesn't seem worth fixing in 8.0, because GetNewRelFileNode() didn't exist before 8.1, so we couldn't check for collisions anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas a écrit : > Tom Lane wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> Yeah, seems like we need to allocate a new relfilenode in the new >>> tablespace. >> >> I looked into tablecmds.c and verified that ATExecSetTableSpace doesn't >> worry about selecting a new relfilenode. I'm also noticing a number of >> permissions-type checks that seem like they'd better be done in >> ATPrepSetTableSpace, because we don't go through ATExecSetTableSpace >> if the table requires rewriting for other reasons. > > The same tests are performed in the rewriting code path in > ATRewriteTables() and in heap_create_with_catalog(). > > I fixed the relfilenode allocation in 8.1-HEAD. Doesn't seem worth > fixing in 8.0, because GetNewRelFileNode() didn't exist before 8.1, so > we couldn't check for collisions anyway. > Thanks. It works great! -- Guillaume.http://www.postgresqlfr.orghttp://dalibo.com