RE: "unexpected duplicate for tablespace" problem in logical replication
От | osumi.takamichi@fujitsu.com |
---|---|
Тема | RE: "unexpected duplicate for tablespace" problem in logical replication |
Дата | |
Msg-id | TYCPR01MB8373616AF8BA3C819535B998EDEA9@TYCPR01MB8373.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | RE: "unexpected duplicate for tablespace" problem in logical replication ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>) |
Ответы |
Re: "unexpected duplicate for tablespace" problem in logical replication
("wangsh.fnst@fujitsu.com" <wangsh.fnst@fujitsu.com>)
|
Список | pgsql-bugs |
On Friday, April 8, 2022 6:44 PM I wrote: > On Wednesday, April 6, 2022 11:14 AM wangsh.fnst@fujitsu.com > <wangsh.fnst@fujitsu.com> wrote: > > I met a problem while using logical replication in PG11 and I think > > all the PG version have this problem. > > > > > > The log looks like: > > > ERROR: unexpected duplicate for tablespace 0, relfilenode xxxxxxx > > Someone also reported this problem in [1], but no one has responded to it. > > > > > > > > I did some investigation, and found a way to reproduce this problem. > > The steps are: > > > > > > 1. create a table (call it tableX) and truncate it. > > > > > > 2. cycle through 2^32 OIDs. > > > > > > 3. restart the database to clear all the cache. > > > > > > 4. create a temp table which make the temp table's OID equals to the > > tableX's relfilenode and insert any data into tableX. > > > > > > The attachment(run.sh) can reproduce this problem in PG10 and PG11with > > the help of option 'WITH OIDS'. I don't find any way to cycle the OIDs > > quickly in branch master, but I use the gdb to reproduce this problem too. > > > > > > > > Now, function GetNewRelFileNode() only checks: > > > > > > 1. duplicated OIDs in pg_class. > > > > > > 2. relpath(rnode) is exists in disk. > > > > > > However, the result of relpath(temp table) and relpath(non-temp table) > > are different, temp table's relpath() has a prefix "t%d". That means, > > if there is a table that value of relfilenode is 20000(but the value > > of oid isn't 20000), it's possible to create a temp table that value > > of relfilenode is also 20000. > > > > > > I think function GetNewRelFileNode() should always check the > > duplicated relfilenode, see the patch(a simple to way to fix this > > problem is master branch). > > > > > > Any comment? > Hi, thank you for your report. > > > It seems correct that there's room that wraparounded oid can be used for temp > table, and we get duplicate result when we retrieve it and face the error. > > I reproduced your issue with HEAD and gdb, by replacing rnode.node.relNode > with an existing relfilenode in GetNewRelFileNode(), immediately before the > call of relpath(). One thing I forgot to note is that this bug is not unique to the logical replication. There is other path to hit it for example, pg_filenode_relation in the same procedures with gdb. In the below output, I created tempa table with the same filenode with gdb without having a pair of logical replication and got the same error you reported. postgres=# select oid, relname, relfilenode, reltablespace from pg_class where relname in ('c', 'tempa'); oid | relname | relfilenode | reltablespace -------+---------+-------------+--------------- 16387 | c | 16390 | 0 16390 | tempa | 16390 | 0 (2 rows) postgres=# select pg_filenode_relation(0, 16390); ERROR: unexpected duplicate for tablespace 0, relfilenode 16390 Best Regards, Takamichi Osumi
В списке pgsql-bugs по дате отправления:
Предыдущее
От: "osumi.takamichi@fujitsu.com"Дата:
Сообщение: RE: "unexpected duplicate for tablespace" problem in logical replication
Следующее
От: PG Bug reporting formДата:
Сообщение: BUG #17462: Invalid memory access in heapam_tuple_lock