RE: "unexpected duplicate for tablespace" problem in logical replication

Поиск
Список
Период
Сортировка
От osumi.takamichi@fujitsu.com
Тема RE: "unexpected duplicate for tablespace" problem in logical replication
Дата
Msg-id TYCPR01MB8373616AF8BA3C819535B998EDEA9@TYCPR01MB8373.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на RE: "unexpected duplicate for tablespace" problem in logical replication  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Ответы Re: "unexpected duplicate for tablespace" problem in logical replication
Список pgsql-bugs
On Friday, April 8, 2022 6:44 PM I wrote:
> On Wednesday, April 6, 2022 11:14 AM wangsh.fnst@fujitsu.com
> <wangsh.fnst@fujitsu.com> wrote:
> > I met a problem while using logical replication in PG11 and I think
> > all the PG version have this problem.
> >
> >
> > The log looks like:
> > > ERROR: unexpected duplicate for tablespace 0, relfilenode xxxxxxx
> > Someone also reported this problem in [1], but no one has responded to it.
> >
> >
> >
> > I did some investigation, and found a way to reproduce this problem.
> > The steps are:
> >
> >
> > 1. create a table (call it tableX) and truncate it.
> >
> >
> > 2. cycle through 2^32 OIDs.
> >
> >
> > 3. restart the database to clear all the cache.
> >
> >
> > 4. create a temp table which make the temp table's OID equals to the
> > tableX's relfilenode and insert any data into tableX.
> >
> >
> > The attachment(run.sh) can reproduce this problem in PG10 and PG11with
> > the help of option 'WITH OIDS'. I don't find any way to cycle the OIDs
> > quickly in branch master, but I use the gdb to reproduce this problem too.
> >
> >
> >
> > Now, function GetNewRelFileNode() only checks:
> >
> >
> > 1. duplicated OIDs in pg_class.
> >
> >
> > 2. relpath(rnode) is exists in disk.
> >
> >
> > However, the result of relpath(temp table) and relpath(non-temp table)
> > are different, temp table's relpath() has a prefix "t%d". That means,
> > if there is a table that value of relfilenode is 20000(but the value
> > of oid isn't 20000), it's possible to create a temp table that value
> > of relfilenode is also 20000.
> >
> >
> > I think function GetNewRelFileNode() should always check the
> > duplicated relfilenode, see the patch(a simple to way to fix this
> > problem is master branch).
> >
> >
> > Any comment?
> Hi, thank you for your report.
> 
> 
> It seems correct that there's room that wraparounded oid can be used for temp
> table, and we get duplicate result when we retrieve it and face the error.
> 
> I reproduced your issue with HEAD and gdb, by replacing rnode.node.relNode
> with an existing relfilenode in GetNewRelFileNode(), immediately before the
> call of relpath().
One thing I forgot to note is that this bug is not unique to the logical replication.
There is other path to hit it for example, pg_filenode_relation
in the same procedures with gdb.

In the below output, I created tempa table with the same filenode with gdb
without having a pair of logical replication and got the same error you reported.

postgres=# select oid, relname, relfilenode, reltablespace from pg_class where relname in ('c', 'tempa');
  oid  | relname | relfilenode | reltablespace
-------+---------+-------------+---------------
 16387 | c       |       16390 |             0
 16390 | tempa   |       16390 |             0
(2 rows)

postgres=# select pg_filenode_relation(0, 16390);
ERROR:  unexpected duplicate for tablespace 0, relfilenode 16390


Best Regards,
    Takamichi Osumi


В списке pgsql-bugs по дате отправления:

Предыдущее
От: "osumi.takamichi@fujitsu.com"
Дата:
Сообщение: RE: "unexpected duplicate for tablespace" problem in logical replication
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17462: Invalid memory access in heapam_tuple_lock