Re: recovering from "found xmin ... from before relfrozenxid ..."

Поиск
Список
Период
Сортировка
От Ashutosh Sharma
Тема Re: recovering from "found xmin ... from before relfrozenxid ..."
Дата
Msg-id CAE9k0PnRhymKiyaxUE8u=6_4=K_1ktKjOVb51K_H2eGmL81f1g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: recovering from "found xmin ... from before relfrozenxid ..."  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: recovering from "found xmin ... from before relfrozenxid ..."  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Re: recovering from "found xmin ... from before relfrozenxid ..."  (MBeena Emerson <mbeena.emerson@gmail.com>)
Список pgsql-hackers
Hi All,

Attached is the patch that adds heap_force_kill(regclass, tid[]) and heap_force_freeze(regclass, tid[]) functions which Robert mentioned in the first email in this thread. The patch basically adds an extension named pg_surgery that contains these functions.  Please have a look and let me know your feedback. Thank you.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


On Thu, Jul 16, 2020 at 9:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 16, 2020 at 10:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I see your point, though: the tuple has to be able to survive
> HOT-pruning in order to cause a problem when we check whether it needs
> freezing.

Here's an example where the new sanity checks fail on an invisible
tuple without any concurrent transactions:

$ initdb
$ pg_ctl start -l ~/logfile
$ createdb
$ psql

create table simpsons (a int, b text);
vacuum freeze;

$ cat > txid.sql
select txid_current();
$ pgbench -t 131072 -c 8 -j 8 -n -f txid.sql
$ psql

insert into simpsons values (1, 'homer');

$ pg_ctl stop
$ pg_resetwal -x 1000 $PGDATA
$ pg_ctl start -l ~/logfile
$ psql

update pg_class set relfrozenxid = (relfrozenxid::text::integer +
2000000)::text::xid where relname = 'simpsons';

rhaas=# select * from simpsons;
 a | b
---+---
(0 rows)

rhaas=# vacuum simpsons;
ERROR:  found xmin 1049082 from before relfrozenxid 2000506
CONTEXT:  while scanning block 0 of relation "public.simpsons"

This is a fairly insane situation, because we should have relfrozenxid
< tuple xid < xid counter, but instead we have xid counter < tuple xid
< relfrozenxid, but it demonstrates that it's possible to have a
database which is sufficiently corrupt that you can't escape from the
new sanity checks using only INSERT, UPDATE, and DELETE.

Now, an even easier way to create a table with a tuple that prevents
vacuuming and also can't just be deleted is to simply remove a
required pg_clog file (and maybe restart the server to clear out any
cached data in the SLRUs). What we typically do with customers who
need to recover from that situation today is give them a script to
fabricate a bogus CLOG file that shows all transactions as committed
(or, perhaps, aborted). But I think that the tools proposed on this
thread might be a better approach in certain cases. If the problem is
that a pg_clog file vanished, then recreating it with whatever content
you think is closest to what was probably there before is likely the
best you can do. But if you've got some individual tuples with crazy
xmin values, you don't really want to drop matching files in pg_clog;
it's better to fix the tuples.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: renaming configure.in to configure.ac