Re: [ADMIN] Missing Chunk Error when doing a VACUUM FULL operation -DB Corruption?

Поиск
Список
Период
Сортировка
От Arjun Ranade
Тема Re: [ADMIN] Missing Chunk Error when doing a VACUUM FULL operation -DB Corruption?
Дата
Msg-id CANrrCRxzFYtJLmZSkpUVFh_2ZUcGnsO2nh3Cg78t2OLQEwfzGg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [ADMIN] Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [ADMIN] Missing Chunk Error when doing a VACUUM FULL operation -DB Corruption?
Список pgsql-admin
Hi Tom,

I am positive that there are no prepared transactions, certainly none that would persist past a DB shutdown and the single user mode work I did.

However, your point about the replication slots seems to be valid.  Also, a leaking replication slot seems more likely since it was a replication slot issue with pglogical that caused the DB to crash before it was restarted a few months ago.

Below are the items I got from querying the replication view.  It seems that you are correct, https://www.postgresql.org/docs/9.4/static/catalog-pg-replication-slots.html says that 'catalog_xmin' is "The oldest transaction affecting the system catalogs that this slot needs the database to retain. VACUUM cannot remove catalog tuples deleted by any later transaction."

db1=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
-----------------------------------------------+------------------+-----------+--------+----------+--------+-----------+--------------+--------------
repmgr_slot_3 | | physical | | | t | 252861646 | | 157/E1755E90
repmgr_slot_4 | | physical | | | t | 324101780 | 162630233 | 157/E1755E90
repmgr_slot_2 | | physical | | | t | | | 157/E1755E90
pgl_tcs_dadfbc8d5_tcspg01_large_tab_ch356c67c | pglogical_output | logical | 21705 | prod | t | | 324101794 | 157/E16D62D8
pgl_tcs_dadfbc8d5_tcspg01_all_tables | pglogical_output | logical | 21705 | prod | t | | 324101794 | 157/E16D62D8
(5 rows)

db1=# SELECT datname, datfrozenxid FROM pg_database;
datname | datfrozenxid
-----------+--------------
template1 | 162630233
template0 | 162630233
repmgr_db | 138378717
postgres | 138378717
rundeck | 138378717
db1 | 162630233
(6 rows)




On Wed, Nov 1, 2017 at 10:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Arjun Ranade <ranade@nodalexchange.com> writes:
> So we had a maintenance window scheduled today and I restarted the DB.
> When it came back up, it reported the same toast error when vacuum'ing
> pg_statistic.  It also reported the "oldest xmin is far in the past" error.

> I even brought the DB into single user mode and tried to manually vacuum
> pg_statistic but got the same error.  I also tried to TRUNCATE pg_statistic
> in single user mode which failed (see below).

> At this point, I am not sure what to make of the issue.  The DB is still
> complaining about old XMIN even when there are no users connected to it
> after a fresh restart:

Clearly, something is holding back the xmin horizon, and if the behavior
persisted past a DB restart then it surely isn't a regular transaction.

Are you *really* sure there are no prepared transactions?

It now occurs to me that there's one other mechanism that could hold back
the xmin horizon, and that is replication slots.  Looking into the
pg_replication_slots view should show you what's up there.  I'm not
an expert on replication slots, but I do know that they were new as of
9.4.  I am thinking maybe your replication setup has "leaked" a slot
that's not being used but is still preventing global xmin from advancing.

                        regards, tom lane

В списке pgsql-admin по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: [ADMIN] Bad recovery: no pg_xlog/RECOVERYXLOG
Следующее
От: Arjun Ranade
Дата:
Сообщение: Re: [ADMIN] Missing Chunk Error when doing a VACUUM FULL operation -DB Corruption?