Re: "PANIC: could not open critical system index 2662" - twice

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: "PANIC: could not open critical system index 2662" - twice
Дата
Msg-id CAFiTN-uKc46MGgkCB9Dim14_Xq23NQznZXSLRdZ-hgjxVBGRYA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: "PANIC: could not open critical system index 2662" - twice  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: "PANIC: could not open critical system index 2662" - twice
Список pgsql-general
On Mon, May 8, 2023 at 7:55 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote:
> > Bug-in-PostgreSQL explanations could include that we forgot it was
> > dirty, or some backend wrote it out to the wrong file; but if we were
> > forgetting something like permanent or dirty, would there be a more
> > systematic failure?  Oh, it could require special rare timing if it is
> > similar to 8a8661828's confusion about permanence level or otherwise
> > somehow not setting BM_PERMANENT, but in the target blocks, so I think
> > that'd require a checkpoint AND a crash.  It doesn't reproduce for me,
> > but perhaps more unlucky ingredients are needed.
> >
> > Bug-in-OS/FS explanations could include that a whole lot of writes
> > were mysteriously lost in some time window, so all those files still
> > contain the zeroes we write first in smgrextend().  I guess this
> > previously rare (previously limited to hash indexes?) use of sparse
> > file hole-punching could be a factor in an it's-all-ZFS's-fault
> > explanation:
>
> Yes, you would need a bit of all that.
>
> I can reproduce the same backtrace here.  That's just my usual laptop
> with ext4, so this would be a Postgres bug.  First, here are the four
> things running in parallel so as I can get a failure in loading a
> critical index when connecting:
> 1) Create and drop a database with WAL_LOG as strategy and the
> regression database as template:
> while true; do
>   createdb --template=regression --strategy=wal_log testdb;
>   dropdb testdb;
> done
> 2) Feeding more data to pg_class in the middle, while testing the
> connection to the database created:
> while true;
>   do psql -c 'create table popo as select 1 as a;' regression > /dev/null 2>&1 ;
>   psql testdb -c "select 1" > /dev/null 2>&1 ;
>   psql -c 'drop table popo' regression > /dev/null 2>&1 ;
>   psql testdb -c "select 1" > /dev/null 2>&1 ;
> done;
> 3) Force some checkpoints:
> while true; do psql -c 'checkpoint' > /dev/null 2>&1; sleep 4; done
> 4) Force a few crashes and recoveries:
> while true ; do pg_ctl stop -m immediate ; pg_ctl start ; sleep 4 ; done
>

I am able to reproduce this using the steps given above, I am also
trying to analyze this further.  I will send the update once I get
some clue.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: huge discrepancy between EXPLAIN cost and actual time (but the table has just been ANALYZED)
Следующее
От: Oscar Carlberg
Дата:
Сообщение: ICU, locale and collation question