Re: "PANIC: could not open critical system index 2662" - twice

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: "PANIC: could not open critical system index 2662" - twice
Дата
Msg-id CA+hUKGJDfBEfqatSsV16XosCkr=3k2vb6X1mcbL_kdod_wHQwQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: "PANIC: could not open critical system index 2662" - twice  (Evgeny Morozov <postgresql3@realityexists.net>)
Ответы Re: "PANIC: could not open critical system index 2662" - twice
Re: "PANIC: could not open critical system index 2662" - twice
Список pgsql-general
On Sun, May 7, 2023 at 12:29 AM Evgeny Morozov
<postgresql3@realityexists.net> wrote:
> On 6/05/2023 12:34 pm, Thomas Munro wrote:
> > So it does indeed look like something unknown has replaced 32KB of
> > data with 32KB of zeroes underneath us.  Are there more non-empty
> > files that are all-zeroes?  Something like this might find them:
> >
> > for F in base/1414389/*
> > do
> >   if [ -s $F ] && ! xxd -p $F | grep -qEv '^(00)*$' > /dev/null
> >   then
> >     echo $F
> >   fi
> > done
>
> Yes, a total of 309 files are all-zeroes (and 52 files are not).
>
> I also checked the other DB that reports the same "unexpected zero page
> at block 0" error, "test_behavior_638186280406544656" (OID 1414967) -
> similar story there. I uploaded the lists of zeroed and non-zeroed files
> and the ls -la output for both as
> https://objective.realityexists.net/temp/pgstuff3.zip
>
> I then searched recursively such all-zeroes files in $PGDATA/base and
> did not find any outside of those two directories (base/1414389 and
> base/1414967). None in $PGDATA/global, either.

So "diff -u zeroed-files-1414967.txt zeroed-files-1414389.txt" shows
that they have the same broken stuff in the range cloned from the
template database by CREATE DATABASE STRATEGY=WAL_LOG, and it looks
like it's *all* the cloned catalogs, and then they have some
non-matching relfilenodes > 1400000, presumably stuff you created
directly in the new database (I'm not sure if I can say for sure that
those files are broken, without knowing what they are).

Did you previously run this same workload on versions < 15 and never
see any problem?  15 gained a new feature CREATE DATABASE ...
STRATEGY=WAL_LOG, which is also the default.  I wonder if there is a
bug somewhere near that, though I have no specific idea.  If you
explicitly added STRATEGY=FILE_COPY to your CREATE DATABASE commands,
you'll get the traditional behaviour.  It seems like you have some
kind of high frequency testing workload that creates and tests
databases all day long, and just occasionally detects this corruption.
Would you like to try requesting FILE_COPY for a while and see if it
eventually happens like that too?

My spidey sense is leaning away from filesystem bugs.  We've found
plenty of filesystem bugs on these mailing lists over the years and of
course it's not impossible, but I dunno... it seems quite suspicious
that all the system catalogs have apparently been wiped during or
moments after the creation of a new database that's running new
PostgreSQL 15 code...



В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Death postgres
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: Check that numeric is zero