Re: BUG #16833: postgresql 13.1 process crash every hour

Поиск
Список
Период
Сортировка
От Alex F
Тема Re: BUG #16833: postgresql 13.1 process crash every hour
Дата
Msg-id CAGbr_zUVuWp51Q2KOQ3YEm78Z=Q_+XWCQP3i7UMGuCXYYfCr2w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16833: postgresql 13.1 process crash every hour  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #16833: postgresql 13.1 process crash every hour  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
Dear Peter,
Honestly don't know if you expect a response with amcheck results but anyway will paste it here:

DEBUG:  verifying that tuples from index "price_model_product_id_latest_idx" are present in "price_model"
DEBUG:  finished verifying presence of 5598051 tuples from table "price_model" with bitset 48.61% set
DEBUG:  verifying consistency of tree structure for index "name_original_idx_s" with cross-level checks
DEBUG:  verifying level 3 (true root level)                                                                
DEBUG:  verifying level 2                                       
ERROR:  down-link lower bound invariant violated for index "name_original_idx_s"            
DETAIL:  Parent block=64 child index tid=(868,3) parent page lsn=1D2F/14483F28.

Anyway, I will wait for v13.4 and try to re-test this crash case.

Thanks for your support!


пт, 14 мая 2021 г. в 20:48, Peter Geoghegan <pg@bowt.ie>:
On Fri, May 14, 2021 at 7:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm, looks like it's time to rope Peter Geoghegan in on this discussion.

I think that this is likely to be a fairly generic symptom of index
corruption. Ockham's razor does not seem to point to a software bug
because posting list splits are just not that complicated, and are
fairly common in the grand scheme of things. Docker is the kind of
thing that I wouldn't necessarily trust to not do something fishy with
LVM snapshotting -- I tend to suspect that that is a factor.

There was a very similar bug report and stack trace back in March.
That case was tied back to generic index corruption using amcheck,
with indexes corrupted that weren't implicated in the hard crash.

There is a real problem for me to fix here in any case:
_bt_swap_posting() is unnecessarily trusting of the state of the
posting list tuple (compared to _bt_split(), say). I still plan on
adding hardening to _bt_swap_posting() to avoid a hard crash.
Unfortunately I missed the opportunity to get that into 13.3, but I'll
get it into 13.4.

Alex should probably run amcheck to see what that throws up. It should
be possible to run amcheck on your database, which will detect corrupt
posting list tuples on Postgres 13. It's a contrib extension, so you
must first run "CREATE EXTENSION amcheck;". From there, you can run a
query like the following (you may want to customize this):

SELECT bt_index_parent_check(index => c.oid, heapallindexed => true),
c.relname,
c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC;

If this query takes too long to complete you may find it useful to add
something to limit the indexes check, such as: AND n.nspname =
'public' -- that change to the SQL will make the query just test
indexes from the public schema.

Do "SET client_min_messages=DEBUG1 " to get a kind of rudimentary
progress indicator, if that seems useful to you.

The docs have further information on what this bt_index_parent_check
function does, should you need it:
https://www.postgresql.org/docs/13/amcheck.html

--
Peter Geoghegan

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17013: All RH6 repos are missing repomod.xml.asc files.
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #16833: postgresql 13.1 process crash every hour