Re: new heapcheck contrib module

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: new heapcheck contrib module
Дата	14 мая 2020 г. 18:32:53
Msg-id	CA+TgmoYTDcf5MJrSBCSB6iLnGzh4pE7nCBBVBYGP-7D0CwzuHw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: new heapcheck contrib module (Peter Geoghegan <pg@bowt.ie>)
Ответы	Re: new heapcheck contrib module
Список	pgsql-hackers

Дерево обсуждения

On Wed, May 13, 2020 at 5:33 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Do you recall seeing corruption resulting in segfaults in production?

I have seen that, I believe. I think it's more common to fail with
errors about not being able to palloc>1GB, not being able to look up
an xid or mxid, etc. but I am pretty sure I've seen multiple cases
involving seg faults, too. Unfortunately for my credibility, I can't
remember the details right now.

> I personally don't recall seeing that. If it happened, the segfaults
> themselves probably wouldn't be the main concern.

I don't really agree. Hypothetically speaking, suppose you corrupt
your only copy of a critical table in such a way that every time you
select from it, the system seg faults. A user in this situation might
ask questions like:

1. How did my table get corrupted?
2. Why do I only have one copy of it?
3. How do I retrieve the non-corrupted portion of my data from that
table and get back up and running?

In the grand scheme of things, #1 and #2 are the most important
questions, but when something like this actually happens, #3 tends to
be the most urgent question, and it's a lot harder to get the
uncorrupted data out if the system keeps crashing.

Also, a seg fault tends to lead customers to think that the database
has a bug, rather than that the database is corrupted.

Slightly off-topic here, but I think our error reporting in this area
is pretty lame. I've learned over the years that when a customer
reports that they get a complaint about a too-large memory allocation
every time they access a table, they've probably got a corrupted
varlena header. However, that's extremely non-obvious to a typical
user. We should try to report errors indicative of corruption in a way
that gives the user some clue that corruption has happened. Peter made
a stab at improving things there by adding
errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of
users will never see the error code, only the message, and a lot of
corruption produces still produces errors that weren't changed by that
commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: new heapcheck contrib module