Re: reporting TID/table with corruption error

Поиск

Список

Период

Сортировка

От	Andrey Borodin
Тема	Re: reporting TID/table with corruption error
Дата	10 января 2022 г. 12:10:47
Msg-id	B8AD9AE4-F533-4769-8B1A-B8A1DC099281@yandex-team.ru обсуждение исходный текст
Ответ на	reporting TID/table with corruption error (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы	Re: reporting TID/table with corruption error (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список	pgsql-hackers

Дерево обсуждения


> 19 авг. 2021 г., в 21:37, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
>
> A customer recently hit this error message:
>
> ERROR:  t_xmin is uncommitted in tuple to be updated

Hi!

Currently I'm observing this on one of our production clusters. The problem occurs at random points in time, seems to
becovered by retries on client's side and so far did not inflict any harm (except woken engineers). 

Few facts:
0. PostgreSQL 12.9 (with some unrelated patches)
1. amcheck\heapcheck\pg_visibility never suspected the cluster and remain silent
2. I observe the problem ~once a day
3. The tuple seems to be updated in a high-contention concurrency trigger function, autovacuum keeks in ~20-30 seconds
afterthe message in logs 

[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:ERROR:  t_xmin 696079792 is uncommitted in tuple
(1419011,109)to be updated in table "????s_statistics" 
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:CONTEXT:  SQL statement "UPDATE
????_????s.????s_statisticsos 
             SET ????_????_found_ts = COALESCE(os.????_????_found_ts, NOW()),
                 last_????_found_ts = NOW(),
                 num_????s = os.num_????s + 1
             WHERE ????_id = NEW.????_id"
        PL/pgSQL function statistics__update_from_new_????() line 3 at SQL statement
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:STATEMENT:
        INSERT INTO ????_????s.????s_????s AS ????s

4. t_xmin is relatevely new, not ancient
5. pageinspect shows dead tuple after some time
6. no suspicious activity in logs nearby
7. vacuum (disable_page_skipping) and repack of indexes did not change anything


I suspect this can be relatively new concurrency stuff. At least I never saw this before on clusters with clean amcheck
andheapcheck results. 

Alvaro, did you observe this on binaries from August 13 minor release or older?

Thanks!

Best regards, Andrey Borodin.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Julien Rouhaud
Дата: 10 января 2022 г., 11:56:58
Сообщение: Re: Multiple Query IDs for a rewritten parse tree

Следующее

От: Andrew Bille
Дата: 10 января 2022 г., 12:17:23
Сообщение: Re: [Proposal] Global temporary tables

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: reporting TID/table with corruption error

Предыдущее

Следующее