Re: reporting TID/table with corruption error

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: reporting TID/table with corruption error
Дата
Msg-id B8AD9AE4-F533-4769-8B1A-B8A1DC099281@yandex-team.ru
обсуждение исходный текст
Ответ на reporting TID/table with corruption error  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: reporting TID/table with corruption error  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers

> 19 авг. 2021 г., в 21:37, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
>
> A customer recently hit this error message:
>
> ERROR:  t_xmin is uncommitted in tuple to be updated

Hi!

Currently I'm observing this on one of our production clusters. The problem occurs at random points in time, seems to
becovered by retries on client's side and so far did not inflict any harm (except woken engineers). 

Few facts:
0. PostgreSQL 12.9 (with some unrelated patches)
1. amcheck\heapcheck\pg_visibility never suspected the cluster and remain silent
2. I observe the problem ~once a day
3. The tuple seems to be updated in a high-contention concurrency trigger function, autovacuum keeks in ~20-30 seconds
afterthe message in logs 

[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:ERROR:  t_xmin 696079792 is uncommitted in tuple
(1419011,109)to be updated in table "????s_statistics" 
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:CONTEXT:  SQL statement "UPDATE
????_????s.????s_statisticsos 
             SET ????_????_found_ts = COALESCE(os.????_????_found_ts, NOW()),
                 last_????_found_ts = NOW(),
                 num_????s = os.num_????s + 1
             WHERE ????_id = NEW.????_id"
        PL/pgSQL function statistics__update_from_new_????() line 3 at SQL statement
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:STATEMENT:
        INSERT INTO ????_????s.????s_????s AS ????s

4. t_xmin is relatevely new, not ancient
5. pageinspect shows dead tuple after some time
6. no suspicious activity in logs nearby
7. vacuum (disable_page_skipping) and repack of indexes did not change anything


I suspect this can be relatively new concurrency stuff. At least I never saw this before on clusters with clean amcheck
andheapcheck results. 

Alvaro, did you observe this on binaries from August 13 minor release or older?

Thanks!

Best regards, Andrey Borodin.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: Multiple Query IDs for a rewritten parse tree
Следующее
От: Andrew Bille
Дата:
Сообщение: Re: [Proposal] Global temporary tables