Обсуждение: reporting TID/table with corruption error

Поиск
Список
Период
Сортировка

reporting TID/table with corruption error

От
Alvaro Herrera
Дата:
A customer recently hit this error message:

ERROR:  t_xmin is uncommitted in tuple to be updated

This was not very informative, so without any clues, we had to let it
go.  But it did occur to me that we can improve this message so that we
know details such as the TID and the relation that caused the issue, so
that if it ever occurs again we can at least look at the WAL stream for
anything affecting the tuple, maybe it'd help to understand the problem.

So I propose the following patch.

-- 
Álvaro Herrera

Вложения

Re: reporting TID/table with corruption error

От
Peter Geoghegan
Дата:
On Thu, Aug 19, 2021 at 9:38 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> A customer recently hit this error message:
>
> ERROR:  t_xmin is uncommitted in tuple to be updated
>
> This was not very informative, so without any clues, we had to let it
> go.  But it did occur to me that we can improve this message so that we
> know details such as the TID and the relation that caused the issue, so
> that if it ever occurs again we can at least look at the WAL stream for
> anything affecting the tuple, maybe it'd help to understand the problem.

I think that this is a very good idea. Ideally this stuff would be
more standardized.

-- 
Peter Geoghegan



Re: reporting TID/table with corruption error

От
Andrey Borodin
Дата:

> 19 авг. 2021 г., в 21:44, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> I think that this is a very good idea. Ideally this stuff would be
> more standardized.

+1
It would be great to see relation, block, offset, xmin\xmax and, probably, flags whenever
ERRCODE_DATA_CORRUPTED\ERRCODE_INDEX_CORRUPTEDis used. Iif it's possible to extract this information, of cause. This is
neededespecially in amcheck functions. 

Best regards, Andrey Borodin.


Re: reporting TID/table with corruption error

От
Andrey Borodin
Дата:

> 19 авг. 2021 г., в 21:37, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
>
> A customer recently hit this error message:
>
> ERROR:  t_xmin is uncommitted in tuple to be updated

Sorry for one more message, it occurred to me only after sending previous one that I know this exact message.

We encountered this error twice:
1. When dealing with broken WAL-delta backups in WAL-G (a kind of fast incremental backup, the bug was fixed back in
2019)The page simply was of an old version. 
2. As a result of broken page cache in patched linux kernel (clog was fsynced succesfully, but the page fsync was lost
beforerestart) 

Best regards, Andrey Borodin.


Re: reporting TID/table with corruption error

От
Dagfinn Ilmari Mannsåker
Дата:
Peter Geoghegan <pg@bowt.ie> writes:

> On Thu, Aug 19, 2021 at 9:38 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> A customer recently hit this error message:
>>
>> ERROR:  t_xmin is uncommitted in tuple to be updated
>>
>> This was not very informative, so without any clues, we had to let it
>> go.  But it did occur to me that we can improve this message so that we
>> know details such as the TID and the relation that caused the issue, so
>> that if it ever occurs again we can at least look at the WAL stream for
>> anything affecting the tuple, maybe it'd help to understand the problem.
>
> I think that this is a very good idea. Ideally this stuff would be
> more standardized.

There are various functions for adding auxilliary fields to the error
report, e.g. errtable() for the schema and table names, but that only
seems to be reported to the client, not logged on the server.

Should we be logging auxiliary error fields?  If not by default, maybe
by a higher log_error_verbosity level (possibly a new one between
default and verbose).

- ilmari



Re: reporting TID/table with corruption error

От
Mark Dilger
Дата:

> On Aug 19, 2021, at 10:17 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> It would be great to see relation, block, offset, xmin\xmax and, probably, flags whenever
ERRCODE_DATA_CORRUPTED\ERRCODE_INDEX_CORRUPTEDis used. Iif it's possible to extract this information, of cause. This is
neededespecially in amcheck functions. 

blockno, offnum and attnum are already included in every result for amcheck functions over heap relations, though
attnummay be null if the corruption is not specific to any particular column. 

xmin, xmax and various flags may occur in the corruption message if they are relevant, but they are not always present.

There was some disagreement during the development of verify_heapam on this point.  We went with the idea that the user
couldfind and inspect the corrupt data with another tool if they had the (blockno, offnum, attnum) information.  As
such,it wasn't necessary to include all the data in the output. 

It shouldn't be too complicated to have a second interface that returns all of the 23 byte main table tuple header
informationand also the 23 byte toast tuple header (when relevant) along with the corruption message.  The guts of the
corruptioncheck would be shared between the two interfaces.  I haven't tried writing a patch yet, but it seems the
patchshouldn't be terribly complicated. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: reporting TID/table with corruption error

От
Peter Geoghegan
Дата:
On Thu, Aug 19, 2021 at 10:49 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> There was some disagreement during the development of verify_heapam on this point.  We went with the idea that the
usercould find and inspect the corrupt data with another tool if they had the (blockno, offnum, attnum) information.
Assuch, it wasn't necessary to include all the data in the output. 

I find that kind of dubious. By definition you're never supposed to
see any verify_heapam errors. If you do end up seeing one, it
should be exceptional -- something that you'll learn from. High
verbosity makes a lot of sense here.


--
Peter Geoghegan



Re: reporting TID/table with corruption error

От
Mark Dilger
Дата:

> On Aug 19, 2021, at 10:57 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> High
> verbosity makes a lot of sense here.

Works for me.  We could create another function, "verify_heapam_full" perhaps, that returns additional columns matching
thosefrom pageinspect's heap_page_items(): 

    OUT lp smallint,
    OUT lp_off smallint,
    OUT lp_flags smallint,
    OUT lp_len smallint,
    OUT t_xmin xid,
    OUT t_xmax xid,
    OUT t_field3 int4,
    OUT t_ctid tid,
    OUT t_infomask2 integer,
    OUT t_infomask integer,
    OUT t_hoff smallint,
    OUT t_bits text,
    OUT t_oid oid,
    OUT t_data bytea

Should it also return the full page?  That would be quite verbose (an extra 8k per row), but it could be fed into any
ofpageinspect's functions for further analysis. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: reporting TID/table with corruption error

От
Andrey Borodin
Дата:

> 19 авг. 2021 г., в 23:19, Mark Dilger <mark.dilger@enterprisedb.com> написал(а):
>
>
>
>> On Aug 19, 2021, at 10:57 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>>
>> High
>> verbosity makes a lot of sense here.
>
> Works for me.  We could create another function, "verify_heapam_full" perhaps, that returns additional columns
matchingthose from pageinspect's heap_page_items(): 

Currently I'm mostly interested in index functions to investigate CIC bug.
I see 4 different cases for corruption checks:
1. Developer tackling a bug
2. Backup smoke test
3. DBA recovering corrupted data
4. Running system detects anomaly

In case 1 you want to find corruption and trace back events that lead to it. You need all the bits that can connect
currentstate with events in the past. 

In case 2 you want succinct check, that in case of fire provides initial information for case 3. Ideally you want a
checkthat is symbiosis of "all indexed" check and heap check. Meanwhile, it's preferred that you can share heap scan
betweenmany index checks. 

In case 3 you want to collect all corrupted data (find everything with same xmin\xmax, or on the same page, or with
nearxmin\xmax). In this case returning heap page right away would be quite useful. 
Sometimes you want to create backup copy of the page to try some surgery. (create table backup_pages as select from
verify_heapam_full())

In case 4 you want to alarm DBA and provide all the necessary information to start 3. Adding standardised corruption
infoto all ERRCODE_DATA_CORRUPTED\ERRCODE_INDEX_CORRUPTED would suffice. Also, when monitoring wakes you at night you
wantto know: 
- How many tuples are corrupted?
- How long ago data was corrupted? Is corrupted data within PITR window yet?
- Where to seek a manual for recovery?
But I don't think we can have this logged in case of "ERROR:  t_xmin is uncommitted in tuple to be updated"

Thanks!

Best regards, Andrey Borodin.


Re: reporting TID/table with corruption error

От
Alvaro Herrera
Дата:
Pushed to all branches, thanks.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/



Re: reporting TID/table with corruption error

От
Andrey Borodin
Дата:

> 19 авг. 2021 г., в 21:37, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
>
> A customer recently hit this error message:
>
> ERROR:  t_xmin is uncommitted in tuple to be updated

Hi!

Currently I'm observing this on one of our production clusters. The problem occurs at random points in time, seems to
becovered by retries on client's side and so far did not inflict any harm (except woken engineers). 

Few facts:
0. PostgreSQL 12.9 (with some unrelated patches)
1. amcheck\heapcheck\pg_visibility never suspected the cluster and remain silent
2. I observe the problem ~once a day
3. The tuple seems to be updated in a high-contention concurrency trigger function, autovacuum keeks in ~20-30 seconds
afterthe message in logs 

[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:ERROR:  t_xmin 696079792 is uncommitted in tuple
(1419011,109)to be updated in table "????s_statistics" 
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:CONTEXT:  SQL statement "UPDATE
????_????s.????s_statisticsos 
             SET ????_????_found_ts = COALESCE(os.????_????_found_ts, NOW()),
                 last_????_found_ts = NOW(),
                 num_????s = os.num_????s + 1
             WHERE ????_id = NEW.????_id"
        PL/pgSQL function statistics__update_from_new_????() line 3 at SQL statement
[ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:STATEMENT:
        INSERT INTO ????_????s.????s_????s AS ????s

4. t_xmin is relatevely new, not ancient
5. pageinspect shows dead tuple after some time
6. no suspicious activity in logs nearby
7. vacuum (disable_page_skipping) and repack of indexes did not change anything


I suspect this can be relatively new concurrency stuff. At least I never saw this before on clusters with clean amcheck
andheapcheck results. 

Alvaro, did you observe this on binaries from August 13 minor release or older?

Thanks!

Best regards, Andrey Borodin.


Re: reporting TID/table with corruption error

От
Alvaro Herrera
Дата:
On 2022-Jan-10, Andrey Borodin wrote:

> 3. The tuple seems to be updated in a high-contention concurrency
> trigger function, autovacuum keeks in ~20-30 seconds after the message
> in logs

Hmm, I bet this is related.

> [ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:ERROR:  t_xmin 696079792 is uncommitted in
tuple(1419011,109) to be updated in table "????s_statistics"
 
> [ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:CONTEXT:  SQL statement "UPDATE
????_????s.????s_statisticsos
 
>              SET ????_????_found_ts = COALESCE(os.????_????_found_ts, NOW()),
>                  last_????_found_ts = NOW(),
>                  num_????s = os.num_????s + 1
>              WHERE ????_id = NEW.????_id"
>         PL/pgSQL function statistics__update_from_new_????() line 3 at SQL statement
> [ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:STATEMENT:  
>         INSERT INTO ????_????s.????s_????s AS ????s 
> 
> 4. t_xmin is relatevely new, not ancient
> 5. pageinspect shows dead tuple after some time
> 6. no suspicious activity in logs nearby
> 7. vacuum (disable_page_skipping) and repack of indexes did not change anything
> 
> 
> I suspect this can be relatively new concurrency stuff. At least I never saw this before on clusters with clean
amcheckand heapcheck results.
 

Ah.  I've been thinking that it'd be some very old tuple that is in
trouble, but that seems to be proven false.  I think we should examine
the affected tuples more closely while they're in the state that causes
the problem.  Can you set up things so that pageinspect's
heap_page_items() is run on the broken page, before the problem
disappears?  Maybe extract the page number from the log line, have
get_raw_page() store the page in a separate table, so that we can run
heap_page_items() at leisure later.  I would also grep through
pg_waldump output to see what changes have been done to that tuple.

Maybe the reason the problem fixes itself is that something else deletes
the tuple.

> Alvaro, did you observe this on binaries from August 13 minor release or older?

Well, the only reports I have of this problem are with the original
error message that didn't give any clues at to what the problem was or
where to look for it, so I don't know if the xmin was recent or not.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/