Re: logical decoding bug: segfault in ReorderBufferToastReplace()

Поиск
Список
Период
Сортировка
От Drouvot, Bertrand
Тема Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Дата
Msg-id EEB686D3-F8A7-4371-9A96-5DF3B72A7734@amazon.com
обсуждение исходный текст
Ответ на Re: logical decoding bug: segfault in ReorderBufferToastReplace()  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: logical decoding bug: segfault in ReorderBufferToastReplace()  (Robert Haas <robertmhaas@gmail.com>)
Re: logical decoding bug: segfault in ReorderBufferToastReplace()  (Robert Haas <robertmhaas@gmail.com>)
Re: logical decoding bug: segfault in ReorderBufferToastReplace()  (Andres Freund <andres@anarazel.de>)
Re: logical decoding bug: segfault in ReorderBufferToastReplace()  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
On 12/9/19, 10:10 AM, "Tomas Vondra" <tomas.vondra@2ndquadrant.com> wrote:
    >On Wed, Dec 04, 2019 at 05:36:16PM -0800, Jeremy Schneider wrote:
    >>On 9/8/19 14:01, Tom Lane wrote:
    >>> Fix RelationIdGetRelation calls that weren't bothering with error checks.
    >>>
    >>> ...
    >>>
    >>> Details
    >>> -------
    >>> https://git.postgresql.org/pg/commitdiff/69f883fef14a3fc5849126799278abcc43f40f56
    >>
    >>We had two different databases this week (with the same schema) both
    >>independently hit the condition of this recent commit from Tom. It's on
    >>11.5 so we're actually segfaulting and restarting rather than just
    >>causing the walsender process to ERROR, but regardless there's still
    >>some underlying bug here.
    >>
    >>We have core files and we're still working to see if we can figure out
    >>what's going on, but I thought I'd report now in case anyone has extra
    >>ideas or suggestions.  The segfault is on line 3034 of reorderbuffer.c.
    >>
    >>https://github.com/postgres/postgres/blob/REL_11_5/src/backend/replication/logical/reorderbuffer.c#L3034
    >>
    >>3033     toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
    >>3034     toast_desc = RelationGetDescr(toast_rel);
    >>
    >>We'll keep looking; let me know any feedback! Would love to track down
    >>whatever bug is in the logical decoding code, if that's what it is.
    >>
    >>==========
    >>
    >>backtrace showing the call stack...
    >>
    >>Core was generated by `postgres: walsender <NAME-REDACTED>
    >><DNS-REDACTED>(31712)'.
    >>Program terminated with signal 11, Segmentation fault.
    >>#0  ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
    >>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
    >>    at reorderbuffer.c:3034
    >>3034    reorderbuffer.c: No such file or directory.
    >>...
    >>(gdb) #0  ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
    >>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
    >>    at reorderbuffer.c:3034
    >>#1  ReorderBufferCommit (rb=0x3086af0, xid=xid@entry=1358809,
    >>commit_lsn=9430473346032, end_lsn=<optimized out>,
    >>    commit_time=commit_time@entry=628712466364268,
    >>origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at
    >>reorderbuffer.c:1584
    >>#2  0x0000000000716248 in DecodeCommit (xid=1358809,
    >>parsed=0x7ffc4ce123f0, buf=0x7ffc4ce125b0, ctx=0x3068f70) at decode.c:637
    >>#3  DecodeXactOp (ctx=0x3068f70, buf=buf@entry=0x7ffc4ce125b0) at
    >>decode.c:245
    >>#4  0x000000000071655a in LogicalDecodingProcessRecord (ctx=0x3068f70,
    >>record=0x3069208) at decode.c:117
    >>#5  0x0000000000727150 in XLogSendLogical () at walsender.c:2886
    >>#6  0x0000000000729192 in WalSndLoop (send_data=send_data@entry=0x7270f0
    >><XLogSendLogical>) at walsender.c:2249
    >>#7  0x0000000000729f91 in StartLogicalReplication (cmd=0x30485a0) at
    >>walsender.c:1111
    >>#8  exec_replication_command (
    >>    cmd_string=cmd_string@entry=0x2f968b0 "START_REPLICATION SLOT
    >>\"<NAME-REDACTED>\" LOGICAL 893/38002B98 (proto_version '1',
    >>publication_names '\"<NAME-REDACTED>\"')") at walsender.c:1628
    >>#9  0x000000000076e939 in PostgresMain (argc=<optimized out>,
    >>argv=argv@entry=0x2fea168, dbname=0x2fea020 "<NAME-REDACTED>",
    >>    username=<optimized out>) at postgres.c:4182
    >>#10 0x00000000004bdcb5 in BackendRun (port=0x2fdec50) at postmaster.c:4410
    >>#11 BackendStartup (port=0x2fdec50) at postmaster.c:4082
    >>#12 ServerLoop () at postmaster.c:1759
    >>#13 0x00000000007062f9 in PostmasterMain (argc=argc@entry=7,
    >>argv=argv@entry=0x2f92540) at postmaster.c:1432
    >>#14 0x00000000004be73b in main (argc=7, argv=0x2f92540) at main.c:228
    >>
    >>==========
    >>
    >>Some additional context...
    >>
    >># select * from pg_publication_rel;
    >> prpubid | prrelid
    >>---------+---------
    >>   71417 |   16453
    >>   71417 |   54949
    >>(2 rows)
    >>
    >>(gdb) print toast_rel
    >>$4 = (struct RelationData *) 0x0
    >>
    >>(gdb) print *relation->rd_rel
    >>$11 = {relname = {data = "<NAME-REDACTED>", '\000' <repeats 44 times>},
    >>relnamespace = 16402, reltype = 16430, reloftype = 0,
    >>relowner = 16393, relam = 0, relfilenode = 16428, reltablespace = 0,
    >>relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0,
    
    >Hmmm, so reltoastrelid = 0, i.e. the relation does not have a TOAST
    >relation. Yet we're calling ReorderBufferToastReplace on the decoded
    >record ... interesting.
    >
    >Can you share structure of the relation causing the issue?
    
   Here it is:

\d+ rel_having_issue
                                                             Table "public.rel_having_issue"
     Column     |           Type           | Collation | Nullable |                     Default                     |
Storage | Stats target | Description
 

----------------+--------------------------+-----------+----------+-------------------------------------------------+----------+--------------+-------------
 id             | integer                  |           | not null | nextval('rel_having_issue_id_seq'::regclass) |
plain   |              |
 
 field1           | character varying(255)   |           |          |                                                 |
extended|              |
 
 field2          | integer                  |           |          |                                                 |
plain   |              |
 
 field3 | timestamp with time zone |           |          |                                                 | plain
|             |
 
Indexes:
    "rel_having_issue_pkey" PRIMARY KEY, btree (id)

select relname,relfilenode,reltoastrelid from pg_class where relname='rel_having_issue';
       relname       | relfilenode | reltoastrelid
---------------------+-------------+---------------
 rel_having_issue |       16428 |             0

Bertrand


В списке pgsql-bugs по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Reorderbuffer crash during recovery
Следующее
От: Devrim Gündüz
Дата:
Сообщение: Re: BUG #16152: postgresql10-plpython-10.11-2PGDG.rhel7.x86_64requires an unexistant package