Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Дата
Msg-id 20191118222416.dkn5cdmbxmtcemaf@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-bugs
Hi,

On 2019-11-18 21:58:16 +0100, Tomas Vondra wrote:
> and the ReorderBufferToastReplace does this:
> 
>     newtup = change->data.tp.newtuple;
> 
>     heap_deform_tuple(&newtup->tuple, desc, attrs, isnull);
> 
> but that fails, because the tuple pointer happens to be 0x8, which is
> clearly bogus. Not sure where that comes from, I don't recognize that as
> a typical patter.

It indicates that change->data.tp.newtuple is NULL,
afaict. newtup->tuple boils down to
((char *) newtup->tuple) + offsetof(ReorderBufferTupleBuf, tuple)
and offsetof(ReorderBufferTupleBuf, tuple) is 0x8.


> Can you create a core dump (see [1]), and print 'change' and 'txn' in
> frame #2? I wonder if some the other fields are bogus too (but it can't
> be entirely true ...), and if the transaction got serialized.

Please print change and *change, both, please.

I suspect what's happening is that somehow a change that shouldn't have
toast changes - e.g. a DELETE - somehow has toast changes. Which then
triggers a failure in ReorderBufferToastReplace(), which expects
newtuple to be valid.

It's probably worthwhile to add an elog(ERROR) check for this, even if
this does not turn out to be the case.



> > This behaviour does not depends on defined data in tables, because we see it
> > in different database with different sets of tables in publications.
> 
> I'm not sure I really believe that. Surely there has to be something
> special about your schema, or possibly access patter that triggers this
> bug in your environment and not elsewhere.

Yea.  Are there any C triggers present? Any unusual extensions? Users of
the transaction hook, for example?


> > Looks like a real issue in logical replication.
> > I will happy to provide an additional information about that issue, but i
> > should know what else to need to collect for helping to solve this
> > problem.
> > 
> 
> Well, if you can create a reproducer, that'd be the best option, because
> then we can investigate locally instead of the ping-ping here.
> 
> But if that's not possible, let's start with the schema and the
> additional information from the core file.
> 
> I'd also like to see the contents of the WAL, particularly for the XID
> triggering this issue. Please run pg_waldump and see how much data is
> there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure
> where it starts. It may have subtransactions, so don't do just grep.

Yea, that'd be helpful.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Elvis Pranskevichus
Дата:
Сообщение: Re: BUG #16121: 12 regression: Volatile function in target list subquery behave as stable
Следующее
От: Adam Scott
Дата:
Сообщение: Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query