Re: logical decoding bug: segfault in ReorderBufferToastReplace()

Поиск
Список
Период
Сортировка
От Jeremy Schneider
Тема Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Дата
Msg-id 187dfed1-7d97-4a8c-2932-b8a3d4dce697@amazon.com
обсуждение исходный текст
Ответ на Re: logical decoding bug: segfault in ReorderBufferToastReplace()  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
On 12/13/19 16:25, Andres Freund wrote:
> On 2019-12-13 16:13:35 -0800, Jeremy Schneider wrote:
>> On 12/11/19 08:35, Andres Freund wrote:
>>> I think we need to see pg_waldump output for the preceding records. That
>>> might allow us to see why there's a toast record that's being associated
>>> with this table, despite there not being a toast table.
>> Unfortunately the WAL logs are no longer available at this time.  :(
>>
>> I did a little poking around in the core file and searching source code
>> but didn't find anything yet.  Is there any memory structure that would
>> have the preceding/following records cached in memory?  If so then I
>> might be able to extract this from the core dumps.
> 
> Well, not the records directly, but the changes could be, depending on
> the size of the changes. That'd already help. It depends a bit on
> whether there are subtransactions or not (txn->nsubtxns will tell
> you). Within one transaction, the currently loaded (i.e. not changes
> that are spilled to disk, and haven't currently been restored - see
> txn->serialized) changes are in ReorderBufferTXN->changes.

I did include the txn in the original post to this thread; there are 357
changes in the transaction and they are all in memory (none spilled to
disk a.k.a. serialized).  No subtransactions.  However I do see that
"txn.has_catalog_changes = true" which makes me wonder if that's related
to the bug.

So... now I know... walking a dlist in gdb and dumping all the changes
is not exactly a walk in the park!  Need some python magic like Tomas
Vondra's script that decodes Nodes.  I was not yet successful today in
figuring out how to do this... so the changes are there in the core dump
but I can't get them yet.  :)

I also dug around the ReorderBufferIterTXNState a little bit but there's
nothing that isn't already in the original post.

If anyone has a trick for walking a dlist in gdb that would be awesome...

I'm off for holidays and won't be working on this for a couple weeks;
not sure whether it'll be possible to get to the bottom of it. But I
hope there's enough info in this thread to at least get a head start if
someone hits it again in the future.


> Well, I've heard mutterings that plain RDS postgres had some efficiency
> improvements around snapshots (in the GetSnapshotData() sense) - and
> that's an area where slightly wrong changes could quite plausibly
> cause a bug like this.

Definitely no changes around snapshots. I've never even heard anyone
talk about making changes like that in RDS PostgreSQL - feels to me like
people at AWS want it to be as close as possible to postgresql.org code.

Aurora is different; it feels to me like the engineering org has more
license to make changes. For example they re-wrote the subtransaction
subsystem. No changes to GetSnapshotData though.

-Jeremy


-- 
Jeremy Schneider
Database Engineer
Amazon Web Services



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Indexing on JSONB field not working
Следующее
От: Zhihong Zhang
Дата:
Сообщение: Re: Indexing on JSONB field not working