Обсуждение: uninitialized values in revised prepared xact code

Поиск
Список
Период
Сортировка

uninitialized values in revised prepared xact code

От
Andres Freund
Дата:
Hi,

I've just rerun valgrind for the first time in a while and saw the
following splat. My guess is it exists since bb38fb0d43c, but that's
blindly guessing:

==2049== Use of uninitialised value of size 8
==2049==    at 0x4FE66D: EndPrepare (twophase.c:1063)
==2049==    by 0x4F231B: PrepareTransaction (xact.c:2217)
==2049==    by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
==2049==    by 0x79013E: finish_xact_command (postgres.c:2408)
==2049==    by 0x78DE97: exec_simple_query (postgres.c:1062)
==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
==2049==    by 0x716804: PostmasterMain (postmaster.c:1219)
==2049==    by 0x679405: main (main.c:219)
==2049==  Uninitialised value was created by a stack allocation
==2049==    at 0x4FE16C: StartPrepare (twophase.c:942)
==2049==
==2049== Syscall param write(buf) points to uninitialised byte(s)
==2049==    at 0x5C69640: __write_nocancel (syscall-template.S:81)
==2049==    by 0x4FE6AE: EndPrepare (twophase.c:1064)
==2049==    by 0x4F231B: PrepareTransaction (xact.c:2217)
==2049==    by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
==2049==    by 0x79013E: finish_xact_command (postgres.c:2408)
==2049==    by 0x78DE97: exec_simple_query (postgres.c:1062)
==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
==2049==    by 0x716804: PostmasterMain (postmaster.c:1219)
==2049==    by 0x679405: main (main.c:219)
==2049==  Address 0x64694ed is 1,389 bytes inside a block of size 8,192 alloc'd
==2049==    at 0x4C27B8F: malloc (vg_replace_malloc.c:298)
==2049==    by 0x8E766E: AllocSetAlloc (aset.c:853)
==2049==    by 0x8E8E04: MemoryContextAllocZero (mcxt.c:627)
==2049==    by 0x8A54D3: AtStart_Inval (inval.c:704)
==2049==    by 0x4F1DFC: StartTransaction (xact.c:1841)
==2049==    by 0x4F28D1: StartTransactionCommand (xact.c:2529)
==2049==    by 0x7900A7: start_xact_command (postgres.c:2383)
==2049==    by 0x78DAF4: exec_simple_query (postgres.c:860)
==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
==2049==  Uninitialised value was created by a stack allocation
==2049==    at 0x4FE16C: StartPrepare (twophase.c:942)

It's probably just padding - twophase.c:1063 is the CRC32 computation of
the record data.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: uninitialized values in revised prepared xact code

От
Bruce Momjian
Дата:
Uh, was this fixed.  I see a cleanup commit for this C file, but this
report is from June:
commit 07a4a93a0e35a778c77ffbbbc18de29e859e18f0Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>Date:   Fri May 16
09:47:502014 +0300    Initialize tsId and dbId fields in WAL record of COMMIT PREPARED.    Commit dd428c79 added dbId
andtsId to the xl_xact_commit struct but missed    that prepared transaction commits reuse that struct. Fix that.
Becausethose fields were left unitialized, replaying a commit prepared WAL    record in a hot standby node would fail
toremove the relcache init file.    That can lead to "could not open file" errors on the standby. Relcache init    file
onlyneeds to be removed when a system table/index is rewritten in the    transaction using two phase commit, so that
shouldbe rare in practice. In    HEAD, the incorrect dbId/tsId values are also used for filtering in logical
replicationcode, causing the transaction to always be filtered out.    Analysis and fix by Andres Freund. Backpatch to
9.0where hot standby was    introduced.
 

---------------------------------------------------------------------------

On Mon, Jun 30, 2014 at 11:58:59AM +0200, Andres Freund wrote:
> Hi,
> 
> I've just rerun valgrind for the first time in a while and saw the
> following splat. My guess is it exists since bb38fb0d43c, but that's
> blindly guessing:
> 
> ==2049== Use of uninitialised value of size 8
> ==2049==    at 0x4FE66D: EndPrepare (twophase.c:1063)
> ==2049==    by 0x4F231B: PrepareTransaction (xact.c:2217)
> ==2049==    by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
> ==2049==    by 0x79013E: finish_xact_command (postgres.c:2408)
> ==2049==    by 0x78DE97: exec_simple_query (postgres.c:1062)
> ==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
> ==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
> ==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
> ==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
> ==2049==    by 0x716804: PostmasterMain (postmaster.c:1219)
> ==2049==    by 0x679405: main (main.c:219)
> ==2049==  Uninitialised value was created by a stack allocation
> ==2049==    at 0x4FE16C: StartPrepare (twophase.c:942)
> ==2049==
> ==2049== Syscall param write(buf) points to uninitialised byte(s)
> ==2049==    at 0x5C69640: __write_nocancel (syscall-template.S:81)
> ==2049==    by 0x4FE6AE: EndPrepare (twophase.c:1064)
> ==2049==    by 0x4F231B: PrepareTransaction (xact.c:2217)
> ==2049==    by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
> ==2049==    by 0x79013E: finish_xact_command (postgres.c:2408)
> ==2049==    by 0x78DE97: exec_simple_query (postgres.c:1062)
> ==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
> ==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
> ==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
> ==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
> ==2049==    by 0x716804: PostmasterMain (postmaster.c:1219)
> ==2049==    by 0x679405: main (main.c:219)
> ==2049==  Address 0x64694ed is 1,389 bytes inside a block of size 8,192 alloc'd
> ==2049==    at 0x4C27B8F: malloc (vg_replace_malloc.c:298)
> ==2049==    by 0x8E766E: AllocSetAlloc (aset.c:853)
> ==2049==    by 0x8E8E04: MemoryContextAllocZero (mcxt.c:627)
> ==2049==    by 0x8A54D3: AtStart_Inval (inval.c:704)
> ==2049==    by 0x4F1DFC: StartTransaction (xact.c:1841)
> ==2049==    by 0x4F28D1: StartTransactionCommand (xact.c:2529)
> ==2049==    by 0x7900A7: start_xact_command (postgres.c:2383)
> ==2049==    by 0x78DAF4: exec_simple_query (postgres.c:860)
> ==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
> ==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
> ==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
> ==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
> ==2049==  Uninitialised value was created by a stack allocation
> ==2049==    at 0x4FE16C: StartPrepare (twophase.c:942)
> 
> It's probably just padding - twophase.c:1063 is the CRC32 computation of
> the record data.
> 
> Greetings,
> 
> Andres Freund
> 
> --
>  Andres Freund                       http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: uninitialized values in revised prepared xact code

От
Heikki Linnakangas
Дата:
On 10/11/2014 10:20 PM, Bruce Momjian wrote:
>
> Uh, was this fixed.  I see a cleanup commit for this C file, but this
> report is from June:
>
>     commit 07a4a93a0e35a778c77ffbbbc18de29e859e18f0
>     Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>     Date:   Fri May 16 09:47:50 2014 +0300
>     
>         Initialize tsId and dbId fields in WAL record of COMMIT PREPARED.
>     
>         Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but missed
>         that prepared transaction commits reuse that struct. Fix that.
>     
>         Because those fields were left unitialized, replaying a commit prepared WAL
>         record in a hot standby node would fail to remove the relcache init file.
>         That can lead to "could not open file" errors on the standby. Relcache init
>         file only needs to be removed when a system table/index is rewritten in the
>         transaction using two phase commit, so that should be rare in practice. In
>         HEAD, the incorrect dbId/tsId values are also used for filtering in logical
>         replication code, causing the transaction to always be filtered out.
>     
>         Analysis and fix by Andres Freund. Backpatch to 9.0 where hot standby was
>         introduced.

No, that was a different issue.

(more below)

> On Mon, Jun 30, 2014 at 11:58:59AM +0200, Andres Freund wrote:
>> Hi,
>>
>> I've just rerun valgrind for the first time in a while and saw the
>> following splat. My guess is it exists since bb38fb0d43c, but that's
>> blindly guessing:
>>
>> ==2049== Use of uninitialised value of size 8
>> ==2049==    at 0x4FE66D: EndPrepare (twophase.c:1063)
>> ==2049==    by 0x4F231B: PrepareTransaction (xact.c:2217)
>> ==2049==    by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
>> ==2049==    by 0x79013E: finish_xact_command (postgres.c:2408)
>> ==2049==    by 0x78DE97: exec_simple_query (postgres.c:1062)
>> ==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049==    by 0x716804: PostmasterMain (postmaster.c:1219)
>> ==2049==    by 0x679405: main (main.c:219)
>> ==2049==  Uninitialised value was created by a stack allocation
>> ==2049==    at 0x4FE16C: StartPrepare (twophase.c:942)
>> ==2049==
>> ==2049== Syscall param write(buf) points to uninitialised byte(s)
>> ==2049==    at 0x5C69640: __write_nocancel (syscall-template.S:81)
>> ==2049==    by 0x4FE6AE: EndPrepare (twophase.c:1064)
>> ==2049==    by 0x4F231B: PrepareTransaction (xact.c:2217)
>> ==2049==    by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
>> ==2049==    by 0x79013E: finish_xact_command (postgres.c:2408)
>> ==2049==    by 0x78DE97: exec_simple_query (postgres.c:1062)
>> ==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049==    by 0x716804: PostmasterMain (postmaster.c:1219)
>> ==2049==    by 0x679405: main (main.c:219)
>> ==2049==  Address 0x64694ed is 1,389 bytes inside a block of size 8,192 alloc'd
>> ==2049==    at 0x4C27B8F: malloc (vg_replace_malloc.c:298)
>> ==2049==    by 0x8E766E: AllocSetAlloc (aset.c:853)
>> ==2049==    by 0x8E8E04: MemoryContextAllocZero (mcxt.c:627)
>> ==2049==    by 0x8A54D3: AtStart_Inval (inval.c:704)
>> ==2049==    by 0x4F1DFC: StartTransaction (xact.c:1841)
>> ==2049==    by 0x4F28D1: StartTransactionCommand (xact.c:2529)
>> ==2049==    by 0x7900A7: start_xact_command (postgres.c:2383)
>> ==2049==    by 0x78DAF4: exec_simple_query (postgres.c:860)
>> ==2049==    by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049==    by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049==    by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049==    by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049==  Uninitialised value was created by a stack allocation
>> ==2049==    at 0x4FE16C: StartPrepare (twophase.c:942)
>>
>> It's probably just padding - twophase.c:1063 is the CRC32 computation of
>> the record data.

Yeah. The padding bytes in TwoPhaseFileHeader were not initialized.

That's simple enough to fix, but when I run valgrind, I get a lot whole 
bunch of similar messages. A few are from pgstat: the padding bytes in 
the pgstat messages are not initialized. One comes from 
write_relcache_init_file(); again I believe it's padding bytes being 
uninitialized (in FormData_pg_attribute). And one from the XLogInsert 
from heap_insert; there's an uninitialized padding byte in 
xl_heap_insert. And so forth.. Is it worthwhile to hunt down all of 
these? If there aren't many more than these, it probably is worth it, 
but I fear this might be an endless effort. Have we been clean of these 
warnings at any point in the past?

- Heikki




Re: uninitialized values in revised prepared xact code

От
Andres Freund
Дата:
On 2014-10-24 17:13:49 +0300, Heikki Linnakangas wrote:
> Yeah. The padding bytes in TwoPhaseFileHeader were not initialized.
> 
> That's simple enough to fix, but when I run valgrind, I get a lot whole
> bunch of similar messages. A few are from pgstat: the padding bytes in the
> pgstat messages are not initialized. One comes from
> write_relcache_init_file(); again I believe it's padding bytes being
> uninitialized (in FormData_pg_attribute). And one from the XLogInsert from
> heap_insert; there's an uninitialized padding byte in xl_heap_insert. And so
> forth.. Is it worthwhile to hunt down all of these? If there aren't many
> more than these, it probably is worth it, but I fear this might be an
> endless effort. Have we been clean of these warnings at any point in the
> past?

Did you use the valgrind suppression file in src/tools? It suppresses
some "known harmless" cases.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: uninitialized values in revised prepared xact code

От
Heikki Linnakangas
Дата:
On 10/24/2014 05:40 PM, Andres Freund wrote:
> On 2014-10-24 17:13:49 +0300, Heikki Linnakangas wrote:
>> Yeah. The padding bytes in TwoPhaseFileHeader were not initialized.
>>
>> That's simple enough to fix, but when I run valgrind, I get a lot whole
>> bunch of similar messages. A few are from pgstat: the padding bytes in the
>> pgstat messages are not initialized. One comes from
>> write_relcache_init_file(); again I believe it's padding bytes being
>> uninitialized (in FormData_pg_attribute). And one from the XLogInsert from
>> heap_insert; there's an uninitialized padding byte in xl_heap_insert. And so
>> forth.. Is it worthwhile to hunt down all of these? If there aren't many
>> more than these, it probably is worth it, but I fear this might be an
>> endless effort. Have we been clean of these warnings at any point in the
>> past?
>
> Did you use the valgrind suppression file in src/tools? It suppresses
> some "known harmless" cases.

Ah, I did not. With the file, the original warning that started this 
thread is gone; you added a suppression for it in commit 
9a0a12f683235d3e10b873baba974f6414297a7e.

- Heikki