Обсуждение: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage
The following bug has been logged on the website: Bug reference: 15943 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 12beta2 Operating system: Ubuntu 18.04 Description: When running under valgrind (sitting on the master branch): printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql I get the following error: ==00:00:00:12.885 19648== Syscall param write(buf) points to uninitialised byte(s) ==00:00:00:12.885 19648== at 0x5054281: write (write.c:27) ==00:00:00:12.885 19648== by 0x2687F0: SlruPhysicalWritePage (slru.c:851) ==00:00:00:12.885 19648== by 0x268E95: SlruInternalWritePage (slru.c:543) ==00:00:00:12.885 19648== by 0x2690F1: SlruSelectLRUPage (slru.c:1095) ==00:00:00:12.885 19648== by 0x269777: SimpleLruZeroPage (slru.c:269) ==00:00:00:12.885 19648== by 0x31DABA: asyncQueueAddEntries (async.c:1379) ==00:00:00:12.885 19648== by 0x31F130: PreCommit_Notify (async.c:845) ==00:00:00:12.885 19648== by 0x273C19: CommitTransaction (xact.c:2128) ==00:00:00:12.885 19648== by 0x274A10: CommitTransactionCommand (xact.c:2921) ==00:00:00:12.885 19648== by 0x53BD7D: finish_xact_command (postgres.c:2597) ==00:00:00:12.885 19648== by 0x53E2F3: exec_simple_query (postgres.c:1256) ==00:00:00:12.885 19648== by 0x540545: PostgresMain (postgres.c:4256) ==00:00:00:12.885 19648== Address 0xf4f95bf is in a rw- anonymous segment ==00:00:00:12.885 19648== Uninitialised value was created by a stack allocation ==00:00:00:12.885 19648== at 0x31D967: asyncQueueAddEntries (async.c:1306) Increasing the count to 8192 produces another one, slightly different stack: ==00:00:00:29.999 19648== Syscall param write(buf) points to uninitialised byte(s) ==00:00:00:29.999 19648== at 0x5054281: write (write.c:27) ==00:00:00:29.999 19648== by 0x2687F0: SlruPhysicalWritePage (slru.c:851) ==00:00:00:29.999 19648== by 0x268E95: SlruInternalWritePage (slru.c:543) ==00:00:00:29.999 19648== by 0x26A1BD: SimpleLruTruncate (slru.c:1234) ==00:00:00:29.999 19648== by 0x31D5A5: asyncQueueAdvanceTail (async.c:2022) ==00:00:00:29.999 19648== by 0x31F231: ProcessCompletedNotifies (async.c:1132) ==00:00:00:29.999 19648== by 0x5403D1: PostgresMain (postgres.c:4176) ==00:00:00:29.999 19648== by 0x4B16D9: BackendRun (postmaster.c:4431) ==00:00:00:29.999 19648== by 0x4B481B: BackendStartup (postmaster.c:4122) ==00:00:00:29.999 19648== by 0x4B4B32: ServerLoop (postmaster.c:1704) ==00:00:00:29.999 19648== by 0x4B5F70: PostmasterMain (postmaster.c:1377) ==00:00:00:29.999 19648== by 0x406428: main (main.c:228) ==00:00:00:29.999 19648== Address 0xf4fb5bf is in a rw- anonymous segment ==00:00:00:29.999 19648== Uninitialised value was created by a stack allocation ==00:00:00:29.999 19648== at 0x31D967: asyncQueueAddEntries (async.c:1306) (It's also reproduced on REL_12_BETA3, REL9_6_15.)
Hi, On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote: >The following bug has been logged on the website: > >Bug reference: 15943 >Logged by: Alexander Lakhin >Email address: exclusion@gmail.com >PostgreSQL version: 12beta2 >Operating system: Ubuntu 18.04 >Description: > >When running under valgrind (sitting on the master branch): >printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql > >I get the following error: >==00:00:00:12.885 19648== Syscall param write(buf) points to >uninitialised >byte(s) >==00:00:00:12.885 19648== at 0x5054281: write (write.c:27) >==00:00:00:12.885 19648== by 0x2687F0: SlruPhysicalWritePage >(slru.c:851) >==00:00:00:12.885 19648== by 0x268E95: SlruInternalWritePage >(slru.c:543) >==00:00:00:12.885 19648== by 0x2690F1: SlruSelectLRUPage >(slru.c:1095) >==00:00:00:12.885 19648== by 0x269777: SimpleLruZeroPage >(slru.c:269) >==00:00:00:12.885 19648== by 0x31DABA: asyncQueueAddEntries >(async.c:1379) >==00:00:00:12.885 19648== by 0x31F130: PreCommit_Notify >(async.c:845) >==00:00:00:12.885 19648== by 0x273C19: CommitTransaction >(xact.c:2128) >==00:00:00:12.885 19648== by 0x274A10: CommitTransactionCommand >(xact.c:2921) >==00:00:00:12.885 19648== by 0x53BD7D: finish_xact_command >(postgres.c:2597) >==00:00:00:12.885 19648== by 0x53E2F3: exec_simple_query >(postgres.c:1256) >==00:00:00:12.885 19648== by 0x540545: PostgresMain >(postgres.c:4256) >==00:00:00:12.885 19648== Address 0xf4f95bf is in a rw- anonymous >segment >==00:00:00:12.885 19648== Uninitialised value was created by a stack >allocation >==00:00:00:12.885 19648== at 0x31D967: asyncQueueAddEntries >(async.c:1306) > >Increasing the count to 8192 produces another one, slightly different >stack: >==00:00:00:29.999 19648== Syscall param write(buf) points to >uninitialised >byte(s) >==00:00:00:29.999 19648== at 0x5054281: write (write.c:27) >==00:00:00:29.999 19648== by 0x2687F0: SlruPhysicalWritePage >(slru.c:851) >==00:00:00:29.999 19648== by 0x268E95: SlruInternalWritePage >(slru.c:543) >==00:00:00:29.999 19648== by 0x26A1BD: SimpleLruTruncate >(slru.c:1234) >==00:00:00:29.999 19648== by 0x31D5A5: asyncQueueAdvanceTail >(async.c:2022) >==00:00:00:29.999 19648== by 0x31F231: ProcessCompletedNotifies >(async.c:1132) >==00:00:00:29.999 19648== by 0x5403D1: PostgresMain >(postgres.c:4176) >==00:00:00:29.999 19648== by 0x4B16D9: BackendRun >(postmaster.c:4431) >==00:00:00:29.999 19648== by 0x4B481B: BackendStartup >(postmaster.c:4122) >==00:00:00:29.999 19648== by 0x4B4B32: ServerLoop >(postmaster.c:1704) >==00:00:00:29.999 19648== by 0x4B5F70: PostmasterMain >(postmaster.c:1377) >==00:00:00:29.999 19648== by 0x406428: main (main.c:228) >==00:00:00:29.999 19648== Address 0xf4fb5bf is in a rw- anonymous >segment >==00:00:00:29.999 19648== Uninitialised value was created by a stack >allocation >==00:00:00:29.999 19648== at 0x31D967: asyncQueueAddEntries >(async.c:1306) > >(It's also reproduced on REL_12_BETA3, REL9_6_15.) That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hello Andres, 08.08.2019 13:33, Andres Freund wrote: > Hi, > > On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote: >> The following bug has been logged on the website: >> >> Bug reference: 15943 >> Logged by: Alexander Lakhin >> Email address: exclusion@gmail.com >> PostgreSQL version: 12beta2 >> Operating system: Ubuntu 18.04 >> Description: >> >> When running under valgrind (sitting on the master branch): >> printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql >> > That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins. > With "--track-origins=yes --read-var-info=yes" I get: ==00:00:00:30.800 24110== Syscall param write(buf) points to uninitialised byte(s) ==00:00:00:30.801 24110== at 0x5054281: write (write.c:27) ==00:00:00:30.801 24110== by 0x2687F0: SlruPhysicalWritePage (slru.c:851) ==00:00:00:30.801 24110== by 0x268E95: SlruInternalWritePage (slru.c:543) ==00:00:00:30.801 24110== by 0x2690F1: SlruSelectLRUPage (slru.c:1095) ==00:00:00:30.801 24110== by 0x269777: SimpleLruZeroPage (slru.c:269) ==00:00:00:30.801 24110== by 0x31DABA: asyncQueueAddEntries (async.c:1381) ==00:00:00:30.801 24110== by 0x31F130: PreCommit_Notify (async.c:845) ==00:00:00:30.801 24110== by 0x273C19: CommitTransaction (xact.c:2128) ==00:00:00:30.801 24110== by 0x274A10: CommitTransactionCommand (xact.c:2921) ==00:00:00:30.801 24110== by 0x53BD7D: finish_xact_command (postgres.c:2597) ==00:00:00:30.801 24110== by 0x53E2F3: exec_simple_query (postgres.c:1256) ==00:00:00:30.801 24110== by 0x540545: PostgresMain (postgres.c:4256) ==00:00:00:30.801 24110== Address 0xf4f95bf is in a rw- anonymous segment ==00:00:00:30.801 24110== Uninitialised value was created by a stack allocation ==00:00:00:30.801 24110== at 0x31D967: asyncQueueAddEntries (async.c:1306) and ==00:00:00:48.177 24110== Syscall param write(buf) points to uninitialised byte(s) ==00:00:00:48.177 24110== at 0x5054281: write (write.c:27) ==00:00:00:48.177 24110== by 0x2687F0: SlruPhysicalWritePage (slru.c:851) ==00:00:00:48.177 24110== by 0x268E95: SlruInternalWritePage (slru.c:543) ==00:00:00:48.177 24110== by 0x26A1BD: SimpleLruTruncate (slru.c:1234) ==00:00:00:48.177 24110== by 0x31D5A5: asyncQueueAdvanceTail (async.c:2024) ==00:00:00:48.177 24110== by 0x31F231: ProcessCompletedNotifies (async.c:1132) ==00:00:00:48.177 24110== by 0x5403D1: PostgresMain (postgres.c:4176) ==00:00:00:48.177 24110== by 0x4B16D9: BackendRun (postmaster.c:4431) ==00:00:00:48.177 24110== by 0x4B481B: BackendStartup (postmaster.c:4122) ==00:00:00:48.177 24110== by 0x4B4B32: ServerLoop (postmaster.c:1704) ==00:00:00:48.178 24110== by 0x4B5F70: PostmasterMain (postmaster.c:1377) ==00:00:00:48.178 24110== by 0x406428: main (main.c:228) ==00:00:00:48.178 24110== Address 0xf4fb5bf is in a rw- anonymous segment ==00:00:00:48.178 24110== Uninitialised value was created by a stack allocation ==00:00:00:48.178 24110== at 0x31D967: asyncQueueAddEntries (async.c:1306) Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of asyncQueueAddEntries fixes the issue. More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same place fixes it too (the patch is attached). So it seems, the padding in AsyncQueueEntry has an alibi. Best regards, Alexander
Вложения
On 2019-Aug-08, Alexander Lakhin wrote: > Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of > asyncQueueAddEntries fixes the issue. > More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same > place fixes it too (the patch is attached). Maybe VALGRIND_MAKE_MEM_DEFINED? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On August 8, 2019 12:43:32 PM EDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >On 2019-Aug-08, Alexander Lakhin wrote: > >> Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of >> asyncQueueAddEntries fixes the issue. >> More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the >same >> place fixes it too (the patch is attached). > >Maybe VALGRIND_MAKE_MEM_DEFINED? Seems there's no reason to not zero initialize memory here? But perhaps just by initializing the stack variable? -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > Seems there's no reason to not zero initialize memory here? But perhaps just by initializing the stack variable? I think all we need here is another suppression in src/tools/valgrind.supp. We have not insisted on zeroing pad bytes that get sent to disk in the places already enumerated in valgrind.supp, so why would we apply a different rule to async.c? regards, tom lane
Hi, On August 8, 2019 2:41:42 PM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Andres Freund <andres@anarazel.de> writes: >> Seems there's no reason to not zero initialize memory here? But >perhaps just by initializing the stack variable? > >I think all we need here is another suppression in >src/tools/valgrind.supp. We have not insisted on zeroing >pad bytes that get sent to disk in the places already >enumerated in valgrind.supp, so why would we apply a >different rule to async.c? Well, with a lot of the other case there's a lot of sources for such uninitialized data. Here there probably is just one?If it weren't such a game of whack a mole, I'd even consider properly initializing the other places. And initializingthe stack data here ought to be cheap in this case? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hello hackers, > 08.08.2019 13:33, Andres Freund wrote: >> On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote: >>> The following bug has been logged on the website: >>> >>> Bug reference: 15943 >>> Logged by: Alexander Lakhin >>> Email address: exclusion@gmail.com >>> PostgreSQL version: 12beta2 >>> Operating system: Ubuntu 18.04 >>> Description: >>> >>> When running under valgrind (sitting on the master branch): >>> printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql >>> >> That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins. > Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of > asyncQueueAddEntries fixes the issue. > More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same > place fixes it too (the patch is attached). > So it seems, the padding in AsyncQueueEntry has an alibi. Andres, you're right. The padding is found guilty (inside the qe.data itself). Please look at the patch addressing the cause of the problem (inspired by btree_bit.c). But if a valgrind suppression is preferred, look at the alternate patch. Best regards, Alexander