Re: skink's test_decoding failures in 9.4 branch

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: skink's test_decoding failures in 9.4 branch
Дата
Msg-id 21842.1469033104@sss.pgh.pa.us
обсуждение исходный текст
Ответ на skink's test_decoding failures in 9.4 branch  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: skink's test_decoding failures in 9.4 branch  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
I wrote:
> I've still had no luck reproducing it here, though.

Hah --- I take that back.  On about the fourth or fifth trial:

==00:00:00:34.291 21525== Invalid read of size 1
==00:00:00:34.291 21525==    at 0x4A08DEC: memcpy (mc_replace_strmem.c:882)
==00:00:00:34.291 21525==    by 0x66FA54: DecodeXLogTuple (decode.c:899)
==00:00:00:34.291 21525==    by 0x670561: LogicalDecodingProcessRecord (decode.c:711)
==00:00:00:34.291 21525==    by 0x671BC3: pg_logical_slot_get_changes_guts (logicalfuncs.c:440)
==00:00:00:34.291 21525==    by 0x5C0B6B: ExecMakeTableFunctionResult (execQual.c:2196)
==00:00:00:34.291 21525==    by 0x5D4131: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:34.291 21525==    by 0x5C170D: ExecScan (execScan.c:82)
==00:00:00:34.291 21525==    by 0x5BA007: ExecProcNode (execProcnode.c:426)
==00:00:00:34.291 21525==    by 0x5B8A61: standard_ExecutorRun (execMain.c:1490)
==00:00:00:34.291 21525==    by 0x6BFE36: PortalRunSelect (pquery.c:942)
==00:00:00:34.291 21525==    by 0x6C11EF: PortalRun (pquery.c:786)
==00:00:00:34.291 21525==    by 0x6BD7E3: exec_simple_query (postgres.c:1072)
==00:00:00:34.291 21525==  Address 0xe5311d6 is 6 bytes after a block of size 8,192 alloc'd
==00:00:00:34.291 21525==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==00:00:00:34.291 21525==    by 0x4ED399: XLogReaderAllocate (xlogreader.c:83)
==00:00:00:34.291 21525==    by 0x6710B3: StartupDecodingContext (logical.c:161)
==00:00:00:34.291 21525==    by 0x671303: CreateDecodingContext (logical.c:413)
==00:00:00:34.291 21525==    by 0x671AF7: pg_logical_slot_get_changes_guts (logicalfuncs.c:394)
==00:00:00:34.291 21525==    by 0x5C0B6B: ExecMakeTableFunctionResult (execQual.c:2196)
==00:00:00:34.291 21525==    by 0x5D4131: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:34.291 21525==    by 0x5C170D: ExecScan (execScan.c:82)
==00:00:00:34.291 21525==    by 0x5BA007: ExecProcNode (execProcnode.c:426)
==00:00:00:34.291 21525==    by 0x5B8A61: standard_ExecutorRun (execMain.c:1490)
==00:00:00:34.291 21525==    by 0x6BFE36: PortalRunSelect (pquery.c:942)
==00:00:00:34.291 21525==    by 0x6C11EF: PortalRun (pquery.c:786)
==00:00:00:34.291 21525== 
...
...
==00:00:00:35.011 21525== Invalid read of size 1
==00:00:00:35.011 21525==    at 0x4A08CCA: memcpy (mc_replace_strmem.c:882)
==00:00:00:35.011 21525==    by 0x66FA54: DecodeXLogTuple (decode.c:899)
==00:00:00:35.011 21525==    by 0x670561: LogicalDecodingProcessRecord (decode.c:711)
==00:00:00:35.011 21525==    by 0x671BC3: pg_logical_slot_get_changes_guts (logicalfuncs.c:440)
==00:00:00:35.011 21525==    by 0x5C0B6B: ExecMakeTableFunctionResult (execQual.c:2196)
==00:00:00:35.011 21525==    by 0x5D4131: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:35.011 21525==    by 0x5C170D: ExecScan (execScan.c:82)
==00:00:00:35.011 21525==    by 0x5BA007: ExecProcNode (execProcnode.c:426)
==00:00:00:35.012 21525==    by 0x5B8A61: standard_ExecutorRun (execMain.c:1490)
==00:00:00:35.012 21525==    by 0x6BFE36: PortalRunSelect (pquery.c:942)
==00:00:00:35.012 21525==    by 0x6C11EF: PortalRun (pquery.c:786)
==00:00:00:35.012 21525==    by 0x6BD7E3: exec_simple_query (postgres.c:1072)
==00:00:00:35.012 21525==  Address 0x4ff2450 is 0 bytes after a block of size 8,192 alloc'd
==00:00:00:35.012 21525==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==00:00:00:35.012 21525==    by 0x4ED399: XLogReaderAllocate (xlogreader.c:83)
==00:00:00:35.012 21525==    by 0x6710B3: StartupDecodingContext (logical.c:161)
==00:00:00:35.012 21525==    by 0x671303: CreateDecodingContext (logical.c:413)
==00:00:00:35.012 21525==    by 0x671AF7: pg_logical_slot_get_changes_guts (logicalfuncs.c:394)
==00:00:00:35.012 21525==    by 0x5C0B6B: ExecMakeTableFunctionResult (execQual.c:2196)
==00:00:00:35.012 21525==    by 0x5D4131: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:35.012 21525==    by 0x5C170D: ExecScan (execScan.c:82)
==00:00:00:35.012 21525==    by 0x5BA007: ExecProcNode (execProcnode.c:426)
==00:00:00:35.012 21525==    by 0x5B8A61: standard_ExecutorRun (execMain.c:1490)
==00:00:00:35.012 21525==    by 0x6BFE36: PortalRunSelect (pquery.c:942)
==00:00:00:35.012 21525==    by 0x6C11EF: PortalRun (pquery.c:786)
==00:00:00:35.012 21525== 
==00:00:00:35.012 21525== Invalid read of size 1
==00:00:00:35.012 21525==    at 0x4A08CB8: memcpy (mc_replace_strmem.c:882)
==00:00:00:35.012 21525==    by 0x66FA54: DecodeXLogTuple (decode.c:899)
==00:00:00:35.012 21525==    by 0x670561: LogicalDecodingProcessRecord (decode.c:711)
==00:00:00:35.012 21525==    by 0x671BC3: pg_logical_slot_get_changes_guts (logicalfuncs.c:440)
==00:00:00:35.012 21525==    by 0x5C0B6B: ExecMakeTableFunctionResult (execQual.c:2196)
==00:00:00:35.012 21525==    by 0x5D4131: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:35.012 21525==    by 0x5C170D: ExecScan (execScan.c:82)
==00:00:00:35.012 21525==    by 0x5BA007: ExecProcNode (execProcnode.c:426)
==00:00:00:35.012 21525==    by 0x5B8A61: standard_ExecutorRun (execMain.c:1490)
==00:00:00:35.012 21525==    by 0x6BFE36: PortalRunSelect (pquery.c:942)
==00:00:00:35.012 21525==    by 0x6C11EF: PortalRun (pquery.c:786)
==00:00:00:35.012 21525==    by 0x6BD7E3: exec_simple_query (postgres.c:1072)
==00:00:00:35.012 21525==  Address 0x4ff2451 is 1 bytes after a block of size 8,192 alloc'd
==00:00:00:35.012 21525==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==00:00:00:35.012 21525==    by 0x4ED399: XLogReaderAllocate (xlogreader.c:83)
==00:00:00:35.012 21525==    by 0x6710B3: StartupDecodingContext (logical.c:161)
==00:00:00:35.012 21525==    by 0x671303: CreateDecodingContext (logical.c:413)
==00:00:00:35.012 21525==    by 0x671AF7: pg_logical_slot_get_changes_guts (logicalfuncs.c:394)
==00:00:00:35.012 21525==    by 0x5C0B6B: ExecMakeTableFunctionResult (execQual.c:2196)
==00:00:00:35.012 21525==    by 0x5D4131: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:35.012 21525==    by 0x5C170D: ExecScan (execScan.c:82)
==00:00:00:35.012 21525==    by 0x5BA007: ExecProcNode (execProcnode.c:426)
==00:00:00:35.012 21525==    by 0x5B8A61: standard_ExecutorRun (execMain.c:1490)
==00:00:00:35.012 21525==    by 0x6BFE36: PortalRunSelect (pquery.c:942)
==00:00:00:35.012 21525==    by 0x6C11EF: PortalRun (pquery.c:786)
==00:00:00:35.012 21525== 
...
...
LOG:  server process (PID 21525) exited with exit code 128

This is rather interesting because I do not recall that any of skink's
failures have shown an access more than 1 byte past the end of the buffer.

Any suggestions how to debug this?
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Следующее
От: Rod Taylor
Дата:
Сообщение: Re: Design for In-Core Logical Replication