Re: Segmentation Fault in logical decoding get/peek API

Поиск
Список
Период
Сортировка
От Jeremy Finzel
Тема Re: Segmentation Fault in logical decoding get/peek API
Дата
Msg-id CAMa1XUjoL_DhD4AWz+MRZYYqjXt=vNcUhuy=JPK2LKS3boVqGQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Segmentation Fault in logical decoding get/peek API  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Segmentation Fault in logical decoding get/peek API  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
Well, as Peter said, "git bisect" and trying to reproduce the problem
at each step would be the way to prove it definitively.  Seems mighty
tedious though.  Possibly you could shave some time off the process
by assuming it must have been one of the commits that touched
reorderbuffer.c ... a quick check says there have been ten of those
in the v10 branch since 10.3.

Update:
  • I definitely got the same segfault on a commit after 10.4 - 0bb28ca
  • I am now getting a different segfault on 10.5 - but I need another set of eyes to verify I am not compiling it wrong
After decoding successfully for awhile, now I get an immediate segfault upon peek_changes.  First of all, here is the backtrace:

$ sudo -u postgres gdb -q -c /san/<cluster>/pgdata/core /usr/lib/postgresql/10.5/bin/postgres
Reading symbols from /usr/lib/postgresql/10.5/bin/postgres...done.
[New LWP 22699]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: <cluster>: jfinzel foo_db 10.7.111.37(52316) FETCH'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007eff42d54428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007eff42d54428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007eff42d5602a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000000000a45f9c in ExceptionalCondition (conditionName=0xc2d688 "!(prev_first_lsn < cur_txn->first_lsn)", errorType=0xc2d404 "FailedAssertion", fileName=0xc2d478 "reorderbuffer.c", lineNumber=688) at assert.c:54
#3  0x000000000084b0ac in AssertTXNLsnOrder (rb=0x28ed790) at reorderbuffer.c:688
#4  0x000000000084ab97 in ReorderBufferTXNByXid (rb=0x28ed790, xid=319299822, create=1 '\001', is_new=0x0, lsn=9888781386112, create_as_top=1 '\001') at reorderbuffer.c:567
#5  0x000000000084d86c in ReorderBufferAddNewTupleCids (rb=0x28ed790, xid=319299822, lsn=9888781386112, node=..., tid=..., cmin=2, cmax=4294967295, combocid=4294967295) at reorderbuffer.c:2053
#6  0x00000000008522b6 in SnapBuildProcessNewCid (builder=0x28f57c0, xid=319299827, lsn=9888781386112, xlrec=0x2821c08) at snapbuild.c:780
#7  0x000000000083f280 in DecodeHeap2Op (ctx=0x28dd720, buf=0x7ffc5b73e2d0) at decode.c:371
#8  0x000000000083ebb1 in LogicalDecodingProcessRecord (ctx=0x28dd720, record=0x28dd9e0) at decode.c:121
#9  0x0000000000844f86 in pg_logical_slot_get_changes_guts (fcinfo=0x7ffc5b73e600, confirm=0 '\000', binary=0 '\000') at logicalfuncs.c:308
#10 0x000000000084514d in pg_logical_slot_peek_changes (fcinfo=0x7ffc5b73e600) at logicalfuncs.c:381
#11 0x00000000006f7973 in ExecMakeTableFunctionResult (setexpr=0x28265b8, econtext=0x28262b0, argContext=0x28b4af0, expectedDesc=0x28d1d20, randomAccess=4 '\004') at execSRF.c:231
#12 0x000000000070a870 in FunctionNext (node=0x2826198) at nodeFunctionscan.c:94
#13 0x00000000006f6f6e in ExecScanFetch (node=0x2826198, accessMtd=0x70a7b9 <FunctionNext>, recheckMtd=0x70aba1 <FunctionRecheck>) at execScan.c:97
#14 0x00000000006f6fdd in ExecScan (node=0x2826198, accessMtd=0x70a7b9 <FunctionNext>, recheckMtd=0x70aba1 <FunctionRecheck>) at execScan.c:147
#15 0x000000000070abef in ExecFunctionScan (pstate=0x2826198) at nodeFunctionscan.c:270
#16 0x00000000006f541a in ExecProcNodeFirst (node=0x2826198) at execProcnode.c:430
#17 0x00000000006ed5af in ExecProcNode (node=0x2826198) at ../../../src/include/executor/executor.h:250
#18 0x00000000006effaf in ExecutePlan (estate=0x2825f80, planstate=0x2826198, use_parallel_mode=0 '\000', operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=2000, direction=ForwardScanDirection, dest=0x27ffc78, execute_once=0 '\000') at execMain.c:1722
#19 0x00000000006edbc4 in standard_ExecutorRun (queryDesc=0x2825130, direction=ForwardScanDirection, count=2000, execute_once=0 '\000') at execMain.c:363
#20 0x00000000006ed9de in ExecutorRun (queryDesc=0x2825130, direction=ForwardScanDirection, count=2000, execute_once=0 '\000') at execMain.c:306
#21 0x00000000008d0dd7 in PortalRunSelect (portal=0x27f70a8, forward=1 '\001', count=2000, dest=0x27ffc78) at pquery.c:932
#22 0x00000000008d200f in DoPortalRunFetch (portal=0x27f70a8, fdirection=FETCH_FORWARD, count=2000, dest=0x27ffc78) at pquery.c:1675
#23 0x00000000008d19df in PortalRunFetch (portal=0x27f70a8, fdirection=FETCH_FORWARD, count=2000, dest=0x27ffc78) at pquery.c:1434
#24 0x00000000006833bb in PerformPortalFetch (stmt=0x2888570, dest=0x27ffc78, completionTag=0x7ffc5b73f0f0 "") at portalcmds.c:199
#25 0x00000000008d2ab6 in standard_ProcessUtility (pstmt=0x28888d0, queryString=0x2887b30 "FETCH FORWARD 2000 FROM crash_dude;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x27ffc78, completionTag=0x7ffc5b73f0f0 "") at utility.c:527
#26 0x00007eff42829eb6 in pglogical_ProcessUtility (pstmt=0x28888d0, queryString=0x2887b30 "FETCH FORWARD 2000 FROM crash_dude;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x27ffc78, completionTag=0x7ffc5b73f0f0 "") at pglogical_executor.c:279
#27 0x00000000008d2547 in ProcessUtility (pstmt=0x28888d0, queryString=0x2887b30 "FETCH FORWARD 2000 FROM crash_dude;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x27ffc78, completionTag=0x7ffc5b73f0f0 "") at utility.c:353
#28 0x00000000008d141b in PortalRunUtility (portal=0x27f6f90, pstmt=0x28888d0, isTopLevel=1 '\001', setHoldSnapshot=1 '\001', dest=0x27ffc78, completionTag=0x7ffc5b73f0f0 "") at pquery.c:1178
#29 0x00000000008d1119 in FillPortalStore (portal=0x27f6f90, isTopLevel=1 '\001') at pquery.c:1038
#30 0x00000000008d09a1 in PortalRun (portal=0x27f6f90, count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001', dest=0x28889c8, altdest=0x28889c8, completionTag=0x7ffc5b73f350 "") at pquery.c:768
#31 0x00000000008c9f67 in exec_simple_query (query_string=0x2887b30 "FETCH FORWARD 2000 FROM crash_dude;") at postgres.c:1099
#32 0x00000000008cea3c in PostgresMain (argc=1, argv=0x2804e50, dbname=0x2804e28 "<foo_db>", username=0x2804e08 "jfinzel") at postgres.c:4088
#33 0x000000000082369b in BackendRun (port=0x2801170) at postmaster.c:4405
#34 0x0000000000822d02 in BackendStartup (port=0x2801170) at postmaster.c:4077
#35 0x000000000081ee31 in ServerLoop () at postmaster.c:1755
#36 0x000000000081e2d9 in PostmasterMain (argc=3, argv=0x27d79a0) at postmaster.c:1363
#37 0x0000000000751669 in main (argc=3, argv=0x27d79a0) at main.c:228

Here is my compile script that I used to compile 10.5 (at commit 4191e37a9a1fb598267c445c717914012d9bc423) and run.  The cluster with said issue uses extensions compiled below as well:

$ cat make_postgres
#!/bin/bash

set -eu

dirname=$1

instdir=/usr/lib/postgresql/$dirname

# Install Postgres
export PATH=$instdir/bin:/home/jfinzel/bin:/home/jfinzel/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
sudo mkdir $instdir

# This is my directory with source code from git commit 4191e37a9a1fb598267c445c717914012d9bc423
cd ~/postgres_source/postgres
./configure --prefix=$instdir --enable-cassert --enable-debug CFLAGS="-ggdb -g3 -fno-omit-frame-pointer -fPIC"
make
sudo "PATH=$PATH" make install

# Contrib
cd contrib/btree_gist/
sudo "PATH=$PATH" make install
cd ../test_decoding/
sudo "PATH=$PATH" make install

# Install Pglogical
cd /usr/src/pglogical-2.2.1
sudo "PATH=$PATH" make clean
sudo "PATH=$PATH" make install

# Install Extensions
cd $HOME/pgl_ddl_deploy
make clean
sudo "PATH=$PATH" make install
cd $HOME/pglogical_ticker
make clean
sudo "PATH=$PATH" make install
cd $HOME/pg_fact_loader
make clean
sudo "PATH=$PATH" make install

$ ./make_postgres 10.5

Thanks!
Jeremy

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Segmentation Fault in logical decoding get/peek API