weird hang while running in HS mode

Поиск
Список
Период
Сортировка
От Robert Haas
Тема weird hang while running in HS mode
Дата
Msg-id AANLkTikn06AkL7v_gh0zssmqoO_unmCCyfg5cDSzddK-@mail.gmail.com
обсуждение исходный текст
Ответы Re: weird hang while running in HS mode  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-hackers
While fooling around with Hot Standby today, I did this on the master:

rhaas=# begin work;
BEGIN
rhaas=# lock table pgbench_accounts;
LOCK TABLE

Then on slave I did this:

rhaas=# select * from pgbench_accounts;
ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.

The error message appeared immediately, although max_standby_delay is
set to 30s and both servers are running on the same machine.  IMHO,
that's a good example of why max_standby_delay in its current form is
not a good thing, but that's a topic for another thread.  I then did
this on the master:

rhaas=# rollback;
ROLLBACK

So at this point, one would think that there are no locks hanging
around anywhere.  Back to the standby:

rhaas=# select * from pgbench_accounts;
<really long hang>

I did this from another session on the standby:

rhaas=# select * from pg_stat_activity;datid | datname | procpid | usesysid | usename | application_name |
client_addr | client_port |         backend_start         |
xact_start           |          query_start          | waiting |  current_query

-------+---------+---------+----------+---------+------------------+-------------+-------------+-------------------------------+-------------------------------+-------------------------------+---------+---------------------------------16384
|rhaas   |    6706 |       10 | rhaas   | psql             |         |          -1 | 2010-05-12 14:07:21.73227-04  |
2010-05-12
14:08:02.657664-04 | 2010-05-12 14:08:02.657664-04 | t       | select
* from pgbench_accounts;16384 | rhaas   |    6719 |       10 | rhaas   | psql             |         |          -1 |
2010-05-1214:10:15.916115-04 | 2010-05-12
 
14:10:24.18013-04  | 2010-05-12 14:10:24.18013-04  | f       | select
* from pg_stat_activity;
(2 rows)

OK, so it's blocked on a lock.  Let's find out more.

[rhaas ~]$ gdb -p 6709
GNU gdb 6.3.50-20050815 (Apple version gdb-1461.2) (Fri Mar  5
04:43:10 UTC 2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin".
/Users/rhaas/6709: No such file or directory
Unable to access task for process-id 6709: (os/kern) failure.
(gdb) q
[rhaas ~]$ gdb -p 6706
GNU gdb 6.3.50-20050815 (Apple version gdb-1461.2) (Fri Mar  5
04:43:10 UTC 2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin".
/Users/rhaas/6706: No such file or directory
Attaching to process 6706.
Reading symbols for shared libraries . done
Reading symbols for shared libraries ................. done
0x00007fff838d014e in semop ()
(gdb) bt
#0  0x00007fff838d014e in semop ()
#1  0x00000001001ba1af in PGSemaphoreLock (sema=0x100d4db08,
interruptOK=1 '\001') at pg_sema.c:420
#2  0x00000001001f8782 in ProcSleep (locallock=0x1010d9568,
lockMethodTable=<value temporarily unavailable, due to optimizations>)
at proc.c:971
#3  0x00000001001f4c31 in WaitOnLock (locallock=0x1010d9568,
owner=0x101060628) at lock.c:1217
#4  0x00000001001f6254 in LockAcquireExtended (locktag=0x7fff5fbfdf00,
lockmode=1, sessionLock=<value temporarily unavailable, due to
optimizations>, dontWait=0 '\0', reportMemoryError=1 '\001') at
lock.c:836
#5  0x00000001001f36bf in LockRelationOid (relid=16420, lockmode=1) at lmgr.c:79
#6  0x00000001000242ba in relation_open (relationId=16420, lockmode=1)
at heapam.c:899
#7  0x00000001000244ac in try_heap_openrv (relation=<value temporarily
unavailable, due to optimizations>, lockmode=<value temporarily
unavailable, due to optimizations>) at heapam.c:1127
#8  0x00000001000c7f14 in parserOpenTable (pstate=0x101041c68,
relation=0x101041a38, lockmode=1) at parse_relation.c:829
#9  0x00000001000c8141 in addRangeTableEntry (pstate=0x101041c68,
relation=0x101041a38, alias=0x0, inh=1 '\001', inFromCl=1 '\001') at
parse_relation.c:895
#10 0x00000001000b8927 in transformTableEntry [inlined] () at
/Users/rhaas/pgsql/src/backend/parser/parse_clause.c:444
#11 0x00000001000b8927 in transformFromClauseItem (pstate=0x101041c68,
n=0x101041a38, top_rte=0x7fff5fbfe1b0, top_rti=0x7fff5fbfe1bc,
relnamespace=0x7fff5fbfe1a8, containedRels=0x7fff5fbfe1a0) at
parse_clause.c:676
#12 0x00000001000b939a in transformFromClause (pstate=0x101041c68,
frmList=<value temporarily unavailable, due to optimizations>) at
parse_clause.c:128
#13 0x000000010009eccb in transformSelectStmt [inlined] () at
/Users/rhaas/pgsql/src/backend/parser/analyze.c:800
#14 0x000000010009eccb in transformStmt (pstate=<value temporarily
unavailable, due to optimizations>, parseTree=0x101041af0) at
analyze.c:185
#15 0x00000001000a01a3 in parse_analyze (parseTree=0x101041af0,
sourceText=0x101040e38 "select * from pgbench_accounts;",
paramTypes=0x0, numParams=0) at analyze.c:93
#16 0x0000000100202cb4 in pg_analyze_and_rewrite
(parsetree=0x101041af0, query_string=0x101040e38 "select * from
pgbench_accounts;", paramTypes=0x0, numParams=0) at postgres.c:619
#17 0x0000000100203535 in exec_simple_query (query_string=0x101040e38
"select * from pgbench_accounts;") at postgres.c:977
#18 0x0000000100204241 in PostgresMain (argc=2, argv=<value
temporarily unavailable, due to optimizations>, username=<value
temporarily unavailable, due to optimizations>) at postgres.c:3854
#19 0x00000001001c653f in ServerLoop () at postmaster.c:3528
#20 0x00000001001c7432 in PostmasterMain (argc=3, argv=0x100808ac0) at
postmaster.c:1092
#21 0x0000000100162055 in main (argc=3, argv=0x100808ac0) at main.c:188
(gdb) bt full
#0  0x00007fff838d014e in semop ()
No symbol table info available.
#1  0x00000001001ba1af in PGSemaphoreLock (sema=0x100d4db08,
interruptOK=1 '\001') at pg_sema.c:420errStatus = <value temporarily unavailable, due to optimizations>sops = { sem_num
=3, sem_op = -1, sem_flg = 0
 
}
#2  0x00000001001f8782 in ProcSleep (locallock=0x1010d9568,
lockMethodTable=<value temporarily unavailable, due to optimizations>)
at proc.c:971lockmode = 1lock = (LOCK *) 0x100d2aad0proclock = (PROCLOCK *) 0x100d4a560hashcode = <value temporarily
unavailable,due to optimizations>partitionLock = 49myHeldLocks = <value temporarily unavailable, due to
optimizations>early_deadlock= <value temporarily unavailable, due to optimizations>allow_autovacuum_cancel = 1
'\001'myWaitStatus= 13806288proc = (PGPROC *) 0x100d2aaf8i = <value temporarily unavailable, due to optimizations>
 
#3  0x00000001001f4c31 in WaitOnLock (locallock=0x1010d9568,
owner=0x101060628) at lock.c:1217save_exception_stack = (sigjmp_buf *) 0x7fff5fbfe4a0save_context_stack =
(ErrorContextCallback*) 0x7fff5fbfdf90local_sigjmp_buf = {1606417070, 32767, 1606409760, 32767, 1606409536,
 
32767, 17172008, 1, 17667432, 1, 1, 0, 1606409984, 32767, 2051077, 1,
530, 0, 8064, 895, 16, 0, 2, 0, 0, 0, 17170328, 1, 0, 0, 128, 0,
1606409712, 32767, 13935968, 1, 1606409728, 32767}lockMethodTable = (LockMethod) 0x1004a9fc0new_status = 0x101041fb0
"SELECT"
#4  0x00000001001f6254 in LockAcquireExtended (locktag=0x7fff5fbfdf00,
lockmode=1, sessionLock=<value temporarily unavailable, due to
optimizations>, dontWait=0 '\0', reportMemoryError=1 '\001') at
lock.c:836lockmethodid = <value temporarily unavailable, due to optimizations>lockMethodTable = (LockMethod)
0x1004a9fc0localtag= { lock = {   locktag_field1 = 16384,   locktag_field2 = 16420,   locktag_field3 = 0,
locktag_field4= 0,   locktag_type = 0 '\0',   locktag_lockmethodid = 1 '\001' }, mode = 1 
}locallock = (LOCALLOCK *) 0x1010d9568lock = <value temporarily unavailable, due to optimizations>proclock = <value
temporarilyunavailable, due to optimizations>proclocktag = { myLock = 0x100d2aad0, myProc = 0x100d4daf8
 
}found = 0 '\0'owner = (ResourceOwner) 0x101060628hashcode = 534311893proclock_hashcode = 311778389partitionLock =
49status= <value temporarily unavailable, due to optimizations>
 
#5  0x00000001001f36bf in LockRelationOid (relid=16420, lockmode=1) at lmgr.c:79tag = { locktag_field1 = 16384,
locktag_field2= 16420, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 0 '\0', locktag_lockmethodid = 1 '\001'
 
}res = <value temporarily unavailable, due to optimizations>
#6  0x00000001000242ba in relation_open (relationId=16420, lockmode=1)
at heapam.c:899r = <value temporarily unavailable, due to optimizations>
#7  0x00000001000244ac in try_heap_openrv (relation=<value temporarily
unavailable, due to optimizations>, lockmode=<value temporarily
unavailable, due to optimizations>) at heapam.c:1127r = <value temporarily unavailable, due to optimizations>
#8  0x00000001000c7f14 in parserOpenTable (pstate=0x101041c68,
relation=0x101041a38, lockmode=1) at parse_relation.c:829rel = <value temporarily unavailable, due to
optimizations>pcbstate= { pstate = 0x101041c68, location = 14, errcontext = {   previous = 0x0,   callback =
0x1000c4810<pcb_error_callback>,   arg = 0x7fff5fbfdf80 }
 
}
#9  0x00000001000c8141 in addRangeTableEntry (pstate=0x101041c68,
relation=0x101041a38, alias=0x0, inh=1 '\001', inFromCl=1 '\001') at
parse_relation.c:895rte = <value temporarily unavailable, due to optimizations>refname = 0x101041a00
"pgbench_accounts"lockmode= <value temporarily unavailable, due to optimizations>rel = <value temporarily unavailable,
dueto optimizations>
 
#10 0x00000001000b8927 in transformTableEntry [inlined] () at
/Users/rhaas/pgsql/src/backend/parser/parse_clause.c:444r = (RangeVar *) Cannot access memory at address 0x0

When I detached gdb, the query immediately returned results, even
though it had been hung for a long time before that.

I have no idea what is going on here and have not checked to see
whether this is reproducible, but thought it was worth reporting
anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Tags missing from GIT mirror?
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: primary/secondary/master/slave/standby