[Help needed] Update query hangs forever on high load

Поиск
Список
Период
Сортировка
От Rural Hunter
Тема [Help needed] Update query hangs forever on high load
Дата
Msg-id 53DB89FE.6040601@gmail.com
обсуждение исходный текст
Список pgsql-admin
Hi,

I'm running postgresql 9.2.4 on Ubuntu 12.04 server. The server has 32
cores and 372G memory. pg is configured with shared_buffers = 16GB. This
server has been running well for quite a long time. Recently, because of
the data volume increasing on my main table, I partitioned it to about
60 children tables. The partition key is simple of int values of one
column. Most of the queries are with partition key. But there are still
some frequent concurrent update sqls run without partition key. The
update sql uses conditions on unique index on children tables.

Though I expected the performance of those updates will would decrease a
bit after the partition as I tested, I encounterred an unexpected and
extremely severe problem. The main update sql normally is quite quick
and could complete in 1 seconds since it use the unique index on
children table. But sometimes it seem hanging forever if I don't take
any action. Since there are many same concurrent update sqls on
different or same records, the update sql will finally accumulate and
blocks of all other queries. When the hanging happens, the server load
increases from less than 20 normally to several hundreds. All the cpus
are utilized by those hanging queries.

pg_cancel_backend and pg_terminate_backend don't work on the hanging
update. I have to take some very abnormal action to recover the service,
such as kill -9 the hanging process(yes, that kills the whole pg
instance). I ran gdb to trace those hanging update sql and they all show
like this:
(gdb) bt
#0  0x00007f8cea310db7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000005f97d3 in PGSemaphoreLock ()
#2  0x0000000000638153 in LWLockAcquire ()
#3  0x00000000004a90d0 in ginTraverseLock ()
#4  0x00000000004a9d0b in ginFindLeafPage ()
#5  0x00000000004a8377 in ginInsertItemPointers ()
#6  0x00000000004a4548 in ginEntryInsert ()
#7  0x00000000004ae687 in ginInsertCleanup ()
#8  0x00000000004af3d6 in ginHeapTupleFastInsert ()
#9  0x00000000004a4ab1 in gininsert ()
#10 0x0000000000709b15 in FunctionCall6Coll ()
#11 0x000000000047b6b7 in index_insert ()
#12 0x000000000057f475 in ExecInsertIndexTuples ()
#13 0x000000000058bf07 in ExecModifyTable ()
#14 0x00000000005766e3 in ExecProcNode ()
#15 0x0000000000575ad4 in standard_ExecutorRun ()
#16 0x000000000064718f in ProcessQuery ()
#17 0x00000000006473b7 in PortalRunMulti ()
#18 0x0000000000647e8a in PortalRun ()
#19 0x0000000000645160 in PostgresMain ()
#20 0x000000000060459e in ServerLoop ()
#21 0x00000000006053bc in PostmasterMain ()
#22 0x00000000005a686b in main ()

Since this never happened before I partitioned my main table, I suspect
it is caused by the high load caused by those performance down-graded
updates. The server load was around 10 before the partition and now it
is around 20. I already tried to update the pgsql to the latest version
9.2.9 but it didn't help. I have posted this problem with progress to
the performance mailing list but didn't recieve much useful responses:
http://postgresql.1045698.n5.nabble.com/Very-slow-planning-performance-on-partition-table-td5812511.html

Please help me to troubleshoot this severe problem. I will feedback any
required information. Thanks in advance.





В списке pgsql-admin по дате отправления:

Предыдущее
От: jesper@krogh.cc
Дата:
Сообщение: Re: Autovacuum not keeping up. (PG 9.2.9)
Следующее
От: Techie
Дата:
Сообщение: PITR WAL Restore and configuration