[Help needed] Update query hangs forever on high load
От | Rural Hunter |
---|---|
Тема | [Help needed] Update query hangs forever on high load |
Дата | |
Msg-id | 53DB89FE.6040601@gmail.com обсуждение исходный текст |
Список | pgsql-admin |
Hi, I'm running postgresql 9.2.4 on Ubuntu 12.04 server. The server has 32 cores and 372G memory. pg is configured with shared_buffers = 16GB. This server has been running well for quite a long time. Recently, because of the data volume increasing on my main table, I partitioned it to about 60 children tables. The partition key is simple of int values of one column. Most of the queries are with partition key. But there are still some frequent concurrent update sqls run without partition key. The update sql uses conditions on unique index on children tables. Though I expected the performance of those updates will would decrease a bit after the partition as I tested, I encounterred an unexpected and extremely severe problem. The main update sql normally is quite quick and could complete in 1 seconds since it use the unique index on children table. But sometimes it seem hanging forever if I don't take any action. Since there are many same concurrent update sqls on different or same records, the update sql will finally accumulate and blocks of all other queries. When the hanging happens, the server load increases from less than 20 normally to several hundreds. All the cpus are utilized by those hanging queries. pg_cancel_backend and pg_terminate_backend don't work on the hanging update. I have to take some very abnormal action to recover the service, such as kill -9 the hanging process(yes, that kills the whole pg instance). I ran gdb to trace those hanging update sql and they all show like this: (gdb) bt #0 0x00007f8cea310db7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00000000005f97d3 in PGSemaphoreLock () #2 0x0000000000638153 in LWLockAcquire () #3 0x00000000004a90d0 in ginTraverseLock () #4 0x00000000004a9d0b in ginFindLeafPage () #5 0x00000000004a8377 in ginInsertItemPointers () #6 0x00000000004a4548 in ginEntryInsert () #7 0x00000000004ae687 in ginInsertCleanup () #8 0x00000000004af3d6 in ginHeapTupleFastInsert () #9 0x00000000004a4ab1 in gininsert () #10 0x0000000000709b15 in FunctionCall6Coll () #11 0x000000000047b6b7 in index_insert () #12 0x000000000057f475 in ExecInsertIndexTuples () #13 0x000000000058bf07 in ExecModifyTable () #14 0x00000000005766e3 in ExecProcNode () #15 0x0000000000575ad4 in standard_ExecutorRun () #16 0x000000000064718f in ProcessQuery () #17 0x00000000006473b7 in PortalRunMulti () #18 0x0000000000647e8a in PortalRun () #19 0x0000000000645160 in PostgresMain () #20 0x000000000060459e in ServerLoop () #21 0x00000000006053bc in PostmasterMain () #22 0x00000000005a686b in main () Since this never happened before I partitioned my main table, I suspect it is caused by the high load caused by those performance down-graded updates. The server load was around 10 before the partition and now it is around 20. I already tried to update the pgsql to the latest version 9.2.9 but it didn't help. I have posted this problem with progress to the performance mailing list but didn't recieve much useful responses: http://postgresql.1045698.n5.nabble.com/Very-slow-planning-performance-on-partition-table-td5812511.html Please help me to troubleshoot this severe problem. I will feedback any required information. Thanks in advance.
В списке pgsql-admin по дате отправления: