Re: Waiting on ExclusiveLock on extension

Поиск
Список
Период
Сортировка
От Andomar
Тема Re: Waiting on ExclusiveLock on extension
Дата
Msg-id 55317984.30607@aule.net
обсуждение исходный текст
Ответ на Re: Waiting on ExclusiveLock on extension  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы Re: Waiting on ExclusiveLock on extension  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-general
> Yes, but did you have the same workload when you upgraded to 9.3 as
> you do today?

The workload is very similar. We upgraded from 9.1 to 9.3 only two
months ago, and our usage statistics have not changed much. There were
no "remaining connection slots are reserved for non-replication
superuser connections" messages in the weeks after the 9.3 upgrade.

> BTW, something else to be aware of: because you essentially re-loaded
> all your data in a single transaction, that means that a: you'll be
> doing a lot of hint bit updates until all data has been read a second
> time, and b: a lot of this data will come up for freezing at around
> the same time, creating a big chunk of work for autovacuum. That's
> caused problems for me in the past, though that was on a database that
> had a pretty high workload.

We run "VACUUM ALL" every night and although that gave a bit more CPU
and I/O but nothing major.

When the problem occurs, the symptoms are:
- spike to a very high load average (= CPU usage)
- messages about long waits on ExclusiveLock on extension appear
- remaining connection slots are reserved (with corresponding end user
impact)

An example of a message with 1 second wait duration and a lot of waiting
queries:

process 667 still waiting for ExclusiveLock on extension of relation
1249 of database 16406 after 1005.226 ms","Process holding the lock:
36279. Wait queue: 36725, 36405, 36511, 36721, 36280, 36048, 36566,
36636, 36466, 36734, 36723, 36621, 36423, 36931, 36720, 36429, 35500,
36735, 37015, 36717, 36938, 36870, 36732, 36587, 36869, 36285, 36573,
37101, 36937, 36414, 36834, 37105, 36867, 36724, 36991, 37102, 36882,
36802, 37163, 39964, 39723, 40044, 39821, 40150, 40218, 40203, 40054,
40060, 40173, 40091, 40174, 40058, 40658, 40370, 40177, 40920, 41085,
41103, 41117, 41154, 41161, 41066, 41053, 41380, 40661, 40632, 40698,
41242, 40681, 41174, 41328, 41075, 41245, 41326, 41523, 41153, 41170,
40543, 41314, 41526, 41490, 41157, 41353, 41472, 41730, 41546, 45087,
41535, 41474, 41362, 41450, 41948, 41929, 41459, 41508, 42117, 42127,
41950, 41922, 42414, 41939, 42565, 42643, 42242, 41796, 42324, 42358,
42411, 42487, 41758, 42120, 42570, 41820, 41925, 43356, 43381, 43360,
43351, 43364, 42336, 42871, 43007, 42455, 43363, 42287, 43336, 42652,
42803, 43567, 43706, 43795, 43630, 43716, 43810, 43596, 43061, 43954,
44014, 43377, 43366, 43825, 43454, 43840, 43582, 43839, 44009, 43842,
43693, 44320, 43824, 43456, 43852, 43863, 44708, 43848, 44255, 44587,
44936, 44915, 44759, 44700, 44948, 45051, 44808, 45189, 45137, 45037,
45303, 45294, 45710, 45711, 45755, 45660, 45120, 45576, 46221, 46125,
46703, 46512, 46399, 46684, 46762, 46373, 46929, 46443, 46817, 46858,
47017, 46886, 46805, 46890, 47593, 47548, 47272, 47454, 47906, 47462,
47801, 47939, 47994, 48098, 48363, 47815, 48393, 48250, 48213, 48470,
48408, 48509, 48499, 48115, 48448, 48877, 461, 560, 637, 48902, 641,
49019, 667, 877, 1306, 1070, 1265.",,,,"SQL statement ""CREATE TEMPORARY
TABLE

An example of a message with 140 second wait duration:

process 27173 acquired ExclusiveLock on extension of relation 16787 of
database 16406 after 138522.699 ms",,,,,"SQL statement ""INSERT INTO

There are about 50 queries a second, so a 10 second lock will exhaust
the number of available connections.

We keep sar logs, and they show an unusual amount of pgscand/s and
pgsteal/s around the time of the problem:

07:25:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
pgscand/s pgsteal/s    %vmeff
08:25:01 PM     29.93   1069.13 393727.33      0.00 170655.03 0.00
90.20     90.20    100.00
08:26:01 PM     20.14   1325.55 388751.51      0.00 157188.40 0.00
315.81    315.81    100.00
08:27:02 PM     59.33   1215.97 612453.98      0.00 271483.71 0.00
180.40    180.40    100.00
08:28:01 PM     43.28   1881.71 473067.50      0.00 236778.10 0.00
114.87    114.87    100.00
08:29:01 PM     23.49    665.73 406865.61      0.00 213178.17 0.00
293.11    293.11    100.00
08:30:01 PM     31.39   1317.51 448565.03      0.02 193320.76 0.00
225.88    225.83     99.98
-- Problem starts here
08:31:02 PM     21.16   1499.32 468467.44      0.10 211767.21 0.00
4311.87   4311.84    100.00
08:32:01 PM     10.19    648.87 261067.16      0.05 167231.84 0.00
1071.01   1070.98    100.00
08:33:01 PM     63.59    950.94 422101.19      0.22 242284.12 0.00
1243.37   1243.34    100.00
-- Problem ends here
08:34:01 PM     24.97   1321.61 412294.87      0.00 273734.03 0.00
0.00      0.00      0.00
08:35:01 PM     11.60   1094.22 353741.41      0.00 238541.73 0.00
0.00      0.00      0.00
08:36:01 PM     39.22    976.60 368450.80      0.10 240632.57 0.00
0.00      0.00      0.00
08:37:01 PM     19.83    967.31 320415.39      0.00 217557.42 0.00
0.00      0.00      0.00
08:38:01 PM     15.68   1884.09 301785.58      0.00 200274.51 0.00
0.00      0.00      0.00
08:39:01 PM     62.61    858.31 487099.01      0.00 330130.65 0.00
0.00      0.00      0.0

pgscand/s = Number of pages scanned directly per second.
pgsteal/s = Number of pages the system has reclaimed from cache
(pagecache and swapcache) per second to satisfy its memory demands.

Could the pgscand and pgsteal numbers provide a hint?  They're sometimes
zero for more than half an hour, so they don't seem related to checkpoints.

Kind regards,
Andomar



В списке pgsql-general по дате отправления:

Предыдущее
От: William Dunn
Дата:
Сообщение: Re: PL\pgSQL 'ERROR: invalid input syntax for type oid:' [PostgreSQL 9.3.6 and 9.4]
Следующее
От: Andreas Joseph Krogh
Дата:
Сообщение: Re: How to keep pg_largeobject from growing endlessly