Re: Problem with ExclusiveLock on inserts

Поиск
Список
Период
Сортировка
От Ilya Kosmodemiansky
Тема Re: Problem with ExclusiveLock on inserts
Дата
Msg-id CABvo6pQLdz+0JB0GT+Oxh2XW0yz7ga-P7AyiYa5NJXeezcF+8A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Problem with ExclusiveLock on inserts  (Бородин Владимир <root@simply.name>)
Ответы Re: Problem with ExclusiveLock on inserts
Список pgsql-performance
On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <root@simply.name> wrote:
>
> Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When
theworkload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem
withlocks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the
numberof context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate
theproblem, but imho not create it. 


I'am afraid that is the problem. More than 1000 backends, most of them
are simply waiting.

>
>
> I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your
statement?


You need to fsync your huge shared buffers any time your database
performs checkpoint. By default it usually happens too often because
checkpoint_timeout is 5min by default. Without bbu, on software raid
that leads to io spike and you commit waits for wal.


>
> 12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
>
> another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running?
4000definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had
barrierenabled and software raid. 
>
> Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have
bbu,otherwise performance will be poor. 
>
>
> On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote:
>>
>> Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off
onone cluster shard to try. 
>>
>> root@rpopdb01e ~ # mount | fgrep data
>> /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
>> root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
>> root@rpopdb01e ~ # mount | fgrep data
>> /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)
>> root@rpopdb01e ~ #
>>
>> 12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
>>
>> My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious
bottleneckon io before disks are actually involved. What says mount without arguments? 
>>
>> On Feb 12, 2014, at 18:43, Бородин Владимир <root@simply.name> wrote:
>>
>> root@rpopdb01e ~ # fgrep data /etc/fstab
>> UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1
>> root@rpopdb01e ~ #
>>
>> According to iostat the disks are not the bottleneck.
>>
>> 12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
>>
>> Hi Vladimir,
>>
>> Just in case: how is your ext4 mount?
>>
>> Best regards,
>> Ilya
>>
>> On Feb 12, 2014, at 17:59, Бородин Владимир <root@simply.name> wrote:
>>
>> Hi all.
>>
>> Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When
ithappens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout). 
>>
>> 2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12
15:43:15MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation
26118of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl)
values(i_folder_id, i_uidl)"" 
>>
>> I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more
diagnostics.
>>
>> I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32
cores)and memory: 
>>
>> root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
>> model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
>> root@rpopdb01e ~ # free -m
>>              total       used       free     shared    buffers     cached
>> Mem:        129028     123558       5469          0        135     119504
>> -/+ buffers/cache:       3918     125110
>> Swap:        16378          0      16378
>> root@rpopdb01e ~ #
>>
>> PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now
PGDATAtakes only 95G. 
>> The settings changed in postgresql.conf are here [5].
>>
>> When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they
doa simple insert. 
>>
>>  (extend,26647,26825,,,,,,,) |        5459 | ExclusiveLock |     1 | (extend,26647,26825,,,,,,,) | 8053 |
ExclusiveLock| 5459,8053 
>>  (extend,26647,26828,,,,,,,) |        5567 | ExclusiveLock |     1 | (extend,26647,26828,,,,,,,) | 5490 |
ExclusiveLock| 5567,5490 
>>  (extend,24584,25626,,,,,,,) |        5611 | ExclusiveLock |     1 | (extend,24584,25626,,,,,,,) | 3963 |
ExclusiveLock| 5611,3963 
>>
>> I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many
oids.I suppose the only common thing for all of them is that they are bigger than others and they almost do not get
updatesand deletes (only inserts and reads). Some more info about one of such tables is here [7]. 
>>
>> I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be
taken,but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for
mostof them there are free pages but there are, for example, such cases: 
>>
>> rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;
>>  count
>> --------
>>  115953
>> (1 row)
>>
>> rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;
>>  count
>> -------
>>      0
>> (1 row)
>>
>> rpopdb_p0=# \dS+ rpop.rpop_uidl
>>                                Table "rpop.rpop_uidl"
>>  Column |          Type          | Modifiers | Storage  | Stats target | Description
>> --------+------------------------+-----------+----------+--------------+-------------
>>  popid  | bigint                 | not null  | plain    |              |
>>  uidl   | character varying(200) | not null  | extended |              |
>> Indexes:
>>     "pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)
>> Has OIDs: no
>>
>> rpopdb_p0=#
>>
>>
>> My questions are:
>> 1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock
acquiring?How can I dig it? 
>> 2. How much space do we extend to the relation when we get exclusive lock on it?
>> 3. Why extended page is not visible for other backends?
>> 4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS
CPUscheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And
soon for many backends. 
>> 5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have
anyideas what to do with this situation :( Any help would be really appropriate. 
>>
>> [1] http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc159@mail.gmail.com
>> [2] http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks
>> [3] http://www.postgresql.org/message-id/50A2C93E.9070607@dalibo.com
>> [4] http://www.postgresql.org/message-id/CAL_0b1sypYeOyNkYNV95nNV2d+4jXTug3HkKF6FahfW7Gvgb_Q@mail.gmail.com
>> [5] http://pastebin.com/raw.php?i=Bd40Vn6h
>> [6] http://wiki.postgresql.org/wiki/Lock_dependency_information
>> [7 http://pastebin.com/raw.php?i=eGrtG524]
>>
>> --
>> Vladimir
>>
>>
>>
>>
>>
>>
>> --
>> Vladimir
>>
>>
>>
>>
>>
>>
>> --
>> Да пребудет с вами сила...
>> http://simply.name
>>
>>
>>
>>
>>
>
>
>
> --
> Vladimir
>
>
>
>


В списке pgsql-performance по дате отправления:

Предыдущее
От: Бородин Владимир
Дата:
Сообщение: Re: Problem with ExclusiveLock on inserts
Следующее
От: Бородин Владимир
Дата:
Сообщение: Re: Problem with ExclusiveLock on inserts