Обсуждение: Recursive query performance issue
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
On Tue, Oct 20, 2015 at 12:34 PM, Jamie Koceniak <jkoceniak@mediamath.com> wrote: > Version: > > ----------------------------------------------------------------------------------------------- > > PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian > 4.7.2-5) 4.7.2, 64-bit > > Query Plan > > http://explain.depesz.com/s/4s37 > > Normally, this query takes around 200-300 ms to execute. > > However when several queries are run concurrently, query performance drops > to 30-60 seconds. Please define 'several'. Essential information here is a capture of 'top' and possibly 'perf top'. Also if the problem is storage related iostat can be very useful (or vmstat in a pinch) FYI you can use pgbench with -f mode to measure concurrency performance of any query. The very first thing to rule out is a storage bottleneck via measured iowait. Assuming that's the case, this problem is interesting if: *) Scaling is much worse than it should be *) You can confirm this on more modern postgres (interesting problems are only interesting if they are unsolved) merlin
Hi Pavel,
Thanks for the reply.
1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.
2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)
Thanks,
Jamie
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
Hi
2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel
Hi Pavel,
Or were you referring to SHMMAX?
Thanks
From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue
Ok
df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 406G 0 406G 0% /run/shm
Ok I will try lowering it.
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,
Thanks for the reply.
1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.
2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)
so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(
What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers
Regards
Pavel
Thanks,
Jamie
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
Hi
2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel
adama_prod=# SHOW shared_buffers;
shared_buffers
----------------
64GB
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:26 PM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,
Or were you referring to SHMMAX?
value of shared_buffers - run SQL statements SHOW shared_buffers;
Regards
Pavel
Thanks
From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue
Ok
df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 406G 0 406G 0% /run/shm
Ok I will try lowering it.
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,
Thanks for the reply.
1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.
2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)
so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(
What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers
Regards
Pavel
Thanks,
Jamie
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
Hi
2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel
On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>: >> >> adama_prod=# SHOW shared_buffers; >> >> shared_buffers >> >> ---------------- >> >> 64GB > > > can you try to increase shared buffers to 200GB and decrease effective cache > size to 180GB? If it is possibly - I am not sure, if this setting is good > fro production usage, but the result can be interesting for bottleneck > identification. we need to see a snapshot from *) top *) perf top merlin
Is the concurrency the cause or the result of the slowdown?Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
Are you executing the same query with the same parameters or do the parameters differ, perhaps making PostgreSQL
choose different queryplan?
http://www.postgresql.org/message-id/20131206095629.GI7814@awork2.anarazel.de
Hi,
We just had the performance problem again today.
Here is some of the top output. Unfortunately, we don't have perf top installed.
top - 16:22:16 up 29 days, 13:00, 2 users, load average: 164.63, 158.62, 148.52
Tasks: 1369 total, 181 running, 1188 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.2 us, 0.7 sy, 0.0 ni, 93.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem: 2068265 total, 433141 used, 1635124 free, 586 buffers
MiB Swap: 7812 total, 0 used, 7812 free, 412641 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
81745 postgres 20 0 65.7g 51m 34m R 101 0.0 0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
81782 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
81797 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
67103 postgres 20 0 65.7g 81m 56m R 97 0.0 2:01.89 postgres: user1 db 0.0.0.4(46337) SELECT
82527 postgres 20 0 65.7g 25m 20m R 93 0.0 0:02.35 postgres: user1 db 0.0.0.2(52490) SELECT
82559 postgres 20 0 65.7g 25m 20m R 93 0.0 0:02.17 postgres: user1 db 0.0.0.2(52496) SELECT
82728 postgres 20 0 65.7g 80m 76m R 93 0.0 0:00.60 postgres: user1 db 0.0.0.6(60957) SELECT
65588 postgres 20 0 65.7g 76m 56m R 89 0.0 2:12.27 postgres: user1 db 0.0.0.6(57195) SELECT
80594 postgres 20 0 65.7g 34m 28m R 89 0.0 0:22.81 postgres: user1 db 0.0.0.2(52071) SELECT
25176 postgres 20 0 65.7g 74m 57m R 85 0.0 7:24.42 postgres: user1 db 0.0.0.2(39410) SELECT
82182 postgres 20 0 65.7g 513m 502m R 85 0.0 0:04.85 postgres: user1 db 0.0.0.4(49789) SELECT
82034 postgres 20 0 65.7g 523m 510m R 81 0.0 0:05.79 postgres: user1 db 0.0.0.3(44683) SELECT
82439 postgres 20 0 65.7g 262m 258m R 81 0.0 0:02.64 postgres: user1 db 0.0.0.6(60887) SELECT
82624 postgres 20 0 65.7g 148m 143m R 81 0.0 0:01.20 postgres: user1 db 0.0.0.4(49888) SELECT
82637 postgres 20 0 65.7g 139m 134m R 81 0.0 0:01.17 postgres: user1 db 0.0.0.3(44805) SELECT
82669 postgres 20 0 65.7g 119m 114m R 81 0.0 0:00.97 postgres: user1 db 0.0.0.6(60939) SELECT
82723 postgres 20 0 65.7g 79m 75m R 81 0.0 0:00.56 postgres: user1 db 0.0.0.4(49907) SELECT
29160 postgres 20 0 65.7g 79m 54m R 77 0.0 6:52.13 postgres: user1 db 0.0.0.6(48802) SELECT
51095 postgres 20 0 65.7g 81m 57m R 77 0.0 4:01.51 postgres: user1 db 0.0.0.4(42914) SELECT
81833 postgres 20 0 65.7g 528m 515m R 77 0.0 0:07.23 postgres: user1 db 0.0.0.3(44644) SELECT
81978 postgres 20 0 65.7g 528m 515m R 77 0.0 0:06.05 postgres: user1 db 0.0.0.2(52364) SELECT
82099 postgres 20 0 65.7g 523m 510m R 77 0.0 0:05.18 postgres: user1 db 0.0.0.3(44692) SELECT
82111 postgres 20 0 65.7g 523m 510m R 77 0.0 0:05.14 postgres: user1 db 0.0.0.4(49773) SELECT
82242 postgres 20 0 65.7g 433m 429m R 77 0.0 0:04.27 postgres: user1 db 0.0.0.2(52428) SELECT
82292 postgres 20 0 65.7g 407m 402m R 77 0.0 0:04.10 postgres: user1 db 0.0.0.2(52440) SELECT
82408 postgres 20 0 65.7g 292m 288m R 77 0.0 0:02.98 postgres: user1 db 0.0.0.4(49835) SELECT
82542 postgres 20 0 65.7g 207m 202m R 77 0.0 0:01.98 postgres: user1 db 0.0.0.4(49868) SELECT
63638 postgres 20 0 65.7g 80m 56m R 73 0.0 2:30.10 postgres: user1 db 0.0.0.2(48699) SELECT
71572 postgres 20 0 65.7g 80m 56m R 73 0.0 1:31.13 postgres: user1 db 0.0.0.2(50223) SELECT
80580 postgres 20 0 65.7g 34m 28m R 73 0.0 0:22.93 postgres: user1 db 0.0.0.2(52065) SELECT
81650 postgres 20 0 65.8g 622m 555m R 73 0.0 0:08.84 postgres: user1 db 0.0.0.2(52290) SELECT
81728 postgres 20 0 65.7g 523m 510m R 73 0.0 0:08.28 postgres: user1 db 0.0.0.4(49684) SELECT
81942 postgres 20 0 65.7g 528m 515m R 73 0.0 0:06.46 postgres: user1 db 0.0.0.2(52355) SELECT
81958 postgres 20 0 65.7g 528m 514m R 73 0.0 0:06.48 postgres: user1 db 0.0.0.4(49744) SELECT
81980 postgres 20 0 65.7g 528m 515m R 73 0.0 0:06.02 postgres: user1 db 0.0.0.3(44671) SELECT
82007 postgres 20 0 65.7g 523m 510m R 73 0.0 0:06.27 postgres: user1 db 0.0.0.3(44676) SELECT
82374 postgres 20 0 65.7g 367m 362m R 73 0.0 0:03.48 postgres: user1 db 0.0.0.6(60873) SELECT
82385 postgres 20 0 65.7g 310m 306m R 73 0.0 0:03.03 postgres: user1 db 0.0.0.6(60876) SELECT
82520 postgres 20 0 65.7g 220m 215m R 73 0.0 0:02.00 postgres: user1 db 0.0.0.3(44785) SELECT
82676 postgres 20 0 65.7g 116m 111m R 73 0.0 0:00.90 postgres: user1 db 0.0.0.2(52531) SELECT
18471 postgres 20 0 65.7g 73m 56m R 69 0.0 8:14.08 postgres: user1 db 0.0.0.6(46144) SELECT
43890 postgres 20 0 65.7g 76m 56m R 69 0.0 5:04.46 postgres: user1 db 0.0.0.3(36697) SELECT
46130 postgres 20 0 65.7g 70m 57m R 69 0.0 4:46.56 postgres: user1 db 0.0.0.4(41871) SELECT
55604 postgres 20 0 65.7g 81m 57m R 69 0.0 3:27.67 postgres: user1 db 0.0.0.3(39292) SELECT
59139 postgres 20 0 65.7g 81m 57m R 69 0.0 3:01.18 postgres: user1 db 0.0.0.2(47670) SELECT
63523 postgres 20 0 65.7g 80m 56m R 69 0.0 2:28.04 postgres: user1 db 0.0.0.2(48680) SELECT
81707 postgres 20 0 65.7g 528m 515m S 69 0.0 0:08.44 postgres: user1 db 0.0.0.6(60737) SELECT
81830 postgres 20 0 65.7g 523m 510m R 69 0.0 0:07.60 postgres: user1 db 0.0.0.4(49707) SELECT
81932 postgres 20 0 65.7g 528m 515m R 69 0.0 0:06.65 postgres: user1 db 0.0.0.2(52352) SELECT
81950 postgres 20 0 65.7g 528m 515m R 69 0.0 0:05.92 postgres: user1 db 0.0.0.6(60783) SELECT
81973 postgres 20 0 65.7g 522m 510m R 69 0.0 0:06.18 postgres: user1 db 0.0.0.6(60789) SELECT
82193 postgres 20 0 65.7g 487m 479m R 69 0.0 0:04.61 postgres: user1 db 0.0.0.2(52415) SELECT
82358 postgres 20 0 65.7g 299m 295m R 69 0.0 0:03.11 postgres: user1 db 0.0.0.2(52453) SELECT
82372 postgres 20 0 65.7g 318m 313m R 69 0.0 0:03.22 postgres: user1 db 0.0.0.4(49827) SELECT
82381 postgres 20 0 65.7g 331m 326m R 69 0.0 0:03.30 postgres: user1 db 0.0.0.3(44757) SELECT
82404 postgres 20 0 65.7g 294m 289m R 69 0.0 0:02.86 postgres: user1 db 0.0.0.3(44761) SELECT
82415 postgres 20 0 65.7g 270m 266m R 69 0.0 0:02.80 postgres: user1 db 0.0.0.3(44767) SELECT
82521 postgres 20 0 65.7g 209m 205m R 69 0.0 0:02.00 postgres: user1 db 0.0.0.3(44786) SELECT
82526 postgres 20 0 65.7g 35m 29m R 69 0.0 0:01.20 postgres: user1 db 0.0.0.6(60906) SELECT
82550 postgres 20 0 65.7g 188m 184m R 69 0.0 0:01.72 postgres: user1 db 0.0.0.4(49870) SELECT
82587 postgres 20 0 65.7g 183m 178m R 69 0.0 0:01.64 postgres: user1 db 0.0.0.4(49882) SELECT
82683 postgres 20 0 65.7g 97m 93m R 69 0.0 0:00.77 postgres: user1 db 0.0.0.4(49899) SELECT
82685 postgres 20 0 65.7g 103m 99m R 69 0.0 0:00.84 postgres: user1 db 0.0.0.2(52532) SELECT
82687 postgres 20 0 65.7g 109m 104m R 69 0.0 0:00.85 postgres: user1 db 0.0.0.3(44809) SELECT
82712 postgres 20 0 65.7g 68m 64m R 69 0.0 0:00.55 postgres: user1 db 0.0.0.3(44814) SELECT
82715 postgres 20 0 65.7g 75m 70m R 69 0.0 0:00.58 postgres: user1 db 0.0.0.4(49905) SELECT
19548 postgres 20 0 65.7g 79m 56m R 65 0.0 8:02.44 postgres: user1 db 0.0.0.2(37887) SELECT
36714 postgres 20 0 65.7g 80m 56m R 65 0.0 5:56.08 postgres: user1 db 0.0.0.3(35177) SELECT
43599 postgres 20 0 65.7g 80m 56m R 65 0.0 5:05.03 postgres: user1 db 0.0.0.3(36638) SELECT
-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Wednesday, October 21, 2015 12:50 PM
To: Pavel Stehule
Cc: Jamie Koceniak; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
>>
>> adama_prod=# SHOW shared_buffers;
>>
>> shared_buffers
>>
>> ----------------
>>
>> 64GB
>
>
> can you try to increase shared buffers to 200GB and decrease effective
> cache size to 180GB? If it is possibly - I am not sure, if this
> setting is good fro production usage, but the result can be
> interesting for bottleneck identification.
we need to see a snapshot from
*) top
*) perf top
merlin
Hi Pavel,
Thanks for the reply.
1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.
2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)
Thanks,
Jamie
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
Hi
2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel
Ok
df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 406G 0 406G 0% /run/shm
Ok I will try lowering it.
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,
Thanks for the reply.
1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.
2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)
so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(
What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers
Regards
Pavel
Thanks,
Jamie
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
Hi
2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel
Hi Pavel,
Or were you referring to SHMMAX?
Thanks
From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue
Ok
df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 406G 0 406G 0% /run/shm
Ok I will try lowering it.
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,
Thanks for the reply.
1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.
2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)
so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(
What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers
Regards
Pavel
Thanks,
Jamie
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
Hi
2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel
adama_prod=# SHOW shared_buffers;
shared_buffers
----------------
64GB
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:26 PM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,
Or were you referring to SHMMAX?
value of shared_buffers - run SQL statements SHOW shared_buffers;
Regards
Pavel
Thanks
From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue
Ok
df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 406G 0 406G 0% /run/shm
Ok I will try lowering it.
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,
Thanks for the reply.
1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.
2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)
so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(
What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers
Regards
Pavel
Thanks,
Jamie
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue
Hi
2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
Query Plan
http://explain.depesz.com/s/4s37
Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.
there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel
Hi, We just had the performance problem again today. Here is some of the top output. Unfortunately, we don't have perf top installed. top - 16:22:16 up 29 days, 13:00, 2 users, load average: 164.63, 158.62, 148.52 Tasks: 1369 total, 181 running, 1188 sleeping, 0 stopped, 0 zombie %Cpu(s): 6.2 us, 0.7 sy, 0.0 ni, 93.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem: 2068265 total, 433141 used, 1635124 free, 586 buffers MiB Swap: 7812 total, 0 used, 7812 free, 412641 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 81745 postgres 20 0 65.7g 51m 34m R 101 0.0 0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT 81782 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT 81797 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT 67103 postgres 20 0 65.7g 81m 56m R 97 0.0 2:01.89 postgres: user1 db 0.0.0.4(46337) SELECT 82527 postgres 20 0 65.7g 25m 20m R 93 0.0 0:02.35 postgres: user1 db 0.0.0.2(52490) SELECT 82559 postgres 20 0 65.7g 25m 20m R 93 0.0 0:02.17 postgres: user1 db 0.0.0.2(52496) SELECT 82728 postgres 20 0 65.7g 80m 76m R 93 0.0 0:00.60 postgres: user1 db 0.0.0.6(60957) SELECT 65588 postgres 20 0 65.7g 76m 56m R 89 0.0 2:12.27 postgres: user1 db 0.0.0.6(57195) SELECT 80594 postgres 20 0 65.7g 34m 28m R 89 0.0 0:22.81 postgres: user1 db 0.0.0.2(52071) SELECT 25176 postgres 20 0 65.7g 74m 57m R 85 0.0 7:24.42 postgres: user1 db 0.0.0.2(39410) SELECT 82182 postgres 20 0 65.7g 513m 502m R 85 0.0 0:04.85 postgres: user1 db 0.0.0.4(49789) SELECT 82034 postgres 20 0 65.7g 523m 510m R 81 0.0 0:05.79 postgres: user1 db 0.0.0.3(44683) SELECT 82439 postgres 20 0 65.7g 262m 258m R 81 0.0 0:02.64 postgres: user1 db 0.0.0.6(60887) SELECT 82624 postgres 20 0 65.7g 148m 143m R 81 0.0 0:01.20 postgres: user1 db 0.0.0.4(49888) SELECT 82637 postgres 20 0 65.7g 139m 134m R 81 0.0 0:01.17 postgres: user1 db 0.0.0.3(44805) SELECT 82669 postgres 20 0 65.7g 119m 114m R 81 0.0 0:00.97 postgres: user1 db 0.0.0.6(60939) SELECT 82723 postgres 20 0 65.7g 79m 75m R 81 0.0 0:00.56 postgres: user1 db 0.0.0.4(49907) SELECT 29160 postgres 20 0 65.7g 79m 54m R 77 0.0 6:52.13 postgres: user1 db 0.0.0.6(48802) SELECT 51095 postgres 20 0 65.7g 81m 57m R 77 0.0 4:01.51 postgres: user1 db 0.0.0.4(42914) SELECT 81833 postgres 20 0 65.7g 528m 515m R 77 0.0 0:07.23 postgres: user1 db 0.0.0.3(44644) SELECT 81978 postgres 20 0 65.7g 528m 515m R 77 0.0 0:06.05 postgres: user1 db 0.0.0.2(52364) SELECT 82099 postgres 20 0 65.7g 523m 510m R 77 0.0 0:05.18 postgres: user1 db 0.0.0.3(44692) SELECT 82111 postgres 20 0 65.7g 523m 510m R 77 0.0 0:05.14 postgres: user1 db 0.0.0.4(49773) SELECT 82242 postgres 20 0 65.7g 433m 429m R 77 0.0 0:04.27 postgres: user1 db 0.0.0.2(52428) SELECT 82292 postgres 20 0 65.7g 407m 402m R 77 0.0 0:04.10 postgres: user1 db 0.0.0.2(52440) SELECT 82408 postgres 20 0 65.7g 292m 288m R 77 0.0 0:02.98 postgres: user1 db 0.0.0.4(49835) SELECT 82542 postgres 20 0 65.7g 207m 202m R 77 0.0 0:01.98 postgres: user1 db 0.0.0.4(49868) SELECT 63638 postgres 20 0 65.7g 80m 56m R 73 0.0 2:30.10 postgres: user1 db 0.0.0.2(48699) SELECT 71572 postgres 20 0 65.7g 80m 56m R 73 0.0 1:31.13 postgres: user1 db 0.0.0.2(50223) SELECT 80580 postgres 20 0 65.7g 34m 28m R 73 0.0 0:22.93 postgres: user1 db 0.0.0.2(52065) SELECT 81650 postgres 20 0 65.8g 622m 555m R 73 0.0 0:08.84 postgres: user1 db 0.0.0.2(52290) SELECT 81728 postgres 20 0 65.7g 523m 510m R 73 0.0 0:08.28 postgres: user1 db 0.0.0.4(49684) SELECT 81942 postgres 20 0 65.7g 528m 515m R 73 0.0 0:06.46 postgres: user1 db 0.0.0.2(52355) SELECT 81958 postgres 20 0 65.7g 528m 514m R 73 0.0 0:06.48 postgres: user1 db 0.0.0.4(49744) SELECT 81980 postgres 20 0 65.7g 528m 515m R 73 0.0 0:06.02 postgres: user1 db 0.0.0.3(44671) SELECT 82007 postgres 20 0 65.7g 523m 510m R 73 0.0 0:06.27 postgres: user1 db 0.0.0.3(44676) SELECT 82374 postgres 20 0 65.7g 367m 362m R 73 0.0 0:03.48 postgres: user1 db 0.0.0.6(60873) SELECT 82385 postgres 20 0 65.7g 310m 306m R 73 0.0 0:03.03 postgres: user1 db 0.0.0.6(60876) SELECT 82520 postgres 20 0 65.7g 220m 215m R 73 0.0 0:02.00 postgres: user1 db 0.0.0.3(44785) SELECT 82676 postgres 20 0 65.7g 116m 111m R 73 0.0 0:00.90 postgres: user1 db 0.0.0.2(52531) SELECT 18471 postgres 20 0 65.7g 73m 56m R 69 0.0 8:14.08 postgres: user1 db 0.0.0.6(46144) SELECT 43890 postgres 20 0 65.7g 76m 56m R 69 0.0 5:04.46 postgres: user1 db 0.0.0.3(36697) SELECT 46130 postgres 20 0 65.7g 70m 57m R 69 0.0 4:46.56 postgres: user1 db 0.0.0.4(41871) SELECT 55604 postgres 20 0 65.7g 81m 57m R 69 0.0 3:27.67 postgres: user1 db 0.0.0.3(39292) SELECT 59139 postgres 20 0 65.7g 81m 57m R 69 0.0 3:01.18 postgres: user1 db 0.0.0.2(47670) SELECT 63523 postgres 20 0 65.7g 80m 56m R 69 0.0 2:28.04 postgres: user1 db 0.0.0.2(48680) SELECT 81707 postgres 20 0 65.7g 528m 515m S 69 0.0 0:08.44 postgres: user1 db 0.0.0.6(60737) SELECT 81830 postgres 20 0 65.7g 523m 510m R 69 0.0 0:07.60 postgres: user1 db 0.0.0.4(49707) SELECT 81932 postgres 20 0 65.7g 528m 515m R 69 0.0 0:06.65 postgres: user1 db 0.0.0.2(52352) SELECT 81950 postgres 20 0 65.7g 528m 515m R 69 0.0 0:05.92 postgres: user1 db 0.0.0.6(60783) SELECT 81973 postgres 20 0 65.7g 522m 510m R 69 0.0 0:06.18 postgres: user1 db 0.0.0.6(60789) SELECT 82193 postgres 20 0 65.7g 487m 479m R 69 0.0 0:04.61 postgres: user1 db 0.0.0.2(52415) SELECT 82358 postgres 20 0 65.7g 299m 295m R 69 0.0 0:03.11 postgres: user1 db 0.0.0.2(52453) SELECT 82372 postgres 20 0 65.7g 318m 313m R 69 0.0 0:03.22 postgres: user1 db 0.0.0.4(49827) SELECT 82381 postgres 20 0 65.7g 331m 326m R 69 0.0 0:03.30 postgres: user1 db 0.0.0.3(44757) SELECT 82404 postgres 20 0 65.7g 294m 289m R 69 0.0 0:02.86 postgres: user1 db 0.0.0.3(44761) SELECT 82415 postgres 20 0 65.7g 270m 266m R 69 0.0 0:02.80 postgres: user1 db 0.0.0.3(44767) SELECT 82521 postgres 20 0 65.7g 209m 205m R 69 0.0 0:02.00 postgres: user1 db 0.0.0.3(44786) SELECT 82526 postgres 20 0 65.7g 35m 29m R 69 0.0 0:01.20 postgres: user1 db 0.0.0.6(60906) SELECT 82550 postgres 20 0 65.7g 188m 184m R 69 0.0 0:01.72 postgres: user1 db 0.0.0.4(49870) SELECT 82587 postgres 20 0 65.7g 183m 178m R 69 0.0 0:01.64 postgres: user1 db 0.0.0.4(49882) SELECT 82683 postgres 20 0 65.7g 97m 93m R 69 0.0 0:00.77 postgres: user1 db 0.0.0.4(49899) SELECT 82685 postgres 20 0 65.7g 103m 99m R 69 0.0 0:00.84 postgres: user1 db 0.0.0.2(52532) SELECT 82687 postgres 20 0 65.7g 109m 104m R 69 0.0 0:00.85 postgres: user1 db 0.0.0.3(44809) SELECT 82712 postgres 20 0 65.7g 68m 64m R 69 0.0 0:00.55 postgres: user1 db 0.0.0.3(44814) SELECT 82715 postgres 20 0 65.7g 75m 70m R 69 0.0 0:00.58 postgres: user1 db 0.0.0.4(49905) SELECT 19548 postgres 20 0 65.7g 79m 56m R 65 0.0 8:02.44 postgres: user1 db 0.0.0.2(37887) SELECT 36714 postgres 20 0 65.7g 80m 56m R 65 0.0 5:56.08 postgres: user1 db 0.0.0.3(35177) SELECT 43599 postgres 20 0 65.7g 80m 56m R 65 0.0 5:05.03 postgres: user1 db 0.0.0.3(36638) SELECT -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: Wednesday, October 21, 2015 12:50 PM To: Pavel Stehule Cc: Jamie Koceniak; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Recursive query performance issue On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>: >> >> adama_prod=# SHOW shared_buffers; >> >> shared_buffers >> >> ---------------- >> >> 64GB > > > can you try to increase shared buffers to 200GB and decrease effective > cache size to 180GB? If it is possibly - I am not sure, if this > setting is good fro production usage, but the result can be > interesting for bottleneck identification. we need to see a snapshot from *) top *) perf top merlin
On Fri, Oct 23, 2015 at 12:45 PM, Jamie Koceniak <jkoceniak@mediamath.com> wrote: > Hi, > > We just had the performance problem again today. > Here is some of the top output. Unfortunately, we don't have perf top installed. > > top - 16:22:16 up 29 days, 13:00, 2 users, load average: 164.63, 158.62, 148.52 > Tasks: 1369 total, 181 running, 1188 sleeping, 0 stopped, 0 zombie > %Cpu(s): 6.2 us, 0.7 sy, 0.0 ni, 93.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > MiB Mem: 2068265 total, 433141 used, 1635124 free, 586 buffers > MiB Swap: 7812 total, 0 used, 7812 free, 412641 cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 81745 postgres 20 0 65.7g 51m 34m R 101 0.0 0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT > 81782 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT > 81797 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT <snip> ok, this rules out iowait. load is 160+. system is reporting 6.2%user, 93.1%idle, 0 iowait. This is very odd. *) how many processors do you have? *) Can we have more details about the hardware platform? *) Is this system virtualized? If so, what solution? we need a perf top and a capture of 'vmstat 1' for context switches merlin
Had the issue again today. Here is vmstat : procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 24 0 0 1591718656 605656 499370336 0 0 0 371 0 0 7 1 93 0 25 0 0 1591701376 605656 499371936 0 0 0 600 13975 20168 20 1 79 0 26 0 0 1591654784 605656 499372064 0 0 0 5892 12725 14627 20 1 79 0 25 0 0 1591614336 605656 499372128 0 0 0 600 11665 12642 21 1 78 0 27 0 0 1591549952 605656 499372192 0 0 0 408 16939 23387 23 1 76 0 29 0 0 1591675392 605656 499372288 0 0 0 836 15380 22564 23 1 76 0 27 0 0 1591608704 605656 499372352 0 0 0 456 17593 27955 23 1 76 0 34 0 0 1591524608 605656 499372480 0 0 0 5904 18963 30915 23 1 75 0 23 0 0 1591632384 605656 499372576 0 0 0 704 18190 31002 22 1 77 0 25 0 0 1591551360 605656 499372640 0 0 0 944 12532 14095 21 1 78 0 24 0 0 1591613568 605656 499372704 0 0 0 416 11183 12553 20 1 79 0 23 0 0 1591531520 605656 499372768 0 0 0 400 12648 15540 19 1 80 0 22 0 0 1591510528 605656 499372800 0 0 0 6024 14670 21993 19 1 80 0 31 0 0 1591388800 605656 499372896 0 0 0 472 20605 28242 20 1 79 0 We have a 120 CPU server :) processor : 119 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: Monday, October 26, 2015 8:04 AM To: Jamie Koceniak Cc: Pavel Stehule; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Recursive query performance issue On Fri, Oct 23, 2015 at 12:45 PM, Jamie Koceniak <jkoceniak@mediamath.com> wrote: > Hi, > > We just had the performance problem again today. > Here is some of the top output. Unfortunately, we don't have perf top installed. > > top - 16:22:16 up 29 days, 13:00, 2 users, load average: 164.63, 158.62, 148.52 > Tasks: 1369 total, 181 running, 1188 sleeping, 0 stopped, 0 zombie > %Cpu(s): 6.2 us, 0.7 sy, 0.0 ni, 93.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > MiB Mem: 2068265 total, 433141 used, 1635124 free, 586 buffers > MiB Swap: 7812 total, 0 used, 7812 free, 412641 cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 81745 postgres 20 0 65.7g 51m 34m R 101 0.0 0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT > 81782 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT > 81797 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT <snip> ok, this rules out iowait. load is 160+. system is reporting 6.2%user, 93.1%idle, 0 iowait. This is very odd. *) how many processors do you have? *) Can we have more details about the hardware platform? *) Is this system virtualized? If so, what solution? we need a perf top and a capture of 'vmstat 1' for context switches merlin
On Sat, Nov 14, 2015 at 12:58 AM, Jamie Koceniak <jkoceniak@mediamath.com> wrote: > Had the issue again today. > > Here is vmstat : > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 24 0 0 1591718656 605656 499370336 0 0 0 371 0 0 7 1 93 0 > 25 0 0 1591701376 605656 499371936 0 0 0 600 13975 20168 20 1 79 0 > 26 0 0 1591654784 605656 499372064 0 0 0 5892 12725 14627 20 1 79 0 > 25 0 0 1591614336 605656 499372128 0 0 0 600 11665 12642 21 1 78 0 > 27 0 0 1591549952 605656 499372192 0 0 0 408 16939 23387 23 1 76 0 > 29 0 0 1591675392 605656 499372288 0 0 0 836 15380 22564 23 1 76 0 > 27 0 0 1591608704 605656 499372352 0 0 0 456 17593 27955 23 1 76 0 > 34 0 0 1591524608 605656 499372480 0 0 0 5904 18963 30915 23 1 75 0 > 23 0 0 1591632384 605656 499372576 0 0 0 704 18190 31002 22 1 77 0 > 25 0 0 1591551360 605656 499372640 0 0 0 944 12532 14095 21 1 78 0 > 24 0 0 1591613568 605656 499372704 0 0 0 416 11183 12553 20 1 79 0 > 23 0 0 1591531520 605656 499372768 0 0 0 400 12648 15540 19 1 80 0 > 22 0 0 1591510528 605656 499372800 0 0 0 6024 14670 21993 19 1 80 0 > 31 0 0 1591388800 605656 499372896 0 0 0 472 20605 28242 20 1 79 0 > > We have a 120 CPU server :) > > processor : 119 > vendor_id : GenuineIntel > cpu family : 6 > model : 62 > model name : Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz Per the numbers above. this server is very healthy. Something is not adding up here: I would really have liked to see a snapshot from 'top' and 'perf top' taken at the same time. Via top we could have seen if some of the processors were completely loaded down while some were not being utilized at all. This would suggest a problem with the operating system, likely NUMA related. *) Are you counting hyperthreading to get to the 120 cpu count *) Is this server virtualized *) what is the output of: lscpu | grep NUMA *) do you have 'taskset' installed? Can we check affinity via: taskset -c -p <pid> where <pid> is the pid of a few randomly sampled postgres processes at work *) Can you report exact kernel version *) what is output of: cat /sys/kernel/mm/transparent_hugepage/enabled cat /sys/kernel/mm/transparent_hugepage/defrag *) Is installing a newer postgres an option? Configuring highly SMP systems for reliable scaling may require some progressive thinking. merlin