Обсуждение: Recursive query performance issue

Поиск

Список

Период

Сортировка

Recursive query performance issue

От

Jamie Koceniak

Дата:

21 октября 2015 г., 08:10:04

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan

http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

Re: Recursive query performance issue

От

Pavel Stehule

Дата:

21 октября 2015 г., 10:04:22

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan
http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

Re: Recursive query performance issue

От

Merlin Moncure

Дата:

21 октября 2015 г., 16:23:06

On Tue, Oct 20, 2015 at 12:34 PM, Jamie Koceniak
<jkoceniak@mediamath.com> wrote:
> Version:
>
> -----------------------------------------------------------------------------------------------
>
> PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
> 4.7.2-5) 4.7.2, 64-bit
>
> Query Plan
>
> http://explain.depesz.com/s/4s37
>
> Normally, this query takes around 200-300 ms to execute.
>
> However when several queries are run concurrently, query performance drops
> to 30-60 seconds.

Please define 'several'.  Essential information here is a capture of
'top' and possibly 'perf top'.  Also if the problem is storage related
iostat can be very useful  (or vmstat in a pinch)

FYI you can use pgbench with -f mode to measure concurrency
performance of any query.

The very first thing to rule out is a storage bottleneck via measured
iowait.  Assuming that's the case, this problem is interesting if:
*) Scaling is much worse than it should be
*) You can confirm this on more modern postgres  (interesting problems
are only interesting if they are unsolved)

merlin

Re: Recursive query performance issue

От

Pavel Stehule

Дата:

21 октября 2015 г., 21:24:28

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

Thanks for the reply.

1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.

2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel

Thanks,
Jamie

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

Hi

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan
http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.

there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel

Re: Recursive query performance issue

От

Pavel Stehule

Дата:

21 октября 2015 г., 22:26:44

2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

Or were you referring to SHMMAX?

value of shared_buffers - run SQL statements SHOW shared_buffers;

Regards

Pavel

Thanks

From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue

Ok

df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 406G 0 406G 0% /run/shm

Ok I will try lowering it.

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM

To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,

Thanks for the reply.

1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.

2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(
What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers
Regards
Pavel

Thanks,
Jamie

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

Hi

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan
http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.

there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel

Re: Recursive query performance issue

От

Pavel Stehule

Дата:

21 октября 2015 г., 22:46:05

2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

adama_prod=# SHOW shared_buffers;
shared_buffers
----------------
64GB

can you try to increase shared buffers to 200GB and decrease effective cache size to 180GB? If it is possibly - I am not sure, if this setting is good fro production usage, but the result can be interesting for bottleneck identification.

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:26 PM

To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,

Or were you referring to SHMMAX?

value of shared_buffers - run SQL statements SHOW shared_buffers;
Regards
Pavel

Thanks

From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue

Ok

df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 406G 0 406G 0% /run/shm

Ok I will try lowering it.

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM

To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,

Thanks for the reply.

1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.

2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(
What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers
Regards
Pavel

Thanks,
Jamie

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

Hi

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan
http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.

there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel

Re: Recursive query performance issue

От

Merlin Moncure

Дата:

21 октября 2015 г., 22:49:38

On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
>>
>> adama_prod=# SHOW shared_buffers;
>>
>> shared_buffers
>>
>> ----------------
>>
>> 64GB
>
>
> can you try to increase shared buffers to 200GB and decrease effective cache
> size to 180GB? If it is possibly - I am not sure, if this setting is good
> fro production usage, but the result can be interesting for bottleneck
> identification.

we need to see a snapshot from
*) top
*) perf top

merlin

Re: Recursive query performance issue

От

vincent elschot

Дата:

22 октября 2015 г., 17:48:22

On 20-10-15 19:34, Jamie Koceniak wrote:

Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan
http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.

Is the concurrency the cause or the result of the slowdown?
Are you executing the same query with the same parameters or do the parameters differ, perhaps making PostgreSQL
choose different queryplan?

Re: Recursive query performance issue

От

Pavel Stehule

Дата:

23 октября 2015 г., 21:01:01

this extremely high load looks like different issue - maybe spinlock issue or virtual memory issue.

Probably you need some low level debug tools like perf or dtrace :(

http://www.postgresql.org/message-id/20131206095629.GI7814@awork2.anarazel.de

Has you last PostgreSQL upgrade?

result of "perf top" when this issue is active is really requested.

Regards

Pavel

2015-10-23 19:45 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi,

We just had the performance problem again today.
Here is some of the top output. Unfortunately, we don't have perf top installed.

top - 16:22:16 up 29 days, 13:00, 2 users, load average: 164.63, 158.62, 148.52
Tasks: 1369 total, 181 running, 1188 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.2 us, 0.7 sy, 0.0 ni, 93.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem: 2068265 total, 433141 used, 1635124 free, 586 buffers
MiB Swap: 7812 total, 0 used, 7812 free, 412641 cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
81745 postgres 20 0 65.7g 51m 34m R 101 0.0 0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
81782 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
81797 postgres 20 0 65.7g 51m 34m R 101 0.0 0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
67103 postgres 20 0 65.7g 81m 56m R 97 0.0 2:01.89 postgres: user1 db 0.0.0.4(46337) SELECT
82527 postgres 20 0 65.7g 25m 20m R 93 0.0 0:02.35 postgres: user1 db 0.0.0.2(52490) SELECT
82559 postgres 20 0 65.7g 25m 20m R 93 0.0 0:02.17 postgres: user1 db 0.0.0.2(52496) SELECT
82728 postgres 20 0 65.7g 80m 76m R 93 0.0 0:00.60 postgres: user1 db 0.0.0.6(60957) SELECT
65588 postgres 20 0 65.7g 76m 56m R 89 0.0 2:12.27 postgres: user1 db 0.0.0.6(57195) SELECT
80594 postgres 20 0 65.7g 34m 28m R 89 0.0 0:22.81 postgres: user1 db 0.0.0.2(52071) SELECT
25176 postgres 20 0 65.7g 74m 57m R 85 0.0 7:24.42 postgres: user1 db 0.0.0.2(39410) SELECT
82182 postgres 20 0 65.7g 513m 502m R 85 0.0 0:04.85 postgres: user1 db 0.0.0.4(49789) SELECT
82034 postgres 20 0 65.7g 523m 510m R 81 0.0 0:05.79 postgres: user1 db 0.0.0.3(44683) SELECT
82439 postgres 20 0 65.7g 262m 258m R 81 0.0 0:02.64 postgres: user1 db 0.0.0.6(60887) SELECT
82624 postgres 20 0 65.7g 148m 143m R 81 0.0 0:01.20 postgres: user1 db 0.0.0.4(49888) SELECT
82637 postgres 20 0 65.7g 139m 134m R 81 0.0 0:01.17 postgres: user1 db 0.0.0.3(44805) SELECT
82669 postgres 20 0 65.7g 119m 114m R 81 0.0 0:00.97 postgres: user1 db 0.0.0.6(60939) SELECT
82723 postgres 20 0 65.7g 79m 75m R 81 0.0 0:00.56 postgres: user1 db 0.0.0.4(49907) SELECT
29160 postgres 20 0 65.7g 79m 54m R 77 0.0 6:52.13 postgres: user1 db 0.0.0.6(48802) SELECT
51095 postgres 20 0 65.7g 81m 57m R 77 0.0 4:01.51 postgres: user1 db 0.0.0.4(42914) SELECT
81833 postgres 20 0 65.7g 528m 515m R 77 0.0 0:07.23 postgres: user1 db 0.0.0.3(44644) SELECT
81978 postgres 20 0 65.7g 528m 515m R 77 0.0 0:06.05 postgres: user1 db 0.0.0.2(52364) SELECT
82099 postgres 20 0 65.7g 523m 510m R 77 0.0 0:05.18 postgres: user1 db 0.0.0.3(44692) SELECT
82111 postgres 20 0 65.7g 523m 510m R 77 0.0 0:05.14 postgres: user1 db 0.0.0.4(49773) SELECT
82242 postgres 20 0 65.7g 433m 429m R 77 0.0 0:04.27 postgres: user1 db 0.0.0.2(52428) SELECT
82292 postgres 20 0 65.7g 407m 402m R 77 0.0 0:04.10 postgres: user1 db 0.0.0.2(52440) SELECT
82408 postgres 20 0 65.7g 292m 288m R 77 0.0 0:02.98 postgres: user1 db 0.0.0.4(49835) SELECT
82542 postgres 20 0 65.7g 207m 202m R 77 0.0 0:01.98 postgres: user1 db 0.0.0.4(49868) SELECT
63638 postgres 20 0 65.7g 80m 56m R 73 0.0 2:30.10 postgres: user1 db 0.0.0.2(48699) SELECT
71572 postgres 20 0 65.7g 80m 56m R 73 0.0 1:31.13 postgres: user1 db 0.0.0.2(50223) SELECT
80580 postgres 20 0 65.7g 34m 28m R 73 0.0 0:22.93 postgres: user1 db 0.0.0.2(52065) SELECT
81650 postgres 20 0 65.8g 622m 555m R 73 0.0 0:08.84 postgres: user1 db 0.0.0.2(52290) SELECT
81728 postgres 20 0 65.7g 523m 510m R 73 0.0 0:08.28 postgres: user1 db 0.0.0.4(49684) SELECT
81942 postgres 20 0 65.7g 528m 515m R 73 0.0 0:06.46 postgres: user1 db 0.0.0.2(52355) SELECT
81958 postgres 20 0 65.7g 528m 514m R 73 0.0 0:06.48 postgres: user1 db 0.0.0.4(49744) SELECT
81980 postgres 20 0 65.7g 528m 515m R 73 0.0 0:06.02 postgres: user1 db 0.0.0.3(44671) SELECT
82007 postgres 20 0 65.7g 523m 510m R 73 0.0 0:06.27 postgres: user1 db 0.0.0.3(44676) SELECT
82374 postgres 20 0 65.7g 367m 362m R 73 0.0 0:03.48 postgres: user1 db 0.0.0.6(60873) SELECT
82385 postgres 20 0 65.7g 310m 306m R 73 0.0 0:03.03 postgres: user1 db 0.0.0.6(60876) SELECT
82520 postgres 20 0 65.7g 220m 215m R 73 0.0 0:02.00 postgres: user1 db 0.0.0.3(44785) SELECT
82676 postgres 20 0 65.7g 116m 111m R 73 0.0 0:00.90 postgres: user1 db 0.0.0.2(52531) SELECT
18471 postgres 20 0 65.7g 73m 56m R 69 0.0 8:14.08 postgres: user1 db 0.0.0.6(46144) SELECT
43890 postgres 20 0 65.7g 76m 56m R 69 0.0 5:04.46 postgres: user1 db 0.0.0.3(36697) SELECT
46130 postgres 20 0 65.7g 70m 57m R 69 0.0 4:46.56 postgres: user1 db 0.0.0.4(41871) SELECT
55604 postgres 20 0 65.7g 81m 57m R 69 0.0 3:27.67 postgres: user1 db 0.0.0.3(39292) SELECT
59139 postgres 20 0 65.7g 81m 57m R 69 0.0 3:01.18 postgres: user1 db 0.0.0.2(47670) SELECT
63523 postgres 20 0 65.7g 80m 56m R 69 0.0 2:28.04 postgres: user1 db 0.0.0.2(48680) SELECT
81707 postgres 20 0 65.7g 528m 515m S 69 0.0 0:08.44 postgres: user1 db 0.0.0.6(60737) SELECT
81830 postgres 20 0 65.7g 523m 510m R 69 0.0 0:07.60 postgres: user1 db 0.0.0.4(49707) SELECT
81932 postgres 20 0 65.7g 528m 515m R 69 0.0 0:06.65 postgres: user1 db 0.0.0.2(52352) SELECT
81950 postgres 20 0 65.7g 528m 515m R 69 0.0 0:05.92 postgres: user1 db 0.0.0.6(60783) SELECT
81973 postgres 20 0 65.7g 522m 510m R 69 0.0 0:06.18 postgres: user1 db 0.0.0.6(60789) SELECT
82193 postgres 20 0 65.7g 487m 479m R 69 0.0 0:04.61 postgres: user1 db 0.0.0.2(52415) SELECT
82358 postgres 20 0 65.7g 299m 295m R 69 0.0 0:03.11 postgres: user1 db 0.0.0.2(52453) SELECT
82372 postgres 20 0 65.7g 318m 313m R 69 0.0 0:03.22 postgres: user1 db 0.0.0.4(49827) SELECT
82381 postgres 20 0 65.7g 331m 326m R 69 0.0 0:03.30 postgres: user1 db 0.0.0.3(44757) SELECT
82404 postgres 20 0 65.7g 294m 289m R 69 0.0 0:02.86 postgres: user1 db 0.0.0.3(44761) SELECT
82415 postgres 20 0 65.7g 270m 266m R 69 0.0 0:02.80 postgres: user1 db 0.0.0.3(44767) SELECT
82521 postgres 20 0 65.7g 209m 205m R 69 0.0 0:02.00 postgres: user1 db 0.0.0.3(44786) SELECT
82526 postgres 20 0 65.7g 35m 29m R 69 0.0 0:01.20 postgres: user1 db 0.0.0.6(60906) SELECT
82550 postgres 20 0 65.7g 188m 184m R 69 0.0 0:01.72 postgres: user1 db 0.0.0.4(49870) SELECT
82587 postgres 20 0 65.7g 183m 178m R 69 0.0 0:01.64 postgres: user1 db 0.0.0.4(49882) SELECT
82683 postgres 20 0 65.7g 97m 93m R 69 0.0 0:00.77 postgres: user1 db 0.0.0.4(49899) SELECT
82685 postgres 20 0 65.7g 103m 99m R 69 0.0 0:00.84 postgres: user1 db 0.0.0.2(52532) SELECT
82687 postgres 20 0 65.7g 109m 104m R 69 0.0 0:00.85 postgres: user1 db 0.0.0.3(44809) SELECT
82712 postgres 20 0 65.7g 68m 64m R 69 0.0 0:00.55 postgres: user1 db 0.0.0.3(44814) SELECT
82715 postgres 20 0 65.7g 75m 70m R 69 0.0 0:00.58 postgres: user1 db 0.0.0.4(49905) SELECT
19548 postgres 20 0 65.7g 79m 56m R 65 0.0 8:02.44 postgres: user1 db 0.0.0.2(37887) SELECT
36714 postgres 20 0 65.7g 80m 56m R 65 0.0 5:56.08 postgres: user1 db 0.0.0.3(35177) SELECT
43599 postgres 20 0 65.7g 80m 56m R 65 0.0 5:05.03 postgres: user1 db 0.0.0.3(36638) SELECT

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Wednesday, October 21, 2015 12:50 PM
To: Pavel Stehule
Cc: Jamie Koceniak; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
>>
>> adama_prod=# SHOW shared_buffers;
>>
>> shared_buffers
>>
>> ----------------
>>
>> 64GB
>
>
> can you try to increase shared buffers to 200GB and decrease effective
> cache size to 180GB? If it is possibly - I am not sure, if this
> setting is good fro production usage, but the result can be
> interesting for bottleneck identification.

we need to see a snapshot from
*) top
*) perf top

merlin

Re: Recursive query performance issue

От

Jamie Koceniak

Дата:

26 октября 2015 г., 00:14:49

Hi Pavel,

Thanks for the reply.

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

Thanks,

Jamie

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan

http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

Re: Recursive query performance issue

От

Jamie Koceniak

Дата:

26 октября 2015 г., 00:14:50

df -h /dev/shm

Filesystem Size Used Avail Use% Mounted on

tmpfs 406G 0 406G 0% /run/shm

Ok I will try lowering it.

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

Thanks for the reply.

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel

Thanks,
Jamie

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

Hi

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan
http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.

there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel

Re: Recursive query performance issue

От

Jamie Koceniak

Дата:

26 октября 2015 г., 00:14:52

Hi Pavel,

Or were you referring to SHMMAX?

Thanks

From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue

df -h /dev/shm

Filesystem Size Used Avail Use% Mounted on

tmpfs 406G 0 406G 0% /run/shm

Ok I will try lowering it.

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

Thanks for the reply.

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel

Thanks,
Jamie

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

Hi

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan
http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.

there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel

Re: Recursive query performance issue

От

Jamie Koceniak

Дата:

26 октября 2015 г., 00:14:53

adama_prod=# SHOW shared_buffers;

shared_buffers

----------------

64GB

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:26 PM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

Or were you referring to SHMMAX?

value of shared_buffers - run SQL statements SHOW shared_buffers;

Regards

Pavel

Thanks

From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue

Ok

df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 406G 0 406G 0% /run/shm

Ok I will try lowering it.

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM

To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi Pavel,

Thanks for the reply.

1. The queries aren’t waiting on any locks.
The query has a recursive join that uses a table with only 80k records and that table is not updated often.

2. The I/O load was not high. CPU utilization was very high and load was very high.
We have a large effective_cache_size = 512GB (25% of total memory)

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(
What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers
Regards
Pavel

Thanks,
Jamie

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

Hi

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Version:
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

Query Plan
http://explain.depesz.com/s/4s37

Normally, this query takes around 200-300 ms to execute.
However when several queries are run concurrently, query performance drops to 30-60 seconds.

there can be few reasons:
1. locking - are you sure, so your queries don't wait on locks?
2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)
Regards
Pavel

Re: Recursive query performance issue

От

Jamie Koceniak

Дата:

26 октября 2015 г., 00:14:54

Hi,

We just had the performance problem again today.
Here is some of the top output. Unfortunately, we don't have perf top installed.

top - 16:22:16 up 29 days, 13:00,  2 users,  load average: 164.63, 158.62, 148.52
Tasks: 1369 total, 181 running, 1188 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.2 us,  0.7 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem:   2068265 total,   433141 used,  1635124 free,      586 buffers
MiB Swap:     7812 total,        0 used,     7812 free,   412641 cached

   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 81745 postgres  20   0 65.7g  51m  34m R   101  0.0   0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
 81782 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
 81797 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
 67103 postgres  20   0 65.7g  81m  56m R    97  0.0   2:01.89 postgres: user1 db 0.0.0.4(46337) SELECT
 82527 postgres  20   0 65.7g  25m  20m R    93  0.0   0:02.35 postgres: user1 db 0.0.0.2(52490) SELECT
 82559 postgres  20   0 65.7g  25m  20m R    93  0.0   0:02.17 postgres: user1 db 0.0.0.2(52496) SELECT
 82728 postgres  20   0 65.7g  80m  76m R    93  0.0   0:00.60 postgres: user1 db 0.0.0.6(60957) SELECT
 65588 postgres  20   0 65.7g  76m  56m R    89  0.0   2:12.27 postgres: user1 db 0.0.0.6(57195) SELECT
 80594 postgres  20   0 65.7g  34m  28m R    89  0.0   0:22.81 postgres: user1 db 0.0.0.2(52071) SELECT
 25176 postgres  20   0 65.7g  74m  57m R    85  0.0   7:24.42 postgres: user1 db 0.0.0.2(39410) SELECT
 82182 postgres  20   0 65.7g 513m 502m R    85  0.0   0:04.85 postgres: user1 db 0.0.0.4(49789) SELECT
 82034 postgres  20   0 65.7g 523m 510m R    81  0.0   0:05.79 postgres: user1 db 0.0.0.3(44683) SELECT
 82439 postgres  20   0 65.7g 262m 258m R    81  0.0   0:02.64 postgres: user1 db 0.0.0.6(60887) SELECT
 82624 postgres  20   0 65.7g 148m 143m R    81  0.0   0:01.20 postgres: user1 db 0.0.0.4(49888) SELECT
 82637 postgres  20   0 65.7g 139m 134m R    81  0.0   0:01.17 postgres: user1 db 0.0.0.3(44805) SELECT
 82669 postgres  20   0 65.7g 119m 114m R    81  0.0   0:00.97 postgres: user1 db 0.0.0.6(60939) SELECT
 82723 postgres  20   0 65.7g  79m  75m R    81  0.0   0:00.56 postgres: user1 db 0.0.0.4(49907) SELECT
 29160 postgres  20   0 65.7g  79m  54m R    77  0.0   6:52.13 postgres: user1 db 0.0.0.6(48802) SELECT
 51095 postgres  20   0 65.7g  81m  57m R    77  0.0   4:01.51 postgres: user1 db 0.0.0.4(42914) SELECT
 81833 postgres  20   0 65.7g 528m 515m R    77  0.0   0:07.23 postgres: user1 db 0.0.0.3(44644) SELECT
 81978 postgres  20   0 65.7g 528m 515m R    77  0.0   0:06.05 postgres: user1 db 0.0.0.2(52364) SELECT
 82099 postgres  20   0 65.7g 523m 510m R    77  0.0   0:05.18 postgres: user1 db 0.0.0.3(44692) SELECT
 82111 postgres  20   0 65.7g 523m 510m R    77  0.0   0:05.14 postgres: user1 db 0.0.0.4(49773) SELECT
 82242 postgres  20   0 65.7g 433m 429m R    77  0.0   0:04.27 postgres: user1 db 0.0.0.2(52428) SELECT
 82292 postgres  20   0 65.7g 407m 402m R    77  0.0   0:04.10 postgres: user1 db 0.0.0.2(52440) SELECT
 82408 postgres  20   0 65.7g 292m 288m R    77  0.0   0:02.98 postgres: user1 db 0.0.0.4(49835) SELECT
 82542 postgres  20   0 65.7g 207m 202m R    77  0.0   0:01.98 postgres: user1 db 0.0.0.4(49868) SELECT
 63638 postgres  20   0 65.7g  80m  56m R    73  0.0   2:30.10 postgres: user1 db 0.0.0.2(48699) SELECT
 71572 postgres  20   0 65.7g  80m  56m R    73  0.0   1:31.13 postgres: user1 db 0.0.0.2(50223) SELECT
 80580 postgres  20   0 65.7g  34m  28m R    73  0.0   0:22.93 postgres: user1 db 0.0.0.2(52065) SELECT
 81650 postgres  20   0 65.8g 622m 555m R    73  0.0   0:08.84 postgres: user1 db 0.0.0.2(52290) SELECT
 81728 postgres  20   0 65.7g 523m 510m R    73  0.0   0:08.28 postgres: user1 db 0.0.0.4(49684) SELECT
 81942 postgres  20   0 65.7g 528m 515m R    73  0.0   0:06.46 postgres: user1 db 0.0.0.2(52355) SELECT
 81958 postgres  20   0 65.7g 528m 514m R    73  0.0   0:06.48 postgres: user1 db 0.0.0.4(49744) SELECT
 81980 postgres  20   0 65.7g 528m 515m R    73  0.0   0:06.02 postgres: user1 db 0.0.0.3(44671) SELECT
 82007 postgres  20   0 65.7g 523m 510m R    73  0.0   0:06.27 postgres: user1 db 0.0.0.3(44676) SELECT
 82374 postgres  20   0 65.7g 367m 362m R    73  0.0   0:03.48 postgres: user1 db 0.0.0.6(60873) SELECT
 82385 postgres  20   0 65.7g 310m 306m R    73  0.0   0:03.03 postgres: user1 db 0.0.0.6(60876) SELECT
 82520 postgres  20   0 65.7g 220m 215m R    73  0.0   0:02.00 postgres: user1 db 0.0.0.3(44785) SELECT
 82676 postgres  20   0 65.7g 116m 111m R    73  0.0   0:00.90 postgres: user1 db 0.0.0.2(52531) SELECT
 18471 postgres  20   0 65.7g  73m  56m R    69  0.0   8:14.08 postgres: user1 db 0.0.0.6(46144) SELECT
 43890 postgres  20   0 65.7g  76m  56m R    69  0.0   5:04.46 postgres: user1 db 0.0.0.3(36697) SELECT
 46130 postgres  20   0 65.7g  70m  57m R    69  0.0   4:46.56 postgres: user1 db 0.0.0.4(41871) SELECT
 55604 postgres  20   0 65.7g  81m  57m R    69  0.0   3:27.67 postgres: user1 db 0.0.0.3(39292) SELECT
 59139 postgres  20   0 65.7g  81m  57m R    69  0.0   3:01.18 postgres: user1 db 0.0.0.2(47670) SELECT
 63523 postgres  20   0 65.7g  80m  56m R    69  0.0   2:28.04 postgres: user1 db 0.0.0.2(48680) SELECT
 81707 postgres  20   0 65.7g 528m 515m S    69  0.0   0:08.44 postgres: user1 db 0.0.0.6(60737) SELECT
 81830 postgres  20   0 65.7g 523m 510m R    69  0.0   0:07.60 postgres: user1 db 0.0.0.4(49707) SELECT
 81932 postgres  20   0 65.7g 528m 515m R    69  0.0   0:06.65 postgres: user1 db 0.0.0.2(52352) SELECT
 81950 postgres  20   0 65.7g 528m 515m R    69  0.0   0:05.92 postgres: user1 db 0.0.0.6(60783) SELECT
 81973 postgres  20   0 65.7g 522m 510m R    69  0.0   0:06.18 postgres: user1 db 0.0.0.6(60789) SELECT
 82193 postgres  20   0 65.7g 487m 479m R    69  0.0   0:04.61 postgres: user1 db 0.0.0.2(52415) SELECT
 82358 postgres  20   0 65.7g 299m 295m R    69  0.0   0:03.11 postgres: user1 db 0.0.0.2(52453) SELECT
 82372 postgres  20   0 65.7g 318m 313m R    69  0.0   0:03.22 postgres: user1 db 0.0.0.4(49827) SELECT
 82381 postgres  20   0 65.7g 331m 326m R    69  0.0   0:03.30 postgres: user1 db 0.0.0.3(44757) SELECT
 82404 postgres  20   0 65.7g 294m 289m R    69  0.0   0:02.86 postgres: user1 db 0.0.0.3(44761) SELECT
 82415 postgres  20   0 65.7g 270m 266m R    69  0.0   0:02.80 postgres: user1 db 0.0.0.3(44767) SELECT
 82521 postgres  20   0 65.7g 209m 205m R    69  0.0   0:02.00 postgres: user1 db 0.0.0.3(44786) SELECT
 82526 postgres  20   0 65.7g  35m  29m R    69  0.0   0:01.20 postgres: user1 db 0.0.0.6(60906) SELECT
 82550 postgres  20   0 65.7g 188m 184m R    69  0.0   0:01.72 postgres: user1 db 0.0.0.4(49870) SELECT
 82587 postgres  20   0 65.7g 183m 178m R    69  0.0   0:01.64 postgres: user1 db 0.0.0.4(49882) SELECT
 82683 postgres  20   0 65.7g  97m  93m R    69  0.0   0:00.77 postgres: user1 db 0.0.0.4(49899) SELECT
 82685 postgres  20   0 65.7g 103m  99m R    69  0.0   0:00.84 postgres: user1 db 0.0.0.2(52532) SELECT
 82687 postgres  20   0 65.7g 109m 104m R    69  0.0   0:00.85 postgres: user1 db 0.0.0.3(44809) SELECT
 82712 postgres  20   0 65.7g  68m  64m R    69  0.0   0:00.55 postgres: user1 db 0.0.0.3(44814) SELECT
 82715 postgres  20   0 65.7g  75m  70m R    69  0.0   0:00.58 postgres: user1 db 0.0.0.4(49905) SELECT
 19548 postgres  20   0 65.7g  79m  56m R    65  0.0   8:02.44 postgres: user1 db 0.0.0.2(37887) SELECT
 36714 postgres  20   0 65.7g  80m  56m R    65  0.0   5:56.08 postgres: user1 db 0.0.0.3(35177) SELECT
 43599 postgres  20   0 65.7g  80m  56m R    65  0.0   5:05.03 postgres: user1 db 0.0.0.3(36638) SELECT

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Wednesday, October 21, 2015 12:50 PM
To: Pavel Stehule
Cc: Jamie Koceniak; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
>>
>> adama_prod=# SHOW shared_buffers;
>>
>> shared_buffers
>>
>> ----------------
>>
>> 64GB
>
>
> can you try to increase shared buffers to 200GB and decrease effective 
> cache size to 180GB? If it is possibly - I am not sure, if this 
> setting is good fro production usage, but the result can be 
> interesting for bottleneck identification.

we need to see a snapshot from
*) top
*) perf top

merlin

Re: Recursive query performance issue

От

Merlin Moncure

Дата:

26 октября 2015 г., 18:03:41

On Fri, Oct 23, 2015 at 12:45 PM, Jamie Koceniak
<jkoceniak@mediamath.com> wrote:
> Hi,
>
> We just had the performance problem again today.
> Here is some of the top output. Unfortunately, we don't have perf top installed.
>
> top - 16:22:16 up 29 days, 13:00,  2 users,  load average: 164.63, 158.62, 148.52
> Tasks: 1369 total, 181 running, 1188 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  6.2 us,  0.7 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> MiB Mem:   2068265 total,   433141 used,  1635124 free,      586 buffers
> MiB Swap:     7812 total,        0 used,     7812 free,   412641 cached
>
>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>  81745 postgres  20   0 65.7g  51m  34m R   101  0.0   0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
>  81782 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
>  81797 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
<snip>

ok, this rules out iowait.

load is 160+.  system is reporting 6.2%user, 93.1%idle,  0 iowait.
This is very odd.
*) how many processors do you have?
*) Can we have more details about the hardware platform?
*) Is this system virtualized? If so, what solution?

we need a perf top and a capture of 'vmstat 1' for context switches

merlin

Re: Recursive query performance issue

От

Jamie Koceniak

Дата:

15 ноября 2015 г., 00:41:21

Had the issue again today.

Here is vmstat :
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
24  0      0 1591718656 605656 499370336    0    0     0   371    0    0  7  1 93  0
25  0      0 1591701376 605656 499371936    0    0     0   600 13975 20168 20  1 79  0
26  0      0 1591654784 605656 499372064    0    0     0  5892 12725 14627 20  1 79  0
25  0      0 1591614336 605656 499372128    0    0     0   600 11665 12642 21  1 78  0
27  0      0 1591549952 605656 499372192    0    0     0   408 16939 23387 23  1 76  0
29  0      0 1591675392 605656 499372288    0    0     0   836 15380 22564 23  1 76  0
27  0      0 1591608704 605656 499372352    0    0     0   456 17593 27955 23  1 76  0
34  0      0 1591524608 605656 499372480    0    0     0  5904 18963 30915 23  1 75  0
23  0      0 1591632384 605656 499372576    0    0     0   704 18190 31002 22  1 77  0
25  0      0 1591551360 605656 499372640    0    0     0   944 12532 14095 21  1 78  0
24  0      0 1591613568 605656 499372704    0    0     0   416 11183 12553 20  1 79  0
23  0      0 1591531520 605656 499372768    0    0     0   400 12648 15540 19  1 80  0
22  0      0 1591510528 605656 499372800    0    0     0  6024 14670 21993 19  1 80  0
31  0      0 1591388800 605656 499372896    0    0     0   472 20605 28242 20  1 79  0

We have  a 120 CPU server :)

processor       : 119
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz


-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Monday, October 26, 2015 8:04 AM
To: Jamie Koceniak
Cc: Pavel Stehule; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

On Fri, Oct 23, 2015 at 12:45 PM, Jamie Koceniak <jkoceniak@mediamath.com> wrote:
> Hi,
>
> We just had the performance problem again today.
> Here is some of the top output. Unfortunately, we don't have perf top installed.
>
> top - 16:22:16 up 29 days, 13:00,  2 users,  load average: 164.63, 158.62, 148.52
> Tasks: 1369 total, 181 running, 1188 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  6.2 us,  0.7 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> MiB Mem:   2068265 total,   433141 used,  1635124 free,      586 buffers
> MiB Swap:     7812 total,        0 used,     7812 free,   412641 cached
>
>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>  81745 postgres  20   0 65.7g  51m  34m R   101  0.0   0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
>  81782 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
>  81797 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
<snip>

ok, this rules out iowait.

load is 160+.  system is reporting 6.2%user, 93.1%idle,  0 iowait.
This is very odd.
*) how many processors do you have?
*) Can we have more details about the hardware platform?
*) Is this system virtualized? If so, what solution?

we need a perf top and a capture of 'vmstat 1' for context switches

merlin

Re: Recursive query performance issue

От

Merlin Moncure

Дата:

16 ноября 2015 г., 17:19:17

On Sat, Nov 14, 2015 at 12:58 AM, Jamie Koceniak
<jkoceniak@mediamath.com> wrote:
> Had the issue again today.
>
> Here is vmstat :
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> 24  0      0 1591718656 605656 499370336    0    0     0   371    0    0  7  1 93  0
> 25  0      0 1591701376 605656 499371936    0    0     0   600 13975 20168 20  1 79  0
> 26  0      0 1591654784 605656 499372064    0    0     0  5892 12725 14627 20  1 79  0
> 25  0      0 1591614336 605656 499372128    0    0     0   600 11665 12642 21  1 78  0
> 27  0      0 1591549952 605656 499372192    0    0     0   408 16939 23387 23  1 76  0
> 29  0      0 1591675392 605656 499372288    0    0     0   836 15380 22564 23  1 76  0
> 27  0      0 1591608704 605656 499372352    0    0     0   456 17593 27955 23  1 76  0
> 34  0      0 1591524608 605656 499372480    0    0     0  5904 18963 30915 23  1 75  0
> 23  0      0 1591632384 605656 499372576    0    0     0   704 18190 31002 22  1 77  0
> 25  0      0 1591551360 605656 499372640    0    0     0   944 12532 14095 21  1 78  0
> 24  0      0 1591613568 605656 499372704    0    0     0   416 11183 12553 20  1 79  0
> 23  0      0 1591531520 605656 499372768    0    0     0   400 12648 15540 19  1 80  0
> 22  0      0 1591510528 605656 499372800    0    0     0  6024 14670 21993 19  1 80  0
> 31  0      0 1591388800 605656 499372896    0    0     0   472 20605 28242 20  1 79  0
>
> We have  a 120 CPU server :)
>
> processor       : 119
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 62
> model name      : Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz

Per the numbers above. this server is very healthy.  Something is not
adding up here: I would really have liked to see a snapshot from 'top'
and 'perf top' taken at the same time.  Via top we could have seen if
some of the processors were completely loaded down while some were not
being utilized at all. This would suggest a problem with the operating
system, likely NUMA related.

*) Are you counting hyperthreading to get to the 120 cpu count

*) Is this server virtualized

*) what is the output of:
lscpu | grep NUMA

*) do you have 'taskset' installed?  Can we check affinity via:
taskset -c -p <pid>

where <pid> is the pid of a few randomly sampled postgres processes at work

*) Can you report exact kernel version

*) what is output of:
cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag

*) Is installing a newer postgres an option?  Configuring highly SMP
systems for reliable scaling may require some progressive thinking.

merlin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Recursive query performance issue