Обсуждение: 9.3 performance issues, lots of bind and parse log entries

Поиск
Список
Период
Сортировка

9.3 performance issues, lots of bind and parse log entries

От
Tory M Blue
Дата:
Not sure what is going on but other than upgrading to 9.3.4 from 9.2.4, i'm seeing major slowness in basic queries and seeing a ton of the bind and parse in my logs. These are standard lookups and should take micro seconds. I'm logging all queries that take over a second and this seems to be getting worse, seems like it's snowballing.


2014-11-04 08:54:52 PST clsdb cls 216.0.0.50(33569) 14857 2014-11-04 08:54:52.476 PSTLOG:  duration: 2206.070 ms  parse dbdpg_p12768_2: SELECT contact_seq_id, status FROM cls.contacts
WHERE
cust_seq_id = $1
AND contact_id = $2

2014-11-04 08:54:52 PST clsdb cls 216.20.0.50(48450) 14882 2014-11-04 08:54:52.394 PSTLOG:  duration: 1624.847 ms  bind dbdpg_p21610_2: SELECT contact_seq_id, status FROM cls.contacts
WHERE
cust_seq_id = $1
AND contact_id = $2


CentOS 6.x
Postgres:  9.3.4
256GB Mem
32Core

I'm tearing up my system to see what is happening, the postgres is the only change, the processes or the data is not. I tried tuning postgresql.conf and modifed effective cache etc, but I've reverted it all, so again just looking for some assistance.

Again no I/O have plenty of memory etc. The server is running hotter than it used to as well.

Thanks
Tory

Re: 9.3 performance issues, lots of bind and parse log entries

От
Tory M Blue
Дата:


On Tue, Nov 4, 2014 at 9:01 AM, Tory M Blue <tmblue@gmail.com> wrote:
Not sure what is going on but other than upgrading to 9.3.4 from 9.2.4, i'm seeing major slowness in basic queries and seeing a ton of the bind and parse in my logs. These are standard lookups and should take micro seconds. I'm logging all queries that take over a second and this seems to be getting worse, seems like it's snowballing.


2014-11-04 08:54:52 PST clsdb cls 216.0.0.50(33569) 14857 2014-11-04 08:54:52.476 PSTLOG:  duration: 2206.070 ms  parse dbdpg_p12768_2: SELECT contact_seq_id, status FROM cls.contacts
WHERE
cust_seq_id = $1
AND contact_id = $2

2014-11-04 08:54:52 PST clsdb cls 216.20.0.50(48450) 14882 2014-11-04 08:54:52.394 PSTLOG:  duration: 1624.847 ms  bind dbdpg_p21610_2: SELECT contact_seq_id, status FROM cls.contacts
WHERE
cust_seq_id = $1
AND contact_id = $2


CentOS 6.x
Postgres:  9.3.4
256GB Mem
32Core

I'm tearing up my system to see what is happening, the postgres is the only change, the processes or the data is not. I tried tuning postgresql.conf and modifed effective cache etc, but I've reverted it all, so again just looking for some assistance.

Again no I/O have plenty of memory etc. The server is running hotter than it used to as well.

Thanks
Tory

Well after fighting this all day and dealing with a really sluggish db where even my slon processes were taking several seconds, I reduced my shared_buffers back to 2GB from 10GB and my work_mem from 7.5GB to 2GB. i actually undid all my changes, including dropping my effective_cache back to 7GB  and restarted.

I have 300 connections configured, we will use around 87 normally with some spikes, but I'm wondering if the 10GB shared memory caused me some grief, I don't believe it was the work_mem and don't believe it was the effective cache, but something caused my DB to run into issues with basic queries, same queries after restart are finishing in milliseconds instead of 2-3 seconds. No disk issues seen,.


So if this is not a 9.3 issue, it's an issue with me upping my config params to a level I thought would give a nice bump..

CentOS 6.x
Postgres:  9.3.4
256GB Mem
32Core

Re: 9.3 performance issues, lots of bind and parse log entries

От
Tomas Vondra
Дата:
Hi Tory,

On 4.11.2014 21:07, Tory M Blue wrote:
> Well after fighting this all day and dealing with a really sluggish db
> where even my slon processes were taking several seconds, I reduced my
> shared_buffers back to 2GB from 10GB and my work_mem from 7.5GB to 2GB.
> i actually undid all my changes, including dropping my effective_cache
> back to 7GB  and restarted.

Have you been using the same parameter values on 9.2, or have you bumped
them up only on the new 9.3? I'm wondering whether 9.2 was performing
better with the values?

> I have 300 connections configured, we will use around 87 normally
> with some spikes, but I'm wondering if the 10GB shared memory caused
> me some grief, I don't believe it was the work_mem and don't believe
> it was the effective cache, but something caused my DB to run into
> issues with basic queries, same queries after restart are finishing
> in milliseconds instead of 2-3 seconds. No disk issues seen,.

I assume only some of the connections will be active (running queries)
at the same time. If you expect >> 32 active queries at the same time,
you're only increasing latency.

Based on your description I assume you're CPU bound (otherwise the
machine would not get "hotter", and planning is not about I/O).

I'm not sure if this is a production machine or how much you can
experiment with it, but it'd be helpful if you could provide some
profiling information

   $ iostat -x -k 1
   $ vmstat 1

and such data. A perf profile would be even better, but to get the most
useful info it may be necessary to recompile the postgres with debug
info and '-fno-omit-frame-pointer'. Then this should do the trick:

  perf record -a -g (for a few seconds, then Ctrl-C)
  perf report

or just "perf top" to see what functions are at the top.


> So if this is not a 9.3 issue, it's an issue with me upping my config
> params to a level I thought would give a nice bump..
>
> CentOS 6.x
> Postgres:  9.3.4
> 256GB Mem
> 32Core

What kernel version are you using? I assume 6.x means 6.5, or are you
using an older CentOS version?

Are you using transparent huge pages, NUMA or similar features?
Althought, that'd probably impact 9.2 too.

Also, what package is this? Is it coming from the CentOS repository,
yum.postgresql.org or some other repository?

regards
Tomas


Re: 9.3 performance issues, lots of bind and parse log entries

От
Tory M Blue
Дата:

Thanks Thomas, 

On 4.11.2014 21:07, Tory M Blue wrote:
> Well after fighting this all day and dealing with a really sluggish db
> where even my slon processes were taking several seconds, I reduced my
> shared_buffers back to 2GB from 10GB and my work_mem from 7.5GB to 2GB.
> i actually undid all my changes, including dropping my effective_cache
> back to 7GB  and restarted.

Have you been using the same parameter values on 9.2, or have you bumped
them up only on the new 9.3? I'm wondering whether 9.2 was performing
better with the values?


Things seem to have been running better on 9.2 at this point I'm using the same config file from 9.2 and I'm still experiencing slowness under heavier write access. And my disk subsystem has not changed. Hardware has not changed, heck i'm even running the old version of slony (have not upgraded it yet).

But since the upgrade to 9.3 even calls to my sl_log tables which are tiny can take:

2014-11-04 02:58:40 PST clsdb postgres 10.13.200.242(52022) 21642 2014-11-04 02:58:40.515 PSTLOG:  duration: 1627.019 ms  statement: fetch 500 from LOG;  (log had 145K items).

> I have 300 connections configured, we will use around 87 normally
> with some spikes, but I'm wondering if the 10GB shared memory caused
> me some grief, I don't believe it was the work_mem and don't believe
> it was the effective cache, but something caused my DB to run into
> issues with basic queries, same queries after restart are finishing
> in milliseconds instead of 2-3 seconds. No disk issues seen,.

I assume only some of the connections will be active (running queries)
at the same time. If you expect >> 32 active queries at the same time,
you're only increasing latency.

Based on your description I assume you're CPU bound (otherwise the
machine would not get "hotter", and planning is not about I/O).

I'm not sure if this is a production machine or how much you can
experiment with it, but it'd be helpful if you could provide some
profiling information

   $ iostat -x -k 1
   $ vmstat 1

and such data. A perf profile would be even better, but to get the most
useful info it may be necessary to recompile the postgres with debug
info and '-fno-omit-frame-pointer'. Then this should do the trick:

  perf record -a -g (for a few seconds, then Ctrl-C)
  perf report

or just "perf top" to see what functions are at the top.


This is a production server, but it was not really CPU bound with 9.2 so something is odd and I'm starting to stress, because it is a production environment :)

Connections correct, I have less than 20 or so active requests at a time, but i would say active queries are in the handful. I was was not seeing IO, but was seeing load increase as queries started taking longer, but nothing in iostat or vmstat/free showed any contention. Heck even Top while showed some cores as busy, nothing was sitting at over 60% utilized. And we are talking a load of 12-14 here on a 32 core system, when it's normally asleep!

This is  my master slon insert server, so I can run commands, tweak configs but any type of rebuild or restart of postgres is a scheduled affair.

These work loads that seem to be creating the issues run between midnight and now almost 6am, prior to 9.3 it was taking maybe 4 hours, now it's taking 6. So tomorrow AM , I'll grab some stats when I see that it's struggling.

But even now with almost no connections or really any major access i'm seeing the sl_log grab 500 rows take 1-3 seconds, which is just plain silly (but it's not a constant, so I may see 1 of these alerts every hour)
 

> So if this is not a 9.3 issue, it's an issue with me upping my config
> params to a level I thought would give a nice bump..
>
> CentOS 6.x
> Postgres:  9.3.4
> 256GB Mem
> 32Core

What kernel version are you using? I assume 6.x means 6.5, or are you
using an older CentOS version?

6.5  yes sir.. 2.6.32-431.5.1.el6.x86_64 


Are you using transparent huge pages, NUMA or similar features?
Althought, that'd probably impact 9.2 too.

ya nothing here. No difference from the 9.2 to 9.3 roll. My sysctl.conf is pretty boring. 

Also, what package is this? Is it coming from the CentOS repository,
yum.postgresql.org or some other repository?


It's a self spun RPM. follows the same procedures since earlier 7.x, with adding required includes as it went along. We spin this rpm with our slon package together. 

Thanks Tomas 

Tory Blue

Re: 9.3 performance issues, lots of bind and parse log entries

От
Tomas Vondra
Дата:
On 5.11.2014 20:16, Tory M Blue wrote:
>
> Thanks Thomas,
>
>
>     On 4.11.2014 21:07, Tory M Blue wrote:
>     > Well after fighting this all day and dealing with a really sluggish db
>     > where even my slon processes were taking several seconds, I reduced my
>     > shared_buffers back to 2GB from 10GB and my work_mem from 7.5GB to
>     2GB.
>     > i actually undid all my changes, including dropping my effective_cache
>     > back to 7GB  and restarted.
>
>     Have you been using the same parameter values on 9.2, or have you bumped
>     them up only on the new 9.3? I'm wondering whether 9.2 was performing
>     better with the values?
>
>
> Things seem to have been running better on 9.2 at this point I'm using
> the same config file from 9.2 and I'm still experiencing slowness under
> heavier write access. And my disk subsystem has not changed. Hardware
> has not changed, heck i'm even running the old version of slony (have
> not upgraded it yet).

So with shared_buffers=10GB and work_mem=7.5GB you saw significant
slowdown both for read and write queries, and after reverting to lower
values the read queries are OK but writes still take much longer?

> But since the upgrade to 9.3 even calls to my sl_log tables which
> are tiny can take:
>
> 2014-11-04 02:58:40 PST clsdb postgres 10.13.200.242(52022) 21642
> 2014-11-04 02:58:40.515 PSTLOG:  duration: 1627.019 ms  statement: fetch
> 500 from LOG;  (log had 145K items).
>
>     > I have 300 connections configured, we will use around 87 normally
>     > with some spikes, but I'm wondering if the 10GB shared memory caused
>     > me some grief, I don't believe it was the work_mem and don't believe
>     > it was the effective cache, but something caused my DB to run into
>     > issues with basic queries, same queries after restart are finishing
>     > in milliseconds instead of 2-3 seconds. No disk issues seen,.
>
>     I assume only some of the connections will be active (running queries)
>     at the same time. If you expect >> 32 active queries at the same time,
>     you're only increasing latency.
>
>     Based on your description I assume you're CPU bound (otherwise the
>     machine would not get "hotter", and planning is not about I/O).
>
>     I'm not sure if this is a production machine or how much you can
>     experiment with it, but it'd be helpful if you could provide some
>     profiling information
>
>        $ iostat -x -k 1
>        $ vmstat 1
>
>     and such data. A perf profile would be even better, but to get the most
>     useful info it may be necessary to recompile the postgres with debug
>     info and '-fno-omit-frame-pointer'. Then this should do the trick:
>
>       perf record -a -g (for a few seconds, then Ctrl-C)
>       perf report
>
>     or just "perf top" to see what functions are at the top.
>
>
> This is a production server, but it was not really CPU bound with 9.2
> so something is odd and I'm starting to stress, because it is a
> production environment :)

Yeah, I was talking about the 9.3 - that's clearly CPU bound.

> Connections correct, I have less than 20 or so active requests at a
> time, but i would say active queries are in the handful. I was was not
> seeing IO, but was seeing load increase as queries started taking
> longer, but nothing in iostat or vmstat/free showed any contention. Heck
> even Top while showed some cores as busy, nothing was sitting at over
> 60% utilized. And we are talking a load of 12-14 here on a 32 core
> system, when it's normally asleep!

Right. That's consistent with being CPU bound.

> This is my master slon insert server, so I can run commands, tweak
> configs but any type of rebuild or restart of postgres is a
> scheduled affair.

OK, understood. That however mostly rules out recompiling with debug
info and frame pointers, as that might make it significantly slower.
That's not something you'd like to do on production.

> These work loads that seem to be creating the issues run between
> midnight and now almost 6am, prior to 9.3 it was taking maybe 4
> hours, now it's taking 6. So tomorrow AM , I'll grab some stats when
> I see that it's struggling.
>
> But even now with almost no connections or really any major access
> i'm seeing the sl_log grab 500 rows take 1-3 seconds, which is just
> plain silly (but it's not a constant, so I may see 1 of these alerts
> every hour)

Is that plain "SELECT * FROM sl_log" or something more complex? When you
do explain analyze on the query, what you see?

Tomas


Re: 9.3 performance issues, lots of bind and parse log entries

От
Josh Berkus
Дата:
Tory,

Do you know if your workload involves a lot of lock-blocking,
particularly blocking on locks related to FKs?  I'm tracing down a
problem which sounds similar to yours.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com