Обсуждение: Queries intermittently slow

Поиск
Список
Период
Сортировка

Queries intermittently slow

От
Scott Rankin
Дата:
Hi all,

I’m trying to track down why some queries on my database system are intermittently much slower than usual.  I have some queries that run, on average, 2-3ms, and they run at a rate of about 10-20 queries/second.  However, every 3-5 seconds, one of the queries will be 500-100ms.  This is making the average query time turn out to be closer to 20ms, with a very large standard deviation.  

This happens to a number of otherwise very fast queries, and I’m trying to trace the reason.  I’ve turned on lock logging and checkpoint logging, and this behavior happens whether or not a checkpoint is occurring.  There are no lock waits happening in the system either.  

Here’s my info: 

- PostgreSQL 9.2.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
- Installed from Postgresql Yum repo
- Changed Settings:

"application_name";"pgAdmin III - Query Tool";"client"
"archive_command";"/usr/crsinc/bin/wal_processor.sh %f %p";"configuration file"
"archive_mode";"on";"configuration file"
"autovacuum_analyze_scale_factor";"0.1";"configuration file"
"autovacuum_analyze_threshold";"500";"configuration file"
"autovacuum_naptime";"10min";"configuration file"
"autovacuum_vacuum_cost_delay";"50ms";"configuration file"
"autovacuum_vacuum_cost_limit";"250";"configuration file"
"autovacuum_vacuum_scale_factor";"0.1";"configuration file"
"autovacuum_vacuum_threshold";"1000";"configuration file"
"bgwriter_delay";"100ms";"configuration file"
"bgwriter_lru_maxpages";"1000";"configuration file"
"bytea_output";"escape";"session"
"checkpoint_completion_target";"0.7";"configuration file"
"checkpoint_segments";"128";"configuration file"
"checkpoint_timeout";"30min";"configuration file"
"checkpoint_warning";"895s";"configuration file"
"client_encoding";"UNICODE";"session"
"client_min_messages";"notice";"session"
"cpu_tuple_cost";"0.001";"configuration file"
"DateStyle";"ISO, MDY";"session"
"default_text_search_config";"pg_catalog.english";"configuration file"
"effective_cache_size";"94GB";"configuration file"
"effective_io_concurrency";"100";"configuration file"
"hot_standby";"off";"configuration file"
"hot_standby_feedback";"off";"configuration file"
"lc_messages";"en_US.UTF-8";"configuration file"
"lc_monetary";"en_US.UTF-8";"configuration file"
"lc_numeric";"en_US.UTF-8";"configuration file"
"lc_time";"en_US.UTF-8";"configuration file"
"listen_addresses";"*";"configuration file"
"log_checkpoints";"on";"configuration file"
"log_destination";"stderr";"configuration file"
"log_directory";"pg_log";"configuration file"
"log_filename";"postgresql-%a.log";"configuration file"
"log_line_prefix";"< user=%u db=%d host=%h time=%t pid=%p xid=%x>";"configuration file"
"log_lock_waits";"on";"configuration file"
"log_min_duration_statement";"100ms";"configuration file"
"log_rotation_age";"1d";"configuration file"
"log_rotation_size";"0";"configuration file"
"log_temp_files";"0";"configuration file"
"log_truncate_on_rotation";"on";"configuration file"
"logging_collector";"on";"configuration file"
"maintenance_work_mem";"6047MB";"configuration file"
"max_connections";"1300";"configuration file"
"max_stack_depth";"2MB";"environment variable"
"max_wal_senders";"4";"configuration file"
"port";"5432";"command line"
"random_page_cost";"1.9";"configuration file"
"shared_buffers";"8GB";"configuration file"
"ssl";"on";"configuration file"
"ssl_renegotiation_limit";"0";"configuration file"
"standard_conforming_strings";"off";"configuration file"
"superuser_reserved_connections";"10";"configuration file"
"TimeZone";"US/Pacific";"configuration file"
"wal_keep_segments";"64";"configuration file"
"wal_level";"hot_standby";"configuration file"
"work_mem";"188MB";"configuration file"

OS: CentOS release 6.7 (Linux databasep1.crsinc.com 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux). Our server has 24 cores and 128 GB of RAM.  

This happens via psql, JDBC, PDO connections.  

Thanks in advance for any help!
Scott

This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be used for any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein.

Re: Queries intermittently slow

От
Tom Lane
Дата:
Scott Rankin <srankin@motus.com> writes:
> I’m trying to track down why some queries on my database system are intermittently much slower than usual.  I have
somequeries that run, on average, 2-3ms, and they run at a rate of about 10-20 queries/second.  However, every 3-5
seconds,one of the queries will be 500-100ms.  This is making the average query time turn out to be closer to 20ms,
witha very large standard deviation. 

> This happens to a number of otherwise very fast queries, and I’m trying to trace the reason.  I’ve turned on lock
loggingand checkpoint logging, and this behavior happens whether or not a checkpoint is occurring.  There are no lock
waitshappening in the system either. 

I doubt you've proved that --- log_lock_waits will only report on waits
longer than deadlock_timeout, which you don't appear to have changed from
its default of 1 sec.  If you're trying to capture events that last a few
hundred msec, you're going to need to reduce deadlock_timeout to maybe
100ms.

It would help to know more about what the queries are, too.  The cause
might be something like GIN index pending-list cleanup but we can't tell
on the basis of this much info.

            regards, tom lane


Re: Queries intermittently slow

От
Scott Rankin
Дата:
On 1/6/16, 10:19 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:


>Scott Rankin <srankin@motus.com> writes:
>> I’m trying to track down why some queries on my database system are intermittently much slower than usual.  I have
somequeries that run, on average, 2-3ms, and they run at a rate of about 10-20 queries/second.  However, every 3-5
seconds,one of the queries will be 500-100ms.  This is making the average query time turn out to be closer to 20ms,
witha very large standard deviation.
 
>
>> This happens to a number of otherwise very fast queries, and I’m trying to trace the reason.  I’ve turned on lock
loggingand checkpoint logging, and this behavior happens whether or not a checkpoint is occurring.  There are no lock
waitshappening in the system either.
 
>
>I doubt you've proved that --- log_lock_waits will only report on waits
>longer than deadlock_timeout, which you don't appear to have changed from
>its default of 1 sec.  If you're trying to capture events that last a few
>hundred msec, you're going to need to reduce deadlock_timeout to maybe
>100ms.
>
>It would help to know more about what the queries are, too.  The cause
>might be something like GIN index pending-list cleanup but we can't tell
>on the basis of this much info.
>
>regards, tom lane

Hi Tom,

Thanks for the update. The query in question is a pretty simple one - it joins 3 tables, all of which are static - they
don’thave any writes being done against them.  They have very few rows, and the query plan for them indicates that they
areall sequential scans.  When doing an EXPLAIN ANALYZE, the delay is not consistently on one table, it can vary
betweenthe three tables involved in the query.
 

Here is the query plan for a fast run:

"Nested Loop  (cost=0.00..4.20 rows=1 width=185) (actual time=0.069..0.069 rows=0 loops=1)"
"  Join Filter: (be.branding_id = b.branding_id)"
"  Rows Removed by Join Filter: 1"
"  ->  Nested Loop  (cost=0.00..3.19 rows=1 width=189) (actual time=0.040..0.057 rows=1 loops=1)"
"        Join Filter: (s.setting_id = be.setting_id)"
"        Rows Removed by Join Filter: 41"
"        ->  Seq Scan on branding_extension be  (cost=0.00..1.00 rows=1 width=8) (actual time=0.008..0.008 rows=1
loops=1)"
"        ->  Seq Scan on setting s  (cost=0.00..2.04 rows=42 width=185) (actual time=0.004..0.018 rows=42 loops=1)"
"  ->  Seq Scan on branding b  (cost=0.00..1.01 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)"
"        Filter: ((name)::text = 'crs'::text)"
"        Rows Removed by Filter: 1"
"Total runtime: 0.150 ms"

And for a slow one:


"Nested Loop  (cost=0.00..4.20 rows=1 width=185) (actual time=383.862..383.862 rows=0 loops=1)"
"  Join Filter: (be.branding_id = b.branding_id)"
"  Rows Removed by Join Filter: 1"
"  ->  Nested Loop  (cost=0.00..3.19 rows=1 width=189) (actual time=383.815..383.849 rows=1 loops=1)"
"        Join Filter: (s.setting_id = be.setting_id)"
"        Rows Removed by Join Filter: 41"
"        ->  Seq Scan on branding_extension be  (cost=0.00..1.00 rows=1 width=8) (actual time=383.776..383.776 rows=1
loops=1)"
"        ->  Seq Scan on setting s  (cost=0.00..2.04 rows=42 width=185) (actual time=0.005..0.037 rows=42 loops=1)"
"  ->  Seq Scan on branding b  (cost=0.00..1.01 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)"
"        Filter: ((name)::text = 'crs'::text)"
"        Rows Removed by Filter: 1"
"Total runtime: 383.920 ms"

I will look at changing the deadlock_timeout, but that might have to wait for the weekend since this is a production
system.


Thanks,
Scott
This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later
designateas confidential and proprietary. It is intended only for use of the individual or entity named above and
shouldnot be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be
usedfor any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the
readerof this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended
recipient,you are hereby notified that any dissemination, distribution, or copying of this communication is strictly
prohibited.If you have received this communication in error, please notify sender immediately and destroy the original
message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent
thepreceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is
notintended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of
avoidingFederal tax penalties, and was not written to support the promotion or marketing of any transaction or matter
discussedherein.
 

Re: Queries intermittently slow

От
Tom Lane
Дата:
Scott Rankin <srankin@motus.com> writes:
> Thanks for the update. The query in question is a pretty simple one - it joins 3 tables, all of which are static -
theydon’t have any writes being done against them.  They have very few rows, and the query plan for them indicates that
theyare all sequential scans.  When doing an EXPLAIN ANALYZE, the delay is not consistently on one table, it can vary
betweenthe three tables involved in the query. 

No writes at all?  That's pretty odd then.  All the likely explanations
involve autovacuum or other forms of deferred maintenance.

A possible theory is that the slow cases represent times when the desired
page is not in cache, but you'd have to have a seriously overloaded disk
subsystem for a disk fetch to take hundreds of ms.  Unless maybe this is
running on some cloud service with totally unspecified I/O bandwidth?

> I will look at changing the deadlock_timeout, but that might have to wait for the weekend since this is a production
system.

You needn't restart the server for that, just edit postgresql.conf and
SIGHUP the postmaster.

            regards, tom lane


Re: Queries intermittently slow

От
Scott Rankin
Дата:
On 1/6/16, 10:38 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

>
>A possible theory is that the slow cases represent times when the desired
>page is not in cache, but you'd have to have a seriously overloaded disk
>subsystem for a disk fetch to take hundreds of ms.  Unless maybe this is
>running on some cloud service with totally unspecified I/O bandwidth?

This intrigues me. We are running on a, shall we say, less than name-brand cloud provider at the moment (transitioning
toAWS later this month).  Is there a reasonably straightforward way of confirming this hypothesis? We have had many
performanceissues with this vendor in the past, so I wouldn’t be surprised.
 

>
>> I will look at changing the deadlock_timeout, but that might have to wait for the weekend since this is a production
system.
>
>You needn't restart the server for that, just edit postgresql.conf and
>SIGHUP the postmaster.

Yep, we just try to limit any database changes until off-hours unless it’s an emergency.

>
>regards, tom lane

Thanks,
Scott
This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later
designateas confidential and proprietary. It is intended only for use of the individual or entity named above and
shouldnot be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be
usedfor any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the
readerof this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended
recipient,you are hereby notified that any dissemination, distribution, or copying of this communication is strictly
prohibited.If you have received this communication in error, please notify sender immediately and destroy the original
message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent
thepreceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is
notintended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of
avoidingFederal tax penalties, and was not written to support the promotion or marketing of any transaction or matter
discussedherein.
 

Re: Queries intermittently slow

От
Tom Lane
Дата:
Scott Rankin <srankin@motus.com> writes:
> On 1/6/16, 10:38 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> A possible theory is that the slow cases represent times when the desired
>> page is not in cache, but you'd have to have a seriously overloaded disk
>> subsystem for a disk fetch to take hundreds of ms.  Unless maybe this is
>> running on some cloud service with totally unspecified I/O bandwidth?

> This intrigues me. We are running on a, shall we say, less than name-brand cloud provider at the moment
(transitioningto AWS later this month).  Is there a reasonably straightforward way of confirming this hypothesis? We
havehad many performance issues with this vendor in the past, so I wouldn’t be surprised. 

Hm, well, given that you are able to capture instances of the behavior
in EXPLAIN ANALYZE, I'd suggest trying EXPLAIN (ANALYZE,BUFFERS).
That will tell you the number of pages it found in shared buffers vs.
having to read them.  Now, a "read" just means we had to ask the kernel,
not necessarily that the page came all the way from disk; if it's in
the kernel's disk cache that won't be very much slower than a shared-
buffers hit.  Still, if the slowdowns are reliably seen only when a read
occurred, I'd say that's good evidence.

            regards, tom lane


Re: Queries intermittently slow

От
Scott Rankin
Дата:
>
>Hm, well, given that you are able to capture instances of the behavior
>in EXPLAIN ANALYZE, I'd suggest trying EXPLAIN (ANALYZE,BUFFERS).
>That will tell you the number of pages it found in shared buffers vs.
>having to read them.  Now, a "read" just means we had to ask the kernel,
>not necessarily that the page came all the way from disk; if it's in
>the kernel's disk cache that won't be very much slower than a shared-
>buffers hit.  Still, if the slowdowns are reliably seen only when a read
>occurred, I'd say that's good evidence.

Well, that was a good thought - but:

Nested Loop  (cost=0.00..4.20 rows=1 width=185) (actual time=567.113..567.113 rows=0 loops=1)
  Join Filter: (be.branding_id = b.branding_id)
  Rows Removed by Join Filter: 1
  Buffers: shared hit=4
  ->  Nested Loop  (cost=0.00..3.19 rows=1 width=189) (actual time=567.081..567.092 rows=1 loops=1)
        Join Filter: (s.setting_id = be.setting_id)
        Rows Removed by Join Filter: 41
        Buffers: shared hit=3
        ->  Seq Scan on branding_extension be  (cost=0.00..1.00 rows=1 width=8) (actual time=0.011..0.012 rows=1
loops=1)
              Buffers: shared hit=1
        ->  Seq Scan on setting s  (cost=0.00..2.04 rows=42 width=185) (actual time=567.041..567.049 rows=42 loops=1)
              Buffers: shared hit=2
  ->  Seq Scan on branding b  (cost=0.00..1.01 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)
        Filter: ((name)::text = 'crs'::text)
        Rows Removed by Filter: 1
        Buffers: shared hit=1
Total runtime: 567.185 ms


I ran it several times and there were no reads, just shared hits, even on slow executions.

It looks like it didn’t go out to the kernel for the data. I guess next steps would be to adjust the deadlock_timeout?
IfI’m reading the docs correctly, this doesn’t change how Postgres determines a deadlock, it just makes it check for
themsooner?
 

Thanks,
Scott
This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later
designateas confidential and proprietary. It is intended only for use of the individual or entity named above and
shouldnot be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be
usedfor any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the
readerof this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended
recipient,you are hereby notified that any dissemination, distribution, or copying of this communication is strictly
prohibited.If you have received this communication in error, please notify sender immediately and destroy the original
message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent
thepreceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is
notintended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of
avoidingFederal tax penalties, and was not written to support the promotion or marketing of any transaction or matter
discussedherein.
 

Re: Queries intermittently slow

От
Tom Lane
Дата:
I wrote:
> A possible theory is that the slow cases represent times when the desired
> page is not in cache, but you'd have to have a seriously overloaded disk
> subsystem for a disk fetch to take hundreds of ms.  Unless maybe this is
> running on some cloud service with totally unspecified I/O bandwidth?

BTW, a glaring flaw in that theory is that if this query is touching only
about four pages worth of data, and you are running it ten times a second,
how in heck would that data ever fall out of shared buffer cache at all?
Your working set across your whole DB would have to be enormously more
than your 8GB shared_buffers setting for that to possibly happen.

So what seems more likely after more thought is that the pages are staying
in our shared buffer arena just fine, but the kernel is randomly choosing
to swap out parts of the arena, and the delays correspond to swap-in
waits.  (There would still have to be a mighty crummy disk subsystem
underlying things for swap-in to take so long, but this is a more
plausible theory for exactly what's invoking the disk read.)

Postgres can't directly see when this is happening, but you could try
watching "iostat 1" and noticing whether swap-in events seem to be
correlated with the slow queries.

If this is the problem, then the answer is to reduce the pressure on
system memory so that swap-outs are less likely.  You might find that
a smaller shared_buffer arena is a good thing (so that all of it stays
"hot" and unswappable from the kernel's perspective).  Or reduce the
number of active backend processes.

            regards, tom lane


Re: Queries intermittently slow

От
Scott Rankin
Дата:
>
>So what seems more likely after more thought is that the pages are staying
>in our shared buffer arena just fine, but the kernel is randomly choosing
>to swap out parts of the arena, and the delays correspond to swap-in
>waits.  (There would still have to be a mighty crummy disk subsystem
>underlying things for swap-in to take so long, but this is a more
>plausible theory for exactly what's invoking the disk read.)
>
>Postgres can't directly see when this is happening, but you could try
>watching "iostat 1" and noticing whether swap-in events seem to be
>correlated with the slow queries.
>
>If this is the problem, then the answer is to reduce the pressure on
>system memory so that swap-outs are less likely.  You might find that
>a smaller shared_buffer arena is a good thing (so that all of it stays
>"hot" and unswappable from the kernel's perspective).  Or reduce the
>number of active backend processes.
>
>regards, tom lane

One change to my setup that my server guy pointed out - this database server is *not* a cloud server, but in fact a
physicalDell box with SSDs.  So unless something is majorly wrong with our disk setup, disk IO should not be a factor.
 

I also checked and the box is not swapping at all.

I guess we’re back to lock contention?

Thanks,
Scott
This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later
designateas confidential and proprietary. It is intended only for use of the individual or entity named above and
shouldnot be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be
usedfor any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the
readerof this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended
recipient,you are hereby notified that any dissemination, distribution, or copying of this communication is strictly
prohibited.If you have received this communication in error, please notify sender immediately and destroy the original
message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent
thepreceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is
notintended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of
avoidingFederal tax penalties, and was not written to support the promotion or marketing of any transaction or matter
discussedherein.
 

Re: Queries intermittently slow

От
Jim Nasby
Дата:
On 1/6/16 12:01 PM, Scott Rankin wrote:
> I guess we’re back to lock contention?

Is there by chance an anti-wraparound vacuum happening on that table?

Actually, for that matter... if autovacuum is hitting that table it's
locking could be causing problems, and it won't release it's locks until
the deadlock detector kicks in.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Queries intermittently slow

От
Scott Rankin
Дата:
On 1/6/16, 11:14 PM, "Jim Nasby" <Jim.Nasby@BlueTreble.com> wrote:


>On 1/6/16 12:01 PM, Scott Rankin wrote:
>> I guess we’re back to lock contention?
>
>Is there by chance an anti-wraparound vacuum happening on that table?
>
>Actually, for that matter... if autovacuum is hitting that table it's
>locking could be causing problems, and it won't release it's locks until
>the deadlock detector kicks in.
>--
>Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
>Experts in Analytics, Data Architecture and PostgreSQL
>Data in Trouble? Get it in Treble! http://BlueTreble.com

Hi Jim, Tom,

Thanks both of you for your help, by the way.

I lowered the deadlock_timeout this morning to 100ms and ran for a little while, but I did not see any lock waits even
whenthe statements ran slowly.
 

Jim, to your point, I enabled auto vacuum logging (log_autovacuum_min_duration=0) but no vacuums are taking place even
whenI see slow queries.
 

Any other ideas?

Thanks,
Scott
This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later
designateas confidential and proprietary. It is intended only for use of the individual or entity named above and
shouldnot be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be
usedfor any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the
readerof this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended
recipient,you are hereby notified that any dissemination, distribution, or copying of this communication is strictly
prohibited.If you have received this communication in error, please notify sender immediately and destroy the original
message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent
thepreceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is
notintended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of
avoidingFederal tax penalties, and was not written to support the promotion or marketing of any transaction or matter
discussedherein.
 

Re: Queries intermittently slow

От
Tom Lane
Дата:
Scott Rankin <srankin@motus.com> writes:
> Any other ideas?

If there's no lock waits, and the tables are static, it's difficult to
arrive at any other conclusion but that the process is simply losing the
CPU for long intervals.  What else is going on on this box?  Have you
made any attempt to correlate the slow queries with spikes in load
average, swap activity, etc?

            regards, tom lane


Re: Queries intermittently slow

От
Scott Rankin
Дата:
On 1/7/16, 10:19 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

>Scott Rankin <srankin@motus.com> writes:
>> Any other ideas?
>
>If there's no lock waits, and the tables are static, it's difficult to
>arrive at any other conclusion but that the process is simply losing the
>CPU for long intervals.  What else is going on on this box?  Have you
>made any attempt to correlate the slow queries with spikes in load
>average, swap activity, etc?
>
>regards, tom lane

This box is a dedicated postgresql box, the load is running around 2-3 (on a 24-core box).  No swapping.

I’ll keep digging!

Thanks,
Scott
This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later
designateas confidential and proprietary. It is intended only for use of the individual or entity named above and
shouldnot be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be
usedfor any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the
readerof this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended
recipient,you are hereby notified that any dissemination, distribution, or copying of this communication is strictly
prohibited.If you have received this communication in error, please notify sender immediately and destroy the original
message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent
thepreceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is
notintended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of
avoidingFederal tax penalties, and was not written to support the promotion or marketing of any transaction or matter
discussedherein.
 

Re: Queries intermittently slow

От
Scott Rankin
Дата:
Winner!  Both of those settings were set to always, and as soon as I turned them off, the query times smoothed right out. 

Thank you all so much for your help! 

- Scott

From: Bob Lunney <bob_lunney@yahoo.com>
Date: Thursday, January 7, 2016 at 12:13 PM
To: Scott Rankin <srankin@motus.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Queries intermittently slow

Scott, 

Is transparent huge pages turned off along with defrag?  Check 

/sys/kernel/mm/transparent_hugepage/enabled
and
/sys/kernel/mm/transparent_hugepage/defrag
Both should be configured with "never".
Bob Lunney

This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be used for any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein.

Re: Queries intermittently slow

От
Rick Otten
Дата:
Is this generally true for all PostgreSQL systems on Linux, or only for specific use cases?

On Thu, Jan 7, 2016 at 12:51 PM, Scott Rankin <srankin@motus.com> wrote:
Winner!  Both of those settings were set to always, and as soon as I turned them off, the query times smoothed right out. 

Thank you all so much for your help! 

- Scott

From: Bob Lunney <bob_lunney@yahoo.com>
Date: Thursday, January 7, 2016 at 12:13 PM
To: Scott Rankin <srankin@motus.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Queries intermittently slow

Scott, 

Is transparent huge pages turned off along with defrag?  Check 

/sys/kernel/mm/transparent_hugepage/enabled
and
/sys/kernel/mm/transparent_hugepage/defrag
Both should be configured with "never".
Bob Lunney

This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be used for any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein.


Re: Queries intermittently slow

От
Tom Lane
Дата:
Rick Otten <rottenwindfish@gmail.com> writes:
> On Thu, Jan 7, 2016 at 12:51 PM, Scott Rankin <srankin@motus.com> wrote:
>> Winner!  Both of those settings were set to always, and as soon as I
>> turned them off, the query times smoothed right out.

> Is this generally true for all PostgreSQL systems on Linux, or only for
> specific use cases?

It's fairly well established that the implementation of transparent
huge pages in Linux kernels from the 2.6-or-so era sucks, and you're
best off turning it off if you care about consistency of performance.
I am not sure whether modern kernels have improved this area.

I think you can get an idea of how big a problem you have by noting
the accumulated runtime of the khugepaged daemon.

(BTW, it would likely be a good thing to collect some current wisdom
in this area and add it to section 17.4 of our docs.)

            regards, tom lane


Re: Queries intermittently slow

От
Andres Freund
Дата:
On 2016-01-07 13:34:51 -0500, Tom Lane wrote:
> It's fairly well established that the implementation of transparent
> huge pages in Linux kernels from the 2.6-or-so era sucks, and you're
> best off turning it off if you care about consistency of performance.

I think the feature wasn't introduced in original 2.6 kernels (3.2 or
so?), but red hat had backported it to their 2.6.32 kernel.


> I am not sure whether modern kernels have improved this area.

I think the problem has largely been solved around 3.16. Around 4.1 I
could still  reproduce problems, but the regressions were only in the
sub 10% range in my test workload.


Andres


Re: Queries intermittently slow

От
Jim Nasby
Дата:
On 1/7/16 1:47 PM, Andres Freund wrote:
> On 2016-01-07 13:34:51 -0500, Tom Lane wrote:
>> It's fairly well established that the implementation of transparent
>> huge pages in Linux kernels from the 2.6-or-so era sucks, and you're
>> best off turning it off if you care about consistency of performance.
>
> I think the feature wasn't introduced in original 2.6 kernels (3.2 or
> so?), but red hat had backported it to their 2.6.32 kernel.
>
>
>> I am not sure whether modern kernels have improved this area.
>
> I think the problem has largely been solved around 3.16. Around 4.1 I
> could still  reproduce problems, but the regressions were only in the
> sub 10% range in my test workload.

BTW, looks like Scott blogged about this along with some nice graphs:
https://sdwr98.wordpress.com/2016/01/07/transparent-huge-pages-or-why-dbas-are-great/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com