Обсуждение: Query help

Поиск

Список

Период

Сортировка

Query help

От

"Subbiah Stalin-XCGF84"

Дата:

03 августа 2009 г., 17:17:40

All,

Not sure what's wrong in below execution plan but at times the query runs for 5 minutes to complete and after a while it runs within a second or two.

Here is explain analyze out of the query.

SELECT OBJECTS.ID,OBJECTS.NAME,OBJECTS.TYPE,OBJECTS.STATUS,OBJECTS.ALTNAME,OBJECTS.DOMAINID,OBJECTS.ASSIGNEDTO,OBJECTS.USER1,OBJECTS.USER2,
OBJECTS.KEY1,OBJECTS.KEY2,OBJECTS.KEY3,OBJECTS.OUI,OBJECTS.PRODCLASS,OBJECTS.STATUS2,OBJECTS.LASTMODIFIED,OBJECTS.LONGDATA,OBJECTS.DATA0,
OBJECTS.DATA1
FROM OBJECTS
WHERE OBJECTS.DOMAINID IN ('HY3XGEzC0E9JxRwoXLOLbjNsghEA','3330000000000000000000000000')
AND OBJECTS.TYPE IN ('cpe')
ORDER BY OBJECTS.LASTMODIFIED DESC LIMIT 501

                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..9235.11 rows=501 width=912) (actual time=0.396..2741.803 rows=501 loops=1)
   -> Index Scan Backward using ix_objects_type_lastmodified on objects (cost=0.00..428372.71 rows=23239 width=912) (actual time=0.394..2741.608 rows=501 loops=1)
         Index Cond: (("type")::text = 'cpe'::text)
         Filter: ((domainid)::text = ANY (('{HY3XGEzC0E9JxRwoXLOLbjNsghEA,3330000000000000000000000000}'::character varying[])::text[]))
Total runtime: 2742.126 ms

The table is auto vaccumed regularly. I have enabled log_min_messages to debug2 but nothing stands out during the times when the query took 5+ minutes. Is rebuild of the index necessary here.

Thanks in Advance,

Stalin

Pg 8.2.7, Sol10.

Re: Query help

От

"Kevin Grittner"

Дата:

03 августа 2009 г., 19:49:00

"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> Not sure what's wrong in below execution plan but at times the query
> runs for 5 minutes to complete and after a while it runs within a
> second or two.

The plan doesn't look entirely unreasonable for the given query,
although it's hard to be sure of that without seeing the table
definitions.  Given the plan, the times look to be about what I'd
expect for uncached and cached timings.  (That is, on subsequent runs,
the data is sitting in RAM, so you don't need to access the hard
drives.)

If the initial run time is unacceptable for your environment, and
there's no way to have the cached "primed" when it matters, please
give more details on your table layouts, and perhaps someone can make
a useful suggestion.

> Pg 8.2.7, Sol10.

One quick suggestion -- upgrade your PostgreSQL version if at all
possible.  The latest bug-fix version of 8.2 is currently 8.2.13, and
there are significant performance improvements in 8.3 and the newly
released 8.4.

-Kevin

Re: Query help

От

"Subbiah Stalin-XCGF84"

Дата:

03 августа 2009 г., 20:20:06

Sure I can provide those details. I have seen this query running 5+
minutes for different values for doaminID too. Its just that it happens
at random and gets fixed within few mins.

Shared buffer=8G, effective cache size=4G. Optimizer/autovaccum settings
are defaults

           relname            | relpages | reltuples
------------------------------+----------+-----------
 ct_objects_id_u1             |    11906 |    671919
 ix_objects_altname           |    13327 |    671919
 ix_objects_domainid_name     |    24714 |    671919
 ix_objects_key3              |     9891 |    671919
 ix_objects_name              |    11807 |    671919
 ix_objects_type_lastmodified |    38640 |    671919
 ix_objects_user1             |    20796 |    671919
 ix_objects_user2             |    20842 |    671919
 objects                      |   111873 |    671919

This database resides on a RAID 1+0 storage with 10 disks (5+5).

Let me know if you need any other information.

Thanks Kevin.

Stalin

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Monday, August 03, 2009 12:48 PM
To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query help

"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> Not sure what's wrong in below execution plan but at times the query
> runs for 5 minutes to complete and after a while it runs within a
> second or two.

The plan doesn't look entirely unreasonable for the given query,
although it's hard to be sure of that without seeing the table
definitions.  Given the plan, the times look to be about what I'd expect
for uncached and cached timings.  (That is, on subsequent runs, the data
is sitting in RAM, so you don't need to access the hard
drives.)

If the initial run time is unacceptable for your environment, and
there's no way to have the cached "primed" when it matters, please give
more details on your table layouts, and perhaps someone can make a
useful suggestion.

> Pg 8.2.7, Sol10.

One quick suggestion -- upgrade your PostgreSQL version if at all
possible.  The latest bug-fix version of 8.2 is currently 8.2.13, and
there are significant performance improvements in 8.3 and the newly
released 8.4.

-Kevin

Re: Query help

От

"Kevin Grittner"

Дата:

03 августа 2009 г., 20:45:15

"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> Shared buffer=8G, effective cache size=4G.

That is odd; if your shared buffers are at 8G, you must have more than
4G of cache.  How much RAM is used for cache at the OS level?
Normally you would add that to the shared buffers to get your
effective cache size, or at least take the larger of the two.

How much RAM is on this machine in total?  Do you have any other
processes which use a lot of RAM or might access a lot of disk from
time to time?

> Let me know if you need any other information.

The \d output for the object table, or the CREATE for it and its
indexes, would be good.  Since it's getting through the random reads
by the current plan at the rate of about one every 5ms, I'd say your
drive array is OK.  If you want to make this query faster you've
either got to have the data in cache or it has to have reason to
believe that a different plan is faster.

One thing which might help is to boost your work_mem setting to
somewhere in the 32MB to 64MB range, provided that won't drive you
into swapping.  You could also try dropping the random_page_cost to
maybe 2 to see if that gets you a different plan.  You can do a quick
check on what plans these generate by changing them on a given
connection and then requesting just an EXPLAIN of the plan, to see if
it's different.  (This doesn't actually run the query, so it's fast.)

-Kevin

Re: Query help

От

"Subbiah Stalin-XCGF84"

Дата:

03 августа 2009 г., 21:18:09

Server has 32G memory and it's a dedicated to run PG and no other
application is sharing this database. I have checked checkpoints and
they don't occur during those slow query runtimes. Checkpoint_segments
is set 128. here is quick snap from vmstat.

 # vmstat 5 5
 kthr      memory            page            disk          faults
cpu
 r b w   swap  free  re  mf pi po fr de sr 1m 1m 1m m1   in   sy   cs us
sy id
 0 0 0 56466032 25908072 59 94 516 13 13 0 0 10 3 59 1  480  443  500  1
1 98
 0 0 0 51377520 20294328 6 8 0 32 32  0  0  0  4  1  0  368  185  361  0
1 99
 0 0 0 56466032 25908064 59 94 516 13 13 0 0 1 10 3 59  480  443  500  1
1 98
 0 0 0 51376984 20294168 57 427 0 16 16 0 0 0  0  1  0  380  781  396  1
1 98
 0 0 0 51376792 20294208 112 1131 2 50 50 0 0 0 0 5  2  398 2210  541  4
3 92

\d output --

                 Table "public.objects"
    Column    |            Type             | Modifiers
--------------+-----------------------------+-----------
 id           | character varying(28)       | not null
 name         | character varying(50)       | not null
 altname      | character varying(50)       |
 type         | character varying(3)        |
 domainid     | character varying(28)       | not null
 status       | smallint                    |
 dbver        | integer                     |
 created      | timestamp without time zone |
 lastmodified | timestamp without time zone |
 assignedto   | character varying(28)       |
 status2      | smallint                    |
 key1         | character varying(25)       |
 key2         | character varying(25)       |
 key3         | character varying(64)       |
 oui          | character varying(6)        |
 prodclass    | character varying(64)       |
 user1        | character varying(50)       |
 user2        | character varying(50)       |
 data0        | character varying(2000)     |
 data1        | character varying(2000)     |
 longdata     | character varying(1)        |
Indexes:
    "ct_objects_id_u1" PRIMARY KEY, btree (id), tablespace
"nbbs_index_data"
    "ix_objects_altname" btree (altname), tablespace "nbbs_index_data"
    "ix_objects_domainid_name" btree (domainid, upper(name::text)),
tablespace "nbbs_index_data"
    "ix_objects_key3" btree (upper(key3::text)), tablespace
"nbbs_index_data"
    "ix_objects_name" btree (upper(name::text) varchar_pattern_ops),
tablespace "nbbs_index_data"
    "ix_objects_type_lastmodified" btree ("type", lastmodified),
tablespace "nbbs_index_data"
    "ix_objects_user1" btree (upper(user1::text)), tablespace
"nbbs_index_data"
    "ix_objects_user2" btree (upper(user2::text)), tablespace
"nbbs_index_data"

Work_mem=64mb, r_p_c = 2 on the session gave similar execution plan
except the cost different due to change r_p_c.

                                                               QUERY
PLAN
------------------------------------------------------------------------
-----------------------------------------------------------------
 Limit  (cost=0.00..5456.11 rows=501 width=912)
   ->  Index Scan Backward using ix_objects_type_lastmodified on objects
(cost=0.00..253083.03 rows=23239 width=912)
         Index Cond: (("type")::text = 'cpe'::text)
         Filter: ((domainid)::text = ANY
(('{HY3XGEzC0E9JxRwoXLOLbjNsghEA,3330000000000000000000000000}'::charact
er varying[])::text[]))
(4 rows)

Given the nature of the ix_objects_type_lastmodified index, wondering if
the index requires rebuilt. I tested rebuilding it in another db, and it
came to 2500 pages as opposed to 38640 pages.

The puzzle being why the same query with same filters, runs most of
times faster but at times runs 5+ mintues and it switches back to fast
mode. If it had used a different execution plan than the above, how do I
list all execution plans executed for a given SQL.

Thanks,
Stalin

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Monday, August 03, 2009 1:45 PM
To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Query help

"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> Shared buffer=8G, effective cache size=4G.

That is odd; if your shared buffers are at 8G, you must have more than
4G of cache.  How much RAM is used for cache at the OS level?
Normally you would add that to the shared buffers to get your effective
cache size, or at least take the larger of the two.

How much RAM is on this machine in total?  Do you have any other
processes which use a lot of RAM or might access a lot of disk from time
to time?

> Let me know if you need any other information.

The \d output for the object table, or the CREATE for it and its
indexes, would be good.  Since it's getting through the random reads by
the current plan at the rate of about one every 5ms, I'd say your drive
array is OK.  If you want to make this query faster you've either got to
have the data in cache or it has to have reason to believe that a
different plan is faster.

One thing which might help is to boost your work_mem setting to
somewhere in the 32MB to 64MB range, provided that won't drive you into
swapping.  You could also try dropping the random_page_cost to maybe 2
to see if that gets you a different plan.  You can do a quick check on
what plans these generate by changing them on a given connection and
then requesting just an EXPLAIN of the plan, to see if it's different.
(This doesn't actually run the query, so it's fast.)

-Kevin

Re: Query help

От

"Kevin Grittner"

Дата:

04 августа 2009 г., 15:57:04

"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> Server has 32G memory and it's a dedicated to run PG and no other
> application is sharing this database.

It's not likely to help with this particular problem, but it's
generally best to start from a position of letting the optimizer know
what it's really got for resources.  An effective cache size of
somewhere around 30GB would probably be best here.

> Given the nature of the ix_objects_type_lastmodified index,
> wondering if the index requires rebuilt. I tested rebuilding it in
> another db, and it came to 2500 pages as opposed to 38640 pages.

That's pretty serious bloat.  Any idea how that happened?  Have you
had long running database transaction which might have prevented
normal maintenance from working?  If not, you may need more aggressive
settings for autovacuum.  Anyway, sure, try this with the index
rebuilt.  If you don't want downtime, use CREATE INDEX CONCURRENTLY
and then drop the old index.  (You could then rename the new index to
match the old, if needed.)

> The puzzle being why the same query with same filters, runs most of
> times faster but at times runs 5+ mintues and it switches back to
> fast mode.

It is likely either that something has pushed the relevant data out of
cache before the slow runs, or there is blocking.  How big is this
database?  Can you get a list of pg_stat_activity and pg_locks during
an episode of slow run time?

> If it had used a different execution plan than the above, how do I
> list all execution plans executed for a given SQL.

It's unlikely that the slow runs are because of a different plan being
chosen.  I was wondering if a better plan might be available, but this
one looks pretty good with your current indexes.  I can think of an
indexing change or two which *might* cause the optimizer to pick a
different plan, but that is far from certain, and without knowing the
cause of the occasional slow runs, it's hard to be sure that the new
plan wouldn't get stalled for the same reasons.

If it's possible to gather more data during an episode of a slow run,
particularly the pg_stat_activity and pg_locks lists, run as the
database superuser, it would help pin down the cause.  A vmstat during
such an episode, to compare to a "normal" one, might also be
instructive.

-Kevin

Re: Query help

От

"Subbiah Stalin-XCGF84"

Дата:

04 августа 2009 г., 17:14:23

Thanks for the response kevin.

DB size is about 30G. Bloat could have been due to recent load testing
that was done. Autovaccum wasn't aggressive enough to catch up with load
testing. I will rebuild those indexes if possible reload the table
itself as they are bloated too.

Sure I will collect necessary stats on the next occurrence of the slow
query.

Stalin

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Tuesday, August 04, 2009 8:57 AM
To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Query help

"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> Server has 32G memory and it's a dedicated to run PG and no other
> application is sharing this database.

It's not likely to help with this particular problem, but it's generally
best to start from a position of letting the optimizer know what it's
really got for resources.  An effective cache size of somewhere around
30GB would probably be best here.

> Given the nature of the ix_objects_type_lastmodified index, wondering
> if the index requires rebuilt. I tested rebuilding it in another db,
> and it came to 2500 pages as opposed to 38640 pages.

That's pretty serious bloat.  Any idea how that happened?  Have you had
long running database transaction which might have prevented normal
maintenance from working?  If not, you may need more aggressive settings
for autovacuum.  Anyway, sure, try this with the index rebuilt.  If you
don't want downtime, use CREATE INDEX CONCURRENTLY and then drop the old
index.  (You could then rename the new index to match the old, if
needed.)

> The puzzle being why the same query with same filters, runs most of
> times faster but at times runs 5+ mintues and it switches back to fast

> mode.

It is likely either that something has pushed the relevant data out of
cache before the slow runs, or there is blocking.  How big is this
database?  Can you get a list of pg_stat_activity and pg_locks during an
episode of slow run time?

> If it had used a different execution plan than the above, how do I
> list all execution plans executed for a given SQL.

It's unlikely that the slow runs are because of a different plan being
chosen.  I was wondering if a better plan might be available, but this
one looks pretty good with your current indexes.  I can think of an
indexing change or two which *might* cause the optimizer to pick a
different plan, but that is far from certain, and without knowing the
cause of the occasional slow runs, it's hard to be sure that the new
plan wouldn't get stalled for the same reasons.

If it's possible to gather more data during an episode of a slow run,
particularly the pg_stat_activity and pg_locks lists, run as the
database superuser, it would help pin down the cause.  A vmstat during
such an episode, to compare to a "normal" one, might also be
instructive.

-Kevin

Re: Query help

От

"Subbiah Stalin-XCGF84"

Дата:

05 августа 2009 г., 19:17:06

We have found the problem. Apparently there was a query doing count on
45 million rows table run prior to the episode of slow query. Definitely
cached data is pushed out the memory. Is there way to assign portion of
memory to recycling purposes like in oracle, so the cached data doesn't
get affected by queries like these.

Stalin

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Tuesday, August 04, 2009 8:57 AM
To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Query help

"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> Server has 32G memory and it's a dedicated to run PG and no other
> application is sharing this database.

It's not likely to help with this particular problem, but it's generally
best to start from a position of letting the optimizer know what it's
really got for resources.  An effective cache size of somewhere around
30GB would probably be best here.

> Given the nature of the ix_objects_type_lastmodified index, wondering
> if the index requires rebuilt. I tested rebuilding it in another db,
> and it came to 2500 pages as opposed to 38640 pages.

That's pretty serious bloat.  Any idea how that happened?  Have you had
long running database transaction which might have prevented normal
maintenance from working?  If not, you may need more aggressive settings
for autovacuum.  Anyway, sure, try this with the index rebuilt.  If you
don't want downtime, use CREATE INDEX CONCURRENTLY and then drop the old
index.  (You could then rename the new index to match the old, if
needed.)

> The puzzle being why the same query with same filters, runs most of
> times faster but at times runs 5+ mintues and it switches back to fast

> mode.

It is likely either that something has pushed the relevant data out of
cache before the slow runs, or there is blocking.  How big is this
database?  Can you get a list of pg_stat_activity and pg_locks during an
episode of slow run time?

> If it had used a different execution plan than the above, how do I
> list all execution plans executed for a given SQL.

It's unlikely that the slow runs are because of a different plan being
chosen.  I was wondering if a better plan might be available, but this
one looks pretty good with your current indexes.  I can think of an
indexing change or two which *might* cause the optimizer to pick a
different plan, but that is far from certain, and without knowing the
cause of the occasional slow runs, it's hard to be sure that the new
plan wouldn't get stalled for the same reasons.

If it's possible to gather more data during an episode of a slow run,
particularly the pg_stat_activity and pg_locks lists, run as the
database superuser, it would help pin down the cause.  A vmstat during
such an episode, to compare to a "normal" one, might also be
instructive.

-Kevin

Re: Query help

От

Scott Carey

Дата:

05 августа 2009 г., 19:31:17



On 8/5/09 12:16 PM, "Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> We have found the problem. Apparently there was a query doing count on
> 45 million rows table run prior to the episode of slow query. Definitely
> cached data is pushed out the memory. Is there way to assign portion of
> memory to recycling purposes like in oracle, so the cached data doesn't
> get affected by queries like these.
>
> Stalin

In Postgres 8.3 and above, large sequential scans don't evict other things
from shared_buffers.  But they can push things out of the OS page cache.

>
> -----Original Message-----
> From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
> Sent: Tuesday, August 04, 2009 8:57 AM
> To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org
> Subject: RE: [PERFORM] Query help
>
> "Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:
>
>> Server has 32G memory and it's a dedicated to run PG and no other
>> application is sharing this database.
>
> It's not likely to help with this particular problem, but it's generally
> best to start from a position of letting the optimizer know what it's
> really got for resources.  An effective cache size of somewhere around
> 30GB would probably be best here.
>
>> Given the nature of the ix_objects_type_lastmodified index, wondering
>> if the index requires rebuilt. I tested rebuilding it in another db,
>> and it came to 2500 pages as opposed to 38640 pages.
>
> That's pretty serious bloat.  Any idea how that happened?  Have you had
> long running database transaction which might have prevented normal
> maintenance from working?  If not, you may need more aggressive settings
> for autovacuum.  Anyway, sure, try this with the index rebuilt.  If you
> don't want downtime, use CREATE INDEX CONCURRENTLY and then drop the old
> index.  (You could then rename the new index to match the old, if
> needed.)
>
>> The puzzle being why the same query with same filters, runs most of
>> times faster but at times runs 5+ mintues and it switches back to fast
>
>> mode.
>
> It is likely either that something has pushed the relevant data out of
> cache before the slow runs, or there is blocking.  How big is this
> database?  Can you get a list of pg_stat_activity and pg_locks during an
> episode of slow run time?
>
>> If it had used a different execution plan than the above, how do I
>> list all execution plans executed for a given SQL.
>
> It's unlikely that the slow runs are because of a different plan being
> chosen.  I was wondering if a better plan might be available, but this
> one looks pretty good with your current indexes.  I can think of an
> indexing change or two which *might* cause the optimizer to pick a
> different plan, but that is far from certain, and without knowing the
> cause of the occasional slow runs, it's hard to be sure that the new
> plan wouldn't get stalled for the same reasons.
>
> If it's possible to gather more data during an episode of a slow run,
> particularly the pg_stat_activity and pg_locks lists, run as the
> database superuser, it would help pin down the cause.  A vmstat during
> such an episode, to compare to a "normal" one, might also be
> instructive.
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Query help

От

"Kevin Grittner"

Дата:

05 августа 2009 г., 19:42:50

"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote:

> We have found the problem.

Great news!

> Apparently there was a query doing count on 45 million rows table
> run prior to the episode of slow query.  Definitely cached data is
> pushed out the memory.

Yeah, that would completely explain your symptoms.

> Is there way to assign portion of memory to recycling purposes like
> in oracle, so the cached data doesn't get affected by queries like
> these.

Right now you have 8GB in shared buffers and 22GB in OS cache.  You
could try playing with that ratio, but benefit there would be iffy.
To solve this particular problem, you might want to see if they
actually need an exact count, versus a reasonable estimate.  The
estimated tuple count from the latest statistics run is often close
enough (and a lot faster).

-Kevin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Query help