Обсуждение: Query help
All,
Not sure what's wrong in below execution plan but at times the query runs for 5 minutes to complete and after a while it runs within a second or two.
Here is explain analyze out of the query.
SELECT OBJECTS.ID,OBJECTS.NAME,OBJECTS.TYPE,OBJECTS.STATUS,OBJECTS.ALTNAME,OBJECTS.DOMAINID,OBJECTS.ASSIGNEDTO,OBJECTS.USER1,OBJECTS.USER2,
OBJECTS.KEY1,OBJECTS.KEY2,OBJECTS.KEY3,OBJECTS.OUI,OBJECTS.PRODCLASS,OBJECTS.STATUS2,OBJECTS.LASTMODIFIED,OBJECTS.LONGDATA,OBJECTS.DATA0,
OBJECTS.DATA1
FROM OBJECTS
WHERE OBJECTS.DOMAINID IN ('HY3XGEzC0E9JxRwoXLOLbjNsghEA','3330000000000000000000000000')
AND OBJECTS.TYPE IN ('cpe')
ORDER BY OBJECTS.LASTMODIFIED DESC LIMIT 501
OBJECTS.KEY1,OBJECTS.KEY2,OBJECTS.KEY3,OBJECTS.OUI,OBJECTS.PRODCLASS,OBJECTS.STATUS2,OBJECTS.LASTMODIFIED,OBJECTS.LONGDATA,OBJECTS.DATA0,
OBJECTS.DATA1
FROM OBJECTS
WHERE OBJECTS.DOMAINID IN ('HY3XGEzC0E9JxRwoXLOLbjNsghEA','3330000000000000000000000000')
AND OBJECTS.TYPE IN ('cpe')
ORDER BY OBJECTS.LASTMODIFIED DESC LIMIT 501
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..9235.11 rows=501 width=912) (actual time=0.396..2741.803 rows=501 loops=1)
-> Index Scan Backward using ix_objects_type_lastmodified on objects (cost=0.00..428372.71 rows=23239 width=912) (actual time=0.394..2741.608 rows=501 loops=1)
Index Cond: (("type")::text = 'cpe'::text)
Filter: ((domainid)::text = ANY (('{HY3XGEzC0E9JxRwoXLOLbjNsghEA,3330000000000000000000000000}'::character varying[])::text[]))
Total runtime: 2742.126 ms
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..9235.11 rows=501 width=912) (actual time=0.396..2741.803 rows=501 loops=1)
-> Index Scan Backward using ix_objects_type_lastmodified on objects (cost=0.00..428372.71 rows=23239 width=912) (actual time=0.394..2741.608 rows=501 loops=1)
Index Cond: (("type")::text = 'cpe'::text)
Filter: ((domainid)::text = ANY (('{HY3XGEzC0E9JxRwoXLOLbjNsghEA,3330000000000000000000000000}'::character varying[])::text[]))
Total runtime: 2742.126 ms
The table is auto vaccumed regularly. I have enabled log_min_messages to debug2 but nothing stands out during the times when the query took 5+ minutes. Is rebuild of the index necessary here.
Thanks in Advance,
Stalin
Pg 8.2.7, Sol10.
"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > Not sure what's wrong in below execution plan but at times the query > runs for 5 minutes to complete and after a while it runs within a > second or two. The plan doesn't look entirely unreasonable for the given query, although it's hard to be sure of that without seeing the table definitions. Given the plan, the times look to be about what I'd expect for uncached and cached timings. (That is, on subsequent runs, the data is sitting in RAM, so you don't need to access the hard drives.) If the initial run time is unacceptable for your environment, and there's no way to have the cached "primed" when it matters, please give more details on your table layouts, and perhaps someone can make a useful suggestion. > Pg 8.2.7, Sol10. One quick suggestion -- upgrade your PostgreSQL version if at all possible. The latest bug-fix version of 8.2 is currently 8.2.13, and there are significant performance improvements in 8.3 and the newly released 8.4. -Kevin
Sure I can provide those details. I have seen this query running 5+ minutes for different values for doaminID too. Its just that it happens at random and gets fixed within few mins. Shared buffer=8G, effective cache size=4G. Optimizer/autovaccum settings are defaults relname | relpages | reltuples ------------------------------+----------+----------- ct_objects_id_u1 | 11906 | 671919 ix_objects_altname | 13327 | 671919 ix_objects_domainid_name | 24714 | 671919 ix_objects_key3 | 9891 | 671919 ix_objects_name | 11807 | 671919 ix_objects_type_lastmodified | 38640 | 671919 ix_objects_user1 | 20796 | 671919 ix_objects_user2 | 20842 | 671919 objects | 111873 | 671919 This database resides on a RAID 1+0 storage with 10 disks (5+5). Let me know if you need any other information. Thanks Kevin. Stalin -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Monday, August 03, 2009 12:48 PM To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query help "Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > Not sure what's wrong in below execution plan but at times the query > runs for 5 minutes to complete and after a while it runs within a > second or two. The plan doesn't look entirely unreasonable for the given query, although it's hard to be sure of that without seeing the table definitions. Given the plan, the times look to be about what I'd expect for uncached and cached timings. (That is, on subsequent runs, the data is sitting in RAM, so you don't need to access the hard drives.) If the initial run time is unacceptable for your environment, and there's no way to have the cached "primed" when it matters, please give more details on your table layouts, and perhaps someone can make a useful suggestion. > Pg 8.2.7, Sol10. One quick suggestion -- upgrade your PostgreSQL version if at all possible. The latest bug-fix version of 8.2 is currently 8.2.13, and there are significant performance improvements in 8.3 and the newly released 8.4. -Kevin
"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > Shared buffer=8G, effective cache size=4G. That is odd; if your shared buffers are at 8G, you must have more than 4G of cache. How much RAM is used for cache at the OS level? Normally you would add that to the shared buffers to get your effective cache size, or at least take the larger of the two. How much RAM is on this machine in total? Do you have any other processes which use a lot of RAM or might access a lot of disk from time to time? > Let me know if you need any other information. The \d output for the object table, or the CREATE for it and its indexes, would be good. Since it's getting through the random reads by the current plan at the rate of about one every 5ms, I'd say your drive array is OK. If you want to make this query faster you've either got to have the data in cache or it has to have reason to believe that a different plan is faster. One thing which might help is to boost your work_mem setting to somewhere in the 32MB to 64MB range, provided that won't drive you into swapping. You could also try dropping the random_page_cost to maybe 2 to see if that gets you a different plan. You can do a quick check on what plans these generate by changing them on a given connection and then requesting just an EXPLAIN of the plan, to see if it's different. (This doesn't actually run the query, so it's fast.) -Kevin
Server has 32G memory and it's a dedicated to run PG and no other application is sharing this database. I have checked checkpoints and they don't occur during those slow query runtimes. Checkpoint_segments is set 128. here is quick snap from vmstat. # vmstat 5 5 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr 1m 1m 1m m1 in sy cs us sy id 0 0 0 56466032 25908072 59 94 516 13 13 0 0 10 3 59 1 480 443 500 1 1 98 0 0 0 51377520 20294328 6 8 0 32 32 0 0 0 4 1 0 368 185 361 0 1 99 0 0 0 56466032 25908064 59 94 516 13 13 0 0 1 10 3 59 480 443 500 1 1 98 0 0 0 51376984 20294168 57 427 0 16 16 0 0 0 0 1 0 380 781 396 1 1 98 0 0 0 51376792 20294208 112 1131 2 50 50 0 0 0 0 5 2 398 2210 541 4 3 92 \d output -- Table "public.objects" Column | Type | Modifiers --------------+-----------------------------+----------- id | character varying(28) | not null name | character varying(50) | not null altname | character varying(50) | type | character varying(3) | domainid | character varying(28) | not null status | smallint | dbver | integer | created | timestamp without time zone | lastmodified | timestamp without time zone | assignedto | character varying(28) | status2 | smallint | key1 | character varying(25) | key2 | character varying(25) | key3 | character varying(64) | oui | character varying(6) | prodclass | character varying(64) | user1 | character varying(50) | user2 | character varying(50) | data0 | character varying(2000) | data1 | character varying(2000) | longdata | character varying(1) | Indexes: "ct_objects_id_u1" PRIMARY KEY, btree (id), tablespace "nbbs_index_data" "ix_objects_altname" btree (altname), tablespace "nbbs_index_data" "ix_objects_domainid_name" btree (domainid, upper(name::text)), tablespace "nbbs_index_data" "ix_objects_key3" btree (upper(key3::text)), tablespace "nbbs_index_data" "ix_objects_name" btree (upper(name::text) varchar_pattern_ops), tablespace "nbbs_index_data" "ix_objects_type_lastmodified" btree ("type", lastmodified), tablespace "nbbs_index_data" "ix_objects_user1" btree (upper(user1::text)), tablespace "nbbs_index_data" "ix_objects_user2" btree (upper(user2::text)), tablespace "nbbs_index_data" Work_mem=64mb, r_p_c = 2 on the session gave similar execution plan except the cost different due to change r_p_c. QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------- Limit (cost=0.00..5456.11 rows=501 width=912) -> Index Scan Backward using ix_objects_type_lastmodified on objects (cost=0.00..253083.03 rows=23239 width=912) Index Cond: (("type")::text = 'cpe'::text) Filter: ((domainid)::text = ANY (('{HY3XGEzC0E9JxRwoXLOLbjNsghEA,3330000000000000000000000000}'::charact er varying[])::text[])) (4 rows) Given the nature of the ix_objects_type_lastmodified index, wondering if the index requires rebuilt. I tested rebuilding it in another db, and it came to 2500 pages as opposed to 38640 pages. The puzzle being why the same query with same filters, runs most of times faster but at times runs 5+ mintues and it switches back to fast mode. If it had used a different execution plan than the above, how do I list all execution plans executed for a given SQL. Thanks, Stalin -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Monday, August 03, 2009 1:45 PM To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org Subject: RE: [PERFORM] Query help "Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > Shared buffer=8G, effective cache size=4G. That is odd; if your shared buffers are at 8G, you must have more than 4G of cache. How much RAM is used for cache at the OS level? Normally you would add that to the shared buffers to get your effective cache size, or at least take the larger of the two. How much RAM is on this machine in total? Do you have any other processes which use a lot of RAM or might access a lot of disk from time to time? > Let me know if you need any other information. The \d output for the object table, or the CREATE for it and its indexes, would be good. Since it's getting through the random reads by the current plan at the rate of about one every 5ms, I'd say your drive array is OK. If you want to make this query faster you've either got to have the data in cache or it has to have reason to believe that a different plan is faster. One thing which might help is to boost your work_mem setting to somewhere in the 32MB to 64MB range, provided that won't drive you into swapping. You could also try dropping the random_page_cost to maybe 2 to see if that gets you a different plan. You can do a quick check on what plans these generate by changing them on a given connection and then requesting just an EXPLAIN of the plan, to see if it's different. (This doesn't actually run the query, so it's fast.) -Kevin
"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > Server has 32G memory and it's a dedicated to run PG and no other > application is sharing this database. It's not likely to help with this particular problem, but it's generally best to start from a position of letting the optimizer know what it's really got for resources. An effective cache size of somewhere around 30GB would probably be best here. > Given the nature of the ix_objects_type_lastmodified index, > wondering if the index requires rebuilt. I tested rebuilding it in > another db, and it came to 2500 pages as opposed to 38640 pages. That's pretty serious bloat. Any idea how that happened? Have you had long running database transaction which might have prevented normal maintenance from working? If not, you may need more aggressive settings for autovacuum. Anyway, sure, try this with the index rebuilt. If you don't want downtime, use CREATE INDEX CONCURRENTLY and then drop the old index. (You could then rename the new index to match the old, if needed.) > The puzzle being why the same query with same filters, runs most of > times faster but at times runs 5+ mintues and it switches back to > fast mode. It is likely either that something has pushed the relevant data out of cache before the slow runs, or there is blocking. How big is this database? Can you get a list of pg_stat_activity and pg_locks during an episode of slow run time? > If it had used a different execution plan than the above, how do I > list all execution plans executed for a given SQL. It's unlikely that the slow runs are because of a different plan being chosen. I was wondering if a better plan might be available, but this one looks pretty good with your current indexes. I can think of an indexing change or two which *might* cause the optimizer to pick a different plan, but that is far from certain, and without knowing the cause of the occasional slow runs, it's hard to be sure that the new plan wouldn't get stalled for the same reasons. If it's possible to gather more data during an episode of a slow run, particularly the pg_stat_activity and pg_locks lists, run as the database superuser, it would help pin down the cause. A vmstat during such an episode, to compare to a "normal" one, might also be instructive. -Kevin
Thanks for the response kevin. DB size is about 30G. Bloat could have been due to recent load testing that was done. Autovaccum wasn't aggressive enough to catch up with load testing. I will rebuild those indexes if possible reload the table itself as they are bloated too. Sure I will collect necessary stats on the next occurrence of the slow query. Stalin -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Tuesday, August 04, 2009 8:57 AM To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org Subject: RE: [PERFORM] Query help "Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > Server has 32G memory and it's a dedicated to run PG and no other > application is sharing this database. It's not likely to help with this particular problem, but it's generally best to start from a position of letting the optimizer know what it's really got for resources. An effective cache size of somewhere around 30GB would probably be best here. > Given the nature of the ix_objects_type_lastmodified index, wondering > if the index requires rebuilt. I tested rebuilding it in another db, > and it came to 2500 pages as opposed to 38640 pages. That's pretty serious bloat. Any idea how that happened? Have you had long running database transaction which might have prevented normal maintenance from working? If not, you may need more aggressive settings for autovacuum. Anyway, sure, try this with the index rebuilt. If you don't want downtime, use CREATE INDEX CONCURRENTLY and then drop the old index. (You could then rename the new index to match the old, if needed.) > The puzzle being why the same query with same filters, runs most of > times faster but at times runs 5+ mintues and it switches back to fast > mode. It is likely either that something has pushed the relevant data out of cache before the slow runs, or there is blocking. How big is this database? Can you get a list of pg_stat_activity and pg_locks during an episode of slow run time? > If it had used a different execution plan than the above, how do I > list all execution plans executed for a given SQL. It's unlikely that the slow runs are because of a different plan being chosen. I was wondering if a better plan might be available, but this one looks pretty good with your current indexes. I can think of an indexing change or two which *might* cause the optimizer to pick a different plan, but that is far from certain, and without knowing the cause of the occasional slow runs, it's hard to be sure that the new plan wouldn't get stalled for the same reasons. If it's possible to gather more data during an episode of a slow run, particularly the pg_stat_activity and pg_locks lists, run as the database superuser, it would help pin down the cause. A vmstat during such an episode, to compare to a "normal" one, might also be instructive. -Kevin
We have found the problem. Apparently there was a query doing count on 45 million rows table run prior to the episode of slow query. Definitely cached data is pushed out the memory. Is there way to assign portion of memory to recycling purposes like in oracle, so the cached data doesn't get affected by queries like these. Stalin -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Tuesday, August 04, 2009 8:57 AM To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org Subject: RE: [PERFORM] Query help "Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > Server has 32G memory and it's a dedicated to run PG and no other > application is sharing this database. It's not likely to help with this particular problem, but it's generally best to start from a position of letting the optimizer know what it's really got for resources. An effective cache size of somewhere around 30GB would probably be best here. > Given the nature of the ix_objects_type_lastmodified index, wondering > if the index requires rebuilt. I tested rebuilding it in another db, > and it came to 2500 pages as opposed to 38640 pages. That's pretty serious bloat. Any idea how that happened? Have you had long running database transaction which might have prevented normal maintenance from working? If not, you may need more aggressive settings for autovacuum. Anyway, sure, try this with the index rebuilt. If you don't want downtime, use CREATE INDEX CONCURRENTLY and then drop the old index. (You could then rename the new index to match the old, if needed.) > The puzzle being why the same query with same filters, runs most of > times faster but at times runs 5+ mintues and it switches back to fast > mode. It is likely either that something has pushed the relevant data out of cache before the slow runs, or there is blocking. How big is this database? Can you get a list of pg_stat_activity and pg_locks during an episode of slow run time? > If it had used a different execution plan than the above, how do I > list all execution plans executed for a given SQL. It's unlikely that the slow runs are because of a different plan being chosen. I was wondering if a better plan might be available, but this one looks pretty good with your current indexes. I can think of an indexing change or two which *might* cause the optimizer to pick a different plan, but that is far from certain, and without knowing the cause of the occasional slow runs, it's hard to be sure that the new plan wouldn't get stalled for the same reasons. If it's possible to gather more data during an episode of a slow run, particularly the pg_stat_activity and pg_locks lists, run as the database superuser, it would help pin down the cause. A vmstat during such an episode, to compare to a "normal" one, might also be instructive. -Kevin
On 8/5/09 12:16 PM, "Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > We have found the problem. Apparently there was a query doing count on > 45 million rows table run prior to the episode of slow query. Definitely > cached data is pushed out the memory. Is there way to assign portion of > memory to recycling purposes like in oracle, so the cached data doesn't > get affected by queries like these. > > Stalin In Postgres 8.3 and above, large sequential scans don't evict other things from shared_buffers. But they can push things out of the OS page cache. > > -----Original Message----- > From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] > Sent: Tuesday, August 04, 2009 8:57 AM > To: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org > Subject: RE: [PERFORM] Query help > > "Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > >> Server has 32G memory and it's a dedicated to run PG and no other >> application is sharing this database. > > It's not likely to help with this particular problem, but it's generally > best to start from a position of letting the optimizer know what it's > really got for resources. An effective cache size of somewhere around > 30GB would probably be best here. > >> Given the nature of the ix_objects_type_lastmodified index, wondering >> if the index requires rebuilt. I tested rebuilding it in another db, >> and it came to 2500 pages as opposed to 38640 pages. > > That's pretty serious bloat. Any idea how that happened? Have you had > long running database transaction which might have prevented normal > maintenance from working? If not, you may need more aggressive settings > for autovacuum. Anyway, sure, try this with the index rebuilt. If you > don't want downtime, use CREATE INDEX CONCURRENTLY and then drop the old > index. (You could then rename the new index to match the old, if > needed.) > >> The puzzle being why the same query with same filters, runs most of >> times faster but at times runs 5+ mintues and it switches back to fast > >> mode. > > It is likely either that something has pushed the relevant data out of > cache before the slow runs, or there is blocking. How big is this > database? Can you get a list of pg_stat_activity and pg_locks during an > episode of slow run time? > >> If it had used a different execution plan than the above, how do I >> list all execution plans executed for a given SQL. > > It's unlikely that the slow runs are because of a different plan being > chosen. I was wondering if a better plan might be available, but this > one looks pretty good with your current indexes. I can think of an > indexing change or two which *might* cause the optimizer to pick a > different plan, but that is far from certain, and without knowing the > cause of the occasional slow runs, it's hard to be sure that the new > plan wouldn't get stalled for the same reasons. > > If it's possible to gather more data during an episode of a slow run, > particularly the pg_stat_activity and pg_locks lists, run as the > database superuser, it would help pin down the cause. A vmstat during > such an episode, to compare to a "normal" one, might also be > instructive. > > -Kevin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
"Subbiah Stalin-XCGF84" <SSubbiah@Motorola.com> wrote: > We have found the problem. Great news! > Apparently there was a query doing count on 45 million rows table > run prior to the episode of slow query. Definitely cached data is > pushed out the memory. Yeah, that would completely explain your symptoms. > Is there way to assign portion of memory to recycling purposes like > in oracle, so the cached data doesn't get affected by queries like > these. Right now you have 8GB in shared buffers and 22GB in OS cache. You could try playing with that ratio, but benefit there would be iffy. To solve this particular problem, you might want to see if they actually need an exact count, versus a reasonable estimate. The estimated tuple count from the latest statistics run is often close enough (and a lot faster). -Kevin