Обсуждение: Specific query performance problem help requested - postgresql 7.4

Поиск
Список
Период
Сортировка

Specific query performance problem help requested - postgresql 7.4

От
"Brad Might"
Дата:
 I am seeing vastly different performance characteristics for almost the
exact same query.
Can someone help me break this down and figure out why the one query
takes so much longer than the other?

Looking at the explain analyze output, I see that the loops value on the
innermost index scan when bucket = 3 is way out of wack with the others.


Here's the query...the only thing that changes from run to run is the
bucket number.

For some strange reason the id and bucket types are bigint although they
do not need to be.

Shared buffers is 48000
sort_mem is 32767

This is on 7.4.2 I'm seeing the same thing on 7.4.7 as well.


explain analyze
 select
t0.filename,
t2.filename as parentname,
t0.st_atime,
t0.size,
t0.ownernameid,
t0.filetypeid,
t0.groupnameid,
t0.groupnameid,
t0.id,
t0.filename
from Nodes_215335885080_1114059806 as t0 inner join
fileftypebkt_215335885080_1114059806 as t1 on t0.id=t1.fileid inner join
dirs_215335885080_1114059806 as t2 on t0.parentnameid=t2.filenameid
where t1.bucket=3 order by t0.filename asc offset 0 limit 25


Here's the bucket distribution..i have clustered the index on the bucket
value.

 bucket |  count
--------+---------
      9 |   13420
      8 |  274053
      7 | 2187261
      6 |    1395
      5 |   45570
      4 | 2218830
      3 |   16940
      2 |  818405
      1 |    4092
(9 rows)


And the explain analyzes for bucket values of 3 7 and 8


QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------------------------------------------------------
 Limit  (cost=0.00..18730.19 rows=25 width=112) (actual
time=89995.190..400863.350 rows=25 loops=1)
   ->  Nested Loop  (cost=0.00..48333634.41 rows=64513 width=112)
(actual time=89995.172..400863.043 rows=25 loops=1)
         ->  Nested Loop  (cost=0.00..47944899.32 rows=64513 width=69)
(actual time=89971.894..400484.701 rows=25 loops=1)
               ->  Index Scan using
xnodes_215335885080_1114059806_filename on nodes_215335885080_1114059806
t0  (cost=0.00..19090075.03 rows=4790475 width=69) (actual
time=0.074..319084.540 rows=713193 loops=1)
               ->  Index Scan using
xfileftypebkt_215335885080_1114059806_fileid on
fileftypebkt_215335885080_1114059806 t1  (cost=0.00..6.01 rows=1
width=8) (actual time=0.101..0.101 rows=0 loops=713193)
                     Index Cond: ("outer".id = t1.fileid)
                     Filter: (bucket = 3)
         ->  Index Scan using xdirs_215335885080_1114059806_filenameid
on dirs_215335885080_1114059806 t2  (cost=0.00..6.01 rows=1 width=59)
(actual time=15.096..15.103 rows=1 loops=25)
               Index Cond: ("outer".parentnameid = t2.filenameid)  Total
runtime: 400863.747 ms (10 rows)



QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------------------------------------------------
 Limit  (cost=0.00..785.15 rows=25 width=112) (actual
time=173.935..552.075 rows=25 loops=1)
   ->  Nested Loop  (cost=0.00..59327691.44 rows=1889045 width=112)
(actual time=173.917..551.763 rows=25 loops=1)
         ->  Nested Loop  (cost=0.00..47944899.32 rows=1889045 width=69)
(actual time=151.198..303.463 rows=25 loops=1)
               ->  Index Scan using
xnodes_215335885080_1114059806_filename on nodes_215335885080_1114059806
t0  (cost=0.00..19090075.03 rows=4790475 width=69) (actual
time=0.225..82.328 rows=6930 loops=1)
               ->  Index Scan using
xfileftypebkt_215335885080_1114059806_fileid on
fileftypebkt_215335885080_1114059806 t1  (cost=0.00..6.01 rows=1
width=8) (actual time=0.019..0.019 rows=0 loops=6930)
                     Index Cond: ("outer".id = t1.fileid)
                     Filter: (bucket = 7)
         ->  Index Scan using xdirs_215335885080_1114059806_filenameid
on dirs_215335885080_1114059806 t2  (cost=0.00..6.01 rows=1 width=59)
(actual time=9.894..9.901 rows=1 loops=25)
               Index Cond: ("outer".parentnameid = t2.filenameid)  Total
runtime: 552.519 ms (10 rows)



QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------------------------------------------------
 Limit  (cost=0.00..18730.19 rows=25 width=112) (actual
time=81.271..330.404 rows=25 loops=1)
   ->  Nested Loop  (cost=0.00..48333634.41 rows=64513 width=112)
(actual time=81.254..330.107 rows=25 loops=1)
         ->  Nested Loop  (cost=0.00..47944899.32 rows=64513 width=69)
(actual time=4.863..8.164 rows=25 loops=1)
               ->  Index Scan using
xnodes_215335885080_1114059806_filename on nodes_215335885080_1114059806
t0  (cost=0.00..19090075.03 rows=4790475 width=69) (actual
time=0.204..2.576 rows=75 loops=1)
               ->  Index Scan using
xfileftypebkt_215335885080_1114059806_fileid on
fileftypebkt_215335885080_1114059806 t1  (cost=0.00..6.01 rows=1
width=8) (actual time=0.054..0.057 rows=0 loops=75)
                     Index Cond: ("outer".id = t1.fileid)
                     Filter: (bucket = 8)
         ->  Index Scan using xdirs_215335885080_1114059806_filenameid
on dirs_215335885080_1114059806 t2  (cost=0.00..6.01 rows=1 width=59)
(actual time=12.841..12.847 rows=1 loops=25)
               Index Cond: ("outer".parentnameid = t2.filenameid)  Total
runtime: 330.835 ms (10 rows)


Thanks,

brad

Re: Specific query performance problem help requested - postgresql 7.4

От
Tom Lane
Дата:
"Brad Might" <bmight@storediq.com> writes:
> Can someone help me break this down and figure out why the one query
> takes so much longer than the other?

It looks to me like there's a correlation between filename and bucket,
such that the indexscan in filename order takes much longer to run
across the first 25 rows with bucket = 3 than it does to run across
the first 25 with bucket = 7 or bucket = 8.  It's not just a matter of
there being fewer rows with bucket = 3 ... the cost differential is much
larger than is explained by the count ratios.  The bucket = 3 rows have
to be lurking further to the back of the filename order than the others.

> Here's the bucket distribution..i have clustered the index on the bucket
> value.

If you have an index on bucket, it's not doing you any good here anyway,
since you wrote the constraint as a crosstype operator ("3" is int4 not
int8).  It might help to explicitly cast the constant to int8.

            regards, tom lane

Re: Specific query performance problem help requested - postgresql 7.4

От
"Brad Might"
Дата:
Looks like I modified that constraint since the original has '3' and
explaining that shows the one I ended up running and posting has 3.
Whn I explain on the original version it shows filter: (bucket =
3::bigint)

Can you elaborate on what you mean by:
> The bucket = 3 rows have to be lurking further to the back of the
filename order than the others

How does this apply to the index on filename?

It is possible that the data values are skewed, is there any way I can
gracefully handle this condition?
This query is being used to extract data for interactive display and the
time for bucket 3 is so out of
whack with all the others (I've run this across all buckets and only
bucket 3 has the horrendous excecution
times)


Any suggestions for working around this problem to speed up execution?


Thanks for the help

brad


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, May 26, 2005 1:32 PM
To: Brad Might
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Specific query performance problem help requested
- postgresql 7.4

"Brad Might" <bmight@storediq.com> writes:
> Can someone help me break this down and figure out why the one query
> takes so much longer than the other?

It looks to me like there's a correlation between filename and bucket,
such that the indexscan in filename order takes much longer to run
across the first 25 rows with bucket = 3 than it does to run across the
first 25 with bucket = 7 or bucket = 8.  It's not just a matter of there
being fewer rows with bucket = 3 ... the cost differential is much
larger than is explained by the count ratios.  The bucket = 3 rows have
to be lurking further to the back of the filename order than the others.

> Here's the bucket distribution..i have clustered the index on the
> bucket value.

If you have an index on bucket, it's not doing you any good here anyway,
since you wrote the constraint as a crosstype operator ("3" is int4 not
int8).  It might help to explicitly cast the constant to int8.

            regards, tom lane