Обсуждение: Bottleneck?
I have a database (699221). It contains of 1.8GB data (707710). I am doing a complex query. Which required to load a 80MB index (732287). I restarted Postgresql so the cache is empty and it has to read the table and index from disk. Which I understand is an expensive process. But what I don't understand is even I split the index into a different tablespace located on a completely separate disk (mounted on /hdd2) there is still a very long I/O wait time. That index is only thing exist on that disk. Any idea why? Or any way I can find out what it is waiting for? Thanks. (running DTrace tool kit iofile.d script to show I/O wait time by filename and process) bash-3.00# ./iofile.d Tracing... Hit Ctrl-C to end. ^C PID CMD TIME FILE 2379 postgres 23273 /export/space/postgres8.3/lib/plpgsql.so 2224 metacity 24403 /lib/libm.so.2 2379 postgres 32345 /export/space/pg_data/pg_data/data/base/699221/2619 2379 postgres 40992 /export/space/pg_data/pg_data/data/base/699221/2691 0 sched 82205 <none> 2379 postgres 273205 /export/space/postgres8.3/bin/postgres 2379 postgres 1092140 <none> 2379 postgres 59461649 /hdd2/indexes/699221/732287 (running DTrace tool kit iofildb.d script to show I/O bytes by filename and process) bash-3.00# ./iofileb.d Tracing... Hit Ctrl-C to end. ^C PID CMD KB FILE 2379 postgres 8256 /export/space/pg_data/pg_data/data/base/699221/699473 2379 postgres 87760 /hdd2/indexes/699221/732287 2379 postgres 832472 /export/space/pg_data/pg_data/data/base/699221/707710.1 2379 postgres 1048576 /export/space/pg_data/pg_data/data/base/699221/707710 -- John
On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: > I have a database (699221). It contains of 1.8GB data (707710). I am > doing a complex query. Which required to load a 80MB index (732287). > > I restarted Postgresql so the cache is empty and it has to read the > table and index from disk. Which I understand is an expensive process. > But what I don't understand is even I split the index into a different > tablespace located on a completely separate disk (mounted on /hdd2) > there is still a very long I/O wait time. That index is only thing > exist on that disk. Any idea why? Or any way I can find out what it is > waiting for? Thanks. OK before DTrace, did you run explain analyze on the query? I think the output of that would be interesting. Looking at the DTrace output it looks to me like you're reading at least one > 1GB table. since you're accessing a file with a .1 on it.
Hi Scott,
Yes I did that. And you are right because I restart my machine, so the postgres cache is empty. And I think postgresql is reading all 1.8GB of data back into the cache when it does a seq scan on the status table.
QUERY PLAN
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
Sort (cost=390162.53..390162.54 rows=3 width=567) (actual time=106045.453..106 078.238 rows=80963 loops=1)
Sort Key: rec.startdatetime, rec.id
Sort Method: quicksort Memory: 43163kB
-> Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual time=41205.683..105523.303 rows=80963 loops=1)
Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
-> Nested Loop (cost=360360.86..389999.01 rows=3 width=567) (actual time=41127.859..105256.069 rows=80963 loops=1)
Join Filter: ((rec.guid)::text = (status.guid)::text)
-> HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual time=41089.852..41177.137 rows=80000 loops=1)
-> Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36401.247..38505.042 rows=4000000 loops=1)
Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
-> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=1009.161..1029.849 rows=80000 loops=1)
-> Hash (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35391.697..35391.697 rows=4000000 loops=1)
-> Seq Scan on status (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.095..32820.746 rows=4000000 loops =1)
Filter: (startdatetime <= 1249281281666:: bigint)
-> Index Scan using index_status_startdatetime on status rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1 loops=80000)
Index Cond: (rec.startdatetime = (max(status.startdatetime)))
-> Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
Total runtime: 106227.356 ms
(18 rows)
On Aug 6, 2009 2:19pm, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin Johnwkipjohn@gmail.com> wrote:
>
> > I have a database (699221). It contains of 1.8GB data (707710). I am
>
> > doing a complex query. Which required to load a 80MB index (732287).
>
> >
>
> > I restarted Postgresql so the cache is empty and it has to read the
>
> > table and index from disk. Which I understand is an expensive process.
>
> > But what I don't understand is even I split the index into a different
>
> > tablespace located on a completely separate disk (mounted on /hdd2)
>
> > there is still a very long I/O wait time. That index is only thing
>
> > exist on that disk. Any idea why? Or any way I can find out what it is
>
> > waiting for? Thanks.
>
>
>
> OK before DTrace, did you run explain analyze on the query? I think
>
> the output of that would be interesting.
>
>
>
> Looking at the DTrace output it looks to me like you're reading at
>
> least one > 1GB table. since you're accessing a file with a .1 on it.
>
Yes I did that. And you are right because I restart my machine, so the postgres cache is empty. And I think postgresql is reading all 1.8GB of data back into the cache when it does a seq scan on the status table.
QUERY PLAN
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
Sort (cost=390162.53..390162.54 rows=3 width=567) (actual time=106045.453..106 078.238 rows=80963 loops=1)
Sort Key: rec.startdatetime, rec.id
Sort Method: quicksort Memory: 43163kB
-> Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual time=41205.683..105523.303 rows=80963 loops=1)
Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
-> Nested Loop (cost=360360.86..389999.01 rows=3 width=567) (actual time=41127.859..105256.069 rows=80963 loops=1)
Join Filter: ((rec.guid)::text = (status.guid)::text)
-> HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual time=41089.852..41177.137 rows=80000 loops=1)
-> Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36401.247..38505.042 rows=4000000 loops=1)
Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
-> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=1009.161..1029.849 rows=80000 loops=1)
-> Hash (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35391.697..35391.697 rows=4000000 loops=1)
-> Seq Scan on status (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.095..32820.746 rows=4000000 loops =1)
Filter: (startdatetime <= 1249281281666:: bigint)
-> Index Scan using index_status_startdatetime on status rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1 loops=80000)
Index Cond: (rec.startdatetime = (max(status.startdatetime)))
-> Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
Total runtime: 106227.356 ms
(18 rows)
On Aug 6, 2009 2:19pm, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin Johnwkipjohn@gmail.com> wrote:
>
> > I have a database (699221). It contains of 1.8GB data (707710). I am
>
> > doing a complex query. Which required to load a 80MB index (732287).
>
> >
>
> > I restarted Postgresql so the cache is empty and it has to read the
>
> > table and index from disk. Which I understand is an expensive process.
>
> > But what I don't understand is even I split the index into a different
>
> > tablespace located on a completely separate disk (mounted on /hdd2)
>
> > there is still a very long I/O wait time. That index is only thing
>
> > exist on that disk. Any idea why? Or any way I can find out what it is
>
> > waiting for? Thanks.
>
>
>
> OK before DTrace, did you run explain analyze on the query? I think
>
> the output of that would be interesting.
>
>
>
> Looking at the DTrace output it looks to me like you're reading at
>
> least one > 1GB table. since you're accessing a file with a .1 on it.
>
Could you possibly attach that in plain text format? Your email client seems to have eaten any text formatting / indentation.
Is this alright?
> QUERY PLAN
> -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
> Sort (cost=390162.53..390162.54 rows=3 width=567) (actual time=106045.453..106 078.238 rows=80963 loops=1)
> Sort Key: rec.startdatetime, rec.id
> Sort Method: quicksort Memory: 43163kB
> -> Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual time=41205.683..105523.303 rows=80963 loops=1)
> Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
> -> Nested Loop (cost=360360.86..389999.01 rows=3 width=567) (actual time=41127.859..105256.069 rows=80963 loops=1)
> Join Filter: ((rec.guid)::text = (status.guid)::text)
> -> HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual time=41089.852..41177.137 rows=80000 loops=1)
> -> Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36401.247..38505.042 rows=4000000 loops=1)
> Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
> -> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=1009.161..1029.849 rows=80000 loops=1)
> -> Hash (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35391.697..35391.697 rows=4000000 loops=1)
> -> Seq Scan on status (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.095..32820.746 rows=4000000 loops =1)
> Filter: (startdatetime -> Index Scan using index_status_startdatetime on status rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1 loops=80000)
> Index Cond: (rec.startdatetime = (max(status.startdatetime)))
> -> Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
> Total runtime: 106227.356 ms
> (18 rows)
>
On Aug 6, 2009 2:30pm, wkipjohn@gmail.com wrote:
> Hi Scott,
>
> Yes I did that. And you are right because I restart my machine, so the postgres cache is empty. And I think postgresql is reading all 1.8GB of data back into the cache when it does a seq scan on the status table.
>
> QUERY PLAN
> -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
> Sort (cost=390162.53..390162.54 rows=3 width=567) (actual time=106045.453..106 078.238 rows=80963 loops=1)
> Sort Key: rec.startdatetime, rec.id
> Sort Method: quicksort Memory: 43163kB
> -> Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual time=41205.683..105523.303 rows=80963 loops=1)
> Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
> -> Nested Loop (cost=360360.86..389999.01 rows=3 width=567) (actual time=41127.859..105256.069 rows=80963 loops=1)
> Join Filter: ((rec.guid)::text = (status.guid)::text)
> -> HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual time=41089.852..41177.137 rows=80000 loops=1)
> -> Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36401.247..38505.042 rows=4000000 loops=1)
> Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
> -> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=1009.161..1029.849 rows=80000 loops=1)
> -> Hash (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35391.697..35391.697 rows=4000000 loops=1)
> -> Seq Scan on status (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.095..32820.746 rows=4000000 loops =1)
> Filter: (startdatetime -> Index Scan using index_status_startdatetime on status rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1 loops=80000)
> Index Cond: (rec.startdatetime = (max(status.startdatetime)))
> -> Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
> Total runtime: 106227.356 ms
> (18 rows)
>
>
>
> On Aug 6, 2009 2:19pm, Scott Marlowe scott.marlowe@gmail.com> wrote:
> > On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin Johnwkipjohn@gmail.com> wrote:
> >
> > > I have a database (699221). It contains of 1.8GB data (707710). I am
> >
> > > doing a complex query. Which required to load a 80MB index (732287).
> >
> > >
> >
> > > I restarted Postgresql so the cache is empty and it has to read the
> >
> > > table and index from disk. Which I understand is an expensive process.
> >
> > > But what I don't understand is even I split the index into a different
> >
> > > tablespace located on a completely separate disk (mounted on /hdd2)
> >
> > > there is still a very long I/O wait time. That index is only thing
> >
> > > exist on that disk. Any idea why? Or any way I can find out what it is
> >
> > > waiting for? Thanks.
> >
> >
> >
> > OK before DTrace, did you run explain analyze on the query? I think
> >
> > the output of that would be interesting.
> >
> >
> >
> > Looking at the DTrace output it looks to me like you're reading at
> >
> > least one > 1GB table. since you're accessing a file with a .1 on it.
> >
> QUERY PLAN
> -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
> Sort (cost=390162.53..390162.54 rows=3 width=567) (actual time=106045.453..106 078.238 rows=80963 loops=1)
> Sort Key: rec.startdatetime, rec.id
> Sort Method: quicksort Memory: 43163kB
> -> Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual time=41205.683..105523.303 rows=80963 loops=1)
> Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
> -> Nested Loop (cost=360360.86..389999.01 rows=3 width=567) (actual time=41127.859..105256.069 rows=80963 loops=1)
> Join Filter: ((rec.guid)::text = (status.guid)::text)
> -> HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual time=41089.852..41177.137 rows=80000 loops=1)
> -> Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36401.247..38505.042 rows=4000000 loops=1)
> Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
> -> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=1009.161..1029.849 rows=80000 loops=1)
> -> Hash (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35391.697..35391.697 rows=4000000 loops=1)
> -> Seq Scan on status (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.095..32820.746 rows=4000000 loops =1)
> Filter: (startdatetime -> Index Scan using index_status_startdatetime on status rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1 loops=80000)
> Index Cond: (rec.startdatetime = (max(status.startdatetime)))
> -> Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
> Total runtime: 106227.356 ms
> (18 rows)
>
On Aug 6, 2009 2:30pm, wkipjohn@gmail.com wrote:
> Hi Scott,
>
> Yes I did that. And you are right because I restart my machine, so the postgres cache is empty. And I think postgresql is reading all 1.8GB of data back into the cache when it does a seq scan on the status table.
>
> QUERY PLAN
> -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
> Sort (cost=390162.53..390162.54 rows=3 width=567) (actual time=106045.453..106 078.238 rows=80963 loops=1)
> Sort Key: rec.startdatetime, rec.id
> Sort Method: quicksort Memory: 43163kB
> -> Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual time=41205.683..105523.303 rows=80963 loops=1)
> Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
> -> Nested Loop (cost=360360.86..389999.01 rows=3 width=567) (actual time=41127.859..105256.069 rows=80963 loops=1)
> Join Filter: ((rec.guid)::text = (status.guid)::text)
> -> HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual time=41089.852..41177.137 rows=80000 loops=1)
> -> Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36401.247..38505.042 rows=4000000 loops=1)
> Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
> -> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=1009.161..1029.849 rows=80000 loops=1)
> -> Hash (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35391.697..35391.697 rows=4000000 loops=1)
> -> Seq Scan on status (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.095..32820.746 rows=4000000 loops =1)
> Filter: (startdatetime -> Index Scan using index_status_startdatetime on status rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1 loops=80000)
> Index Cond: (rec.startdatetime = (max(status.startdatetime)))
> -> Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
> Total runtime: 106227.356 ms
> (18 rows)
>
>
>
> On Aug 6, 2009 2:19pm, Scott Marlowe scott.marlowe@gmail.com> wrote:
> > On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin Johnwkipjohn@gmail.com> wrote:
> >
> > > I have a database (699221). It contains of 1.8GB data (707710). I am
> >
> > > doing a complex query. Which required to load a 80MB index (732287).
> >
> > >
> >
> > > I restarted Postgresql so the cache is empty and it has to read the
> >
> > > table and index from disk. Which I understand is an expensive process.
> >
> > > But what I don't understand is even I split the index into a different
> >
> > > tablespace located on a completely separate disk (mounted on /hdd2)
> >
> > > there is still a very long I/O wait time. That index is only thing
> >
> > > exist on that disk. Any idea why? Or any way I can find out what it is
> >
> > > waiting for? Thanks.
> >
> >
> >
> > OK before DTrace, did you run explain analyze on the query? I think
> >
> > the output of that would be interesting.
> >
> >
> >
> > Looking at the DTrace output it looks to me like you're reading at
> >
> > least one > 1GB table. since you're accessing a file with a .1 on it.
> >
Sorry post again.
QUERY PLAN
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
Sort (cost=390162.53..390162.54 rows=3 width=567) (actual time=105726.803..105 756.743 rows=80963 loops=1)
Sort Key: rec.startdatetime, rec.id
Sort Method: quicksort Memory: 43163kB
-> Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual time=41332.430..105220.859 rows=80963 loops=1)
Join Filter: ((rec.acsguid)::text = (resolve.resolve)::text)
-> Nested Loop (cost=360360.86..389999.01 rows=3 width=567) (actual t ime=41252.145..104952.438 rows=80963 loops=1)
Join Filter: ((rec.volumeguid)::text = (dummymediastatus.volumegu id)::text)
-> HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual time=41212.903..41299.709 rows=80000 loops=1)
-> Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36360.938..38540.426 rows=4000000 loops=1)
Hash Cond: ((getcurrentguids.getcurrentguids)::text = (dummymediastatus.volumeguid)::text)
-> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=977.013..997.404 rows=80000 loops=1)
-> Hash (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35383.529..35383.529 rows=4000000 loops=1)
-> Seq Scan on dummymediastatus (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.081..32821.253 rows=4000000 loops =1)
Filter: (startdatetime <= 1249281281666:: bigint)
-> Index Scan using index_dummymediastatus_startdatetime on dumm ymediastatus rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.791..0.792 r ows=1 loops=80000)
Index Cond: (rec.startdatetime = (max(dummymediastatus.star tdatetime)))
-> Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
Total runtime: 105906.467 ms
(18 rows)
QUERY PLAN
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
Sort (cost=390162.53..390162.54 rows=3 width=567) (actual time=105726.803..105 756.743 rows=80963 loops=1)
Sort Key: rec.startdatetime, rec.id
Sort Method: quicksort Memory: 43163kB
-> Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual time=41332.430..105220.859 rows=80963 loops=1)
Join Filter: ((rec.acsguid)::text = (resolve.resolve)::text)
-> Nested Loop (cost=360360.86..389999.01 rows=3 width=567) (actual t ime=41252.145..104952.438 rows=80963 loops=1)
Join Filter: ((rec.volumeguid)::text = (dummymediastatus.volumegu id)::text)
-> HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual time=41212.903..41299.709 rows=80000 loops=1)
-> Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36360.938..38540.426 rows=4000000 loops=1)
Hash Cond: ((getcurrentguids.getcurrentguids)::text = (dummymediastatus.volumeguid)::text)
-> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=977.013..997.404 rows=80000 loops=1)
-> Hash (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35383.529..35383.529 rows=4000000 loops=1)
-> Seq Scan on dummymediastatus (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.081..32821.253 rows=4000000 loops =1)
Filter: (startdatetime <= 1249281281666:: bigint)
-> Index Scan using index_dummymediastatus_startdatetime on dumm ymediastatus rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.791..0.792 r ows=1 loops=80000)
Index Cond: (rec.startdatetime = (max(dummymediastatus.star tdatetime)))
-> Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
Total runtime: 105906.467 ms
(18 rows)
On Wed, Aug 5, 2009 at 11:21 PM, <wkipjohn@gmail.com> wrote: > Sorry post again. Nope, still mangled. Can you attach it?
Hi scott I attached the query plan with this email. The top one is the first run after I restarted my machine. And the bottom one is the second run. I am using PostgreSQL 8.3 on Solaris 10. cheers On Thu, Aug 6, 2009 at 4:15 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > On Wed, Aug 5, 2009 at 11:21 PM, <wkipjohn@gmail.com> wrote: >> Sorry post again. > > Nope, still mangled. Can you attach it? > -- John
OK, two things. First the row estimate starts going way off around the time it gets to the hash aggregate / nested loop which seems to be making the planner use a bad plan for this many rows. You can try issuing set enable_nestloop = off; before running the query and see if that makes it any faster. Secondly, the first time you run this query you are reading the 1.8G table sequentially, and at about 55MB/s, which isn't gonna get faster without more / faster drives under your machine. On Thu, Aug 6, 2009 at 12:50 AM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: > Here u go. Both in the same file. > > On Thu, Aug 6, 2009 at 4:48 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >> Much better... Looks like I got the second one... >> >> Can I get the first one too? Thx. >> >> On Thu, Aug 6, 2009 at 12:46 AM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: >>> Hope you can get it this time. >>> >>> John >>> >>> On Thu, Aug 6, 2009 at 4:34 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >>>> Sorry man, it's not coming through. Try it this time addressed just to me. >>>> >>>> On Thu, Aug 6, 2009 at 12:23 AM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: >>>>> Hi scott >>>>> >>>>> I attached the query plan with this email. The top one is the first >>>>> run after I restarted my machine. And the bottom one is the second >>>>> run. >>>>> >>>>> I am using PostgreSQL 8.3 on Solaris 10. >>>>> >>>>> cheers >>>>> >>>>> On Thu, Aug 6, 2009 at 4:15 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >>>>>> On Wed, Aug 5, 2009 at 11:21 PM, <wkipjohn@gmail.com> wrote: >>>>>>> Sorry post again. >>>>>> >>>>>> Nope, still mangled. Can you attach it? >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> John >>>>> >>>> >>>> >>>> >>>> -- >>>> When fascism comes to America, it will be intolerance sold as diversity. >>>> >>> >>> >>> >>> -- >>> John >>> >> >> >> >> -- >> When fascism comes to America, it will be intolerance sold as diversity. >> > > > > -- > John > -- When fascism comes to America, it will be intolerance sold as diversity.
On Thu, Aug 06, 2009 at 12:50:51PM +1000, Ip Wing Kin John wrote: > (running DTrace tool kit iofile.d script to show I/O wait time by > filename and process) Is the dtrace toolkit a viable product for a linux environment or is it strickly Sun/Oracle?
Ray Stell <stellr@cns.vt.edu> writes: > On Thu, Aug 06, 2009 at 12:50:51PM +1000, Ip Wing Kin John wrote: >> (running DTrace tool kit iofile.d script to show I/O wait time by >> filename and process) > Is the dtrace toolkit a viable product for a linux environment or > is it strickly Sun/Oracle? dtrace is available on Solaris and Mac OS X and probably a couple other platforms, but not Linux. For Linux there is SystemTap, which does largely the same kinds of things but has a different scripting syntax ... regards, tom lane
On Thu, Aug 06, 2009 at 11:01:52AM -0400, Tom Lane wrote: > > dtrace is available on Solaris and Mac OS X and probably a couple > other platforms, but not Linux. I wondered if anyone had given this a go: http://amitksaha.blogspot.com/2009/03/dtrace-on-linux.html
I wasn't able to compile dtrace on either CentOS 5.3 or Fedora 11. But the author is responsive and the problem doesn't look hard to fix. It sits in my inbox awaiting some hacking time... Ken On Thu, 06 Aug 2009 11:43:05 -0400, Ray Stell <stellr@cns.vt.edu> wrote: > On Thu, Aug 06, 2009 at 11:01:52AM -0400, Tom Lane wrote: >> >> dtrace is available on Solaris and Mac OS X and probably a couple >> other platforms, but not Linux. > > I wondered if anyone had given this a go: > > http://amitksaha.blogspot.com/2009/03/dtrace-on-linux.html > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
On Thu, 2009-08-06 at 11:57 -0400, Kenneth Cox wrote: > I wasn't able to compile dtrace on either CentOS 5.3 or Fedora 11. But > the author is responsive and the problem doesn't look hard to fix. It > sits in my inbox awaiting some hacking time... Why aren't you using systemtap again? As I recall it uses the same interface as dtrace. The front end is just different. Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
On Thu, Aug 06, 2009 at 09:12:22AM -0700, Joshua D. Drake wrote: > Why aren't you using systemtap again? 1. significant solaris responsibilites 2. significant linux responsibilities 3. tool consolidation delusions Can you drive dtace toolkit via systemtap?
On Thu, 2009-08-06 at 12:38 -0400, Ray Stell wrote: > On Thu, Aug 06, 2009 at 09:12:22AM -0700, Joshua D. Drake wrote: > > Why aren't you using systemtap again? > > 1. significant solaris responsibilites There is your problem right there ;) > 2. significant linux responsibilities > 3. tool consolidation delusions Hah! I know this one. > > Can you drive dtace toolkit via systemtap? > I don't know. Tom? Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
On Thu, 2009-08-06 at 11:57 -0400, Kenneth Cox wrote: > I wasn't able to compile dtrace on either CentOS 5.3 or Fedora 11. But > the author is responsive and the problem doesn't look hard to fix. It > sits in my inbox awaiting some hacking time... Why aren't you using systemtap again? As I recall it uses the same interface as dtrace. The front end is just different. Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
On Thu, 2009-08-06 at 12:38 -0400, Ray Stell wrote: > On Thu, Aug 06, 2009 at 09:12:22AM -0700, Joshua D. Drake wrote: > > Why aren't you using systemtap again? > > 1. significant solaris responsibilites There is your problem right there ;) > 2. significant linux responsibilities > 3. tool consolidation delusions Hah! I know this one. > > Can you drive dtace toolkit via systemtap? > I don't know. Tom? Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Hi Scott, Thanks for you suggestion. I have follow your suggestion by disable nestloop and have a substantial improvement. Takes 51s now. I have attached the new query plan in another file. What I want to ask is, is there any other way to hint the planner to choose to use merge join rather than nested loop by modifying my SQL? I did try to sort my second inner join by the join condition, but the planner still prefer to use nested loop. As I am afraid changing the system wide configuration will have some side effect on my other queries. Here is my SQL. select * from dummymediastatus rec INNER JOIN ( SELECT volumeGUID , MAX(startDatetime) AS msdt FROM dummymediastatus INNER JOIN ( select * from getcurrentguids(1249281281666,'hardware.volume',null,null) ) AS cfg ON ( cfg.getcurrentguids = volumeGUID) WHERE startDatetime <= 1249281281666 GROUP BY volumeGUID ) AS rec2 ON ( rec.volumeGUID = rec2.volumeGUID AND rec.startDatetime = rec2.msdt ) where ( ( 1>0 and 1>0 ) and rec.acsGUID in ( SELECT * FROM resolve('acs0') ) ) order by rec.startDatetime DESC,rec.id DESC; thanks On Thu, Aug 6, 2009 at 5:03 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > OK, two things. First the row estimate starts going way off around > the time it gets to the hash aggregate / nested loop which seems to be > making the planner use a bad plan for this many rows. You can try > issuing > > set enable_nestloop = off; > > before running the query and see if that makes it any faster. > > Secondly, the first time you run this query you are reading the 1.8G > table sequentially, and at about 55MB/s, which isn't gonna get faster > without more / faster drives under your machine. > > On Thu, Aug 6, 2009 at 12:50 AM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: >> Here u go. Both in the same file. >> >> On Thu, Aug 6, 2009 at 4:48 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >>> Much better... Looks like I got the second one... >>> >>> Can I get the first one too? Thx. >>> >>> On Thu, Aug 6, 2009 at 12:46 AM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: >>>> Hope you can get it this time. >>>> >>>> John >>>> >>>> On Thu, Aug 6, 2009 at 4:34 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >>>>> Sorry man, it's not coming through. Try it this time addressed just to me. >>>>> >>>>> On Thu, Aug 6, 2009 at 12:23 AM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: >>>>>> Hi scott >>>>>> >>>>>> I attached the query plan with this email. The top one is the first >>>>>> run after I restarted my machine. And the bottom one is the second >>>>>> run. >>>>>> >>>>>> I am using PostgreSQL 8.3 on Solaris 10. >>>>>> >>>>>> cheers >>>>>> >>>>>> On Thu, Aug 6, 2009 at 4:15 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >>>>>>> On Wed, Aug 5, 2009 at 11:21 PM, <wkipjohn@gmail.com> wrote: >>>>>>>> Sorry post again. >>>>>>> >>>>>>> Nope, still mangled. Can you attach it? >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> John >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> When fascism comes to America, it will be intolerance sold as diversity. >>>>> >>>> >>>> >>>> >>>> -- >>>> John >>>> >>> >>> >>> >>> -- >>> When fascism comes to America, it will be intolerance sold as diversity. >>> >> >> >> >> -- >> John >> > > > > -- > When fascism comes to America, it will be intolerance sold as diversity. > -- John
Вложения
On Mon, Aug 10, 2009 at 12:22 AM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: > Hi Scott, > > Thanks for you suggestion. I have follow your suggestion by disable > nestloop and have a substantial improvement. Takes 51s now. I have > attached the new query plan in another file. > > What I want to ask is, is there any other way to hint the planner to > choose to use merge join rather than nested loop by modifying my SQL? > I did try to sort my second inner join by the join condition, but the > planner still prefer to use nested loop. > > As I am afraid changing the system wide configuration will have some > side effect on my other queries. Yeah, that's more of a troubleshooting procedure than something you'd want to institute system wide. If you must set it for this query, you can do so just before you run it in your connection, then turn it back on for the rest of your queries. I.e.: set enable_nestloop=off; select ....; set enable_nestloop=on; I've had one or two big queries in the past that no amount of tuning and setting stats target higher and analyzing could force to choose the right plan. If you haven't already, try setting the default statistic target higher and re-analyzing to see if that helps. After that you can play around a bit with the cost parameters to see what helps. Note that just like setting enable_nestloop on or off, you can do so for the current connection only and not globally, especially while just testing.
On Mon, Aug 10, 2009 at 2:22 AM, Ip Wing Kin John<wkipjohn@gmail.com> wrote: > Hi Scott, > > Thanks for you suggestion. I have follow your suggestion by disable > nestloop and have a substantial improvement. Takes 51s now. I have > attached the new query plan in another file. > > What I want to ask is, is there any other way to hint the planner to > choose to use merge join rather than nested loop by modifying my SQL? > I did try to sort my second inner join by the join condition, but the > planner still prefer to use nested loop. > > As I am afraid changing the system wide configuration will have some > side effect on my other queries. > > Here is my SQL. > > select * from dummymediastatus rec INNER JOIN ( SELECT volumeGUID , > MAX(startDatetime) AS msdt FROM dummymediastatus INNER JOIN ( select * > from getcurrentguids(1249281281666,'hardware.volume',null,null) ) AS > cfg ON ( cfg.getcurrentguids = volumeGUID) WHERE startDatetime <= > 1249281281666 GROUP BY volumeGUID ) AS rec2 ON ( rec.volumeGUID = > rec2.volumeGUID AND rec.startDatetime = rec2.msdt ) where ( ( 1>0 > and 1>0 ) and rec.acsGUID in ( SELECT * FROM resolve('acs0') ) ) > order by rec.startDatetime DESC,rec.id DESC; It looks to me like a big chunk of your problem is here: -> Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32) (actual time=977.013..997.404 rows=80000 loops=1) The planner's estimate of the number of rows is off by a factor of 80 here. You should probably think about inlining the SQL contained inside that function, if possible. You might also want to look at the "rows" setting of CREATE OR REPLACE FUNCTION. As tempting as it is to encapsulate some of your logic into a set-returning function of some sort, as you've done here, I've found that it tends to suck. Even if you fix the row estimate, the planner will still estimate join selectivity etc. poorly for those rows because, of course, there are no statistics. ...Robert