Обсуждение: Slow query performance on large table

Поиск

Список

Период

Сортировка

Slow query performance on large table

От

"Paul McKay"

Дата:

04 марта 2003 г., 12:44:50

Hi,

I am executing a query on a table:

Table "public.measurement"

Column | Type | Modifiers

------------+-----------------------+-----------

assessment | integer |

time | integer |

value | character varying(50) |

Indexes: idx_measurement_assessment btree (assessment),

idx_measurement_time btree ("time")

The primary key of the table is a combination of assessment and time, and there are indexes on both assessment and time.

The query I am executing is

Select time,value

From measurement

Where assessment = ?

And time between ? and ?

This used to run like a rocket before my database got a little larger. There are now around 15 million rows in the table and it is taking a long time to execute queries that get a fair number of rows back (c.300)

The database is ‘VACUUM ANALYZED’ regularly, and I’ve upped the shared buffers to a significant amount.

I’ve tried it on various machine configurations now. A dual processor Linux/Intel Machine with 1G of Memory, (0.5G shared buffers). A single processor Linux/Intel Machine (0.25G shared buffers) , and a Solaris machine (0.25G shared buffers). I’m getting similar performance on all of them.

Anybody see anything I’ve obviously done wrong? Any ways of improving the performance of this query?

Thanks in advance.

Paul McKay.

======================================

Paul Mckay

Consultant Partner

Servicing Division

Clearwater-IT

e:paul_mckay@clearwater-it.co.uk

t:0161 877 6090

m: 07713 510946

======================================

Re: Slow query performance on large table

От

Tomasz Myrta

Дата:

04 марта 2003 г., 13:09:51

Paul McKay wrote:
> Hi,
>
>
>
> I am executing a query on a table:
>
>
>
>            Table "public.measurement"
>
>    Column   |         Type          | Modifiers
>
> ------------+-----------------------+-----------
>
>  assessment | integer               |
>
>  time       | integer               |
>
>  value      | character varying(50) |
>
> Indexes: idx_measurement_assessment btree (assessment),
>
>          idx_measurement_time btree ("time")
>
>
>
> The primary key of the table is a combination of assessment and time,
> and there are indexes on both assessment and time.
>
>
>
> The query I am executing is
>
>
>
> Select time,value
>
>  From measurement
>
> Where assessment = ?
>
> And time between ? and ?
Changing 2 indexes into one both-fields index should improve
performance much.

create index ind_meas on measurement (assessment,time).

Regards,
Tomasz Myrta

Re: Slow query performance on large table

От

Tom Lane

Дата:

04 марта 2003 г., 13:13:01

"Paul McKay" <paul_mckay@clearwater-it.co.uk> writes:
> The query I am executing is
> Select time,value
> From measurement
> Where assessment = ?
> And time between ? and ?

EXPLAIN ANALYZE would help you investigate this.  Is it using an
indexscan?  On which index?  Does forcing use of the other index
(by temporarily dropping the preferred one) improve matters?

Possibly a two-column index on both assessment and time would be
an improvement, but it's hard to guess without knowing anything
about the selectivity of the two WHERE clauses.

            regards, tom lane

Re: Slow query performance on large table

От

Andrew Sullivan

Дата:

04 марта 2003 г., 13:15:30

On Tue, Mar 04, 2003 at 02:45:18PM -0000, Paul McKay wrote:
>
> Select time,value
>
> >From measurement
>
> Where assessment = ?
>
> And time between ? and ?
>

Please run this with EXPLAIN ANALYSE with values that slow the query
down.  By bet is that you have an index which needs wider statistics
setting on the column to be useful, but without the output from
EXAPLIN ANALYSE it'll be hard to tell.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Slow query performance on large table

От

"Paul McKay"

Дата:

04 марта 2003 г., 14:10:48

The results were

clearview=# explain analyse
clearview-# select assessment,time
clearview-# from measurement
clearview-# where assessment = 53661
clearview-# and time between 1046184261 and 1046335461;

NOTICE:  QUERY PLAN:

Index Scan using idx_measurement_assessment on measurement
(cost=0.00..34668.61 rows=261 width=8) (actual time=26128.07..220584.69
rows=503 loops=1)
Total runtime: 220587.06 msec

EXPLAIN

After adding the index kindly suggested by yourself and Tomasz I get,

clearview=# explain analyse
clearview-# select assessment,time
clearview-# from measurement
clearview-# where assessment = 53661
clearview-# and time between 1046184261 and 1046335461;
NOTICE:  QUERY PLAN:

Index Scan using ind_measurement_ass_time on measurement
(cost=0.00..1026.92 rows=261 width=8) (actual time=15.37..350.46
rows=503 loops=1)
Total runtime: 350.82 msec

EXPLAIN

I vaguely recall doing a bit of a reorganize on this database a bit back
and it looks like I lost the primary Key index. No wonder it was going
slow.

Thanks a lot for your help.

Paul Mckay.

======================================
Paul Mckay
Consultant Partner
Servicing Division
Clearwater-IT
e:paul_mckay@clearwater-it.co.uk
t:0161 877 6090
m: 07713 510946
======================================

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: 04 March 2003 15:13
To: Paul McKay
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query performance on large table

"Paul McKay" <paul_mckay@clearwater-it.co.uk> writes:
> The query I am executing is
> Select time,value
> From measurement
> Where assessment = ?
> And time between ? and ?

EXPLAIN ANALYZE would help you investigate this.  Is it using an
indexscan?  On which index?  Does forcing use of the other index
(by temporarily dropping the preferred one) improve matters?

Possibly a two-column index on both assessment and time would be
an improvement, but it's hard to guess without knowing anything
about the selectivity of the two WHERE clauses.

            regards, tom lane

Re: Slow query performance on large table

От

Andreas Pflug

Дата:

04 марта 2003 г., 14:38:41

Tom Lane wrote:

>"Paul McKay" <paul_mckay@clearwater-it.co.uk> writes:
>
>
>>The query I am executing is
>>Select time,value
>From measurement
>>Where assessment = ?
>>And time between ? and ?
>>
>>
>
>EXPLAIN ANALYZE would help you investigate this.  Is it using an
>indexscan?  On which index?  Does forcing use of the other index
>(by temporarily dropping the preferred one) improve matters?
>
>Possibly a two-column index on both assessment and time would be
>an improvement, but it's hard to guess without knowing anything
>about the selectivity of the two WHERE clauses.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>
>

Tom,

does this mean that a primary key alone might not be enough? As far as I
understood Paul, the PK looks quite as the newly created index does, so
"create index ind_meas on measurement (assessment,time)"  should perform
the same as "... primary key(assessment,time)".
Do possibly non-optimal indices (only assessment, only time as Paul
described earlier) screw up the optimizer, igoring the better option
usiing the PK? Obviously, the index used should be combined of
(assessment,time) but IMHO a PK should be enough.

regards,

Andreas

Re: Slow query performance on large table

От

Robert Treat

Дата:

04 марта 2003 г., 15:02:41

On Tue, 2003-03-04 at 11:11, Paul McKay wrote:
> The results were
>
> clearview=# explain analyse
> clearview-# select assessment,time
> clearview-# from measurement
> clearview-# where assessment = 53661
> clearview-# and time between 1046184261 and 1046335461;
>
> NOTICE:  QUERY PLAN:
>
> Index Scan using idx_measurement_assessment on measurement
> (cost=0.00..34668.61 rows=261 width=8) (actual time=26128.07..220584.69
> rows=503 loops=1)
> Total runtime: 220587.06 msec
>
> EXPLAIN
>
> After adding the index kindly suggested by yourself and Tomasz I get,
>
> clearview=# explain analyse
> clearview-# select assessment,time
> clearview-# from measurement
> clearview-# where assessment = 53661
> clearview-# and time between 1046184261 and 1046335461;
> NOTICE:  QUERY PLAN:
>
> Index Scan using ind_measurement_ass_time on measurement
> (cost=0.00..1026.92 rows=261 width=8) (actual time=15.37..350.46
> rows=503 loops=1)
> Total runtime: 350.82 msec
>
> EXPLAIN
>
>
> I vaguely recall doing a bit of a reorganize on this database a bit back
> and it looks like I lost the primary Key index. No wonder it was going
> slow.
>

Maybe it's just me, but I get the feeling you need to work some regular
reindexing into your maintenance schedule. Given your query is using
between, I don't think it would use the index on the time field anyway
(and explain analyze seems to be supporting this). Rewrite it so that
you have a and time > foo and time < bar and I think you'll see a
difference. With that in mind, I think your speedier query results are
due more to having a non-bloated index freshly created than the fact
that it being a dual column index.

Robert Treat

Re: Slow query performance on large table

От

Josh Berkus

Дата:

04 марта 2003 г., 15:15:28

Paul,

> Index Scan using idx_measurement_assessment on measurement
> (cost=0.00..34668.61 rows=261 width=8) (actual time=26128.07..220584.69
> rows=503 loops=1)
> Total runtime: 220587.06 msec

These query results say to me that you need to do both a VACUUM FULL and a
REINDEX on this table.   The 26-second delay before returning the first row
says "table/index with lots of dead pages" to me.

For the future, you should consider dramatically increasing your FSM settings
and working a regular VACUUM FULL and REINDEX into your maintainence jobs.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Slow query performance on large table

От

"Paul McKay"

Дата:

04 марта 2003 г., 15:18:25

I used the between .. and in a vain attempt to improve performance!
Running with < and > improves the performance again by about 10 times.

The explain's below were ran on a test server I was using (not the live
server) where I had recreated the database in order to investigate
matters, so all the indexes were newly created anyway. The dual column
index was the key (literally).

======================================
Paul Mckay
Consultant Partner
Servicing Division
Clearwater-IT
e:paul_mckay@clearwater-it.co.uk
t:0161 877 6090
m: 07713 510946
======================================

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Robert
Treat
Sent: 04 March 2003 17:02
To: Paul McKay
Cc: 'Tom Lane'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query performance on large table

On Tue, 2003-03-04 at 11:11, Paul McKay wrote:
> The results were
>
> clearview=# explain analyse
> clearview-# select assessment,time
> clearview-# from measurement
> clearview-# where assessment = 53661
> clearview-# and time between 1046184261 and 1046335461;
>
> NOTICE:  QUERY PLAN:
>
> Index Scan using idx_measurement_assessment on measurement
> (cost=0.00..34668.61 rows=261 width=8) (actual
time=26128.07..220584.69
> rows=503 loops=1)
> Total runtime: 220587.06 msec
>
> EXPLAIN
>
> After adding the index kindly suggested by yourself and Tomasz I get,
>
> clearview=# explain analyse
> clearview-# select assessment,time
> clearview-# from measurement
> clearview-# where assessment = 53661
> clearview-# and time between 1046184261 and 1046335461;
> NOTICE:  QUERY PLAN:
>
> Index Scan using ind_measurement_ass_time on measurement
> (cost=0.00..1026.92 rows=261 width=8) (actual time=15.37..350.46
> rows=503 loops=1)
> Total runtime: 350.82 msec
>
> EXPLAIN
>
>
> I vaguely recall doing a bit of a reorganize on this database a bit
back
> and it looks like I lost the primary Key index. No wonder it was going
> slow.
>

Maybe it's just me, but I get the feeling you need to work some regular
reindexing into your maintenance schedule. Given your query is using
between, I don't think it would use the index on the time field anyway
(and explain analyze seems to be supporting this). Rewrite it so that
you have a and time > foo and time < bar and I think you'll see a
difference. With that in mind, I think your speedier query results are
due more to having a non-bloated index freshly created than the fact
that it being a dual column index.

Robert Treat

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Slow query performance on large table

От

Tom Lane

Дата:

04 марта 2003 г., 15:20:31

Robert Treat <xzilla@users.sourceforge.net> writes:
> Maybe it's just me, but I get the feeling you need to work some regular
> reindexing into your maintenance schedule.

Or at least, more vacuuming...

> Given your query is using
> between, I don't think it would use the index on the time field anyway
> (and explain analyze seems to be supporting this). Rewrite it so that
> you have a and time > foo and time < bar and I think you'll see a
> difference.

No, you won't, because that's exactly what BETWEEN is.

            regards, tom lane

Re: Slow query performance on large table

От

Tomasz Myrta

Дата:

04 марта 2003 г., 15:20:46

Andreas Pflug wrote:

> Tom,
>
> does this mean that a primary key alone might not be enough? As far as I
> understood Paul, the PK looks quite as the newly created index does, so
> "create index ind_meas on measurement (assessment,time)"  should perform
> the same as "... primary key(assessment,time)".
> Do possibly non-optimal indices (only assessment, only time as Paul
> described earlier) screw up the optimizer, igoring the better option
> usiing the PK? Obviously, the index used should be combined of
> (assessment,time) but IMHO a PK should be enough.
>
> regards,
>
> Andreas
You are right - primary key should be ok, but Paul lost it. psql \d
shows primary key indexes, but in this case there was no such primary key.

Regards,
Tomasz Myrta

Re: Slow query performance on large table

От

Tomasz Myrta

Дата:

04 марта 2003 г., 15:29:14

Robert Treat wrote:

> Maybe it's just me, but I get the feeling you need to work some regular
> reindexing into your maintenance schedule. Given your query is using
> between, I don't think it would use the index on the time field anyway
> (and explain analyze seems to be supporting this). Rewrite it so that
> you have a and time > foo and time < bar and I think you'll see a
> difference. With that in mind, I think your speedier query results are
> due more to having a non-bloated index freshly created than the fact
> that it being a dual column index.
>
> Robert Treat
Do you know anything about between, what should we know?
I made some tests, and there was no noticable difference between them:

pvwatch=# EXPLAIN analyze * from stats where hostid=1 and stp between 1
and 2;
                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------
  Index Scan using ind_stats on stats  (cost=0.00..6.01 rows=1 width=28)
(actual time=0.00..0.00 rows=0 loops=1)
    Index Cond: ((hostid = 1) AND (stp >= 1) AND (stp <= 2))
  Total runtime: 0.00 msec
(3 rows)

pvwatch=# EXPLAIN analyze SELECT * from stats where hostid=1 and stp> 1
and stp<2;
                                                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------
  Index Scan using ind_stats on stats  (cost=0.00..6.01 rows=1 width=28)
(actual time=0.00..0.00 rows=0 loops=1)
    Index Cond: ((hostid = 1) AND (stp > 1) AND (stp < 2))
  Total runtime: 0.00 msec
(3 rows)


Regards,
Tomasz Myrta

Re: Slow query performance on large table

От

Tom Lane

Дата:

04 марта 2003 г., 15:53:14

Andreas Pflug <Andreas.Pflug@web.de> writes:
> "create index ind_meas on measurement (assessment,time)"  should perform
> the same as "... primary key(assessment,time)".

Sure.

> Do possibly non-optimal indices (only assessment, only time as Paul
> described earlier) screw up the optimizer, igoring the better option
> usiing the PK?

One would like to think the optimizer will make the right choice.  But
using a two-column index just because it's there isn't necessarily the
right choice.  The two-column index will certainly be bulkier and more
expensive to scan, so if there's a one-column index that's nearly as
selective, it might be a better choice.

            regards, tom lane

Re: Slow query performance on large table

От

Andreas Pflug

Дата:

05 марта 2003 г., 04:34:33

Tomasz Myrta wrote:

> You are right - primary key should be ok, but Paul lost it. psql \d
> shows primary key indexes, but in this case there was no such primary
> key.
>
> Regards,
> Tomasz Myrta
>
Ok,

then my view of the world is all right again.

Re Tom Lane

> One would like to think the optimizer will make the right choice.  But
> using a two-column index just because it's there isn't necessarily the
> right choice.  The two-column index will certainly be bulkier and more
> expensive to scan, so if there's a one-column index that's nearly as
> selective, it might be a better choice.

If I know that the access pattern of my app looks as if it will need a
multipart index I should create it. If the optimizer finds out, a
simpler one will fit better, all right, it knows better (if properly
VACUUMed :-). But it's still good practice to offer complete indices.
Will pgsql use a multipart index as efficiently for simpler queries as a
shorter one covering only the first columns? In this example, the
(assessment, time) index could replace the (accessment) index, but
certainly not the (time) index. I tend to design longer indices with
hopefully valuable columns.

In this context:
 From MSSQL, I know "covering indices". Imagine a table t with many
columns, and an index on (a,b,c).
in MSSQL, SELECT c from t where (a ... AND b...) will use that index to
retrieve the c column value also without touching the row data. In a
sense, the index is used as an alternative table. Does pgsql profit from
this kind of indices also?

Regards,

Andreas

Re: Slow query performance on large table

От

"Paul McKay"

Дата:

05 марта 2003 г., 07:49:58

Hopefully you guys can help me with another query I've got that's
running slow.

This time it's across two tables I have

clearview=# \d panconversation
      Table "panconversation"
   Column    |  Type   | Modifiers
-------------+---------+-----------
 assessment  | integer | not null
 interface   | integer |
 source      | integer |
 destination | integer |
 protocol    | integer |
Indexes: idx_panconversation_destination,
         idx_panconversation_interface,
         idx_panconversation_protocol,
         idx_panconversation_source
Primary key: panconversation_pkey
Unique keys: unq_panconversation
Triggers: RI_ConstraintTrigger_52186648,
          RI_ConstraintTrigger_52186654,
          RI_ConstraintTrigger_52186660,
          RI_ConstraintTrigger_52186666

Primary key is assessment

Along with the table I was dealing with before, with the index I'd
mislaid put back in

clearview=# \d measurement
              Table "measurement"
   Column   |         Type          | Modifiers
------------+-----------------------+-----------
 assessment | integer               |
 time       | integer               |
 value      | character varying(50) |
Indexes: idx_measurement_assessment,
         idx_measurement_time,
         ind_measurement_ass_time

The 'explain analyse' of the query I am running is rather evil.

clearview=# explain analyse select source,value
clearview-#            from measurement, PANConversation
clearview-#            where PANConversation.assessment =
measurement.assessment
clearview-#            and Interface = 11
clearview-#            and Time > 1046184261 and Time < 1046335461
clearview-# ;
NOTICE:  QUERY PLAN:

Hash Join  (cost=1532.83..345460.73 rows=75115 width=23) (actual
time=1769.84..66687.11 rows=16094 loops=1)
  ->  Seq Scan on measurement  (cost=0.00..336706.07 rows=418859
width=15) (actual time=1280.11..59985.47 rows=455788 loops=1)
  ->  Hash  (cost=1498.21..1498.21 rows=13848 width=8) (actual
time=253.49..253.49 rows=0 loops=1)
        ->  Seq Scan on panconversation  (cost=0.00..1498.21 rows=13848
width=8) (actual time=15.64..223.18 rows=13475 loops=1)
Total runtime: 66694.82 msec

EXPLAIN

Anybody shed any light on why the indexes I created aren't being used,
and I have these nasty sequential scans?

Thanks in advance,

Paul.
======================================
Paul Mckay
Consultant Partner
Servicing Division
Clearwater-IT
e:paul_mckay@clearwater-it.co.uk
t:0161 877 6090
m: 07713 510946
======================================

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomasz
Myrta
Sent: 04 March 2003 17:21
To: Andreas Pflug
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query performance on large table

Andreas Pflug wrote:

> Tom,
>
> does this mean that a primary key alone might not be enough? As far as
I
> understood Paul, the PK looks quite as the newly created index does,
so
> "create index ind_meas on measurement (assessment,time)"  should
perform
> the same as "... primary key(assessment,time)".
> Do possibly non-optimal indices (only assessment, only time as Paul
> described earlier) screw up the optimizer, igoring the better option
> usiing the PK? Obviously, the index used should be combined of
> (assessment,time) but IMHO a PK should be enough.
>
> regards,
>
> Andreas
You are right - primary key should be ok, but Paul lost it. psql \d
shows primary key indexes, but in this case there was no such primary
key.

Regards,
Tomasz Myrta


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: Slow query performance on large table

От

Tomasz Myrta

Дата:

05 марта 2003 г., 08:04:30

Paul McKay wrote:
> Hopefully you guys can help me with another query I've got that's
> running slow.
>
> This time it's across two tables I have
>
> clearview=# \d panconversation
>       Table "panconversation"
>    Column    |  Type   | Modifiers
> -------------+---------+-----------
>  assessment  | integer | not null
>  interface   | integer |
>  source      | integer |
>  destination | integer |
>  protocol    | integer |
> Indexes: idx_panconversation_destination,
>          idx_panconversation_interface,
>          idx_panconversation_protocol,
>          idx_panconversation_source
> Primary key: panconversation_pkey
> Unique keys: unq_panconversation
> Triggers: RI_ConstraintTrigger_52186648,
>           RI_ConstraintTrigger_52186654,
>           RI_ConstraintTrigger_52186660,
>           RI_ConstraintTrigger_52186666
>
> Primary key is assessment
>
> Along with the table I was dealing with before, with the index I'd
> mislaid put back in
>
> clearview=# \d measurement
>               Table "measurement"
>    Column   |         Type          | Modifiers
> ------------+-----------------------+-----------
>  assessment | integer               |
>  time       | integer               |
>  value      | character varying(50) |
> Indexes: idx_measurement_assessment,
>          idx_measurement_time,
>          ind_measurement_ass_time
>
> The 'explain analyse' of the query I am running is rather evil.
>
> clearview=# explain analyse select source,value
> clearview-#            from measurement, PANConversation
> clearview-#            where PANConversation.assessment =
> measurement.assessment
> clearview-#            and Interface = 11
> clearview-#            and Time > 1046184261 and Time < 1046335461
> clearview-# ;
> NOTICE:  QUERY PLAN:
>
> Hash Join  (cost=1532.83..345460.73 rows=75115 width=23) (actual
> time=1769.84..66687.11 rows=16094 loops=1)
>   ->  Seq Scan on measurement  (cost=0.00..336706.07 rows=418859
> width=15) (actual time=1280.11..59985.47 rows=455788 loops=1)
>   ->  Hash  (cost=1498.21..1498.21 rows=13848 width=8) (actual
> time=253.49..253.49 rows=0 loops=1)
>         ->  Seq Scan on panconversation  (cost=0.00..1498.21 rows=13848
> width=8) (actual time=15.64..223.18 rows=13475 loops=1)
> Total runtime: 66694.82 msec
>
> EXPLAIN
>
> Anybody shed any light on why the indexes I created aren't being used,
> and I have these nasty sequential scans?

Measurement is sequentially scaned, because probably "interface=12"
results in lot of records.

Please, check how many rows you have
- all rows in measurement/panconversation,
- rows in measurement with "Interface"=12
- rows in panconversation between your time.

Regards,
Tomasz Myrta

Re: Slow query performance on large table

От

"Paul McKay"

Дата:

05 марта 2003 г., 08:26:43


clearview=# select count(*) from measurement;
  count
----------
 15302138
(1 row)

clearview=# select count(*) from panconversation;
 count
-------
 77217
(1 row)

clearview=# select count(*) from panconversation where interface = 11;
 count
-------
 13475
(1 row)

clearview=# select count(*) from measurement where time > 1046184261 and
time < 1046335461;
 count
--------
 455788
(1 row)

======================================
Paul Mckay
Consultant Partner
Servicing Division
Clearwater-IT
e:paul_mckay@clearwater-it.co.uk
t:0161 877 6090
m: 07713 510946
======================================

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomasz
Myrta
Sent: 05 March 2003 10:05
To: Paul McKay
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query performance on large table

Paul McKay wrote:
> Hopefully you guys can help me with another query I've got that's
> running slow.
>
> This time it's across two tables I have
>
> clearview=# \d panconversation
>       Table "panconversation"
>    Column    |  Type   | Modifiers
> -------------+---------+-----------
>  assessment  | integer | not null
>  interface   | integer |
>  source      | integer |
>  destination | integer |
>  protocol    | integer |
> Indexes: idx_panconversation_destination,
>          idx_panconversation_interface,
>          idx_panconversation_protocol,
>          idx_panconversation_source
> Primary key: panconversation_pkey
> Unique keys: unq_panconversation
> Triggers: RI_ConstraintTrigger_52186648,
>           RI_ConstraintTrigger_52186654,
>           RI_ConstraintTrigger_52186660,
>           RI_ConstraintTrigger_52186666
>
> Primary key is assessment
>
> Along with the table I was dealing with before, with the index I'd
> mislaid put back in
>
> clearview=# \d measurement
>               Table "measurement"
>    Column   |         Type          | Modifiers
> ------------+-----------------------+-----------
>  assessment | integer               |
>  time       | integer               |
>  value      | character varying(50) |
> Indexes: idx_measurement_assessment,
>          idx_measurement_time,
>          ind_measurement_ass_time
>
> The 'explain analyse' of the query I am running is rather evil.
>
> clearview=# explain analyse select source,value
> clearview-#            from measurement, PANConversation
> clearview-#            where PANConversation.assessment =
> measurement.assessment
> clearview-#            and Interface = 11
> clearview-#            and Time > 1046184261 and Time < 1046335461
> clearview-# ;
> NOTICE:  QUERY PLAN:
>
> Hash Join  (cost=1532.83..345460.73 rows=75115 width=23) (actual
> time=1769.84..66687.11 rows=16094 loops=1)
>   ->  Seq Scan on measurement  (cost=0.00..336706.07 rows=418859
> width=15) (actual time=1280.11..59985.47 rows=455788 loops=1)
>   ->  Hash  (cost=1498.21..1498.21 rows=13848 width=8) (actual
> time=253.49..253.49 rows=0 loops=1)
>         ->  Seq Scan on panconversation  (cost=0.00..1498.21
rows=13848
> width=8) (actual time=15.64..223.18 rows=13475 loops=1)
> Total runtime: 66694.82 msec
>
> EXPLAIN
>
> Anybody shed any light on why the indexes I created aren't being used,
> and I have these nasty sequential scans?

Measurement is sequentially scaned, because probably "interface=12"
results in lot of records.

Please, check how many rows you have
- all rows in measurement/panconversation,
- rows in measurement with "Interface"=12
- rows in panconversation between your time.

Regards,
Tomasz Myrta



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: Slow query performance on large table

От

Manfred Koizar

Дата:

08 марта 2003 г., 08:15:49

On Wed, 5 Mar 2003 09:47:51 -0000, "Paul McKay"
<paul_mckay@clearwater-it.co.uk> wrote:
>Hash Join  (cost=1532.83..345460.73 rows=75115 width=23) (actual
>time=1769.84..66687.11 rows=16094 loops=1)
>  ->  Seq Scan on measurement  (cost=0.00..336706.07 rows=418859
>width=15) (actual time=1280.11..59985.47 rows=455788 loops=1)
>  ->  Hash  (cost=1498.21..1498.21 rows=13848 width=8) (actual
>time=253.49..253.49 rows=0 loops=1)
>        ->  Seq Scan on panconversation  (cost=0.00..1498.21 rows=13848
>width=8) (actual time=15.64..223.18 rows=13475 loops=1)
>Total runtime: 66694.82 msec

|clearview=# select count(*) from measurement;
| 15302138
|clearview=# select count(*) from panconversation;
| 77217
Paul,

you seem to have a lot of dead tuples in your tables.

    VACUUM FULL VERBOSE ANALYZE panconversation;
    VACUUM FULL VERBOSE ANALYZE measurement;

This should cut your query time to ca. one third.  If you could
migrate to 7.3 and create your tables WITHOUT OIDS, I'd expect a
further speed increase of ~ 15%.

Servus
 Manfred

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Slow query performance on large table