Обсуждение: Improving performance of a query

Поиск
Список
Период
Сортировка

Improving performance of a query

От
Carlos Benkendorf
Дата:
Hi,
 
Is there a way to improve the performance of the following query?
 
SELECT * FROM SSIRRA where
(YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or
(YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or
(YEAR = 2004 and CUSTOMER > 0000000004) or
(YEAR > 2004)
 
Thanks in advance!
 
Benkendorf

__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger
http://br.download.yahoo.com/messenger/

Re: Improving performance of a query

От
Michael Fuhr
Дата:
On Sat, Sep 03, 2005 at 09:02:27PM +0000, Carlos Benkendorf wrote:
> Is there a way to improve the performance of the following query?
>
> SELECT * FROM SSIRRA where
> (YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or
> (YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or
> (YEAR = 2004 and CUSTOMER > 0000000004) or
> (YEAR > 2004)

Could you post the EXPLAIN ANALYZE output of the query?  It might
also be useful to see the EXPLAIN ANALYZE output for each of those
WHERE conditions individually.  Also, what are the table and index
definitions?  How many rows are in the table?  What version of
PostgreSQL are you using?  Do you vacuum and analyze regularly?

In simple tests in 8.0.3 with random data -- which almost certainly
has a different distribution than yours -- I see about a 10%
improvement with a multi-column index on (year, customer, code,
part) over using single-column indexes on each of those columns.
Various multi-column indexes on two or three of the columns gave
worse performance than single-column indexes.  Your results will
probably vary, however.

In my tests, 8.1beta1 with its bitmap scans was about 20-25% faster
than 8.0.3 with single-column indexes and about 35% faster with a
four-column index.  8.1beta1's use of a four-column index was about
45% faster than 8.0.3's use of single-column indexes.  Don't trust
these numbers too much, though -- I simply inserted 20,000 random
records into a table and ran the above query.

--
Michael Fuhr

Re: Improving performance of a query

От
"Merlin Moncure"
Дата:
Carlos wrote:
SELECT * FROM SSIRRA where
(YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or
(YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or
(YEAR = 2004 and CUSTOMER > 0000000004) or
(YEAR > 2004)
[snip]

ah, the positional query.  You can always rewrite this query in the
following form:

(YEAR >= 2004) and
(YEAR = 2004 or CUSTOMER >= 0000000004) and
(YEAR = 2004 or CUSTOMER = 0000000004 or CODE >= 00) and
(YEAR = 2004 or CUSTOMER = 0000000004 or CODE = 00 or PART > 00)

This is better because it will index scan using 'year' (not customer or
part though).  The true answer is to lobby for/develop proper row
constructor support so you can just

SELECT * FROM SSIRRA where (YEAR, CUSTOMER, CODE, PART) > (2004,
0000000004, 00, 00)

this is designed to do what you are trying to do but currently doesn't
work quite right.

note: in all these queries, 'order by YEAR, CUSTOMER, CODE, PART' should
probably be on the query.

Other solution: use cursor/fetch or some type of materialized solution.

Merlin

Re: Improving performance of a query

От
Stephan Szabo
Дата:
On Tue, 6 Sep 2005, Merlin Moncure wrote:

> Carlos wrote:
> SELECT * FROM SSIRRA where
> (YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or
> (YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or
> (YEAR = 2004 and CUSTOMER > 0000000004) or
> (YEAR > 2004)
> [snip]
>
> ah, the positional query.  You can always rewrite this query in the
> following form:
>
> (YEAR >= 2004) and
> (YEAR = 2004 or CUSTOMER >= 0000000004) and
> (YEAR = 2004 or CUSTOMER = 0000000004 or CODE >= 00) and
> (YEAR = 2004 or CUSTOMER = 0000000004 or CODE = 00 or PART > 00)

Unless I'm not seeing something, I don't think that's a correct
reformulation in general. If customer < 4 and year > 2004 the original
clause would return true but the reformulation would return false since
(year=2004 or customer >= 4) would be false.

Re: Improving performance of a query

От
"Merlin Moncure"
Дата:
> > Carlos wrote:
> > SELECT * FROM SSIRRA where
> > (YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00)
or
> > (YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or
> > (YEAR = 2004 and CUSTOMER > 0000000004) or
> > (YEAR > 2004)
> > [snip]
> >
> > ah, the positional query.  You can always rewrite this query in the
> > following form:
> >
> > (YEAR >= 2004) and
> > (YEAR = 2004 or CUSTOMER >= 0000000004) and
> > (YEAR = 2004 or CUSTOMER = 0000000004 or CODE >= 00) and
> > (YEAR = 2004 or CUSTOMER = 0000000004 or CODE = 00 or PART > 00)
>
> Unless I'm not seeing something, I don't think that's a correct
> reformulation in general. If customer < 4 and year > 2004 the original
> clause would return true but the reformulation would return false
since
> (year=2004 or customer >= 4) would be false.

You are correct, you also have to exchange '=' with '>' to exchange
'and' with 'or'.

Correct answer is:
> > (YEAR >= 2004) and
> > (YEAR > 2004 or CUSTOMER >= 0000000004) and
> > (YEAR > 2004 or CUSTOMER > 0000000004 or CODE >= 00) and
> > (YEAR > 2004 or CUSTOMER > 0000000004 or CODE > 00 or PART > 00)

It's easy to get tripped up here: the basic problem is how to get the
next record based on a multi part key.  My ISAM bridge can write them
either way but the 'and' major form is always faster ;).

MErlin