Обсуждение: join-performance problem

Поиск
Список
Период
Сортировка

join-performance problem

От
Wolfgang.Fuertbauer@ebewe.com
Дата:
Hi,

I have problem using joins: there are indexes which could be used, but
sequential-scan takes place;
see:

>explain SELECT b.Kdnr, date_part('year', b.Datum), d.Name, e.Name, a.Menge
,
>     a.Rabproz, a.Netto - (a.netto * a.rabproz / 100),
>     a.Netto * b.kurs - ((a.netto * b.kurs) * a.rabproz / 100), 'RG'
>FROM Faktzeilen a, Fakt b, Artikel c, Kollektion d, ArGruppen e
>where a.Fanr = b.nr
>     and c.nr = a.Arnr
>     and c.Kollektion = d.Nr
>     and (c.Gruppe = e.nr or c.gruppe = 0)
>     and b.kdnr = 49736;

>NOTICE:  QUERY PLAN:996" 9L, 376C written
>
>Nested Loop  (cost=17.08..265.50 rows=30 width=152)
>  ->  Nested Loop  (cost=17.08..90.50 rows=5 width=115)
>        ->  Nested Loop  (cost=17.08..66.32 rows=5 width=78)
>              ->  Hash Join  (cost=17.08..42.14 rows=5 width=51)
>                    ->  Seq Scan on faktzeilen a  (cost=0.00..20.00
rows=1000 width=35)
>                    ->  Hash  (cost=17.07..17.07 rows=5 width=16)
>                          ->  Index Scan using fakt_tbkundentbfakt_key on
fakt b  (cost=0.00..17.07 rows=5 width=16)
>              ->  Index Scan using artikel_pkey on artikel c
(cost=0.00..4.82 rows=1 width=27)
>        ->  Index Scan using kollektion_pkey on kollektion d
(cost=0.00..4.82 rows=1 width=37)
>  ->  Seq Scan on argruppen e  (cost=0.00..20.00 rows=1000 width=37)
>
>EXPLAIN

my problem is the sequential scan for tabel faktzeilen (which has 250000
rows);
the following indexes exist:

primary index: fanr, znr
index2: fanr
index3: fanr, arnr, arname, pknr

can one tell me why index2 is not used for the join?

Thanks in advance

Wolfgang



Re: join-performance problem

От
Stephan Szabo
Дата:
On Tue, 30 Apr 2002 Wolfgang.Fuertbauer@ebewe.com wrote:

> Hi,
>
> I have problem using joins: there are indexes which could be used, but
> sequential-scan takes place;

Have you run vacuum analyze on the database? Some of those statistics
down there look suspiciously like the default statistics (1000 rows in
faktzeilen)
> see:
>
> >explain SELECT b.Kdnr, date_part('year', b.Datum), d.Name, e.Name, a.Menge
> ,
> >     a.Rabproz, a.Netto - (a.netto * a.rabproz / 100),
> >     a.Netto * b.kurs - ((a.netto * b.kurs) * a.rabproz / 100), 'RG'
> >FROM Faktzeilen a, Fakt b, Artikel c, Kollektion d, ArGruppen e
> >where a.Fanr = b.nr
> >     and c.nr = a.Arnr
> >     and c.Kollektion = d.Nr
> >     and (c.Gruppe = e.nr or c.gruppe = 0)
> >     and b.kdnr = 49736;
>
> >NOTICE:  QUERY PLAN:996" 9L, 376C written
> >
> >Nested Loop  (cost=17.08..265.50 rows=30 width=152)
> >  ->  Nested Loop  (cost=17.08..90.50 rows=5 width=115)
> >        ->  Nested Loop  (cost=17.08..66.32 rows=5 width=78)
> >              ->  Hash Join  (cost=17.08..42.14 rows=5 width=51)
> >                    ->  Seq Scan on faktzeilen a  (cost=0.00..20.00
> rows=1000 width=35)
> >                    ->  Hash  (cost=17.07..17.07 rows=5 width=16)
> >                          ->  Index Scan using fakt_tbkundentbfakt_key on
> fakt b  (cost=0.00..17.07 rows=5 width=16)
> >              ->  Index Scan using artikel_pkey on artikel c
> (cost=0.00..4.82 rows=1 width=27)
> >        ->  Index Scan using kollektion_pkey on kollektion d
> (cost=0.00..4.82 rows=1 width=37)
> >  ->  Seq Scan on argruppen e  (cost=0.00..20.00 rows=1000 width=37)
> >
> >EXPLAIN
>
> my problem is the sequential scan for tabel faktzeilen (which has 250000
> rows);
> the following indexes exist:
>
> primary index: fanr, znr
> index2: fanr
> index3: fanr, arnr, arname, pknr
>
> can one tell me why index2 is not used for the join?
>
> Thanks in advance
>
> Wolfgang
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: join-performance problem

От
Tom Lane
Дата:
Wolfgang.Fuertbauer@ebewe.com writes:
>> explain SELECT b.Kdnr, date_part('year', b.Datum), d.Name, e.Name, a.Menge,
>> a.Rabproz, a.Netto - (a.netto * a.rabproz / 100),
>> a.Netto * b.kurs - ((a.netto * b.kurs) * a.rabproz / 100), 'RG'
>> FROM Faktzeilen a, Fakt b, Artikel c, Kollektion d, ArGruppen e
>> where a.Fanr = b.nr
>> and c.nr = a.Arnr
>> and c.Kollektion = d.Nr
>> and (c.Gruppe = e.nr or c.gruppe = 0)
>> and b.kdnr = 49736;

Anyplace that c.gruppe is 0, this is an unconstrained join to e --- ie,
you'll get a row out for *every* row of e.  Somehow I doubt that's the
behavior you really want.

I concur with Stephan's observation that you haven't analyzed.  But
even when you have, this query doesn't give much traction for the use
of indexes on a --- the only constraint that might possibly be used to
avoid a complete scan of a is the "b.kdnr = 49736", and that's not even
on a.  The only hope I can see is if you create an index on b.kdnr;
then (if there aren't very many rows matching b.kdnr = 49736), it might
be able to pick those up with an indexscan on b and then do an inner
indexscan join to a using a.Fanr = b.nr.  Your secondary indexes on a
look like wastes of space (at least for this query).
        regards, tom lane


Re: join-performance problem

От
Wolfgang.Fuertbauer@ebewe.com
Дата:
On 30.04.2002 18:44:19 pgsql-sql-owner wrote:
>Wolfgang.Fuertbauer@ebewe.com writes:
>>> explain SELECT b.Kdnr, date_part('year', b.Datum), d.Name, e.Name,
a.Menge,
>>> a.Rabproz, a.Netto - (a.netto * a.rabproz / 100),
>>> a.Netto * b.kurs - ((a.netto * b.kurs) * a.rabproz / 100), 'RG'
>>> FROM Faktzeilen a, Fakt b, Artikel c, Kollektion d, ArGruppen e
>>> where a.Fanr = b.nr
>>> and c.nr = a.Arnr
>>> and c.Kollektion = d.Nr
>>> and (c.Gruppe = e.nr or c.gruppe = 0)
>>> and b.kdnr = 49736;
>
>Anyplace that c.gruppe is 0, this is an unconstrained join to e --- ie,
>you'll get a row out for *every* row of e.  Somehow I doubt that's the
>behavior you really want.

you're absolutly right! I fixed that (by having an record in c with nr 0 -
what
is ok for the application)

>I concur with Stephan's observation that you haven't analyzed.  But
>even when you have, this query doesn't give much traction for the use
>of indexes on a --- the only constraint that might possibly be used to
>avoid a complete scan of a is the "b.kdnr = 49736", and that's not even
>on a.  The only hope I can see is if you create an index on b.kdnr;
>then (if there aren't very many rows matching b.kdnr = 49736), it might
>be able to pick those up with an indexscan on b and then do an inner
>indexscan join to a using a.Fanr = b.nr.

OK;
created an index on b.kdnr (fakt_kunde_key) and here is the analyses:

Hash Join  (cost=482.82..704.62 rows=80 width=93) ->  Hash Join  (cost=481.78..702.17 rows=80 width=79)       ->  Hash
Join (cost=480.44..699.43 rows=80 width=59)             ->  Seq Scan on artikel c  (cost=0.00..155.77 rows=4977
 
width=18)             ->  Hash  (cost=480.24..480.24 rows=80 width=41)                   ->  Nested Loop
(cost=0.00..480.24rows=80 width=41)                         ->  Index Scan using fakt_kunde_key on fakt b
 
(cost=0.00..55.11 rows=13 width=16)                         ->  Index Scan using faktzeilen_pkey on
faktzeilen a  (cost=0.00..31.82 rows=10 width=25)       ->  Hash  (cost=1.27..1.27 rows=27 width=20)             ->
SeqScan on argruppen e  (cost=0.00..1.27 rows=27
 
width=20) ->  Hash  (cost=1.03..1.03 rows=3 width=14)       ->  Seq Scan on kollektion d  (cost=0.00..1.03 rows=3
width=14)

EXPLAIN

can you pleas explain me why c is now scanned sequential?

>Your secondary indexes on a
>look like wastes of space (at least for this query).

they are relevant for other queries; this is only a part from a *large*
ordering / invoicing-system I want to port from Access.

Thanks and Best regards
Wolfgang



Re: join-performance problem

От
Tom Lane
Дата:
Wolfgang.Fuertbauer@ebewe.com writes:
> Hash Join  (cost=482.82..704.62 rows=80 width=93)
>   ->  Hash Join  (cost=481.78..702.17 rows=80 width=79)
>         ->  Hash Join  (cost=480.44..699.43 rows=80 width=59)
>               ->  Seq Scan on artikel c  (cost=0.00..155.77 rows=4977
> width=18)
>               ->  Hash  (cost=480.24..480.24 rows=80 width=41)
>                     ->  Nested Loop  (cost=0.00..480.24 rows=80 width=41)
>                           ->  Index Scan using fakt_kunde_key on fakt b
> (cost=0.00..55.11 rows=13 width=16)
>                           ->  Index Scan using faktzeilen_pkey on
> faktzeilen a  (cost=0.00..31.82 rows=10 width=25)
>         ->  Hash  (cost=1.27..1.27 rows=27 width=20)
>               ->  Seq Scan on argruppen e  (cost=0.00..1.27 rows=27
> width=20)
>   ->  Hash  (cost=1.03..1.03 rows=3 width=14)
>         ->  Seq Scan on kollektion d  (cost=0.00..1.03 rows=3 width=14)

> can you pleas explain me why c is now scanned sequential?

Looks like a reasonable choice to me.  80 indexscan probes would
probably cost more than the one seqscan, given that c is so small.

The alternative of putting c on the inside of the hashjoin won't
be chosen as long as c is the larger relation...
        regards, tom lane