Обсуждение: join-performance problem
Hi, I have problem using joins: there are indexes which could be used, but sequential-scan takes place; see: >explain SELECT b.Kdnr, date_part('year', b.Datum), d.Name, e.Name, a.Menge , > a.Rabproz, a.Netto - (a.netto * a.rabproz / 100), > a.Netto * b.kurs - ((a.netto * b.kurs) * a.rabproz / 100), 'RG' >FROM Faktzeilen a, Fakt b, Artikel c, Kollektion d, ArGruppen e >where a.Fanr = b.nr > and c.nr = a.Arnr > and c.Kollektion = d.Nr > and (c.Gruppe = e.nr or c.gruppe = 0) > and b.kdnr = 49736; >NOTICE: QUERY PLAN:996" 9L, 376C written > >Nested Loop (cost=17.08..265.50 rows=30 width=152) > -> Nested Loop (cost=17.08..90.50 rows=5 width=115) > -> Nested Loop (cost=17.08..66.32 rows=5 width=78) > -> Hash Join (cost=17.08..42.14 rows=5 width=51) > -> Seq Scan on faktzeilen a (cost=0.00..20.00 rows=1000 width=35) > -> Hash (cost=17.07..17.07 rows=5 width=16) > -> Index Scan using fakt_tbkundentbfakt_key on fakt b (cost=0.00..17.07 rows=5 width=16) > -> Index Scan using artikel_pkey on artikel c (cost=0.00..4.82 rows=1 width=27) > -> Index Scan using kollektion_pkey on kollektion d (cost=0.00..4.82 rows=1 width=37) > -> Seq Scan on argruppen e (cost=0.00..20.00 rows=1000 width=37) > >EXPLAIN my problem is the sequential scan for tabel faktzeilen (which has 250000 rows); the following indexes exist: primary index: fanr, znr index2: fanr index3: fanr, arnr, arname, pknr can one tell me why index2 is not used for the join? Thanks in advance Wolfgang
On Tue, 30 Apr 2002 Wolfgang.Fuertbauer@ebewe.com wrote: > Hi, > > I have problem using joins: there are indexes which could be used, but > sequential-scan takes place; Have you run vacuum analyze on the database? Some of those statistics down there look suspiciously like the default statistics (1000 rows in faktzeilen) > see: > > >explain SELECT b.Kdnr, date_part('year', b.Datum), d.Name, e.Name, a.Menge > , > > a.Rabproz, a.Netto - (a.netto * a.rabproz / 100), > > a.Netto * b.kurs - ((a.netto * b.kurs) * a.rabproz / 100), 'RG' > >FROM Faktzeilen a, Fakt b, Artikel c, Kollektion d, ArGruppen e > >where a.Fanr = b.nr > > and c.nr = a.Arnr > > and c.Kollektion = d.Nr > > and (c.Gruppe = e.nr or c.gruppe = 0) > > and b.kdnr = 49736; > > >NOTICE: QUERY PLAN:996" 9L, 376C written > > > >Nested Loop (cost=17.08..265.50 rows=30 width=152) > > -> Nested Loop (cost=17.08..90.50 rows=5 width=115) > > -> Nested Loop (cost=17.08..66.32 rows=5 width=78) > > -> Hash Join (cost=17.08..42.14 rows=5 width=51) > > -> Seq Scan on faktzeilen a (cost=0.00..20.00 > rows=1000 width=35) > > -> Hash (cost=17.07..17.07 rows=5 width=16) > > -> Index Scan using fakt_tbkundentbfakt_key on > fakt b (cost=0.00..17.07 rows=5 width=16) > > -> Index Scan using artikel_pkey on artikel c > (cost=0.00..4.82 rows=1 width=27) > > -> Index Scan using kollektion_pkey on kollektion d > (cost=0.00..4.82 rows=1 width=37) > > -> Seq Scan on argruppen e (cost=0.00..20.00 rows=1000 width=37) > > > >EXPLAIN > > my problem is the sequential scan for tabel faktzeilen (which has 250000 > rows); > the following indexes exist: > > primary index: fanr, znr > index2: fanr > index3: fanr, arnr, arname, pknr > > can one tell me why index2 is not used for the join? > > Thanks in advance > > Wolfgang > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
Wolfgang.Fuertbauer@ebewe.com writes: >> explain SELECT b.Kdnr, date_part('year', b.Datum), d.Name, e.Name, a.Menge, >> a.Rabproz, a.Netto - (a.netto * a.rabproz / 100), >> a.Netto * b.kurs - ((a.netto * b.kurs) * a.rabproz / 100), 'RG' >> FROM Faktzeilen a, Fakt b, Artikel c, Kollektion d, ArGruppen e >> where a.Fanr = b.nr >> and c.nr = a.Arnr >> and c.Kollektion = d.Nr >> and (c.Gruppe = e.nr or c.gruppe = 0) >> and b.kdnr = 49736; Anyplace that c.gruppe is 0, this is an unconstrained join to e --- ie, you'll get a row out for *every* row of e. Somehow I doubt that's the behavior you really want. I concur with Stephan's observation that you haven't analyzed. But even when you have, this query doesn't give much traction for the use of indexes on a --- the only constraint that might possibly be used to avoid a complete scan of a is the "b.kdnr = 49736", and that's not even on a. The only hope I can see is if you create an index on b.kdnr; then (if there aren't very many rows matching b.kdnr = 49736), it might be able to pick those up with an indexscan on b and then do an inner indexscan join to a using a.Fanr = b.nr. Your secondary indexes on a look like wastes of space (at least for this query). regards, tom lane
On 30.04.2002 18:44:19 pgsql-sql-owner wrote: >Wolfgang.Fuertbauer@ebewe.com writes: >>> explain SELECT b.Kdnr, date_part('year', b.Datum), d.Name, e.Name, a.Menge, >>> a.Rabproz, a.Netto - (a.netto * a.rabproz / 100), >>> a.Netto * b.kurs - ((a.netto * b.kurs) * a.rabproz / 100), 'RG' >>> FROM Faktzeilen a, Fakt b, Artikel c, Kollektion d, ArGruppen e >>> where a.Fanr = b.nr >>> and c.nr = a.Arnr >>> and c.Kollektion = d.Nr >>> and (c.Gruppe = e.nr or c.gruppe = 0) >>> and b.kdnr = 49736; > >Anyplace that c.gruppe is 0, this is an unconstrained join to e --- ie, >you'll get a row out for *every* row of e. Somehow I doubt that's the >behavior you really want. you're absolutly right! I fixed that (by having an record in c with nr 0 - what is ok for the application) >I concur with Stephan's observation that you haven't analyzed. But >even when you have, this query doesn't give much traction for the use >of indexes on a --- the only constraint that might possibly be used to >avoid a complete scan of a is the "b.kdnr = 49736", and that's not even >on a. The only hope I can see is if you create an index on b.kdnr; >then (if there aren't very many rows matching b.kdnr = 49736), it might >be able to pick those up with an indexscan on b and then do an inner >indexscan join to a using a.Fanr = b.nr. OK; created an index on b.kdnr (fakt_kunde_key) and here is the analyses: Hash Join (cost=482.82..704.62 rows=80 width=93) -> Hash Join (cost=481.78..702.17 rows=80 width=79) -> Hash Join (cost=480.44..699.43 rows=80 width=59) -> Seq Scan on artikel c (cost=0.00..155.77 rows=4977 width=18) -> Hash (cost=480.24..480.24 rows=80 width=41) -> Nested Loop (cost=0.00..480.24rows=80 width=41) -> Index Scan using fakt_kunde_key on fakt b (cost=0.00..55.11 rows=13 width=16) -> Index Scan using faktzeilen_pkey on faktzeilen a (cost=0.00..31.82 rows=10 width=25) -> Hash (cost=1.27..1.27 rows=27 width=20) -> SeqScan on argruppen e (cost=0.00..1.27 rows=27 width=20) -> Hash (cost=1.03..1.03 rows=3 width=14) -> Seq Scan on kollektion d (cost=0.00..1.03 rows=3 width=14) EXPLAIN can you pleas explain me why c is now scanned sequential? >Your secondary indexes on a >look like wastes of space (at least for this query). they are relevant for other queries; this is only a part from a *large* ordering / invoicing-system I want to port from Access. Thanks and Best regards Wolfgang
Wolfgang.Fuertbauer@ebewe.com writes: > Hash Join (cost=482.82..704.62 rows=80 width=93) > -> Hash Join (cost=481.78..702.17 rows=80 width=79) > -> Hash Join (cost=480.44..699.43 rows=80 width=59) > -> Seq Scan on artikel c (cost=0.00..155.77 rows=4977 > width=18) > -> Hash (cost=480.24..480.24 rows=80 width=41) > -> Nested Loop (cost=0.00..480.24 rows=80 width=41) > -> Index Scan using fakt_kunde_key on fakt b > (cost=0.00..55.11 rows=13 width=16) > -> Index Scan using faktzeilen_pkey on > faktzeilen a (cost=0.00..31.82 rows=10 width=25) > -> Hash (cost=1.27..1.27 rows=27 width=20) > -> Seq Scan on argruppen e (cost=0.00..1.27 rows=27 > width=20) > -> Hash (cost=1.03..1.03 rows=3 width=14) > -> Seq Scan on kollektion d (cost=0.00..1.03 rows=3 width=14) > can you pleas explain me why c is now scanned sequential? Looks like a reasonable choice to me. 80 indexscan probes would probably cost more than the one seqscan, given that c is so small. The alternative of putting c on the inside of the hashjoin won't be chosen as long as c is the larger relation... regards, tom lane