Обсуждение: Query Plan far worse in 7.3.2 than 7.2.1

Поиск
Список
Период
Сортировка

Query Plan far worse in 7.3.2 than 7.2.1

От
"SZŰCS Gábor"
Дата:
Dear Gurus,

A nasty query and its EXPLAINs are here. Read on at your own risk :)

ABSTRACT:

Search for the strings "looping index scan" and "loops=2310" in this
message.

DETAILS:

Yesteray, I spent two hours to optimize a view in postgresql 7.2.1. My
problem was that one of the index scans executed 2358 times, which is (as
far as I can consider) equal to 2x1179, where 2 is the #rows in a subquery,
and 1179 is the total #rows in the table of the index scan ("arfolyam").

Finally I managed to put it as deep in the query as possible, to reduce the
loops to the number of query result rows (12)

However, when I tried the same query in 7.3.2, it first complained about a
missing FROM clause and a missing GROUP BY for a field. I managed to
eliminate both without affecting 7.2.1 performance, but in 7.3.2, there are
still those 2300+ loops of the index scan.

I ask your kind help, explanation or references to documented similar cases.

Below is the query and the two explains. Please forgive the raw format.
If you need further info (such as table defs), I'd be glad to help.

Note that the two databases are not exactly the same but very similar (they
where the same 2 weeks ago, but data changes occured in both, independently)
and I don't think these differences should affect the planner.

Yours,
G.
--
while (!asleep()) sheep++;

---------------------------- QUERY -----------------------------------
SELECT *,
    kerekit_penznem (netto_ertek*(afa_szazalek/100), penznem) AS afa_ertek,
    kerekit_penznem (netto_ertek*(1+afa_szazalek/100), penznem) AS
brutto_ertek
FROM (
  SELECT szamla,
    kerekit_penznem(
      elsodl_netto_ertek*COALESCE(deka,1)/COALESCE(dekb,1), penznem)
      AS netto_ertek,
    konyvelesi_tetelcsoport, afa, afa_szazalek, penznem
  FROM
    (SELECT szt.szamla,
 kerekit_penznem(sum(szt.netto_egysegar * szt.mennyiseg), penznem)
   AS elsodl_netto_ertek,
 konyvelesi_tetelcsoport(szt.szamla, szt.tetelszam)
   AS konyvelesi_tetelcsoport,
 szt.afa, afa.ertek AS afa_szazalek,
 szamla.penznem AS sz_penznem,
 szamla.teljesites AS sz_teljesites,
 arf_a.deviza_kozeparfolyam as deka,
 arf_b.deviza_kozeparfolyam as dekb,
 foo_valuta AS penznem
     FROM szamla_tetele szt
 LEFT JOIN szamla ON (szamla = szamla.az)
 LEFT JOIN afa ON (afa.az = szt.afa)
 LEFT JOIN arfolyam arf_a
 ON (arf_a.ervenyes =
     (SELECT ervenyes FROM arfolyam
      WHERE ervenyes<=szamla.teljesites AND valuta = szamla.penznem
      ORDER BY 1 DESC LIMIT 1)
     AND szamla.penznem=arf_a.valuta)
 JOIN
 (SELECT az AS foo_valuta FROM valuta) AS valuta ON (true)
 LEFT JOIN arfolyam arf_b
 ON (arf_b.valuta=foo_valuta AND
     arf_b.ervenyes =
-- this is the looping index scan --
     (SELECT ervenyes FROM arfolyam
      WHERE ervenyes<=szamla.teljesites AND valuta = foo_valuta
      ORDER BY 1 DESC LIMIT 1)
-- end of looping index scan --
    )
     WHERE (NOT szt.archiv) AND
       (foo_valuta = 4 or arf_b.valuta notnull
       )
     GROUP BY szt.szamla, konyvelesi_tetelcsoport, szt.afa,
       sz_penznem, sz_teljesites, afa.ertek,
       arf_a.deviza_kozeparfolyam, arf_b.deviza_kozeparfolyam, foo_valuta,
penznem
    ) foo
) bar
WHERE szamla=2380;

---------------------------- 7.2.1 PLAN ------------------------------
Subquery Scan foo  (cost=488.97..490.94 rows=8 width=104) (actual
time=94.77..109.10 rows=12 loops=1)
->  Aggregate  (cost=488.97..490.94 rows=8 width=104) (actual
time=89.29..92.05 rows=12 loops=1)
    ->  Group  (cost=488.97..490.74 rows=79 width=104) (actual
time=88.13..88.59 rows=12 loops=1)
 ->  Sort  (cost=488.97..488.97 rows=79 width=104) (actual time=88.09..88.13
rows=12 loops=1)
     ->  Nested Loop  (cost=1.05..486.50 rows=79 width=104) (actual
time=28.23..86.20 rows=12 loops=1)
  ->  Nested Loop  (cost=1.05..150.19 rows=79 width=84) (actual
time=12.68..25.41 rows=12 loops=1)
      ->  Nested Loop  (cost=1.05..135.52 rows=13 width=80) (actual
time=12.60..24.80 rows=2 loops=1)
   ->  Hash Join  (cost=1.05..79.59 rows=13 width=60) (actual
time=0.55..0.80 rows=2 loops=1)
       ->  Nested Loop  (cost=0.00..78.31 rows=13 width=46) (actual
time=0.23..0.42 rows=2 loops=1)
    ->  Index Scan using szml_ttl_szml on szamla_tetele szt
(cost=0.00..3.51 rows=13 width=34) (actual time=0.11..0.16 rows=2 loops=1)
    ->  Index Scan using szamla_az_key on szamla  (cost=0.00..5.70 rows=1
width=12) (actual time=0.07..0.09 rows=1 loops=2)
       ->  Hash  (cost=1.04..1.04 rows=4 width=14) (actual time=0.14..0.14
rows=0 loops=1)
    ->  Seq Scan on afa  (cost=0.00..1.04 rows=4 width=14) (actual
time=0.08..0.11 rows=4 loops=1)
   ->  Index Scan using arfolyam_ervenyes on arfolyam arf_a
(cost=0.00..3.63 rows=3 width=20) (actual time=0.01..0.01 rows=0 loops=2)
       SubPlan
       ->  Limit  (cost=0.00..0.17 rows=1 width=4) (actual time=11.92..11.92
rows=0 loops=2)
    ->  Index Scan Backward using arfolyam_ervenyes on arfolyam
(cost=0.00..13.30 rows=79 width=4) (actual time=11.90..11.90 rows=0 loops=2)
       ->  Limit  (cost=0.00..0.17 rows=1 width=4)
    ->  Index Scan Backward using arfolyam_ervenyes on arfolyam
(cost=0.00..13.30 rows=79 width=4)
      ->  Seq Scan on valuta  (cost=0.00..1.06 rows=6 width=4) (actual
time=0.02..0.11 rows=6 loops=2)
  ->  Index Scan using arfolyam_ervenyes on arfolyam arf_b  (cost=0.00..3.63
rows=3 width=20) (actual time=0.04..0.10 rows=3 loops=12)
      SubPlan
      ->  Limit  (cost=0.00..0.17 rows=1 width=4) (actual time=4.35..4.39
rows=1 loops=12)
   ->  Index Scan Backward using arfolyam_ervenyes on arfolyam
(cost=0.00..13.30 rows=79 width=4) (actual time=4.33..4.37 rows=2 loops=12)
      ->  Limit  (cost=0.00..0.17 rows=1 width=4)
   ->  Index Scan Backward using arfolyam_ervenyes on arfolyam
(cost=0.00..13.30 rows=79 width=4)
Total runtime: 111.48 msec

---------------------------- 7.3.2 PLAN ------------------------------
 Subquery Scan foo  (cost=14542.01..15448.46 rows=3022 width=123) (actual
time=2264.36..2282.17 rows=12 loops=1)
   ->  Aggregate  (cost=14542.01..15448.46 rows=3022 width=123) (actual
time=2257.70..2261.08 rows=12 loops=1)
         ->  Group  (cost=14542.01..15372.92 rows=30215 width=123) (actual
time=2256.31..2256.84 rows=12 loops=1)
               ->  Sort  (cost=14542.01..14617.55 rows=30215 width=123)
(actual time=2256.27..2256.31 rows=12 loops=1)
                     Sort Key: szt.szamla,
konyvelesi_tetelcsoport(szt.szamla, szt.tetelszam), szt.afa, szamla.penznem,
szamla.teljesites, afa.ertek, arf_a.deviza_kozeparfolyam,
arf_b.deviza_kozeparfolyam, public.valuta.az
                     ->  Merge Join  (cost=4755.50..11038.79 rows=30215
width=123) (actual time=80.88..2254.96 rows=12 loops=1)
                           Merge Cond: ("outer".az = "inner".valuta)
                           Join Filter: ("inner".ervenyes = (subplan))
                           Filter: (("outer".az = 4) OR ("inner".valuta IS
NOT NULL))
                           ->  Sort  (cost=4676.19..4751.73 rows=30215
width=103) (actual time=56.41..56.44 rows=12 loops=1)
                                 Sort Key: public.valuta.az
                                 ->  Nested Loop  (cost=433.35..1375.92
rows=30215 width=103) (actual time=55.78..56.18 rows=12 loops=1)
                                       ->  Merge Join  (cost=433.35..469.47
rows=30 width=99) (actual time=55.68..55.72 rows=2 loops=1)
                                             Merge Cond: ("outer".penznem =
"inner".valuta)
                                             Join Filter: ("inner".ervenyes
= (subplan))
                                             ->  Sort  (cost=354.05..354.12
rows=30 width=79) (actual time=32.63..32.64 rows=2 loops=1)
                                                   Sort Key: szamla.penznem
                                                   ->  Hash Join
(cost=121.67..353.30 rows=30 width=79) (actual time=32.37..32.49 rows=2
loops=1)
                                                         Hash Cond:
("outer".afa = "inner".az)
                                                         ->  Merge Join
(cost=120.62..352.09 rows=30 width=58) (actual time=32.07..32.16 rows=2
loops=1)
                                                               Merge Cond:
("outer".az = "inner".szamla)
                                                               ->  Sort
(cost=120.62..123.85 rows=1291 width=12) (actual time=25.62..27.62 rows=1285
loops=1)
                                                                     Sort
Key: szamla.az
                                                                     ->  Seq
Scan on szamla  (cost=0.00..53.91 rows=1291 width=12) (actual
time=0.04..16.46 rows=1314 loops=1)
                                                               ->  Index
Scan using szamla_tetele_pkey on szamla_tetele szt  (cost=0.00..218.88
rows=30 width=46) (actual time=0.13..0.18 rows=2 loops=1)
                                                                     Index
Cond: (szamla = 2380)
                                                                     Filter:
(NOT archiv)
                                                         ->  Hash
(cost=1.04..1.04 rows=4 width=21) (actual time=0.10..0.10 rows=0 loops=1)
                                                               ->  Seq Scan
on afa  (cost=0.00..1.04 rows=4 width=21) (actual time=0.04..0.07 rows=4
loops=1)
                                             ->  Sort  (cost=79.30..82.19
rows=1155 width=20) (actual time=22.93..22.93 rows=1 loops=1)
                                                   Sort Key: arf_a.valuta
                                                   ->  Seq Scan on arfolyam
arf_a  (cost=0.00..20.55 rows=1155 width=20) (actual time=0.03..9.66
rows=1155 loops=1)
                                             SubPlan
                                               ->  Limit  (cost=0.00..0.16
rows=1 width=4) (never executed)
                                                     ->  Index Scan Backward
using arfolyam_ervenyes on arfolyam  (cost=0.00..12.17 rows=77 width=4)
(never executed)
                                                           Index Cond:
(ervenyes <= $0)
                                                           Filter: (valuta =
$1)
                                       ->  Seq Scan on valuta
(cost=0.00..20.00 rows=1000 width=4) (actual time=0.02..0.06 rows=6 loops=2)
                           ->  Sort  (cost=79.30..82.19 rows=1155 width=20)
(actual time=21.98..27.54 rows=2309 loops=1)
                                 Sort Key: arf_b.valuta
                                 ->  Seq Scan on arfolyam arf_b
(cost=0.00..20.55 rows=1155 width=20) (actual time=0.03..9.89 rows=1155
loops=1)
                           SubPlan
                             ->  Limit  (cost=0.00..0.16 rows=1 width=4)
(actual time=0.88..0.91 rows=1 loops=2310)
                                   ->  Index Scan Backward using
arfolyam_ervenyes on arfolyam  (cost=0.00..12.17 rows=77 width=4) (actual
time=0.87..0.90 rows=2 loops=2310)
                                         Index Cond: (ervenyes <= $0)
                                         Filter: (valuta = $2)
 Total runtime: 2287.30 msec


Re: Query Plan far worse in 7.3.2 than 7.2.1

От
Tom Lane
Дата:
"=?iso-8859-2?B?U1rbQ1MgR+Fib3I=?=" <surrano@mailbox.hu> writes:
> A nasty query and its EXPLAINs are here. Read on at your own risk :)

It's pretty much unreadable because of the way your mailer folded,
spindled, and mutilated the EXPLAIN output :-(

Could you resend in a more legible format?  Maybe append the explain
output as an attachment, if you can't get the mailer to leave its
formatting alone otherwise.

            regards, tom lane


Re: Query Plan far worse in 7.3.2 than 7.2.1

От
"SZŰCS Gábor"
Дата:
Sure, thanks for your interest :)

hope these help.

G.
--
while (!asleep()) sheep++;

---------------------------- cut here ------------------------------
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
Sent: Tuesday, April 29, 2003 4:53 PM


> "=?iso-8859-2?B?U1rbQ1MgR+Fib3I=?=" <surrano@mailbox.hu> writes:
> > A nasty query and its EXPLAINs are here. Read on at your own risk :)
>
> It's pretty much unreadable because of the way your mailer folded,
> spindled, and mutilated the EXPLAIN output :-(
>
> Could you resend in a more legible format?  Maybe append the explain
> output as an attachment, if you can't get the mailer to leave its
> formatting alone otherwise.

Вложения

Re: Query Plan far worse in 7.3.2 than 7.2.1

От
Tom Lane
Дата:
"=?iso-8859-2?B?U1rbQ1MgR+Fib3I=?=" <surrano@mailbox.hu> writes:
> ---------------------------- 7.2.1 PLAN ---------------------------------
>             ->  Seq Scan on valuta  (cost=0.00..1.06 rows=6 width=4) (actual time=0.02..0.11 rows=6 loops=2)
>
> ---------------------------- 7.3.2 PLAN ---------------------------------
>                                        ->  Seq Scan on valuta  (cost=0.00..20.00 rows=1000 width=4) (actual
time=0.02..0.06rows=6 loops=2) 

Ah, there's the problem.  You never vacuumed or analyzed "valuta", so
the 7.3 planner didn't know it had only six rows, and chose a plan that
was more appropriate for a larger table.  The thousand-row estimate is
the tipoff, because that's the default assumption when there are no
stats.

            regards, tom lane


Re: Query Plan far worse in 7.3.2 than 7.2.1

От
"SZŰCS Gábor"
Дата:
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
Sent: Wednesday, April 30, 2003 1:05 AM


> "=?iso-8859-2?B?U1rbQ1MgR+Fib3I=?=" <surrano@mailbox.hu> writes:
> > ---------------------------- 7.2.1
PLAN ---------------------------------
> >     ->  Seq Scan on valuta  (cost=0.00..1.06 rows=6 width=4) (actual
time=0.02..0.11 rows=6 loops=2)
> >
> > ---------------------------- 7.3.2
PLAN ---------------------------------
> >                                        ->  Seq Scan on valuta
(cost=0.00..20.00 rows=1000 width=4) (actual time=0.02..0.06 rows=6 loops=2)
>
> Ah, there's the problem.  You never vacuumed or analyzed "valuta", so
> the 7.3 planner didn't know it had only six rows, and chose a plan that
> was more appropriate for a larger table.  The thousand-row estimate is
> the tipoff, because that's the default assumption when there are no
> stats.
>
> regards, tom lane

Thanks!

VACUUM ANALYZE really worked and I learned something new.

The strange part is, that I think I issued a "VACUUM ANALYZE;" (that should
do all the tables, right?) a couple of weeks before because of another
problem (it didn't help that time, tho)

G.
--
while (!asleep()) sheep++;

---------------------------- cut here ------------------------------


Query Plan far worse in 7.3.2 than 7.2.1

От
"Peter Darley"
Дата:
Friends,
    I've got a query that has stopped using an index scan between 7.2.1 or RH
7.1 and 7.3.2 or RH 8.0, and I can't figure out why.  I've come up with a
replacement query which is a whole lot faster, but again, I can't tell why.

The original query (condensed to remove the uninteresting bits) is:

SELECT COUNT(*) FROM Border_Shop_List  WHERE NOT EXISTS (SELECT Foreign_Key
FROM Sample WHERE Foreign_Key='Quantum_' || Border_Shop_List.Assignment_ID
|| '_' || Assignment_Year || '_' || Evaluation_ID)

This runs in 667055.79 msec

The new one is:

SELECT COUNT(*) FROM Border_Shop_List  WHERE 'Quantum_' ||
Border_Shop_List.Assignment_ID || '_' || Border_Shop_List.Assignment_Year ||
'_' || Border_Shop_List.Evaluation_ID NOT IN  (SELECT Foreign_Key FROM
Sample WHERE Foreign_Key IS NOT NULL)

This runs in 16500.83 msec (~1/40th the time)

    Again, my immediate problem is solved, but I'm trying to understand why
there is such a speed difference.

    I've attached explains for the two querys in both versions.

    The schemas for the two databases are identical.  If there's more info
people need, just let me know.

Thanks,
Peter Darley

Вложения

Re: Query Plan far worse in 7.3.2 than 7.2.1

От
Tom Lane
Дата:
"Peter Darley" <pdarley@kinesis-cem.com> writes:
> SELECT COUNT(*) FROM Border_Shop_List  WHERE NOT EXISTS (SELECT Foreign_Key=
>  FROM Sample WHERE Foreign_Key=3D'Quantum_' || Border_Shop_List.Assignment_=
> ID || '_' || Assignment_Year || '_' || Evaluation_ID)

What's the datatype of Foreign_Key?

I'm betting that it's varchar(n) or char(n).  The result of the ||
expression is text, and so the comparison can't use a varchar index
unless you explicitly cast it to varchar:
    WHERE Foreign_Key = ('Quantum_' || ... || Evaluation_ID)::varchar

I think 7.2 had some kluge in it that would allow a varchar index to be
used anyway, but we took out the kluge because it was semantically wrong
(it would also allow use of a char(n) index in place of a text
comparison, which alters the semantics...)

            regards, tom lane


Re: Query Plan far worse in 7.3.2 than 7.2.1

От
"Peter Darley"
Дата:
Tom,
    You hit the nail on the head, foreign_key is a varchar(250).  I'll re-write
the queries with explicit casts.
    I'm hesitant to say anything, because I'm really not in a position to
contribute, but... It seems like there are getting to be lots of typing
issues (this one, 2 isn't an int8, etc.)  I think that people have said that
things are like this to support user defined data types.  I would happily
get rid of user defined data types if it would help with the type conversion
issues.  Just my 2c, for what it's worth.
Thanks,
Peter Darley

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, April 30, 2003 7:54 AM
To: Peter Darley
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query Plan far worse in 7.3.2 than 7.2.1


"Peter Darley" <pdarley@kinesis-cem.com> writes:
> SELECT COUNT(*) FROM Border_Shop_List  WHERE NOT EXISTS (SELECT
Foreign_Key=
>  FROM Sample WHERE Foreign_Key=3D'Quantum_' ||
Border_Shop_List.Assignment_=
> ID || '_' || Assignment_Year || '_' || Evaluation_ID)

What's the datatype of Foreign_Key?

I'm betting that it's varchar(n) or char(n).  The result of the ||
expression is text, and so the comparison can't use a varchar index
unless you explicitly cast it to varchar:
    WHERE Foreign_Key = ('Quantum_' || ... || Evaluation_ID)::varchar

I think 7.2 had some kluge in it that would allow a varchar index to be
used anyway, but we took out the kluge because it was semantically wrong
(it would also allow use of a char(n) index in place of a text
comparison, which alters the semantics...)

            regards, tom lane