Re: Performance issue with nested loop

Поиск
Список
Период
Сортировка
От Decibel!
Тема Re: Performance issue with nested loop
Дата
Msg-id 2C72737C-9DE5-4780-9FB0-677C0B29B783@decibel.org
обсуждение исходный текст
Ответ на Performance issue with nested loop  ("Jens Reufsteck" <jens.reufsteck@hobsons.de>)
Список pgsql-general
On Aug 29, 2007, at 5:15 AM, Jens Reufsteck wrote:
> I'm having a strange performance issue with two almost similar
> queries, the
> one running as expected, the other one taking far more time. The only
> difference is that I have "uniid in (10)" in the normally running
> query and
> "uniid in (9,10)" in the other one. The number of rows resulting
> from the
> respective table differs not very much being 406 for the first and
> 511 for
> the second query.
>
> This is the full query - the "uniid in (9,10)" is in the last
> subquery:
>
>
> SELECT 'Anzahl' AS column1, count(DISTINCT sid) AS column2
> FROM (
>     SELECT sid
>     FROM stud
>     WHERE stud.status > 0
>     AND length(stud.vname) > 1
>     AND length(stud.nname) > 1
> ) AS qur_filter_1 INNER JOIN (
>     SELECT DISTINCT sid
>     FROM stud_vera
>     INNER JOIN phon USING (sid)
>     WHERE veraid = 22
>     AND stud_vera.status > 0
>     AND (
>         (
>         veraid IN (2, 3, 22, 24, 36)
>         AND phontyp = 5
>         AND phon.typ = 1
>         AND phon.status > 0
>         ) OR (
>         veraid NOT IN (2, 3, 22, 24, 36)
>         )
>     )
> ) AS qur_filter_2 USING (sid) INNER JOIN (
>     SELECT DISTINCT sid
>     FROM ausb
>     INNER JOIN uni USING (uniid)
>     WHERE uni.uniort IN ('Augsburg')
>     AND ausb.overview = 1
>     AND ausb.zweitstudium != 2
>     AND ausb.status > 0
> ) AS qur_filter_3 USING (sid) INNER JOIN (
>     SELECT DISTINCT sid
>     FROM ausb
>     WHERE uniid IN (9, 10)
>     AND ausb.overview = 1
>     AND ausb.zweitstudium != 2
>     AND ausb.status > 0
> ) AS qur_filter_4 USING (sid)
>
>
>
> These are the query-plans for both queries, first the problematic one:
>
>
>
> Aggregate  (cost=78785.78..78785.79 rows=1 width=4) (actual
> time=698777.890..698777.891 rows=1 loops=1)
>
>   ->  Nested Loop  (cost=65462.58..78785.78 rows=1 width=4) (actual
> time=6743.856..698776.957 rows=250 loops=1)
>
>         Join Filter: ("outer".sid = "inner".sid)
>
>         ->  Merge Join  (cost=11031.79..11883.12 rows=1 width=12)
> (actual
> time=387.837..433.612 rows=494 loops=1)
>
>               Merge Cond: ("outer".sid = "inner".sid)
>
>               ->  Nested Loop  (cost=5643.11..6490.17 rows=19 width=8)
> (actual time=114.323..154.043 rows=494 loops=1)
>
>                     ->  Unique  (cost=5643.11..5645.35 rows=180
> width=4)
> (actual time=114.202..116.002 rows=511 loops=1)
>
>                           ->  Sort  (cost=5643.11..5644.23 rows=448
> width=4)
> (actual time=114.199..114.717 rows=511 loops=1)
>
>                                 Sort Key: public.ausb.sid
>
>                                 ->  Seq Scan on ausb
> (cost=0.00..5623.38
> rows=448 width=4) (actual time=0.351..112.459 rows=511 loops=1)
>
>                                       Filter: (((uniid = 9) OR
> (uniid = 10))
> AND (overview = 1) AND (zweitstudium <> 2) AND (status > 0))
>
>                     ->  Index Scan using stud_pkey on stud
> (cost=0.00..4.67
> rows=1 width=4) (actual time=0.062..0.067 rows=1 loops=511)
>
>                           Index Cond: (stud.sid = "outer".sid)
>
>                           Filter: ((status > 0) AND (length
> ((vname)::text) >
> 1) AND (length((nname)::text) > 1))
>
>               ->  Materialize  (cost=5388.68..5392.05 rows=337
> width=4)
> (actual time=273.506..276.785 rows=511 loops=1)
>
>                     ->  Unique  (cost=5383.29..5384.98 rows=337
> width=4)
> (actual time=273.501..275.421 rows=511 loops=1)
>
>                           ->  Sort  (cost=5383.29..5384.13 rows=337
> width=4)
> (actual time=273.499..274.091 rows=511 loops=1)
>
>                                 Sort Key: public.ausb.sid
>
>                                 ->  Hash Join  (cost=17.61..5369.14
> rows=337
> width=4) (actual time=1.139..272.465 rows=511 loops=1)
>
>                                       Hash Cond: ("outer".uniid =
> "inner".uniid)
>
>                                       ->  Seq Scan on ausb
> (cost=0.00..4827.30 rows=104174 width=8) (actual time=0.026..200.111
> rows=103593 loops=1)
>
>                                             Filter: ((overview = 1)
> AND
> (zweitstudium <> 2) AND (status > 0))
>
>                                       ->  Hash  (cost=17.60..17.60
> rows=2
> width=4) (actual time=0.435..0.435 rows=2 loops=1)
>
>                                             ->  Seq Scan on uni
> (cost=0.00..17.60 rows=2 width=4) (actual time=0.412..0.424 rows=2
> loops=1)
>
>                                                   Filter:
> ((uniort)::text =
> 'Augsburg'::text)
>
>         ->  Unique  (cost=54430.79..66664.18 rows=10599 width=4)
> (actual
> time=6.851..1374.135 rows=40230 loops=494)
>
>               ->  Merge Join  (cost=54430.79..66319.65 rows=137811
> width=4)
> (actual time=6.849..1282.333 rows=40233 loops=494)
>
>                     Merge Cond: ("outer".sid = "inner".sid)
>
>                     Join Filter: (((("outer".veraid = 2) OR
> ("outer".veraid
> = 3) OR ("outer".veraid = 22) OR ("outer".veraid = 24) OR
> ("outer".veraid =
> 36)) AND ("inner".phontyp = 5) AND ("inner".typ = 1) AND
> ("inner".status >
> 0)) OR (("outer".veraid <> 2) AND ("outer".veraid <> 3) AND
> ("outer".veraid
> <> 22) AND ("outer".veraid <> 24) AND ("outer".veraid <> 36)))
>
>                     ->  Sort  (cost=11962.11..12098.59 rows=54593
> width=8)
> (actual time=0.547..46.482 rows=53354 loops=494)
>
>                           Sort Key: stud_vera.sid
>
>                           ->  Bitmap Heap Scan on stud_vera
> (cost=2239.14..7666.61 rows=54593 width=8) (actual
> time=43.096..165.300
> rows=53354 loops=1)
>
>                                 Recheck Cond: (veraid = 22)
>
>                                 Filter: (status > 0)
>
>                                 ->  Bitmap Index Scan on
> stud_vera_sid_veraid_idx  (cost=0.00..2239.14 rows=58765 width=0)
> (actual
> time=41.242..41.242 rows=61855 loops=1)
>
>                                       Index Cond: (veraid = 22)
>
>                     ->  Sort  (cost=42468.68..43407.53 rows=375539
> width=12)
> (actual time=6.297..533.711 rows=375539 loops=494)
>
>                           Sort Key: phon.sid
>
>                           ->  Seq Scan on phon  (cost=0.00..7696.39
> rows=375539 width=12) (actual time=0.048..544.999 rows=375539 loops=1)
>
>
>
>
> !!!! The query-plan for the normally running query: !!!!!
>
>
>
>
>
> Aggregate  (cost=77846.97..77846.98 rows=1 width=4) (actual
> time=5488.913..5488.913 rows=1 loops=1)
>
>   ->  Nested Loop  (cost=65471.70..77846.97 rows=1 width=4) (actual
> time=3913.839..5488.513 rows=208 loops=1)
>
>         Join Filter: ("outer".sid = "inner".sid)
>
>         ->  Merge Join  (cost=60088.41..72454.41 rows=1 width=12)
> (actual
> time=3598.105..4841.242 rows=208 loops=1)
>
>               Merge Cond: ("outer".sid = "inner".sid)
>
>               ->  Unique  (cost=54430.79..66664.18 rows=10599 width=4)
> (actual time=3479.029..4697.051 rows=40129 loops=1)
>
>                     ->  Merge Join  (cost=54430.79..66319.65
> rows=137811
> width=4) (actual time=3479.027..4616.245 rows=40132 loops=1)
>
>                           Merge Cond: ("outer".sid = "inner".sid)
>
>                           Join Filter: (((("outer".veraid = 2) OR
> ("outer".veraid = 3) OR ("outer".veraid = 22) OR ("outer".veraid =
> 24) OR
> ("outer".veraid = 36)) AND ("inner".phontyp = 5) AND ("inner".typ =
> 1) AND
> ("inner".status > 0)) OR (("outer".veraid <> 2) AND ("outer".veraid
> <> 3)
> AND ("outer".veraid <> 22) AND ("outer".veraid <> 24) AND
> ("outer".veraid <>
> 36)))
>
>                           ->  Sort  (cost=11962.11..12098.59
> rows=54593
> width=8) (actual time=274.248..315.052 rows=53252 loops=1)
>
>                                 Sort Key: stud_vera.sid
>
>                                 ->  Bitmap Heap Scan on stud_vera
> (cost=2239.14..7666.61 rows=54593 width=8) (actual
> time=46.669..167.599
> rows=53352 loops=1)
>
>                                       Recheck Cond: (veraid = 22)
>
>                                       Filter: (status > 0)
>
>                                       ->  Bitmap Index Scan on
> stud_vera_sid_veraid_idx  (cost=0.00..2239.14 rows=58765 width=0)
> (actual
> time=44.618..44.618 rows=61857 loops=1)
>
>                                             Index Cond: (veraid = 22)
>
>                           ->  Sort  (cost=42468.68..43407.53
> rows=375539
> width=12) (actual time=3204.729..3681.598 rows=375090 loops=1)
>
>                                 Sort Key: phon.sid
>
>                                 ->  Seq Scan on phon
> (cost=0.00..7696.39
> rows=375539 width=12) (actual time=0.052..628.838 rows=375539 loops=1)
>
>               ->  Materialize  (cost=5657.62..5657.71 rows=9 width=8)
> (actual time=91.290..105.557 rows=406 loops=1)
>
>                     ->  Nested Loop  (cost=5234.08..5657.61 rows=9
> width=8)
> (actual time=91.282..104.571 rows=406 loops=1)
>
>                           ->  Unique  (cost=5234.08..5235.20 rows=90
> width=4) (actual time=91.156..92.232 rows=420 loops=1)
>
>                                 ->  Sort  (cost=5234.08..5234.64
> rows=224
> width=4) (actual time=91.154..91.484 rows=420 loops=1)
>
>                                       Sort Key: public.ausb.sid
>
>                                       ->  Seq Scan on ausb
> (cost=0.00..5225.34 rows=224 width=4) (actual time=0.266..90.242
> rows=420
> loops=1)
>
>                                             Filter: ((uniid = 10) AND
> (overview = 1) AND (zweitstudium <> 2) AND (status > 0))
>
>                           ->  Index Scan using stud_pkey on stud
> (cost=0.00..4.67 rows=1 width=4) (actual time=0.024..0.026 rows=1
> loops=420)
>
>                                 Index Cond: (stud.sid = "outer".sid)
>
>                                 Filter: ((status > 0) AND
> (length((vname)::text) > 1) AND (length((nname)::text) > 1))
>
>         ->  Unique  (cost=5383.29..5384.98 rows=337 width=4) (actual
> time=1.520..2.686 rows=511 loops=208)
>
>               ->  Sort  (cost=5383.29..5384.13 rows=337 width=4)
> (actual
> time=1.519..1.871 rows=511 loops=208)
>
>                     Sort Key: public.ausb.sid
>
>                     ->  Hash Join  (cost=17.61..5369.14 rows=337
> width=4)
> (actual time=1.133..314.584 rows=511 loops=1)
>
>                           Hash Cond: ("outer".uniid = "inner".uniid)
>
>                           ->  Seq Scan on ausb  (cost=0.00..4827.30
> rows=104174 width=8) (actual time=0.030..226.532 rows=103593 loops=1)
>
>                                 Filter: ((overview = 1) AND
> (zweitstudium <>
> 2) AND (status > 0))
>
>                           ->  Hash  (cost=17.60..17.60 rows=2 width=4)
> (actual time=0.392..0.392 rows=2 loops=1)
>
>                                 ->  Seq Scan on uni  (cost=0.00..17.60
> rows=2 width=4) (actual time=0.369..0.381 rows=2 loops=1)
>
>                                       Filter: ((uniort)::text =
> 'Augsburg'::text)
>
>
>
> The estimated row numbers are not bad as long as one table is
> affected.
> They're much worse as soon as two or more tables are joined. Though
> the
> query plans are slightly different, the number of merged rows at
> different
> stages seems to be rather the same for both plans. The big
> difference in my
> eyes seems the cost for the first nested loop. This seems to be the
> point,
> where the long running query consumes most time. I've then set
> enable_nestloop to off, and actually the problem disappears.
>
>
> Other maybe relevant parameters:
> default_statistics_target = 100
> work_mem = 4096
> max_fsm_pages = 100000
>
> My questions:
>
> What could be the problem behind high amount of actually used time
> for the
> nested loop in the first query?
>
> If we decided to constantly turn off nested loops, what side
> effects would
> we have to expect?
>
> Are there more granular ways to tell the query planner when to use
> nested
> loops?
>
> Or just other ideas what to do? We'd be grateful for any hint!

Here's what's killing you:

   ->  Nested Loop  (cost=65462.58..78785.78 rows=1 width=4) (actual
time=6743.856..698776.957 rows=250 loops=1)

         Join Filter: ("outer".sid = "inner".sid)

         ->  Merge Join  (cost=11031.79..11883.12 rows=1 width=12)
(actual
time=387.837..433.612 rows=494 loops=1)

That merge thinks it's olny going to see 1 row, but it ends up with
494, which results in:

         ->  Unique  (cost=54430.79..66664.18 rows=10599 width=4)
(actual
time=6.851..1374.135 rows=40230 loops=494)

The miss-estimation is actually coming from lower in the query... I
see there's one place where it expects 180 rows and gets 511, which
is part of the problem. Try increasing the stats on ausb.sid.

Oh, and please don't line-wrap explain output.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



В списке pgsql-general по дате отправления:

Предыдущее
От: Ron Johnson
Дата:
Сообщение: Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)
Следующее
От: "Idan Miller"
Дата:
Сообщение: Re: PostgreSQL with Kerberos and Active Directory