Обсуждение: Redundant sub query triggers slow nested loop left join

От:
"henk de wit"
Дата:

In investigating a slow query, I distiled the code below from a larger
query:

SELECT
    *
FROM
    /* SUBQUERY banners */ (
        SELECT
            *
        FROM
            /* SUBQUERY banners_links */ (
                SELECT
                    *
                FROM
                    banners_links
                WHERE
                    merchant_id = 5631
            ) as banners_links
        WHERE
            merchant_id = 5631 AND
            banners_links.status = 0
    ) AS banners

        LEFT OUTER JOIN

    /* SUBQUERY types */ (
        SELECT
            banner_types.id         AS type_id,
            banner_types.type         AS type,
            banners_banner_types.banner_id     AS id
        FROM
            banner_types,banners_banner_types
        WHERE
            banners_banner_types.banner_id IN /* SUBQUERY */ (
                SELECT
                    id
                FROM
                    banners_links
                WHERE
                    merchant_id = 5631
            ) AND
            banners_banner_types.type_id = banner_types.id
    ) AS types

    USING (id)

Obviously, the subquery "banners_links" is redundant. The query however is a
generated one, and this redundancy probably wasn't noted before. Logically
you would say it shouldn't hurt, but in fact it does. The above query
executes painfully slow. The left outer join is killing the performance, as
witnessed by the plan:

"Nested Loop Left Join  (cost=964.12..1480.67 rows=1 width=714) (actual
time=20.801..8233.410 rows=553 loops=1)"
"  Join Filter: (public.banners_links.id = banners_banner_types.banner_id)"
"  ->  Bitmap Heap Scan on banners_links  (cost=4.35..42.12 rows=1
width=671) (actual time=0.127..0.690 rows=359 loops=1)"
"        Recheck Cond: ((merchant_id = 5631) AND (merchant_id = 5631))"
"        Filter: ((status)::text = '0'::text)"
"        ->  Bitmap Index Scan on banners_links_merchant_id_idx
(cost=0.00..4.35 rows=10 width=0) (actual time=0.092..0.092 rows=424
loops=1)"
"              Index Cond: ((merchant_id = 5631) AND (merchant_id = 5631))"
"  ->  Hash Join  (cost=959.77..1432.13 rows=514 width=51) (actual
time=0.896..22.611 rows=658 loops=359)"
"        Hash Cond: (banners_banner_types.type_id = banner_types.id)"
"        ->  Hash IN Join  (cost=957.32..1422.52 rows=540 width=16) (actual
time=0.894..21.878 rows=658 loops=359)"
"              Hash Cond: (banners_banner_types.banner_id =
public.banners_links.id)"
"              ->  Seq Scan on banners_banner_types  (cost=0.00..376.40
rows=22240 width=16) (actual time=0.003..10.149 rows=22240 loops=359)"
"              ->  Hash  (cost=952.02..952.02 rows=424 width=8) (actual
time=0.779..0.779 rows=424 loops=1)"
"                    ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.108..0.513 rows=424
loops=1)"
"                          Recheck Cond: (merchant_id = 5631)"
"                          ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.078..0.078 rows=424 loops=1)"
"                                Index Cond: (merchant_id = 5631)"
"        ->  Hash  (cost=2.20..2.20 rows=20 width=43) (actual
time=0.033..0.033 rows=20 loops=1)"
"              ->  Seq Scan on banner_types  (cost=0.00..2.20 rows=20
width=43) (actual time=0.004..0.017 rows=20 loops=1)"
"Total runtime: 8233.710 ms"

I noticed that the recheck condition looks a bit weird:

Recheck Cond: ((merchant_id = 5631) AND (merchant_id = 5631))

You would think that PG (8.2.3) would be smart enough to optimize this away.
Also the estimate of the nested loop left join and the actual results are
way off. I tried increasing the statistics of both public.banners_links.id
and banners_banner_types.banner_id (to the highest value 1000), analyzed,
vacuum analyzed and did a vacuum full, but without any improvements.

Anyway, when I remove the redundant sub query the code becomes:

SELECT
    *
FROM
    /* SUBQUERY banners */ (
        SELECT
            *
        FROM
            banners_links
        WHERE
            merchant_id = 5631 AND
            banners_links.status = 0
    ) AS banners

        LEFT OUTER JOIN

    /* SUBQUERY types */ (
        SELECT
            banner_types.id         AS type_id,
            banner_types.type         AS type,
            banners_banner_types.banner_id     AS id
        FROM
            banner_types,banners_banner_types
        WHERE
            banners_banner_types.banner_id IN /* SUBQUERY */ (
                SELECT
                    id
                FROM
                    banners_links
                WHERE
                    merchant_id = 5631
            ) AND
            banners_banner_types.type_id = banner_types.id
    ) AS types

    USING (id)

With this query, the execution time drops from 8 seconds to a mere 297 ms!
The plan now looks as follows:

"Hash Left Join  (cost=1449.99..2392.68 rows=2 width=714) (actual
time=24.257..25.292 rows=553 loops=1)"
"  Hash Cond: (public.banners_links.id = banners_banner_types.banner_id)"
"  ->  Bitmap Heap Scan on banners_links  (cost=11.43..954.03 rows=2
width=671) (actual time=0.122..0.563 rows=359 loops=1)"
"        Recheck Cond: (merchant_id = 5631)"
"        Filter: ((status)::text = '0'::text)"
"        ->  Bitmap Index Scan on banners_links_merchant_id_idx
(cost=0.00..11.43 rows=424 width=0) (actual time=0.086..0.086 rows=424
loops=1)"
"              Index Cond: (merchant_id = 5631)"
"  ->  Hash  (cost=1432.13..1432.13 rows=514 width=51) (actual
time=24.128..24.128 rows=658 loops=1)"
"        ->  Hash Join  (cost=959.77..1432.13 rows=514 width=51) (actual
time=1.714..23.606 rows=658 loops=1)"
"              Hash Cond: (banners_banner_types.type_id = banner_types.id)"
"              ->  Hash IN Join  (cost=957.32..1422.52 rows=540 width=16)
(actual time=1.677..22.811 rows=658 loops=1)"
"                    Hash Cond: (banners_banner_types.banner_id =
public.banners_links.id)"
"                    ->  Seq Scan on banners_banner_types
(cost=0.00..376.40 rows=22240 width=16) (actual time=0.005..10.306
rows=22240 loops=1)"
"                    ->  Hash  (cost=952.02..952.02 rows=424 width=8)
(actual time=0.772..0.772 rows=424 loops=1)"
"                          ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.105..0.510 rows=424
loops=1)"
"                                Recheck Cond: (merchant_id = 5631)"
"                                ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.077..0.077 rows=424 loops=1)"
"                                      Index Cond: (merchant_id = 5631)"
"              ->  Hash  (cost=2.20..2.20 rows=20 width=43) (actual
time=0.032..0.032 rows=20 loops=1)"
"                    ->  Seq Scan on banner_types  (cost=0.00..2.20 rows=20
width=43) (actual time=0.004..0.018 rows=20 loops=1)"
"Total runtime: 25.602 ms"

We see that instead of a nested loop left join the planner now chooses a
Hash Left Join. I'm not really an expert in this matter and would like some
more insight into what's happening  here. Naively I would say that a planner
would have to be smart enough to see this by itself?

Thanks in advance for all hints.

_________________________________________________________________
Talk with your online friends with Messenger
http://www.join.msn.com/messenger/overview


От:
Tom Lane
Дата:

"henk de wit" <> writes:
> Naively I would say that a planner
> would have to be smart enough to see this by itself?

We got rid of direct tests for redundant WHERE clauses a long time ago
(in 7.4, according to some quick tests I just made).  They took a lot
of cycles and almost never accomplished anything.

Since you have two redundant tests, the selectivity is being
double-counted, leading to a too-small rows estimate and a not very
appropriate choice of join plan.

FWIW, CVS HEAD does get rid of the duplicate conditions for the common
case of mergejoinable equality operators --- but it's not explicitly
looking for duplicate conditions, rather this is falling out of a new
method for making transitive equality deductions.

            regards, tom lane

От:
"henk de wit"
Дата:

>Since you have two redundant tests, the selectivity is being
>double-counted, leading to a too-small rows estimate and a not very
>appropriate choice of join plan.

I see, thanks for the explanation. I did notice though that in the second
case, with 1 redundant test removed, the estimate is still low:

"Hash Left Join (cost=1449.99..2392.68 rows=2 width=714) (actual
time=24.257..25.292 rows=553 loops=1)"

In that case the prediction is 2 rows, which is only 1 row more than in the
previous case. Yet the plan is much better and performance improved
dramatically. Is there a reason/explanation for that?

>FWIW, CVS HEAD does get rid of the duplicate conditions for the common
>case of mergejoinable equality operators --- but it's not explicitly
>looking for duplicate conditions, rather this is falling out of a new
>method for making transitive equality deductions.

This sounds very interesting Tom. Is there some documentation somewhere
where I can read about this new method?

_________________________________________________________________
Live Search, for accurate results! http://www.live.nl


От:
Tom Lane
Дата:

"henk de wit" <> writes:
> In that case the prediction is 2 rows, which is only 1 row more than in the
> previous case. Yet the plan is much better and performance improved
> dramatically. Is there a reason/explanation for that?

Well, it's just an estimated-cost comparison.  If there's only one row
then a nestloop join looks like the best way since it requires no extra
overhead.  But as soon as you get to two rows, the other side of the
join would have to be executed twice, and that's more expensive than
doing it once and setting up a hash table.  In the actual event, with
359 rows out of the scan, the nestloop way is just horrid because it
repeats the other side 359 times :-(

It strikes me that it might be interesting to use a minimum rowcount
estimate of two rows, not one, for any case where we can't actually
prove there is at most one row (ie, the query conditions match a unique
index).  That is probably enough to discourage this sort of brittle
behavior ... though no doubt there'd still be cases where it's the
wrong thing.  We do not actually have any code right now to make such
proofs, but there's been some discussion recently about adding such
logic in support of removing useless outer joins.

>> FWIW, CVS HEAD does get rid of the duplicate conditions for the common
>> case of mergejoinable equality operators --- but it's not explicitly
>> looking for duplicate conditions, rather this is falling out of a new
>> method for making transitive equality deductions.

> This sounds very interesting Tom. Is there some documentation somewhere
> where I can read about this new method?

Check the archives for mention of equivalence classes, notably these
two threads:
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00568.php
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00826.php

            regards, tom lane

От:
"henk de wit"
Дата:

>In the actual event, with
>359 rows out of the scan, the nestloop way is just horrid because it
>repeats the other side 359 times :-(

Indeed. :(

Btw, I tried to apply the removal of the redundant check in the larger query
(the one from which I extracted the part shown earlier) but it only performs
worse after that. The more redundant checks I remove, the slower the query
gets. I figure the original designer of the query inserted those checks to
quickly limit the number of rows involved in the nested loop. Of course, the
problem is probably not the number of rows involved, but the unfortunate
choice of the nested loop.

I spend a few hours today in trying to figure it all out, but I'm pretty
stuck at the moment.

For what its worth, this is the plan PG 8.2 comes up with right after I
remove the same check that made the isolated query in the openings post so
much faster:

Sort  (cost=6006.54..6006.55 rows=1 width=597) (actual
time=14561.499..14561.722 rows=553 loops=1)
  Sort Key: public.banners_links.id
  ->  Nested Loop Left Join  (cost=3917.68..6006.53 rows=1 width=597)
(actual time=64.723..14559.811 rows=553 loops=1)
        Join Filter: (public.banners_links.id =
public.fetch_banners.banners_links_id)
        ->  Nested Loop Left Join  (cost=3917.68..6002.54 rows=1 width=527)
(actual time=64.607..14509.291 rows=553 loops=1)
              Join Filter: (public.banners_links.id =
reward_ratings.banner_id)
              ->  Nested Loop Left Join  (cost=2960.36..4395.12 rows=1
width=519) (actual time=52.761..8562.575 rows=553 loops=1)
                    Join Filter: (public.banners_links.id =
banners_banner_types.banner_id)
                    ->  Nested Loop Left Join  (cost=2000.60..2956.57 rows=1
width=484) (actual time=32.026..304.700 rows=359 loops=1)
                          Join Filter: (public.banners_links.id =
ecpc_per_banner_link.banners_links_id)
                          ->  Nested Loop  (cost=124.58..1075.70 rows=1
width=468) (actual time=9.793..187.724 rows=359 loops=1)
                                ->  Nested Loop Left Join
(cost=124.58..1067.42 rows=1 width=89) (actual time=9.786..184.671 rows=359
loops=1)
                                      Join Filter: (public.banners_links.id
= users_banners_tot_sub.banner_id)
                                      ->  Hash Left Join
(cost=107.97..1050.78 rows=1 width=81) (actual time=6.119..7.605 rows=359
loops=1)
                                            Hash Cond:
(public.banners_links.id = special_deals.id)
                                            Filter:
(special_deals.special_deal IS NULL)
                                            ->  Bitmap Heap Scan on
banners_links  (cost=11.43..954.03 rows=2 width=73) (actual
time=0.128..1.069 rows=359 loops=1)
                                                  Recheck Cond: (merchant_id
= 5631)
                                                  Filter: ((status)::text =
'0'::text)
                                                  ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.089..0.089 rows=424 loops=1)
                                                        Index Cond:
(merchant_id = 5631)
                                            ->  Hash  (cost=86.93..86.93
rows=769 width=16) (actual time=5.982..5.982 rows=780 loops=1)
                                                  ->  Subquery Scan
special_deals  (cost=69.62..86.93 rows=769 width=16) (actual
time=4.179..5.414 rows=780 loops=1)
                                                        ->  HashAggregate
(cost=69.62..79.24 rows=769 width=16) (actual time=4.179..4.702 rows=780
loops=1)
                                                              ->  Seq Scan
on banner_deals  (cost=0.00..53.75 rows=3175 width=16) (actual
time=0.006..1.480 rows=3175 loops=1)
                                      ->  HashAggregate  (cost=16.61..16.62
rows=1 width=24) (actual time=0.011..0.292 rows=424 loops=359)
                                            ->  Nested Loop
(cost=0.00..16.60 rows=1 width=24) (actual time=0.029..3.096 rows=424
loops=1)
                                                  ->  Index Scan using
users_banners_affiliate_id_idx on users_banners  (cost=0.00..8.30 rows=1
width=16) (actual time=0.021..0.523 rows=424 loops=1)
                                                        Index Cond:
((affiliate_id = 5631) AND (affiliate_id = 5631))
                                                        Filter:
((status)::text = '3'::text)
                                                  ->  Index Scan using
users_banners_id_idx on users_banners_rotation  (cost=0.00..8.29 rows=1
width=16) (actual time=0.003..0.004 rows=1 loops=424)
                                                        Index Cond:
(users_banners_rotation.users_banners_id = users_banners.id)
                                ->  Index Scan using
banners_org_id_banner.idx on banners_org  (cost=0.00..8.27 rows=1 width=387)
(actual time=0.005..0.006 rows=1 loops=359)
                                      Index Cond: (public.banners_links.id =
banners_org.id_banner)
                          ->  Sort  (cost=1876.01..1876.50 rows=194
width=30) (actual time=0.062..0.183 rows=290 loops=359)
                                Sort Key: CASE WHEN
(precalculated_stats_banners_links.clicks_total > 0) THEN
(((precalculated_stats_banners_links.revenue_total_affiliate /
(precalculated_stats_banners_links.clicks_total)::numeric))::double
precision / 1000::double precision) ELSE 0::double precision END
                                ->  Merge IN Join  (cost=1819.78..1868.64
rows=194 width=30) (actual time=16.701..21.797 rows=290 loops=1)
                                      Merge Cond:
(precalculated_stats_banners_links.banners_links_id =
public.banners_links.id)
                                      ->  Sort  (cost=849.26..869.24
rows=7993 width=30) (actual time=12.416..15.717 rows=7923 loops=1)
                                            Sort Key:
precalculated_stats_banners_links.banners_links_id
                                            ->  Index Scan using
pre_calc_banners_status on precalculated_stats_banners_links
(cost=0.00..331.13 rows=7993 width=30) (actual time=0.006..6.209 rows=7923
loops=1)
                                                  Index Cond: (status = 4)
                                      ->  Sort  (cost=970.52..971.58
rows=424 width=8) (actual time=0.885..1.049 rows=366 loops=1)
                                            Sort Key:
public.banners_links.id
                                            ->  Bitmap Heap Scan on
banners_links  (cost=11.54..952.02 rows=424 width=8) (actual
time=0.121..0.549 rows=424 loops=1)
                                                  Recheck Cond: (merchant_id
= 5631)
                                                  ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.087..0.087 rows=424 loops=1)
                                                        Index Cond:
(merchant_id = 5631)
                    ->  Hash Join  (cost=959.77..1432.13 rows=514 width=43)
(actual time=0.900..22.684 rows=658 loops=359)
                          Hash Cond: (banners_banner_types.type_id =
banner_types.id)
                          ->  Hash IN Join  (cost=957.32..1422.52 rows=540
width=16) (actual time=0.898..21.944 rows=658 loops=359)
                                Hash Cond: (banners_banner_types.banner_id =
public.banners_links.id)
                                ->  Seq Scan on banners_banner_types
(cost=0.00..376.40 rows=22240 width=16) (actual time=0.004..10.184
rows=22240 loops=359)
                                ->  Hash  (cost=952.02..952.02 rows=424
width=8) (actual time=0.751..0.751 rows=424 loops=1)
                                      ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.127..0.470 rows=424
loops=1)
                                            Recheck Cond: (merchant_id =
5631)
                                            ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.086..0.086 rows=424 loops=1)
                                                  Index Cond: (merchant_id =
5631)
                          ->  Hash  (cost=2.20..2.20 rows=20 width=43)
(actual time=0.037..0.037 rows=20 loops=1)
                                ->  Seq Scan on banner_types
(cost=0.00..2.20 rows=20 width=43) (actual time=0.004..0.015 rows=20
loops=1)
              ->  Hash IN Join  (cost=957.32..1606.26 rows=93 width=16)
(actual time=10.751..10.751 rows=0 loops=553)
                    Hash Cond: (reward_ratings.banner_id =
public.banners_links.id)
                    ->  Seq Scan on reward_ratings  (cost=0.00..633.66
rows=3827 width=16) (actual time=0.007..8.770 rows=4067 loops=553)
                          Filter: ((now() >= period_start) AND (now() <=
period_end))
                    ->  Hash  (cost=952.02..952.02 rows=424 width=8) (actual
time=0.747..0.747 rows=424 loops=1)
                          ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.120..0.472 rows=424
loops=1)
                                Recheck Cond: (merchant_id = 5631)
                                ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.088..0.088 rows=424 loops=1)
                                      Index Cond: (merchant_id = 5631)
        ->  Seq Scan on fetch_banners  (cost=0.00..2.88 rows=88 width=78)
(actual time=0.003..0.042 rows=88 loops=553)
Total runtime: 14562.251 ms

The same check (merchant_id = 5631) still appears at 5 other places in the
query. If I remove one other, the query goes to 20 seconds, if I then remove
one other again it goes to 28 seconds, etc all the way to more than 40
seconds. I understand the above looks like a complicated mess, but would you
have any pointers of what I could possibly do next to force a better plan?

>Check the archives for mention of equivalence classes, notably these
>two threads:
>http://archives.postgresql.org/pgsql-hackers/2007-01/msg00568.php
>http://archives.postgresql.org/pgsql-hackers/2007-01/msg00826.php

I'm going to read those. Thanks for the references.

_________________________________________________________________
Play online games with your friends with Messenger
http://www.join.msn.com/messenger/overview


От:
Tom Lane
Дата:

"henk de wit" <> writes:
> I understand the above looks like a complicated mess, but would you
> have any pointers of what I could possibly do next to force a better plan?

Taking a closer look, it seems the problem is the underestimation of the
number of rows resulting from this relation scan:

>                  ->  Bitmap Heap Scan on
> banners_links  (cost=11.43..954.03 rows=2 width=73) (actual
> time=0.128..1.069 rows=359 loops=1)
>                        Recheck Cond: (merchant_id = 5631)
>                        Filter: ((status)::text = '0'::text)
>                        ->  Bitmap Index Scan on
> banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
> time=0.089..0.089 rows=424 loops=1)
>                              Index Cond: (merchant_id = 5631)

You might be able to improve matters by increasing the statistics target
for this table.  I have a bad feeling though that the problem may be
lack of cross-column statistics --- the thing is evidently assuming
that only about 1 in 200 rows have status = '0', which might be accurate
as a global average but not for this particular merchant.  What exactly
is the relationship between status and merchant_id, anyway?

            regards, tom lane

От:
"henk de wit"
Дата:

>You might be able to improve matters by increasing the statistics target
>for this table.

I have tried to increase the statistics for the status column to the maximum
of 1000. After that I performed an analyze, vacuum analyze and vacuum full
analyze on the table. Unfortunately this didn't seem to make any difference.

>I have a bad feeling though that the problem may be
>lack of cross-column statistics

I assume this isn't a thing that can be tweaked/increased in PG 8.2?

>--- the thing is evidently assuming
>that only about 1 in 200 rows have status = '0', which might be accurate
>as a global average but not for this particular merchant.  What exactly
>is the relationship between status and merchant_id, anyway?

The meaning is that a "banners_link" belongs to a merchant with the id
merchant_id. A "banners_link" can be disabled (status 1) or enabled (status
0). Globally about 1/3 of the banners_links have status 0 and 2/3 have
status 1. The 1 in 200 estimate is indeed way off.

For the merchant in question the numbers are a bit different though, but not
that much. Out of 424 rows total, 359 have status 0 and 65 have status 1.

_________________________________________________________________
FREE pop-up blocking with the new Windows Live Toolbar - get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


От:
"henk de wit"
Дата:

One interesting other thing to note; if I remove the banners_links.status =
0 condition from the query altogether the execution times improve
dramatically again. The results are not correct right now, but if worse
comes to worst I can always remove the unwanted rows in a procedural
language (it's a simple case of iterating a resultset and omitting rows with
status 1). Of course this would not really be a neat solution.

Anyway, the plan without the status = 0 condition now looks like this:

Sort  (cost=6058.87..6058.88 rows=2 width=597) (actual time=305.869..306.138
rows=658 loops=1)
  Sort Key: public.banners_links.id
  ->  Nested Loop Left Join  (cost=5051.23..6058.86 rows=2 width=597)
(actual time=69.956..304.259 rows=658 loops=1)
        Join Filter: (public.banners_links.id =
public.fetch_banners.banners_links_id)
        ->  Nested Loop Left Join  (cost=5048.26..6051.92 rows=2 width=527)
(actual time=69.715..249.122 rows=658 loops=1)
              Join Filter: (public.banners_links.id =
reward_ratings.banner_id)
              ->  Nested Loop Left Join  (cost=3441.91..4441.39 rows=2
width=519) (actual time=57.795..235.954 rows=658 loops=1)
                    Join Filter: (public.banners_links.id =
ecpc_per_banner_link.banners_links_id)
                    ->  Nested Loop  (cost=1563.28..2554.02 rows=2
width=503) (actual time=35.359..42.018 rows=658 loops=1)
                          ->  Hash Left Join  (cost=1563.28..2545.93 rows=2
width=124) (actual time=35.351..37.987 rows=658 loops=1)
                                Hash Cond: (public.banners_links.id =
users_banners_tot_sub.banner_id)
                                ->  Hash Left Join  (cost=1546.63..2529.27
rows=2 width=116) (actual time=30.757..32.552 rows=658 loops=1)
                                      Hash Cond: (public.banners_links.id =
banners_banner_types.banner_id)
                                      ->  Hash Left Join
(cost=108.08..1090.62 rows=2 width=81) (actual time=6.087..7.085 rows=424
loops=1)
                                            Hash Cond:
(public.banners_links.id = special_deals.id)
                                            Filter:
(special_deals.special_deal IS NULL)
                                            ->  Bitmap Heap Scan on
banners_links  (cost=11.54..952.02 rows=424 width=73) (actual
time=0.125..0.514 rows=424 loops=1)
                                                  Recheck Cond: (merchant_id
= 5631)
                                                  ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.089..0.089 rows=424 loops=1)
                                                        Index Cond:
(merchant_id = 5631)
                                            ->  Hash  (cost=86.93..86.93
rows=769 width=16) (actual time=5.951..5.951 rows=780 loops=1)
                                                  ->  Subquery Scan
special_deals  (cost=69.62..86.93 rows=769 width=16) (actual
time=4.164..5.389 rows=780 loops=1)
                                                        ->  HashAggregate
(cost=69.62..79.24 rows=769 width=16) (actual time=4.164..4.670 rows=780
loops=1)
                                                              ->  Seq Scan
on banner_deals  (cost=0.00..53.75 rows=3175 width=16) (actual
time=0.005..1.496 rows=3175 loops=1)
                                      ->  Hash  (cost=1432.13..1432.13
rows=514 width=43) (actual time=24.661..24.661 rows=658 loops=1)
                                            ->  Hash Join
(cost=959.77..1432.13 rows=514 width=43) (actual time=1.780..24.147 rows=658
loops=1)
                                                  Hash Cond:
(banners_banner_types.type_id = banner_types.id)
                                                  ->  Hash IN Join
(cost=957.32..1422.52 rows=540 width=16) (actual time=1.738..23.332 rows=658
loops=1)
                                                        Hash Cond:
(banners_banner_types.banner_id = public.banners_links.id)
                                                        ->  Seq Scan on
banners_banner_types  (cost=0.00..376.40 rows=22240 width=16) (actual
time=0.005..10.355 rows=22240 loops=1)
                                                        ->  Hash
(cost=952.02..952.02 rows=424 width=8) (actual time=0.808..0.808 rows=424
loops=1)
                                                              ->  Bitmap
Heap Scan on banners_links  (cost=11.54..952.02 rows=424 width=8) (actual
time=0.114..0.515 rows=424 loops=1)
                                                                    Recheck
Cond: (merchant_id = 5631)
                                                                    ->
Bitmap Index Scan on banners_links_merchant_id_idx  (cost=0.00..11.43
rows=424 width=0) (actual time=0.085..0.085 rows=424 loops=1)

Index Cond: (merchant_id = 5631)
                                                  ->  Hash  (cost=2.20..2.20
rows=20 width=43) (actual time=0.034..0.034 rows=20 loops=1)
                                                        ->  Seq Scan on
banner_types  (cost=0.00..2.20 rows=20 width=43) (actual time=0.004..0.016
rows=20 loops=1)
                                ->  Hash  (cost=16.63..16.63 rows=1
width=24) (actual time=4.582..4.582 rows=424 loops=1)
                                      ->  Subquery Scan
users_banners_tot_sub  (cost=16.61..16.63 rows=1 width=24) (actual
time=3.548..4.235 rows=424 loops=1)
                                            ->  HashAggregate
(cost=16.61..16.62 rows=1 width=24) (actual time=3.547..3.850 rows=424
loops=1)
                                                  ->  Nested Loop
(cost=0.00..16.60 rows=1 width=24) (actual time=0.031..3.085 rows=424
loops=1)
                                                        ->  Index Scan using
users_banners_affiliate_id_idx on users_banners  (cost=0.00..8.30 rows=1
width=16) (actual time=0.021..0.516 rows=424 loops=1)
                                                              Index Cond:
((affiliate_id = 5631) AND (affiliate_id = 5631))
                                                              Filter:
((status)::text = '3'::text)
                                                        ->  Index Scan using
users_banners_id_idx on users_banners_rotation  (cost=0.00..8.29 rows=1
width=16) (actual time=0.003..0.004 rows=1 loops=424)
                                                              Index Cond:
(users_banners_rotation.users_banners_id = users_banners.id)
                          ->  Index Scan using banners_org_id_banner.idx on
banners_org  (cost=0.00..4.03 rows=1 width=387) (actual time=0.003..0.004
rows=1 loops=658)
                                Index Cond: (public.banners_links.id =
banners_org.id_banner)
                    ->  Materialize  (cost=1878.63..1880.57 rows=194
width=20) (actual time=0.034..0.153 rows=290 loops=658)
                          ->  Sort  (cost=1876.01..1876.50 rows=194
width=30) (actual time=22.105..22.230 rows=290 loops=1)
                                Sort Key: CASE WHEN
(precalculated_stats_banners_links.clicks_total > 0) THEN
(((precalculated_stats_banners_links.revenue_total_affiliate /
(precalculated_stats_banners_links.clicks_total)::numeric))::double
precision / 1000::double precision) ELSE 0::double precision END
                                ->  Merge IN Join  (cost=1819.78..1868.64
rows=194 width=30) (actual time=16.723..21.832 rows=290 loops=1)
                                      Merge Cond:
(precalculated_stats_banners_links.banners_links_id =
public.banners_links.id)
                                      ->  Sort  (cost=849.26..869.24
rows=7993 width=30) (actual time=12.474..15.725 rows=7923 loops=1)
                                            Sort Key:
precalculated_stats_banners_links.banners_links_id
                                            ->  Index Scan using
pre_calc_banners_status on precalculated_stats_banners_links
(cost=0.00..331.13 rows=7993 width=30) (actual time=0.007..6.220 rows=7923
loops=1)
                                                  Index Cond: (status = 4)
                                      ->  Sort  (cost=970.52..971.58
rows=424 width=8) (actual time=0.862..1.012 rows=366 loops=1)
                                            Sort Key:
public.banners_links.id
                                            ->  Bitmap Heap Scan on
banners_links  (cost=11.54..952.02 rows=424 width=8) (actual
time=0.121..0.490 rows=424 loops=1)
                                                  Recheck Cond: (merchant_id
= 5631)
                                                  ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.087..0.087 rows=424 loops=1)
                                                        Index Cond:
(merchant_id = 5631)
              ->  Materialize  (cost=1606.35..1607.28 rows=93 width=16)
(actual time=0.019..0.019 rows=0 loops=658)
                    ->  Hash IN Join  (cost=957.32..1606.25 rows=93
width=16) (actual time=11.916..11.916 rows=0 loops=1)
                          Hash Cond: (reward_ratings.banner_id =
public.banners_links.id)
                          ->  Seq Scan on reward_ratings  (cost=0.00..633.66
rows=3826 width=16) (actual time=0.016..9.190 rows=4067 loops=1)
                                Filter: ((now() >= period_start) AND (now()
<= period_end))
                          ->  Hash  (cost=952.02..952.02 rows=424 width=8)
(actual time=0.738..0.738 rows=424 loops=1)
                                ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.118..0.459 rows=424
loops=1)
                                      Recheck Cond: (merchant_id = 5631)
                                      ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.086..0.086 rows=424 loops=1)
                                            Index Cond: (merchant_id = 5631)
        ->  Materialize  (cost=2.97..3.85 rows=88 width=78) (actual
time=0.000..0.037 rows=88 loops=658)
              ->  Seq Scan on fetch_banners  (cost=0.00..2.88 rows=88
width=78) (actual time=0.005..0.052 rows=88 loops=1)
Total runtime: 306.734 ms

_________________________________________________________________
Play online games with your friends with Messenger
http://www.join.msn.com/messenger/overview


От:
Tom Lane
Дата:

"henk de wit" <> writes:
>> --- the thing is evidently assuming
>> that only about 1 in 200 rows have status = '0', which might be accurate
>> as a global average but not for this particular merchant.  What exactly
>> is the relationship between status and merchant_id, anyway?

> The meaning is that a "banners_link" belongs to a merchant with the id
> merchant_id. A "banners_link" can be disabled (status 1) or enabled (status
> 0). Globally about 1/3 of the banners_links have status 0 and 2/3 have
> status 1. The 1 in 200 estimate is indeed way off.

Well, that's darn odd.  It should not be getting that so far wrong.
What's the datatype of the status column exactly (I'm guessing varchar
but maybe not)?  Would you show us the pg_stats row for the status column?

> One interesting other thing to note; if I remove the banners_links.status =
> 0 condition from the query altogether the execution times improve
> dramatically again.

Right, because it was dead on about how many merchant_id = 5631 rows
there are.  The estimation error is creeping in where it guesses how
selective the status filter is.  It should be using the global fraction
of status = 0 rows for that, but it seems to be using a default estimate
instead (1/200 is actually the default eqsel estimate now that I think
about it).  I'm not sure why, but I think it must have something to do
with the subquery structure of your query.  Were you showing us the
whole truth about your query, or were there details you left out?

            regards, tom lane

От:
"henk de wit"
Дата:

>Well, that's darn odd.  It should not be getting that so far wrong.
>What's the datatype of the status column exactly (I'm guessing varchar
>but maybe not)?  Would you show us the pg_stats row for the status column?

It has been created as a char(1) in fact. The pg_stats row for the status
column is:

public|banners_links|status|0|5|2|{0,1}|{0.626397,0.373603}||0.560611

>I'm not sure why, but I think it must have something to do
>with the subquery structure of your query.  Were you showing us the
>whole truth about your query, or were there details you left out?

The query I gave in the opening post was just a small part, the part that I
initially identified as the 'slow path'. The last plan I gave was from the
whole query, without any details left out. I didn't gave the SQL of that
yet, so here it is:

SELECT
    id,
    status,
    merchant_id,
    description,
    org_text,
    users_banners_id,
    banner_url,
    cookie_redirect,
    type,

    CASE WHEN special_deal IS null THEN
        ''
    ELSE
        'special deal'
    END AS special_deal,

    CASE WHEN url_of_banner IS null    THEN
        ''
    ELSE
        url_of_banner
    END AS url_of_banner,

    CASE WHEN period_end IS NULL THEN
        'not_active'
    ELSE
        'active'
    END AS active_not_active,

    CASE WHEN ecpc IS NULL THEN
        0.00
    ELSE
        ROUND(ecpc::numeric,2)
    END AS ecpc,

    CASE WHEN ecpc_merchant IS NULL THEN
        0.00
    ELSE
        ROUND(ecpc_merchant::numeric,2)
    END AS ecpc_merchant

FROM
    /* SUBQUERY grand_total_fetch_banners */ (
        /* SUBQUERY grand_total  */(
            /* SUBQUERY banners_special_deals */    (

                /* SUBQUERY banners */ (
                    SELECT
                        *
                    FROM
                        /* SUBQUERY banners_links */ (
                            SELECT
                                banners_links.id,
                                merchant_id,
                                banners_org.banner_text AS org_text,
                                description,
                                status,
                                banner_url,
                                ecpc,
                                ecpc_merchant,
                                COALESCE(cookie_redirect,0) AS cookie_redirect
                            FROM
                                /* SUBQUERY banners_links  */ (

                                    /* subselect tot join ecpc_per_banner_links on banners_links*/
                                    /* SUBQUERY banners_links */ (
                                        SELECT
                                            *
                                        FROM
                                            banners_links
                                        WHERE
                                            merchant_id = 5631
                                    ) AS banners_links

                                        LEFT OUTER JOIN

                                    /* SUBQUERY ecpc_per_banner_link */    (
                                        SELECT
                                            CASE WHEN clicks_total > 0 THEN
                                                (revenue_total_affiliate/clicks_total)::float/1000.0
                                            ELSE
                                                0.0
                                            END AS ecpc,
                                            CASE WHEN clicks_total > 0 THEN
                                                (revenue_total/clicks_total)::float/1000.0
                                            ELSE
                                                0.0
                                            END AS ecpc_merchant,
                                            banners_links_id
                                        FROM
                                            precalculated_stats_banners_links
                                        WHERE
                                            status = 4            AND
                                            banners_links_id IN /* SUBQUERY */ (
                                                SELECT
                                                    id
                                                FROM
                                                    banners_links
                                                WHERE
                                                    merchant_id = 5631
                                            )
                                        ORDER BY
                                            ecpc DESC
                                    ) AS ecpc_per_banner_link

                                        ON (banners_links.id = ecpc_per_banner_link.banners_links_id)
                                ) AS banners_links

                                    ,

                                banners_org

                            WHERE
                                merchant_id = 5631                            AND
                                banners_links.id = banners_org.id_banner            AND
                                (banners_links.id = -1 OR -1 = -1)    AND
                                (banners_links.status = 0 OR 0 = -1)
                        ) AS banners_links

                            LEFT OUTER JOIN

                        /* SUBQUERY users_banners_tot_sub */(
                            SELECT
                                MAX (users_banners_id) AS users_banners_id,
                                merchant_users_banners_id,
                                banner_id
                            FROM
                                /* SUBQUERY users_banners_rotations_sub */(
                                    SELECT
                                        affiliate_id         AS merchant_users_banners_id,
                                        users_banners.id     AS users_banners_id,
                                        users_banners_rotation.banner_id
                                    FROM
                                        users_banners, users_banners_rotation
                                    WHERE
                                        affiliate_id = 5631                                    AND
                                        users_banners_rotation.users_banners_id = users_banners.id    AND
                                        users_banners.status = 3
                                ) AS users_banners_rotations_sub
                            GROUP BY
                                merchant_users_banners_id,banner_id
                        ) AS users_banners_tot_sub

                            ON (
                                banners_links.id = users_banners_tot_sub.banner_id     AND
                                banners_links.merchant_id =
users_banners_tot_sub.merchant_users_banners_id
                            )
                    ) AS banners

                        LEFT OUTER JOIN

                    /* SUBQUERY special_deals */(
                        SELECT
                            banner_deals.banner_id     AS id,
                            MAX(affiliate_id)         AS special_deal
                        FROM
                            banner_deals
                        GROUP BY
                            banner_deals.banner_id
                    ) AS special_deals

                        USING (id)

            ) AS banners_special_deals

                LEFT OUTER JOIN

            /* SUBQUERY types */ (
                SELECT
                    banner_types.id                 AS type_id,
                    banner_types.type                 AS type,
                    banners_banner_types.banner_id     AS id
                FROM
                    banner_types,banners_banner_types
                WHERE
                    banners_banner_types.banner_id IN /* SUBQUERY */ (
                        SELECT
                            id
                        FROM
                            banners_links
                        WHERE
                            merchant_id = 5631
                    ) AND
                    banners_banner_types.type_id = banner_types.id
            ) AS types

                USING (id)

        )  as grand_total

            LEFT OUTER JOIN

        /* SUBQUERY fetch_banners */ (
            SELECT
                banners_links_id AS id,
                url_of_banner
            FROM
                fetch_banners
        ) AS fetch_banners

            USING (id)
    ) AS grand_total_fetch_banners

        LEFT OUTER JOIN

    /* SUBQUERY active_banners */ (
        SELECT
            banner_id AS id,
            period_end
        FROM
            reward_ratings
        WHERE
            now() BETWEEN period_start AND period_end
        AND
            banner_id IN /* SUBQUERY */ (
                SELECT
                    id
                FROM
                    banners_links
                WHERE
                    merchant_id = 5631
            )
    ) AS active_banners

        USING (id)

WHERE
    (type_id =  -1 OR -1 = -1 )    AND
    (special_deal IS null)

ORDER BY
    id DESC

This is the original query without even the earlier mentioned redundant
check removed. For this query, PG 8.2 creates the following plan:

Sort  (cost=5094.40..5094.41 rows=1 width=597) (actual
time=15282.503..15282.734 rows=553 loops=1)
  Sort Key: public.banners_links.id
  ->  Nested Loop Left Join  (cost=3883.68..5094.39 rows=1 width=597)
(actual time=64.066..15280.773 rows=553 loops=1)
        Join Filter: (public.banners_links.id = reward_ratings.banner_id)
        ->  Nested Loop Left Join  (cost=2926.37..3486.98 rows=1 width=589)
(actual time=51.992..9231.245 rows=553 loops=1)
              Join Filter: (public.banners_links.id =
public.fetch_banners.banners_links_id)
              ->  Nested Loop Left Join  (cost=2926.37..3483.00 rows=1
width=519) (actual time=51.898..9183.007 rows=553 loops=1)
                    Join Filter: (public.banners_links.id =
ecpc_per_banner_link.banners_links_id)
                    ->  Nested Loop  (cost=1050.35..1602.14 rows=1
width=503) (actual time=29.585..9015.077 rows=553 loops=1)
                          ->  Nested Loop Left Join  (cost=1050.35..1593.86
rows=1 width=124) (actual time=29.577..9010.273 rows=553 loops=1)
                                Join Filter: (public.banners_links.id =
users_banners_tot_sub.banner_id)
                                ->  Nested Loop Left Join
(cost=1033.74..1577.21 rows=1 width=116) (actual time=25.904..8738.006
rows=553 loops=1)
                                      Join Filter: (public.banners_links.id
= special_deals.id)
                                      Filter: (special_deals.special_deal IS
NULL)
                                      ->  Nested Loop Left Join
(cost=964.12..1480.67 rows=1 width=108) (actual time=20.905..8259.497
rows=553 loops=1)
                                            Join Filter:
(public.banners_links.id = banners_banner_types.banner_id)
                                            ->  Bitmap Heap Scan on
banners_links  (cost=4.35..42.12 rows=1 width=73) (actual time=0.160..1.122
rows=359 loops=1)
                                                  Recheck Cond:
((merchant_id = 5631) AND (merchant_id = 5631))
                                                  Filter: ((status)::text =
'0'::text)
                                                  ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..4.35 rows=10 width=0) (actual
time=0.123..0.123 rows=424 loops=1)
                                                        Index Cond:
((merchant_id = 5631) AND (merchant_id = 5631))
                                            ->  Hash Join
(cost=959.77..1432.13 rows=514 width=43) (actual time=0.899..22.685 rows=658
loops=359)
                                                  Hash Cond:
(banners_banner_types.type_id = banner_types.id)
                                                  ->  Hash IN Join
(cost=957.32..1422.52 rows=540 width=16) (actual time=0.897..21.946 rows=658
loops=359)
                                                        Hash Cond:
(banners_banner_types.banner_id = public.banners_links.id)
                                                        ->  Seq Scan on
banners_banner_types  (cost=0.00..376.40 rows=22240 width=16) (actual
time=0.004..10.164 rows=22240 loops=359)
                                                        ->  Hash
(cost=952.02..952.02 rows=424 width=8) (actual time=0.790..0.790 rows=424
loops=1)
                                                              ->  Bitmap
Heap Scan on banners_links  (cost=11.54..952.02 rows=424 width=8) (actual
time=0.108..0.503 rows=424 loops=1)
                                                                    Recheck
Cond: (merchant_id = 5631)
                                                                    ->
Bitmap Index Scan on banners_links_merchant_id_idx  (cost=0.00..11.43
rows=424 width=0) (actual time=0.078..0.078 rows=424 loops=1)

Index Cond: (merchant_id = 5631)
                                                  ->  Hash  (cost=2.20..2.20
rows=20 width=43) (actual time=0.033..0.033 rows=20 loops=1)
                                                        ->  Seq Scan on
banner_types  (cost=0.00..2.20 rows=20 width=43) (actual time=0.004..0.017
rows=20 loops=1)
                                      ->  HashAggregate  (cost=69.62..79.24
rows=769 width=16) (actual time=0.008..0.498 rows=780 loops=553)
                                            ->  Seq Scan on banner_deals
(cost=0.00..53.75 rows=3175 width=16) (actual time=0.004..1.454 rows=3175
loops=1)
                                ->  HashAggregate  (cost=16.61..16.62 rows=1
width=24) (actual time=0.007..0.291 rows=424 loops=553)
                                      ->  Nested Loop  (cost=0.00..16.60
rows=1 width=24) (actual time=0.056..3.123 rows=424 loops=1)
                                            ->  Index Scan using
users_banners_affiliate_id_idx on users_banners  (cost=0.00..8.30 rows=1
width=16) (actual time=0.046..0.555 rows=424 loops=1)
                                                  Index Cond: ((affiliate_id
= 5631) AND (affiliate_id = 5631))
                                                  Filter: ((status)::text =
'3'::text)
                                            ->  Index Scan using
users_banners_id_idx on users_banners_rotation  (cost=0.00..8.29 rows=1
width=16) (actual time=0.003..0.004 rows=1 loops=424)
                                                  Index Cond:
(users_banners_rotation.users_banners_id = users_banners.id)
                          ->  Index Scan using "banners_org_id_banner.idx"
on banners_org  (cost=0.00..8.27 rows=1 width=387) (actual time=0.005..0.006
rows=1 loops=553)
                                Index Cond: (public.banners_links.id =
banners_org.id_banner)
                    ->  Sort  (cost=1876.01..1876.50 rows=194 width=30)
(actual time=0.041..0.161 rows=290 loops=553)
                          Sort Key: CASE WHEN
(precalculated_stats_banners_links.clicks_total > 0) THEN
(((precalculated_stats_banners_links.revenue_total_affiliate /
(precalculated_stats_banners_links.clicks_total)::numeric))::double
precision / 1000::double precision) ELSE 0::double precision END
                          ->  Merge IN Join  (cost=1819.78..1868.64 rows=194
width=30) (actual time=16.769..21.879 rows=290 loops=1)
                                Merge Cond:
(precalculated_stats_banners_links.banners_links_id =
public.banners_links.id)
                                ->  Sort  (cost=849.26..869.24 rows=7993
width=30) (actual time=12.486..15.740 rows=7923 loops=1)
                                      Sort Key:
precalculated_stats_banners_links.banners_links_id
                                      ->  Index Scan using
pre_calc_banners_status on precalculated_stats_banners_links
(cost=0.00..331.13 rows=7993 width=30) (actual time=0.007..6.291 rows=7923
loops=1)
                                            Index Cond: (status = 4)
                                ->  Sort  (cost=970.52..971.58 rows=424
width=8) (actual time=0.879..1.023 rows=366 loops=1)
                                      Sort Key: public.banners_links.id
                                      ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.123..0.509 rows=424
loops=1)
                                            Recheck Cond: (merchant_id =
5631)
                                            ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.089..0.089 rows=424 loops=1)
                                                  Index Cond: (merchant_id =
5631)
              ->  Seq Scan on fetch_banners  (cost=0.00..2.88 rows=88
width=78) (actual time=0.003..0.042 rows=88 loops=553)
        ->  Hash IN Join  (cost=957.32..1606.24 rows=93 width=16) (actual
time=10.933..10.933 rows=0 loops=553)
              Hash Cond: (reward_ratings.banner_id =
public.banners_links.id)
              ->  Seq Scan on reward_ratings  (cost=0.00..633.66 rows=3822
width=16) (actual time=0.007..8.955 rows=4067 loops=553)
                    Filter: ((now() >= period_start) AND (now() <=
period_end))
              ->  Hash  (cost=952.02..952.02 rows=424 width=8) (actual
time=0.738..0.738 rows=424 loops=1)
                    ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.118..0.475 rows=424
loops=1)
                          Recheck Cond: (merchant_id = 5631)
                          ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.087..0.087 rows=424 loops=1)
                                Index Cond: (merchant_id = 5631)
Total runtime: 15283.225 ms

If I change 1 of the redundant checks:

/* SUBQUERY banners_links */ (
    SELECT
        *
    FROM
        banners_links
    WHERE
        merchant_id = 5631
) AS banners_links

into just banner_links, PG comes up with the (large) plan I posted earlier.

_________________________________________________________________
Live Search, for accurate results! http://www.live.nl


От:
"henk de wit"
Дата:

>Well, that's darn odd.  It should not be getting that so far wrong.

I've been puzzling on this for over a week now, but can't seem to find a
solution. Would you have some more hints of what I could possibly try next?
As far as I can see, the mentioned status column is just a simple column. Or
could it be related to the fact that its type is var(1)? Would that confuse
the planner?

_________________________________________________________________
Talk with your online friends with Messenger
http://www.join.msn.com/messenger/overview


От:
"henk de wit"
Дата:

Perhaps one other interesting observation; when I earlier removed the status
check for which the rows got so wrongly estimated, the query got
dramatically faster. However, once I also remove all redundant checks the
query gets slower again.

This is the query with both status and redundant check removed:

SELECT
    id,
    status,
    merchant_id,
    description,
    org_text,
    users_banners_id,
    banner_url,
    cookie_redirect,
    type,

    CASE WHEN special_deal IS null THEN
        ''
    ELSE
        'special deal'
    END AS special_deal,

    CASE WHEN url_of_banner IS null    THEN
        ''
    ELSE
        url_of_banner
    END AS url_of_banner,

    CASE WHEN period_end IS NULL THEN
        'not_active'
    ELSE
        'active'
    END AS active_not_active,

    CASE WHEN ecpc IS NULL THEN
        0.00
    ELSE
        ROUND(ecpc::numeric,2)
    END AS ecpc,

    CASE WHEN ecpc_merchant IS NULL THEN
        0.00
    ELSE
        ROUND(ecpc_merchant::numeric,2)
    END AS ecpc_merchant

FROM
    /* SUBQUERY grand_total_fetch_banners */ (
        /* SUBQUERY grand_total  */(
            /* SUBQUERY banners_special_deals */    (

                /* SUBQUERY banners */ (
                    SELECT
                        *
                    FROM
                        /* SUBQUERY banners_links */ (
                            SELECT
                                banners_links.id,
                                merchant_id,
                                banners_org.banner_text AS org_text,
                                description,
                                status,
                                banner_url,
                                ecpc,
                                ecpc_merchant,
                                COALESCE(cookie_redirect,0) AS cookie_redirect
                            FROM
                                /* SUBQUERY banners_links  */ (

                                    /* subselect tot join ecpc_per_banner_links on banners_links*/
                                    /* SUBQUERY banners_links */ (
                                        SELECT
                                            *
                                        FROM
                                            banners_links
                                        WHERE
                                            merchant_id = 217
                                    ) AS  banners_links

                                        LEFT OUTER JOIN

                                    /* SUBQUERY ecpc_per_banner_link */    (
                                        SELECT
                                            CASE WHEN clicks_total > 0 THEN
                                                (revenue_total_affiliate/clicks_total)::float/1000.0
                                            ELSE
                                                0.0
                                            END AS ecpc,
                                            CASE WHEN clicks_total > 0 THEN
                                                (revenue_total/clicks_total)::float/1000.0
                                            ELSE
                                                0.0
                                            END AS ecpc_merchant,

                                            banners_links_id
                                        FROM
                                            precalculated_stats_banners_links
                                        WHERE
                                            status = 4
                                    ) AS ecpc_per_banner_link

                                        ON (banners_links.id = ecpc_per_banner_link.banners_links_id)
                                ) AS banners_links

                                    ,

                                banners_org

                            WHERE
                                banners_links.id = banners_org.id_banner            AND
                                (banners_links.id = -1 OR -1 = -1)
                        ) AS banners_links

                            LEFT OUTER JOIN

                        /* SUBQUERY users_banners_tot_sub */(
                            SELECT
                                MAX (users_banners_id) AS users_banners_id,
                                merchant_users_banners_id,
                                banner_id
                            FROM
                                /* SUBQUERY users_banners_rotations_sub */(
                                    SELECT
                                        affiliate_id         AS merchant_users_banners_id,
                                        users_banners.id     AS users_banners_id,
                                        users_banners_rotation.banner_id
                                    FROM
                                        users_banners, users_banners_rotation
                                    WHERE
                                        users_banners_rotation.users_banners_id = users_banners.id    AND
                                        users_banners.status = 3
                                ) AS users_banners_rotations_sub
                            GROUP BY
                                merchant_users_banners_id,banner_id
                        ) AS users_banners_tot_sub

                            ON (
                                banners_links.id = users_banners_tot_sub.banner_id     AND
                                banners_links.merchant_id =
users_banners_tot_sub.merchant_users_banners_id
                            )
                    ) AS banners

                        LEFT OUTER JOIN

                    /* SUBQUERY special_deals */(
                        SELECT
                            banner_deals.banner_id     AS id,
                            MAX(affiliate_id)         AS special_deal
                        FROM
                            banner_deals
                        GROUP BY
                            banner_deals.banner_id
                    ) AS special_deals

                        USING (id)

            ) AS banners_special_deals

                LEFT OUTER JOIN

            /* SUBQUERY types */ (
                SELECT
                    banner_types.id                 AS type_id,
                    banner_types.type                 AS type,
                    banners_banner_types.banner_id     AS id
                FROM
                    banner_types,banners_banner_types
                WHERE

                    banners_banner_types.type_id = banner_types.id
            ) AS types

                USING (id)

        )  as grand_total

            LEFT OUTER JOIN

        /* SUBQUERY fetch_banners */ (
            SELECT
                banners_links_id AS id,
                url_of_banner
            FROM
                fetch_banners
        ) AS fetch_banners

            USING (id)
    ) AS grand_total_fetch_banners

        LEFT OUTER JOIN

    /* SUBQUERY active_banners */ (
        SELECT
            banner_id AS id,
            period_end
        FROM
            reward_ratings
        WHERE
            now() BETWEEN period_start AND period_end

    ) AS active_banners

        USING (id)
WHERE
    (type_id =  -1 OR -1 = -1 )    AND
    (special_deal IS null)

ORDER BY
    id DESC

For this query, PG comes up with the following plan:

Sort  (cost=3772.80..3772.81 rows=2 width=597) (actual
time=3203.143..3203.315 rows=436 loops=1)
  Sort Key: public.banners_links.id
  ->  Nested Loop Left Join  (cost=2345.33..3772.79 rows=2 width=597)
(actual time=108.926..3201.931 rows=436 loops=1)
        ->  Nested Loop Left Join  (cost=2341.06..3742.03 rows=2 width=589)
(actual time=108.902..3197.302 rows=436 loops=1)
              Join Filter: (public.banners_links.id =
ecpc_per_banner_link.banners_links_id)
              ->  Nested Loop  (cost=1722.18..2763.47 rows=2 width=573)
(actual time=68.228..78.611 rows=436 loops=1)
                    ->  Hash Left Join  (cost=1722.18..2754.88 rows=2
width=194) (actual time=68.219..75.916 rows=436 loops=1)
                          Hash Cond: (public.banners_links.id =
users_banners_tot_sub.banner_id)
                          ->  Nested Loop Left Join  (cost=1227.70..2260.38
rows=2 width=186) (actual time=61.822..68.891 rows=436 loops=1)
                                ->  Hash Left Join  (cost=1227.70..2259.73
rows=2 width=116) (actual time=61.811..67.321 rows=436 loops=1)
                                      Hash Cond: (public.banners_links.id =
banners_banner_types.banner_id)
                                      ->  Hash Left Join
(cost=103.40..946.54 rows=2 width=81) (actual time=6.135..7.009 rows=331
loops=1)
                                            Hash Cond:
(public.banners_links.id = special_deals.id)
                                            Filter:
(special_deals.special_deal IS NULL)
                                            ->  Bitmap Heap Scan on
banners_links  (cost=6.86..816.67 rows=336 width=73) (actual
time=0.111..0.496 rows=336 loops=1)
                                                  Recheck Cond: (merchant_id
= 217)
                                                  ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..6.77 rows=336 width=0) (actual
time=0.079..0.079 rows=336 loops=1)
                                                        Index Cond:
(merchant_id = 217)
                                            ->  Hash  (cost=86.93..86.93
rows=769 width=16) (actual time=6.012..6.012 rows=780 loops=1)
                                                  ->  Subquery Scan
special_deals  (cost=69.62..86.93 rows=769 width=16) (actual
time=4.240..5.451 rows=780 loops=1)
                                                        ->  HashAggregate
(cost=69.62..79.24 rows=769 width=16) (actual time=4.239..4.748 rows=780
loops=1)
                                                              ->  Seq Scan
on banner_deals  (cost=0.00..53.75 rows=3175 width=16) (actual
time=0.006..1.485 rows=3175 loops=1)
                                      ->  Hash  (cost=673.83..673.83
rows=21158 width=43) (actual time=55.659..55.659 rows=22112 loops=1)
                                            ->  Hash Join
(cost=2.45..673.83 rows=21158 width=43) (actual time=0.047..36.885
rows=22112 loops=1)
                                                  Hash Cond:
(banners_banner_types.type_id = banner_types.id)
                                                  ->  Seq Scan on
banners_banner_types  (cost=0.00..376.40 rows=22240 width=16) (actual
time=0.005..10.653 rows=22240 loops=1)
                                                  ->  Hash  (cost=2.20..2.20
rows=20 width=43) (actual time=0.034..0.034 rows=20 loops=1)
                                                        ->  Seq Scan on
banner_types  (cost=0.00..2.20 rows=20 width=43) (actual time=0.003..0.016
rows=20 loops=1)
                                ->  Index Scan using
fetch_banners_banners_links_id_idx on fetch_banners  (cost=0.00..0.32 rows=1
width=78) (actual time=0.002..0.002 rows=0 loops=436)
                                      Index Cond: (public.banners_links.id =
public.fetch_banners.banners_links_id)
                          ->  Hash  (cost=494.34..494.34 rows=11 width=24)
(actual time=6.378..6.378 rows=336 loops=1)
                                ->  Subquery Scan users_banners_tot_sub
(cost=494.09..494.34 rows=11 width=24) (actual time=5.588..6.124 rows=336
loops=1)
                                      ->  HashAggregate
(cost=494.09..494.23 rows=11 width=24) (actual time=5.586..5.810 rows=336
loops=1)
                                            ->  Nested Loop
(cost=360.46..494.01 rows=11 width=24) (actual time=2.876..5.232 rows=336
loops=1)
                                                  ->  Bitmap Heap Scan on
users_banners  (cost=360.46..402.65 rows=11 width=16) (actual
time=2.863..3.133 rows=336 loops=1)
                                                        Recheck Cond:
((affiliate_id = 217) AND ((status)::text = '3'::text))
                                                        ->  BitmapAnd
(cost=360.46..360.46 rows=11 width=0) (actual time=2.842..2.842 rows=0
loops=1)
                                                              ->  Bitmap
Index Scan on users_banners_affiliate_id_idx  (cost=0.00..5.31 rows=138
width=0) (actual time=0.072..0.072 rows=350 loops=1)
                                                                    Index
Cond: (affiliate_id = 217)
                                                              ->  Bitmap
Index Scan on users_banners_status_idx  (cost=0.00..354.90 rows=19016
width=0) (actual time=2.741..2.741 rows=17406 loops=1)
                                                                    Index
Cond: ((status)::text = '3'::text)
                                                  ->  Index Scan using
users_banners_id_idx on users_banners_rotation  (cost=0.00..8.29 rows=1
width=16) (actual time=0.004..0.004 rows=1 loops=336)
                                                        Index Cond:
(users_banners_rotation.users_banners_id = users_banners.id)
                    ->  Index Scan using banners_org_id_banner.idx on
banners_org  (cost=0.00..4.28 rows=1 width=387) (actual time=0.003..0.004
rows=1 loops=436)
                          Index Cond: (public.banners_links.id =
banners_org.id_banner)
              ->  Materialize  (cost=618.88..698.81 rows=7993 width=20)
(actual time=0.000..3.299 rows=7923 loops=436)
                    ->  Index Scan using pre_calc_banners_status on
precalculated_stats_banners_links  (cost=0.00..530.96 rows=7993 width=30)
(actual time=0.025..26.349 rows=7923 loops=1)
                          Index Cond: (status = 4)
        ->  Bitmap Heap Scan on reward_ratings  (cost=4.27..15.33 rows=3
width=16) (actual time=0.005..0.005 rows=0 loops=436)
              Recheck Cond: (public.banners_links.id =
reward_ratings.banner_id)
              Filter: ((now() >= period_start) AND (now() <= period_end))
              ->  Bitmap Index Scan on reward_ratings_banner_id_idx
(cost=0.00..4.27 rows=3 width=0) (actual time=0.003..0.003 rows=0 loops=436)
                    Index Cond: (public.banners_links.id =
reward_ratings.banner_id)
Total runtime: 3204.016 ms


For the "banners_links.id = ecpc_per_banner_link.banners_links_id" join, it
chooses the dreaded Nested loop left join again, which takes up the bulk of
the query execution time.

After some fiddling and experimentation with the query, I found that if I
only removed both case statements in the "ecpc_per_banner_link" subquery, it
becomes fast again:

Sort  (cost=2875.63..2875.64 rows=6 width=599) (actual time=107.824..109.456
rows=1780 loops=1)
  Sort Key: public.banners_links.id
  ->  Nested Loop Left Join  (cost=1726.45..2875.55 rows=6 width=599)
(actual time=68.243..98.013 rows=1780 loops=1)
        ->  Nested Loop Left Join  (cost=1722.18..2783.30 rows=6 width=591)
(actual time=68.220..84.351 rows=1780 loops=1)
              ->  Nested Loop  (cost=1722.18..2763.47 rows=2 width=573)
(actual time=68.210..78.427 rows=436 loops=1)
                    ->  Hash Left Join  (cost=1722.18..2754.88 rows=2
width=194) (actual time=68.196..75.592 rows=436 loops=1)
                          Hash Cond: (public.banners_links.id =
users_banners_tot_sub.banner_id)
                          ->  Nested Loop Left Join  (cost=1227.70..2260.38
rows=2 width=186) (actual time=61.870..68.654 rows=436 loops=1)
                                ->  Hash Left Join  (cost=1227.70..2259.73
rows=2 width=116) (actual time=61.859..67.140 rows=436 loops=1)
                                      Hash Cond: (public.banners_links.id =
banners_banner_types.banner_id)
                                      ->  Hash Left Join
(cost=103.40..946.54 rows=2 width=81) (actual time=6.099..6.944 rows=331
loops=1)
                                            Hash Cond:
(public.banners_links.id = special_deals.id)
                                            Filter:
(special_deals.special_deal IS NULL)
                                            ->  Bitmap Heap Scan on
banners_links  (cost=6.86..816.67 rows=336 width=73) (actual
time=0.105..0.451 rows=336 loops=1)
                                                  Recheck Cond: (merchant_id
= 217)
                                                  ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..6.77 rows=336 width=0) (actual
time=0.073..0.073 rows=336 loops=1)
                                                        Index Cond:
(merchant_id = 217)
                                            ->  Hash  (cost=86.93..86.93
rows=769 width=16) (actual time=5.989..5.989 rows=780 loops=1)
                                                  ->  Subquery Scan
special_deals  (cost=69.62..86.93 rows=769 width=16) (actual
time=4.225..5.445 rows=780 loops=1)
                                                        ->  HashAggregate
(cost=69.62..79.24 rows=769 width=16) (actual time=4.223..4.742 rows=780
loops=1)
                                                              ->  Seq Scan
on banner_deals  (cost=0.00..53.75 rows=3175 width=16) (actual
time=0.006..1.484 rows=3175 loops=1)
                                      ->  Hash  (cost=673.83..673.83
rows=21158 width=43) (actual time=55.750..55.750 rows=22112 loops=1)
                                            ->  Hash Join
(cost=2.45..673.83 rows=21158 width=43) (actual time=0.042..36.943
rows=22112 loops=1)
                                                  Hash Cond:
(banners_banner_types.type_id = banner_types.id)
                                                  ->  Seq Scan on
banners_banner_types  (cost=0.00..376.40 rows=22240 width=16) (actual
time=0.005..10.791 rows=22240 loops=1)
                                                  ->  Hash  (cost=2.20..2.20
rows=20 width=43) (actual time=0.032..0.032 rows=20 loops=1)
                                                        ->  Seq Scan on
banner_types  (cost=0.00..2.20 rows=20 width=43) (actual time=0.004..0.016
rows=20 loops=1)
                                ->  Index Scan using
fetch_banners_banners_links_id_idx on fetch_banners  (cost=0.00..0.32 rows=1
width=78) (actual time=0.002..0.002 rows=0 loops=436)
                                      Index Cond: (public.banners_links.id =
public.fetch_banners.banners_links_id)
                          ->  Hash  (cost=494.34..494.34 rows=11 width=24)
(actual time=6.312..6.312 rows=336 loops=1)
                                ->  Subquery Scan users_banners_tot_sub
(cost=494.09..494.34 rows=11 width=24) (actual time=5.519..6.056 rows=336
loops=1)
                                      ->  HashAggregate
(cost=494.09..494.23 rows=11 width=24) (actual time=5.518..5.747 rows=336
loops=1)
                                            ->  Nested Loop
(cost=360.46..494.01 rows=11 width=24) (actual time=2.814..5.166 rows=336
loops=1)
                                                  ->  Bitmap Heap Scan on
users_banners  (cost=360.46..402.65 rows=11 width=16) (actual
time=2.801..3.079 rows=336 loops=1)
                                                        Recheck Cond:
((affiliate_id = 217) AND ((status)::text = '3'::text))
                                                        ->  BitmapAnd
(cost=360.46..360.46 rows=11 width=0) (actual time=2.781..2.781 rows=0
loops=1)
                                                              ->  Bitmap
Index Scan on users_banners_affiliate_id_idx  (cost=0.00..5.31 rows=138
width=0) (actual time=0.088..0.088 rows=350 loops=1)
                                                                    Index
Cond: (affiliate_id = 217)
                                                              ->  Bitmap
Index Scan on users_banners_status_idx  (cost=0.00..354.90 rows=19016
width=0) (actual time=2.673..2.673 rows=17406 loops=1)
                                                                    Index
Cond: ((status)::text = '3'::text)
                                                  ->  Index Scan using
users_banners_id_idx on users_banners_rotation  (cost=0.00..8.29 rows=1
width=16) (actual time=0.004..0.004 rows=1 loops=336)
                                                        Index Cond:
(users_banners_rotation.users_banners_id = users_banners.id)
                    ->  Index Scan using banners_org_id_banner.idx on
banners_org  (cost=0.00..4.28 rows=1 width=387) (actual time=0.004..0.004
rows=1 loops=436)
                          Index Cond: (public.banners_links.id =
banners_org.id_banner)
              ->  Index Scan using pre_calc_banners_id on
precalculated_stats_banners_links  (cost=0.00..9.87 rows=4 width=22) (actual
time=0.004..0.008 rows=4 loops=436)
                    Index Cond: (public.banners_links.id =
precalculated_stats_banners_links.banners_links_id)
        ->  Bitmap Heap Scan on reward_ratings  (cost=4.27..15.33 rows=3
width=16) (actual time=0.004..0.004 rows=0 loops=1780)
              Recheck Cond: (public.banners_links.id =
reward_ratings.banner_id)
              Filter: ((now() >= period_start) AND (now() <= period_end))
              ->  Bitmap Index Scan on reward_ratings_banner_id_idx
(cost=0.00..4.27 rows=3 width=0) (actual time=0.003..0.003 rows=1
loops=1780)
                    Index Cond: (public.banners_links.id =
reward_ratings.banner_id)
Total runtime: 111.046 ms

I'm really having a hard time with this query. Every time when I change the
smallest of things, the query time dramatically jumps up or down.

I understand that some calculations just take time, but it seems to me that
its not the particular things I actually do that makes the difference, but
the fact that some actions or structure of the query 'just happen' to force
a bad plan while others just happen to force a good plan. With my limited
knowledge I absolutely see no connection between what actions influence what
and why.

Could someone shed some light on this issue?

_________________________________________________________________
Play online games with your friends with Messenger
http://www.join.msn.com/messenger/overview