Обсуждение: Odd performance / query plan with bitmasked field as opposed to equality

Поиск
Список
Период
Сортировка

Odd performance / query plan with bitmasked field as opposed to equality

От
Frank Joerdens
Дата:
I can't figure what is going on below; first of all, this count  which
returns 1.5 million from a ~2 million row table:

woome=# explain analyze SELECT COUNT(*) FROM "webapp_person" WHERE
"webapp_person"."permissionflags" =
B'0000000000001111111111111111111111111111'::"bit";
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=125774.83..125774.84 rows=1 width=0) (actual
time=2976.405..2976.405 rows=1 loops=1)
   ->  Seq Scan on webapp_person  (cost=0.00..122041.10 rows=1493490
width=0) (actual time=0.019..2781.735 rows=1518635 loops=1)
         Filter: (permissionflags =
B'0000000000001111111111111111111111111111'::"bit")
 Total runtime: 2976.475 ms
(4 rows)

is slower than

woome=# explain analyze SELECT COUNT(*) FROM "webapp_person" WHERE
"webapp_person"."permissionflags" &
b'0000000000000000000000000000000000000100' =
b'0000000000000000000000000000000000000100';

            QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=49341.55..49341.56 rows=1 width=0) (actual
time=1035.226..1035.226 rows=1 loops=1)
   ->  Bitmap Heap Scan on webapp_person  (cost=26280.49..49316.11
rows=10174 width=0) (actual time=221.672..872.037 rows=1518630
loops=1)
         Recheck Cond: ((permissionflags &
B'0000000000000000000000000000000000000100'::"bit") =
B'0000000000000000000000000000000000000100'::"bit")
         ->  Bitmap Index Scan on
webapp_person_permissionflags_bitmasked0100_idx  (cost=0.00..26277.95
rows=10174 width=0) (actual time=186.596..186.596 rows=1574558
loops=1)
 Total runtime: 1035.328 ms
(5 rows)

with both a straight btree index on the permissionflags column, and a
conditional index that matches the where clause in the 2nd example.
Both queries return the same result because with current data, the
only one value in the table which matches the bitmask. How come the
more complicated one is 3x faster? Maybe the size of the conditional
index? But why does the first form not use the conditional index?

Now if I add a straight join with another ~300k row table:

woome=# explain analyze SELECT COUNT(*) FROM webapp_person join
webapp_report on webapp_person.id = webapp_report.reported_id WHERE
webapp_report.crime='IP_SPAM' and webapp_person.permissionflags =
b'0000000000001111111111111111111111111111';
                                                                 QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=150870.81..150870.82 rows=1 width=0) (actual
time=4024.475..4024.475 rows=1 loops=1)
   ->  Hash Join  (cost=140740.92..150531.01 rows=135922 width=0)
(actual time=3601.576..4013.332 rows=91126 loops=1)
         Hash Cond: (webapp_report.reported_id = webapp_person.id)
         ->  Seq Scan on webapp_report  (cost=0.00..6579.06
rows=185180 width=4) (actual time=0.024..88.038 rows=183558 loops=1)
               Filter: ((crime)::text = 'IP_SPAM'::text)
         ->  Hash  (cost=122059.09..122059.09 rows=1494547 width=4)
(actual time=3600.877..3600.877 rows=1518724 loops=1)
               ->  Seq Scan on webapp_person  (cost=0.00..122059.09
rows=1494547 width=4) (actual time=0.011..2984.683 rows=1518724
loops=1)
                     Filter: (permissionflags =
B'0000000000001111111111111111111111111111'::"bit")
 Total runtime: 4043.415 ms
(9 rows)

it adds a couple of seconds but the execution time stays within the
same order of magnitude. However if I now replace the straight
equality comparison there with the bitmasked comparison, I get

woome=# explain analyze SELECT COUNT(*) FROM webapp_person join
webapp_report on webapp_person.id = webapp_report.reported_id WHERE
webapp_report.crime='IP_SPAM' and webapp_person.permissionflags &
b'0000000000000000000000000000000000000100' =
b'0000000000000000000000000000000000000100';

                  QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=58927.28..58927.29 rows=1 width=0) (actual
time=58004.762..58004.762 rows=1 loops=1)
   ->  Hash Join  (cost=49558.94..58924.96 rows=926 width=0) (actual
time=1505.944..57978.469 rows=91132 loops=1)
         Hash Cond: (webapp_report.reported_id = webapp_person.id)
         ->  Seq Scan on webapp_report  (cost=0.00..6579.06
rows=185180 width=4) (actual time=0.030..201.187 rows=183564 loops=1)
               Filter: ((crime)::text = 'IP_SPAM'::text)
         ->  Hash  (cost=49431.68..49431.68 rows=10181 width=4)
(actual time=1505.462..1505.462 rows=1518756 loops=1)
               ->  Bitmap Heap Scan on webapp_person
(cost=26383.65..49431.68 rows=10181 width=4) (actual
time=225.114..1058.692 rows=1518756 loops=1)
                     Recheck Cond: ((permissionflags &
B'0000000000000000000000000000000000000100'::"bit") =
B'0000000000000000000000000000000000000100'::"bit")
                     ->  Bitmap Index Scan on
webapp_person_permissionflags_bitmasked0100_idx  (cost=0.00..26381.10
rows=10181 width=0) (actual time=188.089..188.089 rows=1579945
loops=1)
 Total runtime: 58004.897 ms
(10 rows)

Time: 58024.124 ms

It takes almost a minute to run. My first question is, why is the
actual execution time for the hash join in the last example 15x higher
with the bitmasked condition than with the straight equality version
of the query above? This being even more puzzling since the
performance relationship between bitmask and equality is reversed if I
omit the join altogether, presumably because the conditional index
gets used ... however if I define a conditional index on the column
for the value that matches the bitmask, it still doesn't get used with
equality ...

I'm generally interested in the behaviour/performance of such bitmask
fields with and without indexes as we've started using it a lot. The
above seems to suggest there a things to keep in mind and to observe.

Regards,

Frank

Re: Odd performance / query plan with bitmasked field as opposed to equality

От
Robert Haas
Дата:
On Mon, Jul 13, 2009 at 4:46 PM, Frank Joerdens<frank@joerdens.de> wrote:
> I can't figure what is going on below; first of all, this count  which
> returns 1.5 million from a ~2 million row table:
>
> woome=# explain analyze SELECT COUNT(*) FROM "webapp_person" WHERE
> "webapp_person"."permissionflags" =
> B'0000000000001111111111111111111111111111'::"bit";
>                                                           QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=125774.83..125774.84 rows=1 width=0) (actual
> time=2976.405..2976.405 rows=1 loops=1)
>   ->  Seq Scan on webapp_person  (cost=0.00..122041.10 rows=1493490
> width=0) (actual time=0.019..2781.735 rows=1518635 loops=1)
>         Filter: (permissionflags =
> B'0000000000001111111111111111111111111111'::"bit")
>  Total runtime: 2976.475 ms
> (4 rows)

There are two possibilities here: the planner thinks it CAN'T use the
relevant index for this query, or it thinks that the index will be
slower than just seq-scaning the whole table.  To figure out which it
is, try EXPLAIN ANALYZE again with enable_seqscan set to false (note:
this is a bad idea in general, but useful for debugging).  If you
still get a seqscan anyway, then there's some reason why it thinks
that it can't use the index (which we can investigate).  If that makes
it switch to an index scan, then you can try adjusting your cost
parameters.  But the first thing is to figure out which kind of
problem you have.  In any case, send the output to the list.

Solving this problem will probably shed some light on the other things
in your original email, so I'm not going to specifically address each
one at this point.

...Robert