Обсуждение: too complex query plan for not exists query and multicolumn indexes

Поиск
Список
Период
Сортировка

too complex query plan for not exists query and multicolumn indexes

От
Corin
Дата:
Hi all!

While evaluting the pgsql query planer I found some weird behavior of
the query planer. I think it's plan is way too complex and could much
faster?

CREATE TABLE friends (
    id integer NOT NULL,
    user_id integer NOT NULL,
    ref_id integer NOT NULL,
);

ALTER TABLE ONLY friends ADD CONSTRAINT friends_pkey PRIMARY KEY (id);
CREATE INDEX user_ref ON friends USING btree (user_id, ref_id);

I fill this table with around 2.800.000 random rows (values between 1
and 500.000 for user_id, ref_id).

The intention of the query is to find rows with no "partner" row. The
offset and limit are just to ignore the time needed to send the result
to the client.

SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS
f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id) OFFSET 1000000
LIMIT 1

Mysql uses this query plan:
1     PRIMARY     f1     index      NULL     user_ref     8      NULL
    2818860     Using where; Using index
2     DEPENDENT SUBQUERY     f2     ref     user_ref     user_ref     8
    f1.ref_id,f1.user_id     1     Using index
Time: 9.8s

Postgre uses this query plan:
"Limit  (cost=66681.50..66681.50 rows=1 width=139) (actual
time=7413.489..7413.489 rows=1 loops=1)"
"  ->  Merge Anti Join  (cost=40520.17..66681.50 rows=367793 width=139)
(actual time=3705.078..7344.256 rows=1000001 loops=1)"
"        Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id =
f2.user_id))"
"        ->  Index Scan using user_ref on friends f1
(cost=0.00..26097.86 rows=2818347 width=139) (actual
time=0.093..1222.592 rows=1917360 loops=1)"
"        ->  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
(actual time=3704.977..5043.347 rows=1990148 loops=1)"
"              ->  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
(actual time=3704.970..4710.703 rows=1990148 loops=1)"
"                    Sort Key: f2.ref_id, f2.user_id"
"                    Sort Method:  external merge  Disk: 49576kB"
"                    ->  Seq Scan on friends f2  (cost=0.00..18143.18
rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1)"
"Total runtime: 7422.516 ms"

It's already faster, which is great, but I wonder why the query plan is
that complex.

I read in the pqsql docs that using a multicolumn key is almost never
needed and only a waste of cpu/space. So I dropped the multicolumn key
and added to separate keys instead:

CREATE INDEX ref1 ON friends USING btree (ref_id);
CREATE INDEX user1 ON friends USING btree (user_id);

New query plan:
"Limit  (cost=70345.04..70345.04 rows=1 width=139) (actual
time=43541.709..43541.709 rows=1 loops=1)"
"  ->  Merge Anti Join  (cost=40520.27..70345.04 rows=367793 width=139)
(actual time=3356.694..43467.818 rows=1000001 loops=1)"
"        Merge Cond: (f1.user_id = f2.ref_id)"
"        Join Filter: (f1.ref_id = f2.user_id)"
"        ->  Index Scan using user1 on friends f1  (cost=0.00..26059.79
rows=2818347 width=139) (actual time=0.031..1246.668 rows=1917365 loops=1)"
"        ->  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
(actual time=3356.615..14941.405 rows=130503729 loops=1)"
"              ->  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
(actual time=3356.611..4127.435 rows=1990160 loops=1)"
"                    Sort Key: f2.ref_id"
"                    Sort Method:  external merge  Disk: 49560kB"
"                    ->  Seq Scan on friends f2  (cost=0.00..18143.18
rows=2818347 width=8) (actual time=0.012..496.174 rows=2818347 loops=1)"
"Total runtime: 43550.187 ms"

As one can see it's much much slower and only uses one key, not both. I
thought performance should be almost equal.

I also wonder why it makes a difference when adding a "LIMIT" clause to
the subselect in an EXISTS subselect. Shouldn't pgsql always stop after
finding the a row? In mysql is makes no difference in speed, pgsql even
get's slower when adding a LIMIT to the EXISTS subselect (I hoped it
would get faster?!).

SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS
f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id LIMIT 1) OFFSET
1000000 LIMIT 1

"Limit  (cost=6389166.19..6389172.58 rows=1 width=139) (actual
time=54540.356..54540.356 rows=1 loops=1)"
"  ->  Seq Scan on friends f1  (cost=0.00..9003446.87 rows=1409174
width=139) (actual time=0.511..54460.006 rows=1000001 loops=1)"
"        Filter: (NOT (SubPlan 1))"
"        SubPlan 1"
"          ->  Limit  (cost=2.18..3.19 rows=1 width=0) (actual
time=0.029..0.029 rows=0 loops=1832284)"
"                ->  Bitmap Heap Scan on friends f2  (cost=2.18..3.19
rows=1 width=0) (actual time=0.028..0.028 rows=0 loops=1832284)"
"                      Recheck Cond: (($0 = ref_id) AND ($1 = user_id))"
"                      ->  BitmapAnd  (cost=2.18..2.18 rows=1 width=0)
(actual time=0.027..0.027 rows=0 loops=1832284)"
"                            ->  Bitmap Index Scan on ref1
(cost=0.00..1.09 rows=75 width=0) (actual time=0.011..0.011 rows=85
loops=1832284)"
"                                  Index Cond: ($0 = ref_id)"
"                            ->  Bitmap Index Scan on user1
(cost=0.00..1.09 rows=87 width=0) (actual time=0.011..0.011 rows=87
loops=1737236)"
"                                  Index Cond: ($1 = user_id)"
"Total runtime: 54540.431 ms"

As in my previous tests, this is only a testing environment: so all data
is in memory, no disk activity involved at all, no swap etc.

Thanks,
Corin

BTW: I'll respond to the answers of my previous post later.


Re: too complex query plan for not exists query and multicolumn indexes

От
"Kevin Grittner"
Дата:
Corin <wakathane@gmail.com> wrote:

> It's already faster, which is great, but I wonder why the query
> plan is that complex.

Because that's the plan, out of all the ways the planner knows to
get the requested result set, which was estimated to cost the least.
If it isn't actually the fastest, that might suggest that you
should adjust your costing model.  Could you tell us more about the
machine?  Especially useful would be the amount of RAM, what else is
running on the machine, and what the disk system looks like.  The
default configuration is almost never optimal for serious production
-- it's designed to behave reasonably if someone installs on their
desktop PC to try it out.

> I read in the pqsql docs that using a multicolumn key is almost
> never needed and only a waste of cpu/space.

Where in the docs did you see that?

> As in my previous tests, this is only a testing environment: so
> all data is in memory, no disk activity involved at all, no swap
> etc.

Ah, that suggests possible configuration changes.  You can try these
out in the session to see the impact, and modify postgresql.conf if
they work out.

seq_page_cost = 0.01
random_page_cost = 0.01
effective_cache_size = <about 3/4 of your machine's RAM>

Also, make sure that you run VACUUM ANALYZE against the table after
initially populating it and before your benchmarks; otherwise you
might inadvertently include transient or one-time maintenance costs
to some benchmarks, or distort behavior by not yet having the
statistics present for sane optimizer choices.

-Kevin

Re: too complex query plan for not exists query and multicolumn indexes

От
Stephen Frost
Дата:
Corin,

* Corin (wakathane@gmail.com) wrote:
> I fill this table with around 2.800.000 random rows (values between 1
> and 500.000 for user_id, ref_id).

Using random data really isn't a good test.

> The intention of the query is to find rows with no "partner" row. The
> offset and limit are just to ignore the time needed to send the result
> to the client.

Setting offset and/or limit changes the behaviour of the query.  What's
important are the queries your application will actually be doing.
These kind of arbitrary tests really aren't a good idea.

> SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS
> f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id) OFFSET 1000000
> LIMIT 1
>
> Mysql uses this query plan:
> 1     PRIMARY     f1     index      NULL     user_ref     8      NULL
> 2818860     Using where; Using index
> 2     DEPENDENT SUBQUERY     f2     ref     user_ref     user_ref     8
>  f1.ref_id,f1.user_id     1     Using index
> Time: 9.8s

Yeah, that query plan basically doesn't tell you diddly about what's
going on, if you ask me.  The impression I get from this is that it's
using the index to do an in-order traversal of the table (why I'm not
sure..) and then using the index in the subquery to look up each record
in the table one-by-one.  This isn't terribly efficient, and PG manages
to beat it by being smarter- even with the handicap that it has to go to
an external on-disk sort (see later on, and how to fix that).

> Postgre uses this query plan:
> "Limit  (cost=66681.50..66681.50 rows=1 width=139) (actual
> time=7413.489..7413.489 rows=1 loops=1)"
> "  ->  Merge Anti Join  (cost=40520.17..66681.50 rows=367793 width=139)
> (actual time=3705.078..7344.256 rows=1000001 loops=1)"
> "        Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id =
> f2.user_id))"
> "        ->  Index Scan using user_ref on friends f1
> (cost=0.00..26097.86 rows=2818347 width=139) (actual
> time=0.093..1222.592 rows=1917360 loops=1)"
> "        ->  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
> (actual time=3704.977..5043.347 rows=1990148 loops=1)"
> "              ->  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
> (actual time=3704.970..4710.703 rows=1990148 loops=1)"
> "                    Sort Key: f2.ref_id, f2.user_id"
> "                    Sort Method:  external merge  Disk: 49576kB"
> "                    ->  Seq Scan on friends f2  (cost=0.00..18143.18
> rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1)"
> "Total runtime: 7422.516 ms"
>
> It's already faster, which is great, but I wonder why the query plan is
> that complex.

Uh, I'm not sure what you're looking at, but that simply isn't a very
complex query plan.  I think what's different is that PG is telling you
alot more about what is going on than MySQL does.  What's happening here
is this:

Sort the table by ref_id and user_id

Then, do an in-order index traversal using the user_ref index, comparing
each row from the sorted output to each row of the index traversal.  It
does that using a merge anti-join (essentially, return records where
they don't match, which is what you want).  Then it limits the results,
since you asked it to.

If you had an index on ref_id,user_id (as well as the one on
user_id,ref_id), it'd probably be able to do in-order index traversals
on both and be really fast...  But then updates would be more expensive,
of course, since it'd have more indexes to maintain.

> I read in the pqsql docs that using a multicolumn key is almost never
> needed and only a waste of cpu/space. So I dropped the multicolumn key
> and added to separate keys instead:

Uh, no, that's not always the case.  It's certainly not something you
can just generalize that simply.  It's true that PG can use bitmap index
scans now, which allow it to do individual lookups using multiple
indexes even when they're not a composite index, but that capability
doesn't allow in-order index traversals across multiple columns.

> CREATE INDEX ref1 ON friends USING btree (ref_id);
> CREATE INDEX user1 ON friends USING btree (user_id);
>
> New query plan:
> "Limit  (cost=70345.04..70345.04 rows=1 width=139) (actual
> time=43541.709..43541.709 rows=1 loops=1)"
> "  ->  Merge Anti Join  (cost=40520.27..70345.04 rows=367793 width=139)
> (actual time=3356.694..43467.818 rows=1000001 loops=1)"
> "        Merge Cond: (f1.user_id = f2.ref_id)"
> "        Join Filter: (f1.ref_id = f2.user_id)"
> "        ->  Index Scan using user1 on friends f1  (cost=0.00..26059.79
> rows=2818347 width=139) (actual time=0.031..1246.668 rows=1917365
> loops=1)"
> "        ->  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
> (actual time=3356.615..14941.405 rows=130503729 loops=1)"
> "              ->  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
> (actual time=3356.611..4127.435 rows=1990160 loops=1)"
> "                    Sort Key: f2.ref_id"
> "                    Sort Method:  external merge  Disk: 49560kB"
> "                    ->  Seq Scan on friends f2  (cost=0.00..18143.18
> rows=2818347 width=8) (actual time=0.012..496.174 rows=2818347 loops=1)"
> "Total runtime: 43550.187 ms"
>
> As one can see it's much much slower and only uses one key, not both. I
> thought performance should be almost equal.

It only uses 1 key because it can't use 2 independent indexes to do an
in-order index traversal of the combination.  It's an interesting
question as to why it didn't use the ref_id index instead of sorting
though, for that half of the merge anti-join, are you sure that you ran
'analyze;' on the table after creating this set of indexes?

> I also wonder why it makes a difference when adding a "LIMIT" clause to
> the subselect in an EXISTS subselect. Shouldn't pgsql always stop after
> finding the a row? In mysql is makes no difference in speed, pgsql even
> get's slower when adding a LIMIT to the EXISTS subselect (I hoped it
> would get faster?!).

PG is smarter than you're giving it credit for.  It's not just going
through each row of table A and then doing a single-row lookup in table
B for that row.  You *force* it to do that when you put a 'limit 1'
inside the sub-select, hence the performance goes into the toilet.  You
can see that from the query plan.

> SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS
> f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id LIMIT 1) OFFSET
> 1000000 LIMIT 1
>
> "Limit  (cost=6389166.19..6389172.58 rows=1 width=139) (actual
> time=54540.356..54540.356 rows=1 loops=1)"
> "  ->  Seq Scan on friends f1  (cost=0.00..9003446.87 rows=1409174
> width=139) (actual time=0.511..54460.006 rows=1000001 loops=1)"
> "        Filter: (NOT (SubPlan 1))"
> "        SubPlan 1"
> "          ->  Limit  (cost=2.18..3.19 rows=1 width=0) (actual
> time=0.029..0.029 rows=0 loops=1832284)"
> "                ->  Bitmap Heap Scan on friends f2  (cost=2.18..3.19
> rows=1 width=0) (actual time=0.028..0.028 rows=0 loops=1832284)"
> "                      Recheck Cond: (($0 = ref_id) AND ($1 = user_id))"
> "                      ->  BitmapAnd  (cost=2.18..2.18 rows=1 width=0)
> (actual time=0.027..0.027 rows=0 loops=1832284)"
> "                            ->  Bitmap Index Scan on ref1
> (cost=0.00..1.09 rows=75 width=0) (actual time=0.011..0.011 rows=85
> loops=1832284)"
> "                                  Index Cond: ($0 = ref_id)"
> "                            ->  Bitmap Index Scan on user1
> (cost=0.00..1.09 rows=87 width=0) (actual time=0.011..0.011 rows=87
> loops=1737236)"
> "                                  Index Cond: ($1 = user_id)"
> "Total runtime: 54540.431 ms"

This says "Ok, we're going to step through each row of friends, and then
run a subplan on each one of those records".  That's a *horrible* way to
implement this query, since it's basically asking for you to test all
the records in the table.  If there was some selectivity to this (eg: a
WHERE clause for a specific user_id or something), it'd be alot more
sane to use this approach.  You'll note above that it does actually end
up using both of your indexes when it's called this way, combining them
using a BitmapAnd.  For one-off lookups with specific values (as you're
forcing to happen here), you can have independent indexes that both get
used.  That's what the hint you're talking about above was trying to
point out.

Note also that, if I'm right, this is essentially the same query plan
that MySQL used above, but with alot more specifics about what's
actually happening (it's not really any more complicated).

> As in my previous tests, this is only a testing environment: so all data
> is in memory, no disk activity involved at all, no swap etc.

Yeahhhh, system calls still aren't free.  I would recommend, if you care
about this query, bumping up your work_mem setting for it.  Right now,
PG is using an external sort (meaning- on-disk), but the data set
appears to only be like 50M (49560kB).  If you increased work_mem to,
say, 128MB (for this query, maybe or maybe not for the entire system),
it'd be able to do an in-memory sort (or maybe a hash or something else,
if it makes sense), which would be faster.

I'd probably rewrite this as a left-join too, to be honest, but based on
what I'm saying, that'd probably get the same query plan as you had
first anyway (the merge anti-join), so it's probably not necessary.  I'd
love to hear how PG performs with work_mem bumped up to something
decent...

    Thanks,

        Stephen

Вложения

Re: too complex query plan for not exists query and multicolumn indexes

От
Dave Crooke
Дата:
K.I.S.S. here ..... the best way to do one of these in most DB's is typically an outer join and test for null:

select f1.* from friends f1
   left outer join friends f2 on (f1.user_id=f2.ref_id and f1.ref_id=f2.user_id)
   where f2.id is null;

On Fri, Mar 19, 2010 at 7:26 AM, Corin <wakathane@gmail.com> wrote:
Hi all!

While evaluting the pgsql query planer I found some weird behavior of the query planer. I think it's plan is way too complex and could much faster?

CREATE TABLE friends (
  id integer NOT NULL,
  user_id integer NOT NULL,
  ref_id integer NOT NULL,
);

ALTER TABLE ONLY friends ADD CONSTRAINT friends_pkey PRIMARY KEY (id);
CREATE INDEX user_ref ON friends USING btree (user_id, ref_id);

I fill this table with around 2.800.000 random rows (values between 1 and 500.000 for user_id, ref_id).

The intention of the query is to find rows with no "partner" row. The offset and limit are just to ignore the time needed to send the result to the client.

SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id) OFFSET 1000000 LIMIT 1

<snip>
 

Re: too complex query plan for not exists query and multicolumn indexes

От
Matthew Wakeling
Дата:
On Fri, 19 Mar 2010, Stephen Frost wrote:
> ...it has to go to an external on-disk sort (see later on, and how to
> fix that).

This was covered on this list a few months ago, in
http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and
http://archives.postgresql.org/pgsql-performance/2009-08/msg00189.php

There seemed to be some consensus that allowing a materialise in front of
an index scan might have been a good change. Was there any movement on
this front?

>> "Limit  (cost=66681.50..66681.50 rows=1 width=139) (actual
>> time=7413.489..7413.489 rows=1 loops=1)"
>> "  ->  Merge Anti Join  (cost=40520.17..66681.50 rows=367793 width=139)
>> (actual time=3705.078..7344.256 rows=1000001 loops=1)"
>> "        Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id =
>> f2.user_id))"
>> "        ->  Index Scan using user_ref on friends f1
>> (cost=0.00..26097.86 rows=2818347 width=139) (actual
>> time=0.093..1222.592 rows=1917360 loops=1)"
>> "        ->  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
>> (actual time=3704.977..5043.347 rows=1990148 loops=1)"
>> "              ->  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
>> (actual time=3704.970..4710.703 rows=1990148 loops=1)"
>> "                    Sort Key: f2.ref_id, f2.user_id"
>> "                    Sort Method:  external merge  Disk: 49576kB"
>> "                    ->  Seq Scan on friends f2  (cost=0.00..18143.18
>> rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1)"
>> "Total runtime: 7422.516 ms"

> If you had an index on ref_id,user_id (as well as the one on
> user_id,ref_id), it'd probably be able to do in-order index traversals
> on both and be really fast...  But then updates would be more expensive,
> of course, since it'd have more indexes to maintain.

That isn't necessarily so, until the issue referred to in the above linked
messages is resolved. It depends.

Matthew

--
 I've run DOOM more in the last few days than I have the last few
 months.  I just love debugging ;-)  -- Linus Torvalds

Re: too complex query plan for not exists query and multicolumn indexes

От
Tom Lane
Дата:
Matthew Wakeling <matthew@flymine.org> writes:
> On Fri, 19 Mar 2010, Stephen Frost wrote:
>> ...it has to go to an external on-disk sort (see later on, and how to
>> fix that).

> This was covered on this list a few months ago, in
> http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and
> http://archives.postgresql.org/pgsql-performance/2009-08/msg00189.php

> There seemed to be some consensus that allowing a materialise in front of
> an index scan might have been a good change. Was there any movement on
> this front?

Yes, 9.0 will consider plans like

 Merge Join  (cost=0.00..14328.70 rows=1000000 width=488)
   Merge Cond: (a.four = b.hundred)
   ->  Index Scan using fouri on tenk1 a  (cost=0.00..1635.62 rows=10000 width=244)
   ->  Materialize  (cost=0.00..1727.16 rows=10000 width=244)
         ->  Index Scan using tenk1_hundred on tenk1 b  (cost=0.00..1702.16 rows
=10000 width=244)

Some experimentation shows that it won't insert the materialize unless
quite a bit of re-fetching is predicted (ie neither side of the join is
unique).  We might need to tweak the cost parameters once we get some
field experience with it.

            regards, tom lane