Re: too complex query plan for not exists query and multicolumn indexes

От: Stephen Frost
Тема: Re: too complex query plan for not exists query and multicolumn indexes
Дата: ,
Msg-id: 20100319182750.GQ21875@tamriel.snowman.net
(см: обсуждение, исходный текст)
Ответ на: too complex query plan for not exists query and multicolumn indexes  (Corin)
Ответы: Re: too complex query plan for not exists query and multicolumn indexes  (Matthew Wakeling)
Список: pgsql-performance


Corin,

* Corin () wrote:
> I fill this table with around 2.800.000 random rows (values between 1
> and 500.000 for user_id, ref_id).

Using random data really isn't a good test.

> The intention of the query is to find rows with no "partner" row. The
> offset and limit are just to ignore the time needed to send the result
> to the client.

Setting offset and/or limit changes the behaviour of the query.  What's
important are the queries your application will actually be doing.
These kind of arbitrary tests really aren't a good idea.

> SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS
> f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id) OFFSET 1000000
> LIMIT 1
>
> Mysql uses this query plan:
> 1     PRIMARY     f1     index      NULL     user_ref     8      NULL
> 2818860     Using where; Using index
> 2     DEPENDENT SUBQUERY     f2     ref     user_ref     user_ref     8
>  f1.ref_id,f1.user_id     1     Using index
> Time: 9.8s

Yeah, that query plan basically doesn't tell you diddly about what's
going on, if you ask me.  The impression I get from this is that it's
using the index to do an in-order traversal of the table (why I'm not
sure..) and then using the index in the subquery to look up each record
in the table one-by-one.  This isn't terribly efficient, and PG manages
to beat it by being smarter- even with the handicap that it has to go to
an external on-disk sort (see later on, and how to fix that).

> Postgre uses this query plan:
> "Limit  (cost=66681.50..66681.50 rows=1 width=139) (actual
> time=7413.489..7413.489 rows=1 loops=1)"
> "  ->  Merge Anti Join  (cost=40520.17..66681.50 rows=367793 width=139)
> (actual time=3705.078..7344.256 rows=1000001 loops=1)"
> "        Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id =
> f2.user_id))"
> "        ->  Index Scan using user_ref on friends f1
> (cost=0.00..26097.86 rows=2818347 width=139) (actual
> time=0.093..1222.592 rows=1917360 loops=1)"
> "        ->  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
> (actual time=3704.977..5043.347 rows=1990148 loops=1)"
> "              ->  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
> (actual time=3704.970..4710.703 rows=1990148 loops=1)"
> "                    Sort Key: f2.ref_id, f2.user_id"
> "                    Sort Method:  external merge  Disk: 49576kB"
> "                    ->  Seq Scan on friends f2  (cost=0.00..18143.18
> rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1)"
> "Total runtime: 7422.516 ms"
>
> It's already faster, which is great, but I wonder why the query plan is
> that complex.

Uh, I'm not sure what you're looking at, but that simply isn't a very
complex query plan.  I think what's different is that PG is telling you
alot more about what is going on than MySQL does.  What's happening here
is this:

Sort the table by ref_id and user_id

Then, do an in-order index traversal using the user_ref index, comparing
each row from the sorted output to each row of the index traversal.  It
does that using a merge anti-join (essentially, return records where
they don't match, which is what you want).  Then it limits the results,
since you asked it to.

If you had an index on ref_id,user_id (as well as the one on
user_id,ref_id), it'd probably be able to do in-order index traversals
on both and be really fast...  But then updates would be more expensive,
of course, since it'd have more indexes to maintain.

> I read in the pqsql docs that using a multicolumn key is almost never
> needed and only a waste of cpu/space. So I dropped the multicolumn key
> and added to separate keys instead:

Uh, no, that's not always the case.  It's certainly not something you
can just generalize that simply.  It's true that PG can use bitmap index
scans now, which allow it to do individual lookups using multiple
indexes even when they're not a composite index, but that capability
doesn't allow in-order index traversals across multiple columns.

> CREATE INDEX ref1 ON friends USING btree (ref_id);
> CREATE INDEX user1 ON friends USING btree (user_id);
>
> New query plan:
> "Limit  (cost=70345.04..70345.04 rows=1 width=139) (actual
> time=43541.709..43541.709 rows=1 loops=1)"
> "  ->  Merge Anti Join  (cost=40520.27..70345.04 rows=367793 width=139)
> (actual time=3356.694..43467.818 rows=1000001 loops=1)"
> "        Merge Cond: (f1.user_id = f2.ref_id)"
> "        Join Filter: (f1.ref_id = f2.user_id)"
> "        ->  Index Scan using user1 on friends f1  (cost=0.00..26059.79
> rows=2818347 width=139) (actual time=0.031..1246.668 rows=1917365
> loops=1)"
> "        ->  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
> (actual time=3356.615..14941.405 rows=130503729 loops=1)"
> "              ->  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
> (actual time=3356.611..4127.435 rows=1990160 loops=1)"
> "                    Sort Key: f2.ref_id"
> "                    Sort Method:  external merge  Disk: 49560kB"
> "                    ->  Seq Scan on friends f2  (cost=0.00..18143.18
> rows=2818347 width=8) (actual time=0.012..496.174 rows=2818347 loops=1)"
> "Total runtime: 43550.187 ms"
>
> As one can see it's much much slower and only uses one key, not both. I
> thought performance should be almost equal.

It only uses 1 key because it can't use 2 independent indexes to do an
in-order index traversal of the combination.  It's an interesting
question as to why it didn't use the ref_id index instead of sorting
though, for that half of the merge anti-join, are you sure that you ran
'analyze;' on the table after creating this set of indexes?

> I also wonder why it makes a difference when adding a "LIMIT" clause to
> the subselect in an EXISTS subselect. Shouldn't pgsql always stop after
> finding the a row? In mysql is makes no difference in speed, pgsql even
> get's slower when adding a LIMIT to the EXISTS subselect (I hoped it
> would get faster?!).

PG is smarter than you're giving it credit for.  It's not just going
through each row of table A and then doing a single-row lookup in table
B for that row.  You *force* it to do that when you put a 'limit 1'
inside the sub-select, hence the performance goes into the toilet.  You
can see that from the query plan.

> SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS
> f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id LIMIT 1) OFFSET
> 1000000 LIMIT 1
>
> "Limit  (cost=6389166.19..6389172.58 rows=1 width=139) (actual
> time=54540.356..54540.356 rows=1 loops=1)"
> "  ->  Seq Scan on friends f1  (cost=0.00..9003446.87 rows=1409174
> width=139) (actual time=0.511..54460.006 rows=1000001 loops=1)"
> "        Filter: (NOT (SubPlan 1))"
> "        SubPlan 1"
> "          ->  Limit  (cost=2.18..3.19 rows=1 width=0) (actual
> time=0.029..0.029 rows=0 loops=1832284)"
> "                ->  Bitmap Heap Scan on friends f2  (cost=2.18..3.19
> rows=1 width=0) (actual time=0.028..0.028 rows=0 loops=1832284)"
> "                      Recheck Cond: (($0 = ref_id) AND ($1 = user_id))"
> "                      ->  BitmapAnd  (cost=2.18..2.18 rows=1 width=0)
> (actual time=0.027..0.027 rows=0 loops=1832284)"
> "                            ->  Bitmap Index Scan on ref1
> (cost=0.00..1.09 rows=75 width=0) (actual time=0.011..0.011 rows=85
> loops=1832284)"
> "                                  Index Cond: ($0 = ref_id)"
> "                            ->  Bitmap Index Scan on user1
> (cost=0.00..1.09 rows=87 width=0) (actual time=0.011..0.011 rows=87
> loops=1737236)"
> "                                  Index Cond: ($1 = user_id)"
> "Total runtime: 54540.431 ms"

This says "Ok, we're going to step through each row of friends, and then
run a subplan on each one of those records".  That's a *horrible* way to
implement this query, since it's basically asking for you to test all
the records in the table.  If there was some selectivity to this (eg: a
WHERE clause for a specific user_id or something), it'd be alot more
sane to use this approach.  You'll note above that it does actually end
up using both of your indexes when it's called this way, combining them
using a BitmapAnd.  For one-off lookups with specific values (as you're
forcing to happen here), you can have independent indexes that both get
used.  That's what the hint you're talking about above was trying to
point out.

Note also that, if I'm right, this is essentially the same query plan
that MySQL used above, but with alot more specifics about what's
actually happening (it's not really any more complicated).

> As in my previous tests, this is only a testing environment: so all data
> is in memory, no disk activity involved at all, no swap etc.

Yeahhhh, system calls still aren't free.  I would recommend, if you care
about this query, bumping up your work_mem setting for it.  Right now,
PG is using an external sort (meaning- on-disk), but the data set
appears to only be like 50M (49560kB).  If you increased work_mem to,
say, 128MB (for this query, maybe or maybe not for the entire system),
it'd be able to do an in-memory sort (or maybe a hash or something else,
if it makes sense), which would be faster.

I'd probably rewrite this as a left-join too, to be honest, but based on
what I'm saying, that'd probably get the same query plan as you had
first anyway (the merge anti-join), so it's probably not necessary.  I'd
love to hear how PG performs with work_mem bumped up to something
decent...

    Thanks,

        Stephen

Вложения

В списке pgsql-performance по дате сообщения:

От: Kenneth Marshall
Дата:
Сообщение: Re: GiST index performance
От: Tom Lane
Дата:
Сообщение: Re: PG using index+filter instead only use index