Обсуждение: Subquery WHERE IN or WHERE EXISTS faster?

Поиск

Список

Период

Сортировка

Subquery WHERE IN or WHERE EXISTS faster?

От

Ulrich

Дата:

28 июня 2008 г., 15:22:49

Hi,
I am new to SQL and have two tables..., "processor" and
"users_processors". The first table contains Processors:

CREATE TABLE processor (
id SERIAL,
speed varchar(50) NOT NULL,
type int2 NOT NULL,
PRIMARY KEY (id)
);
CREATE UNIQUE INDEX processor_speed_index ON processors(lower(speed));

Example:
1 "100MHz" 0
2 "36GHz" 7
...


The second Table defines which processor one user has got:

CREATE TABLE users_processors (
userid int REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE,
processorid int REFERENCES processors ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY(userid, processorid)
);
CREATE INDEX users_processors_processorid_index ON
users_processors(processorid);
CREATE INDEX users_processors_processorid_index ON
users_processors(processorid);

Example:
1 2
1 3
1 4
...
2 1
2 2
...
(The user "1" own processors 2,3,4 and the user 2 owns processors 1,2)


__________________________________________________________

Now, I would like to list all processors user "1" has got. The following
query does that:
SELECT speed FROM processors WHERE id IN (SELECT processorid FROM
users_processors WHERE userid=1) ORDER BY speed ASC LIMIT 10 OFFSET 2;

This would return 10 processors beginning with number 3. I have read,
that this query is slow and can be faster. I analyzed it:
Limit  (cost=22.90..22.90 rows=1 width=118) (actual time=0.344..0.349
rows=9 loops=1)
   ->  Sort  (cost=22.90..22.90 rows=2 width=118) (actual
time=0.341..0.341 rows=11 loops=1)
         Sort Key: processors.speed
         Sort Method:  quicksort  Memory: 17kB
         ->  Nested Loop  (cost=15.03..22.89 rows=2 width=118) (actual
time=0.225..0.289 rows=11 loops=1)
               ->  HashAggregate  (cost=15.03..15.05 rows=2 width=4)
(actual time=0.207..0.214 rows=11 loops=1)
                     ->  Bitmap Heap Scan on users_processors
(cost=4.34..15.01 rows=11 width=4) (actual time=0.175..0.179 rows=11
loops=1)
                           Recheck Cond: (userid = 1)
                           ->  Bitmap Index Scan on
users_processors_userid_index  (cost=0.00..4.33 rows=11 width=0) (actual
time=0.159..0.159 rows=12 loops=1)
                                 Index Cond: (userid = 1)
               ->  Index Scan using processors_pkey on processors
(cost=0.00..3.90 rows=1 width=122) (actual time=0.004..0.004 rows=1
loops=11)
                     Index Cond: (processors.id =
users_processors.processorid)
 Total runtime: 0.478 ms
(13 rows)


__________________________________________________________


People say that this query is faster:
SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM
users_processors WHERE userid=1 AND processorid=processors.id) ORDER BY
speed ASC LIMIT 10 OFFSET 2;

Analyze returns:
 Limit  (cost=4404.52..4404.55 rows=10 width=118) (actual
time=0.179..0.184 rows=9 loops=1)
   ->  Sort  (cost=4404.52..4405.18 rows=265 width=118) (actual
time=0.176..0.177 rows=11 loops=1)
         Sort Key: processors.speed
         Sort Method:  quicksort  Memory: 17kB
         ->  Seq Scan on processors (cost=0.00..4398.44 rows=265
width=118) (actual time=0.056..0.118 rows=11 loops=1)
               Filter: (subplan)
               SubPlan
                 ->  Index Scan using users_processors_pkey on
users_processors  (cost=0.00..8.27 rows=1 width=0) (actual
time=0.006..0.006 rows=1 loops=11)
                       Index Cond: ((userid = 1) AND (processorid = $0))
 Total runtime: 0.267 ms
(10 rows)




The second query is faster, but I have only used a very small table with
less than 20 items. In real-world I will have tables with thousands of
entries. I wonder if the second query is also faster in cases where I
have big tables, because it does a "Seq Scan", for me this looks like a
complete table scan. This seams reasonable if we look at the query I do
not expect that it is possible to use an INDEX for the second query. So,
is it slower?

Which query would you use, the first or the second one?


I would also like to know the total number of processors one user has
got. I would use one of those queries and replace the "SELECT speed"
with "SELECT count(*)" and remove the LIMIT and OFFSET. Is this good? I
have read that count(*) is slow.

Kind regards
Ulrich

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Tom Lane

Дата:

28 июня 2008 г., 15:53:51

Ulrich <ulrich.mierendorff@gmx.net> writes:
> People say that [EXISTS is faster]

People who say that are not reliable authorities, at least as far as
Postgres is concerned.  But it is always a bad idea to extrapolate
results on toy tables to large tables --- quite aside from measurement
noise and caching issues, the planner might pick a different plan when
faced with large tables.  Load up a realistic amount of data and then
see what you get.

            regards, tom lane

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Ulrich

Дата:

28 июня 2008 г., 22:07:42

Hi,
I have added a bit of dummy Data, 100000 processors, 10000 users, each
user got around 12 processors.

I have tested both queries. First of all, I was surprised that it is
that fast :) Here are the results:


EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT
processorid FROM users_processors WHERE userid=4040) ORDER BY speed ASC
LIMIT 10 OFFSET 1;

Limit  (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340
rows=10 loops=1)
   ->  Sort  (cost=113.73..113.75 rows=8 width=5) (actual
time=0.332..0.333 rows=11 loops=1)
         Sort Key: processors.speed
         Sort Method:  quicksort  Memory: 17kB
         ->  Nested Loop  (cost=47.22..113.61 rows=8 width=5) (actual
time=0.171..0.271 rows=13 loops=1)
               ->  HashAggregate  (cost=47.22..47.30 rows=8 width=4)
(actual time=0.148..0.154 rows=13 loops=1)
                     ->  Bitmap Heap Scan on users_processors
(cost=4.36..47.19 rows=12 width=4) (actual time=0.074..0.117 rows=13
loops=1)
                           Recheck Cond: (userid = 4040)
                           ->  Bitmap Index Scan on
users_processors_userid_index  (cost=0.00..4.35 rows=12 width=0) (actual
time=0.056..0.056 rows=13 loops=1)
                                 Index Cond: (userid = 4040)
               ->  Index Scan using processors_pkey on processors
(cost=0.00..8.28 rows=1 width=9) (actual time=0.006..0.007 rows=1 loops=13)
                     Index Cond: (processors.id =
users_processors.processorid)
 Total runtime: 0.471 ms
(13 rows)

___________

EXPLAIN ANALYZE SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM
users_processors WHERE userid=4040 AND processorid=processors.id) ORDER
BY speed ASC LIMIT 10 OFFSET 1;

 Limit  (cost=831413.86..831413.89 rows=10 width=5) (actual
time=762.475..762.482 rows=10 loops=1)
   ->  Sort  (cost=831413.86..831538.86 rows=50000 width=5) (actual
time=762.471..762.473 rows=11 loops=1)
         Sort Key: processors.speed
         Sort Method:  quicksort  Memory: 17kB
         ->  Seq Scan on processors  (cost=0.00..830299.00 rows=50000
width=5) (actual time=313.591..762.411 rows=13 loops=1)
               Filter: (subplan)
               SubPlan
                 ->  Index Scan using users_processors_pkey on
users_processors  (cost=0.00..8.29 rows=1 width=0) (actual
time=0.006..0.006 rows=0 loops=100000)
                       Index Cond: ((userid = 4040) AND (processorid = $0))
 Total runtime: 762.579 ms
(10 rows)




As you can see the second query is much slower. First I thought "Just a
difference of 0.3ms?", but then I realized that it was 762ms not 0.762 ;-).
Both queries return the same result, so I will use #1 and count(*) takes
just 0.478ms if I use query #1.

Kind Regards,
Ulrich

Tom Lane wrote:
> Ulrich <ulrich.mierendorff@gmx.net> writes:
>
>> People say that [EXISTS is faster]
>>
>
> People who say that are not reliable authorities, at least as far as
> Postgres is concerned.  But it is always a bad idea to extrapolate
> results on toy tables to large tables --- quite aside from measurement
> noise and caching issues, the planner might pick a different plan when
> faced with large tables.  Load up a realistic amount of data and then
> see what you get.
>
>             regards, tom lane
>
>

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Gregory Stark

Дата:

28 июня 2008 г., 23:01:17

"Ulrich" <ulrich.mierendorff@gmx.net> writes:

> EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT processorid
> FROM users_processors WHERE userid=4040) ORDER BY speed ASC LIMIT 10 OFFSET 1;
>
> Limit  (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340 rows=10 loops=1)
>   ->  Sort  (cost=113.73..113.75 rows=8 width=5) (actual time=0.332..0.333 rows=11 loops=1)

                                                                                  ^^

>         Sort Key: processors.speed
>         Sort Method:  quicksort  Memory: 17kB
>         ->  Nested Loop  (cost=47.22..113.61 rows=8 width=5) (actual time=0.171..0.271 rows=13 loops=1)
>               ->  HashAggregate  (cost=47.22..47.30 rows=8 width=4) (actual time=0.148..0.154 rows=13 loops=1)
>                     ->  Bitmap Heap Scan on users_processors  (cost=4.36..47.19 rows=12 width=4) (actual
time=0.074..0.117rows=13 loops=1) 


         ^^ 

>                                 Index Cond: (userid = 4040)
>               ->  Index Scan using processors_pkey on processors (cost=0.00..8.28 rows=1 width=9) (actual
time=0.006..0.007rows=1 loops=13) 
>                     Index Cond: (processors.id = users_processors.processorid)


It looks to me like you have some processors which appear in
"users_processors" but not in "processors". I don't know your data model but
that sounds like broken referential integrity  to me.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Ulrich

Дата:

29 июня 2008 г., 10:15:37

Hi,
Yes that looks strange. But it is not possible that I have processors in
"users_processors" which do not appear in "processors", because
"users_processors" contains foreign keys to "processors".

If I remove the LIMIT 10 OFFSET 1 the line "Sort (cost=.... rows=11.."
disappears and the query return 13 correct processors from "processors".
Then, I have tested different values for OFFSET. If I set Offset to "2"
and LIMIT=10 the line is:
       Sort  (cost=113.73..113.75 rows=8 width=5) (actual
time=0.322..0.330 rows=12 loops=1)
If I set Offset to "3" and LIMIT=10 it is
       Sort  (cost=113.73..113.75 rows=8 width=5) (actual
time=0.321..0.328 rows=13 loops=1)

It looks like if this "row" is something like min(max_rows=13,
LIMIT+OFFSET). But I do not completely understand the Syntax... ;-)

Kind regards
Ulrich

Gregory Stark wrote:
> "Ulrich" <ulrich.mierendorff@gmx.net> writes:
>
>
>> EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT processorid
>> FROM users_processors WHERE userid=4040) ORDER BY speed ASC LIMIT 10 OFFSET 1;
>>
>> Limit  (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340 rows=10 loops=1)
>>   ->  Sort  (cost=113.73..113.75 rows=8 width=5) (actual time=0.332..0.333 rows=11 loops=1)
>>
>
>                                                                                   ^^
>
>
>>         Sort Key: processors.speed
>>         Sort Method:  quicksort  Memory: 17kB
>>         ->  Nested Loop  (cost=47.22..113.61 rows=8 width=5) (actual time=0.171..0.271 rows=13 loops=1)
>>               ->  HashAggregate  (cost=47.22..47.30 rows=8 width=4) (actual time=0.148..0.154 rows=13 loops=1)
>>                     ->  Bitmap Heap Scan on users_processors  (cost=4.36..47.19 rows=12 width=4) (actual
time=0.074..0.117rows=13 loops=1) 
>>
>
>
           ^^ 
>
>
>>                                 Index Cond: (userid = 4040)
>>               ->  Index Scan using processors_pkey on processors (cost=0.00..8.28 rows=1 width=9) (actual
time=0.006..0.007rows=1 loops=13) 
>>                     Index Cond: (processors.id = users_processors.processorid)
>>
>
>
> It looks to me like you have some processors which appear in
> "users_processors" but not in "processors". I don't know your data model but
> that sounds like broken referential integrity  to me.
>
>

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Gregory Stark

Дата:

29 июня 2008 г., 12:09:06

"Ulrich" <ulrich.mierendorff@gmx.net> writes:

> Hi,
> Yes that looks strange. But it is not possible that I have processors in
> "users_processors" which do not appear in "processors", because
> "users_processors" contains foreign keys to "processors".
>
> If I remove the LIMIT 10 OFFSET 1 the line "Sort (cost=.... rows=11.."
> disappears and the query return 13 correct processors from "processors".

Oh, er, my bad. That makes perfect sense. The "actual" numbers can be affected
by what records are actually requested. The LIMIT prevents the records beyond
11 from ever being requested even though they exist.

While the bitmap heap scan has to fetch all the records even though they don't
all get used, the nested loop only fetches the records as requested.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

"Jaime Casanova"

Дата:

30 июня 2008 г., 04:34:22

On Sat, Jun 28, 2008 at 10:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ulrich <ulrich.mierendorff@gmx.net> writes:
>> People say that [EXISTS is faster]
>
> People who say that are not reliable authorities, at least as far as
> Postgres is concerned.  But it is always a bad idea to extrapolate
> results on toy tables to large tables --- quite aside from measurement
> noise and caching issues, the planner might pick a different plan when
> faced with large tables.  Load up a realistic amount of data and then
> see what you get.
>

i've made some queries run faster using EXISTS instead of large IN
clauses... actually, it was NOT EXISTS replacing a NOT IN

while i'm not telling EXISTS is better i actually know in some cases is better

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Guayaquil - Ecuador
Cel. (593) 87171157

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Tom Lane

Дата:

30 июня 2008 г., 04:49:00

"Jaime Casanova" <jcasanov@systemguards.com.ec> writes:
> i've made some queries run faster using EXISTS instead of large IN
> clauses... actually, it was NOT EXISTS replacing a NOT IN

That's just about entirely unrelated ...

            regards, tom lane

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Rusty Conover

Дата:

30 июня 2008 г., 06:50:37


On Jun 28, 2008, at 4:07 PM, Ulrich wrote:

> Hi,
> I have added a bit of dummy Data, 100000 processors, 10000 users,
> each user got around 12 processors.
>
> I have tested both queries. First of all, I was surprised that it is
> that fast :) Here are the results:
>
>
> EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT
> processorid FROM users_processors WHERE userid=4040) ORDER BY speed
> ASC LIMIT 10 OFFSET 1;
>
> Limit  (cost=113.73..113.75 rows=7 width=5) (actual
> time=0.335..0.340 rows=10 loops=1)
>  ->  Sort  (cost=113.73..113.75 rows=8 width=5) (actual
> time=0.332..0.333 rows=11 loops=1)
>        Sort Key: processors.speed
>        Sort Method:  quicksort  Memory: 17kB
>        ->  Nested Loop  (cost=47.22..113.61 rows=8 width=5) (actual
> time=0.171..0.271 rows=13 loops=1)
>              ->  HashAggregate  (cost=47.22..47.30 rows=8 width=4)
> (actual time=0.148..0.154 rows=13 loops=1)
>                    ->  Bitmap Heap Scan on users_processors
> (cost=4.36..47.19 rows=12 width=4) (actual time=0.074..0.117 rows=13
> loops=1)
>                          Recheck Cond: (userid = 4040)
>                          ->  Bitmap Index Scan on
> users_processors_userid_index  (cost=0.00..4.35 rows=12 width=0)
> (actual time=0.056..0.056 rows=13 loops=1)
>                                Index Cond: (userid = 4040)
>              ->  Index Scan using processors_pkey on processors
> (cost=0.00..8.28 rows=1 width=9) (actual time=0.006..0.007 rows=1
> loops=13)
>                    Index Cond: (processors.id =
> users_processors.processorid)
> Total runtime: 0.471 ms
> (13 rows)
>
> ___________
>
> EXPLAIN ANALYZE SELECT speed FROM processors WHERE EXISTS (SELECT 1
> FROM users_processors WHERE userid=4040 AND
> processorid=processors.id) ORDER BY speed ASC LIMIT 10 OFFSET 1;
>
> Limit  (cost=831413.86..831413.89 rows=10 width=5) (actual
> time=762.475..762.482 rows=10 loops=1)
>  ->  Sort  (cost=831413.86..831538.86 rows=50000 width=5) (actual
> time=762.471..762.473 rows=11 loops=1)
>        Sort Key: processors.speed
>        Sort Method:  quicksort  Memory: 17kB
>        ->  Seq Scan on processors  (cost=0.00..830299.00 rows=50000
> width=5) (actual time=313.591..762.411 rows=13 loops=1)
>              Filter: (subplan)
>              SubPlan
>                ->  Index Scan using users_processors_pkey on
> users_processors  (cost=0.00..8.29 rows=1 width=0) (actual
> time=0.006..0.006 rows=0 loops=100000)
>                      Index Cond: ((userid = 4040) AND (processorid =
> $0))
> Total runtime: 762.579 ms
> (10 rows)
>
>
>
>
> As you can see the second query is much slower. First I thought
> "Just a difference of 0.3ms?", but then I realized that it was 762ms
> not 0.762 ;-).
> Both queries return the same result, so I will use #1 and count(*)
> takes just 0.478ms if I use query #1.
>


This is what I've found with tables ranging in the millions of rows.

Using IN is better when you've got lots of rows to check against the
IN set and the IN set may be large and possibly complicated to
retrieve (i.e. lots of joins, or expensive functions).

Postgres will normally build a hash table of the IN set and just
search that hash table. It's especially fast if the entire hash table
that is built can fit into RAM.  The cpu/io cost of building the IN
set can be quite large because it needs to fetch every tuple to hash
it, but this can be faster then searching tuple by tuple through
possibly many indexes and tables like EXISTS does.  I like to increase
work_mem a lot (512mb and up) if I know I'm going to be doing a lot of
matches against a large IN set of rows because I'd prefer for that
hash table to never to be written to disk.

EXISTS is better when you're doing fewer matches because it will pull
the rows out one at a time from its query possibly using indexes, its
main advantage is that it doesn't pull all of the tuples before it
starts processing matches.

So in summary both are good to know how to use, but choosing which one
to use can really depend on your data set and resources.

Cheers,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Ulrich

Дата:

30 июня 2008 г., 07:29:16

I think it will be fast, because the "IN set", which is the result of
"SELECT processorid FROM users_processors WHERE userid=4040", is limited
to a maximum of ~500 processors which is not very big. Increasing
Postgres' RAM would be difficult for me, because I am only running a
very small server with 256MB RAM and the webserver also likes to use
some RAM.

Does Postgre cache the HASH-Table for later use? For example when the
user reloads the website.

Kind regards
Ulrich

Rusty Conover wrote:
> This is what I've found with tables ranging in the millions of rows.
>
> Using IN is better when you've got lots of rows to check against the
> IN set and the IN set may be large and possibly complicated to
> retrieve (i.e. lots of joins, or expensive functions).
>
> Postgres will normally build a hash table of the IN set and just
> search that hash table. It's especially fast if the entire hash table
> that is built can fit into RAM.  The cpu/io cost of building the IN
> set can be quite large because it needs to fetch every tuple to hash
> it, but this can be faster then searching tuple by tuple through
> possibly many indexes and tables like EXISTS does.  I like to increase
> work_mem a lot (512mb and up) if I know I'm going to be doing a lot of
> matches against a large IN set of rows because I'd prefer for that
> hash table to never to be written to disk.
>
> EXISTS is better when you're doing fewer matches because it will pull
> the rows out one at a time from its query possibly using indexes, its
> main advantage is that it doesn't pull all of the tuples before it
> starts processing matches.
>
> So in summary both are good to know how to use, but choosing which one
> to use can really depend on your data set and resources.
>
> Cheers,
>
> Rusty
> --
> Rusty Conover
> InfoGears Inc.
> http://www.infogears.com
>

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

Rusty Conover

Дата:

30 июня 2008 г., 07:44:52

On Jun 30, 2008, at 1:29 AM, Ulrich wrote:

> I think it will be fast, because the "IN set", which is the result
> of "SELECT processorid FROM users_processors WHERE userid=4040", is
> limited to a maximum of ~500 processors which is not very big.
> Increasing Postgres' RAM would be difficult for me, because I am
> only running a very small server with 256MB RAM and the webserver
> also likes to use some RAM.
>
> Does Postgre cache the HASH-Table for later use? For example when
> the user reloads the website.
>

No the hash table only lives as long as the query is being executed.
If you're looking for generic caching, I'd suggest memcached may be
able to fill your needs.

Cheers,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

"Sergio Gabriel Rodriguez"

Дата:

05 июля 2008 г., 12:02:25

Hi Ulrich, do you try with

SELECT p.speed FROM processor p
           INNER JOIN users_processors up ON p.id=up.processorid
           AND up.userid=1
?
Or your question is only about IN and EXIST?

regards,

Sergio Gabriel Rodriguez
Corrientes - Argentina
http://www.3trex.com.ar

On Mon, Jun 30, 2008 at 4:44 AM, Rusty Conover <rconover@infogears.com> wrote:
>
> On Jun 30, 2008, at 1:29 AM, Ulrich wrote:
>
>> I think it will be fast, because the "IN set", which is the result of
>> "SELECT processorid FROM users_processors WHERE userid=4040", is limited to
>> a maximum of ~500 processors which is not very big. Increasing Postgres' RAM
>> would be difficult for me, because I am only running a very small server
>> with 256MB RAM and the webserver also likes to use some RAM.
>>
>> Does Postgre cache the HASH-Table for later use? For example when the user
>> reloads the website.
>>
>
> No the hash table only lives as long as the query is being executed.  If
> you're looking for generic caching, I'd suggest memcached may be able to
> fill your needs.
>
> Cheers,
>
> Rusty
> --
> Rusty Conover
> InfoGears Inc.
> http://www.infogears.com
>
>
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Subquery WHERE IN or WHERE EXISTS faster?

От

"Sergio Gabriel Rodriguez"

Дата:

05 июля 2008 г., 12:14:13

Hi Ulrich, do you try with

SELECT p.speed FROM processor p
           INNER JOIN users_processors up ON p.id=up.processorid
           AND up.userid=1
?
Or your question is only about IN and EXIST?

regards,

Sergio Gabriel Rodriguez
Corrientes - Argentina
http://www.3trex.com.ar

On Sat, Jun 28, 2008 at 7:07 PM, Ulrich <ulrich.mierendorff@gmx.net> wrote:
> Hi,
> I have added a bit of dummy Data, 100000 processors, 10000 users, each user
> got around 12 processors.
>
> I have tested both queries. First of all, I was surprised that it is that
> fast :) Here are the results:
>
>
> EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT processorid
> FROM users_processors WHERE userid=4040) ORDER BY speed ASC LIMIT 10 OFFSET
> 1;
>
> Limit  (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340
> rows=10 loops=1)
>  ->  Sort  (cost=113.73..113.75 rows=8 width=5) (actual time=0.332..0.333
> rows=11 loops=1)
>        Sort Key: processors.speed
>        Sort Method:  quicksort  Memory: 17kB
>        ->  Nested Loop  (cost=47.22..113.61 rows=8 width=5) (actual
> time=0.171..0.271 rows=13 loops=1)
>              ->  HashAggregate  (cost=47.22..47.30 rows=8 width=4) (actual
> time=0.148..0.154 rows=13 loops=1)
>                    ->  Bitmap Heap Scan on users_processors
>  (cost=4.36..47.19 rows=12 width=4) (actual time=0.074..0.117 rows=13
> loops=1)
>                          Recheck Cond: (userid = 4040)
>                          ->  Bitmap Index Scan on
> users_processors_userid_index  (cost=0.00..4.35 rows=12 width=0) (actual
> time=0.056..0.056 rows=13 loops=1)
>                                Index Cond: (userid = 4040)
>              ->  Index Scan using processors_pkey on processors
>  (cost=0.00..8.28 rows=1 width=9) (actual time=0.006..0.007 rows=1 loops=13)
>                    Index Cond: (processors.id =
> users_processors.processorid)
> Total runtime: 0.471 ms
> (13 rows)
>
> ___________
>
> EXPLAIN ANALYZE SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM
> users_processors WHERE userid=4040 AND processorid=processors.id) ORDER BY
> speed ASC LIMIT 10 OFFSET 1;
>
> Limit  (cost=831413.86..831413.89 rows=10 width=5) (actual
> time=762.475..762.482 rows=10 loops=1)
>  ->  Sort  (cost=831413.86..831538.86 rows=50000 width=5) (actual
> time=762.471..762.473 rows=11 loops=1)
>        Sort Key: processors.speed
>        Sort Method:  quicksort  Memory: 17kB
>        ->  Seq Scan on processors  (cost=0.00..830299.00 rows=50000 width=5)
> (actual time=313.591..762.411 rows=13 loops=1)
>              Filter: (subplan)
>              SubPlan
>                ->  Index Scan using users_processors_pkey on
> users_processors  (cost=0.00..8.29 rows=1 width=0) (actual time=0.006..0.006
> rows=0 loops=100000)
>                      Index Cond: ((userid = 4040) AND (processorid = $0))
> Total runtime: 762.579 ms
> (10 rows)
>
>
>
>
> As you can see the second query is much slower. First I thought "Just a
> difference of 0.3ms?", but then I realized that it was 762ms not 0.762 ;-).
> Both queries return the same result, so I will use #1 and count(*) takes
> just 0.478ms if I use query #1.
>
> Kind Regards,
> Ulrich
>
> Tom Lane wrote:
>>
>> Ulrich <ulrich.mierendorff@gmx.net> writes:
>>
>>>
>>> People say that [EXISTS is faster]
>>>
>>
>> People who say that are not reliable authorities, at least as far as
>> Postgres is concerned.  But it is always a bad idea to extrapolate
>> results on toy tables to large tables --- quite aside from measurement
>> noise and caching issues, the planner might pick a different plan when
>> faced with large tables.  Load up a realistic amount of data and then
>> see what you get.
>>
>>                        regards, tom lane
>>
>>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Subquery WHERE IN or WHERE EXISTS faster?