Обсуждение: UNION versus SUB SELECT

Поиск
Список
Период
Сортировка

UNION versus SUB SELECT

От
Robert DiFalco
Дата:
I have found this:

SELECT c.* 
FROM contacts c
WHERE c.id IN ( SELECT p.contact_id FROM phone p WHERE p.addr = ? )
OR c.id IN (SELECT e.contact_id FROM email e WHERE e.addr = ? );

To have a worse plan than:

SELECT * FROM contacts where id IN (
( SELECT c.id FROM contacts c 
JOIN phone p ON c.id = p.contact_id AND p.addr = ? 
UNION
SELECT c.id FROM contacts c 
JOIN email e ON c.id = e.contact_id AND e.addr = ? );

Maybe this is no surprise. But after discovering this my question is this, is there another option I dont' know about that is logically the same that can perform even better than the UNION?

Re: UNION versus SUB SELECT

От
desmodemone
Дата:
Hi Robert, could you try with "exists" ?

SELECT c.* 
FROM contacts c
WHERE  exists  ( SELECT  1 FROM phone p WHERE p.addr =? and  p.contact_id=c.id )
OR exists (SELECT  1 FROM email e WHERE e.addr = ? and  e.contact_id=c.id );





2013/11/21 Robert DiFalco <robert.difalco@gmail.com>
I have found this:

SELECT c.* 
FROM contacts c
WHERE c.id IN ( SELECT p.contact_id FROM phone p WHERE p.addr = ? )
OR c.id IN (SELECT e.contact_id FROM email e WHERE e.addr = ? );

To have a worse plan than:

SELECT * FROM contacts where id IN (
( SELECT c.id FROM contacts c 
JOIN phone p ON c.id = p.contact_id AND p.addr = ? 
UNION
SELECT c.id FROM contacts c 
JOIN email e ON c.id = e.contact_id AND e.addr = ? );

Maybe this is no surprise. But after discovering this my question is this, is there another option I dont' know about that is logically the same that can perform even better than the UNION?

Re: UNION versus SUB SELECT

От
Robert DiFalco
Дата:
UNION and subselect both performed better than EXISTS for this particular case.


On Thu, Nov 21, 2013 at 12:31 PM, desmodemone <desmodemone@gmail.com> wrote:
Hi Robert, could you try with "exists" ?

SELECT c.* 
FROM contacts c
WHERE  exists  ( SELECT  1 FROM phone p WHERE p.addr =? and  p.contact_id=c.id )
OR exists (SELECT  1 FROM email e WHERE e.addr = ? and  e.contact_id=c.id );





2013/11/21 Robert DiFalco <robert.difalco@gmail.com>
I have found this:

SELECT c.* 
FROM contacts c
WHERE c.id IN ( SELECT p.contact_id FROM phone p WHERE p.addr = ? )
OR c.id IN (SELECT e.contact_id FROM email e WHERE e.addr = ? );

To have a worse plan than:

SELECT * FROM contacts where id IN (
( SELECT c.id FROM contacts c 
JOIN phone p ON c.id = p.contact_id AND p.addr = ? 
UNION
SELECT c.id FROM contacts c 
JOIN email e ON c.id = e.contact_id AND e.addr = ? );

Maybe this is no surprise. But after discovering this my question is this, is there another option I dont' know about that is logically the same that can perform even better than the UNION?


Re: UNION versus SUB SELECT

От
desmodemone
Дата:
Could you please attache the plan with explain buffers verbose?

thank you


2013/11/21 Robert DiFalco <robert.difalco@gmail.com>
UNION and subselect both performed better than EXISTS for this particular case.


On Thu, Nov 21, 2013 at 12:31 PM, desmodemone <desmodemone@gmail.com> wrote:
Hi Robert, could you try with "exists" ?

SELECT c.* 
FROM contacts c
WHERE  exists  ( SELECT  1 FROM phone p WHERE p.addr =? and  p.contact_id=c.id )
OR exists (SELECT  1 FROM email e WHERE e.addr = ? and  e.contact_id=c.id );





2013/11/21 Robert DiFalco <robert.difalco@gmail.com>
I have found this:

SELECT c.* 
FROM contacts c
WHERE c.id IN ( SELECT p.contact_id FROM phone p WHERE p.addr = ? )
OR c.id IN (SELECT e.contact_id FROM email e WHERE e.addr = ? );

To have a worse plan than:

SELECT * FROM contacts where id IN (
( SELECT c.id FROM contacts c 
JOIN phone p ON c.id = p.contact_id AND p.addr = ? 
UNION
SELECT c.id FROM contacts c 
JOIN email e ON c.id = e.contact_id AND e.addr = ? );

Maybe this is no surprise. But after discovering this my question is this, is there another option I dont' know about that is logically the same that can perform even better than the UNION?



Re: UNION versus SUB SELECT

От
Robert DiFalco
Дата:
Sorry I couldn't get buffers to work but here is the explain analyze verbose:

dft1fjfv106r48=> explain analyze verbose select c.*                                                                                                                    from contacts c                                                                                                                                                        where c.id IN (                                                                                                                                                            select p.contact_id                                                                                                                                                    from phone_numbers p                                                                                                                                                   where (p.national = 5038904993 and p.e164 = '+15038904993'))                                                                                                       or c.id IN (                                                                                                                                                               select e.contact_id                                                                                                                                                    from email_addresses e                                                                                                                                                 where e.email = 'robert.xxxxx@gmail.com')                                                                                                                          ;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=8.12..75.73 rows=1988 width=95) (actual time=0.410..0.410 rows=0 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 2849
   SubPlan 1
     ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=1)
           Output: p.contact_id
           Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
   SubPlan 2
     ->  Index Scan using idx_email_address on public.email_addresses e  (cost=0.06..4.06 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
           Output: e.contact_id
           Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
 Total runtime: 0.489 ms
(13 rows)

dft1fjfv106r48=> explain analyze verbose select c.*                                                                                                                    from contacts c                                                                                                                                                        where exists(                                                                                                                                                              select 1                                                                                                                                                               from phone_numbers p                                                                                                                                                   where (p.national = 5038904993 and p.e164 = '+15038904993') and p.contact_id = c.id)                                                                               or EXISTS(                                                                                                                                                                 select 1                                                                                                                                                               from email_addresses e                                                                                                                                                 where e.email = 'robert.xxxxx@gmail.com' and e.contact_id = c.id)                                                                                                  ;
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=0.00..21596.38 rows=1988 width=95) (actual time=0.479..0.479 rows=0 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))
   Rows Removed by Filter: 2849
   SubPlan 1
     ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
           Filter: (p.contact_id = c.id)
   SubPlan 2
     ->  Index Scan using idx_phone_address on public.phone_numbers p_1  (cost=0.06..4.06 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)
           Output: p_1.contact_id
           Index Cond: ((p_1."national" = 5038904993::bigint) AND ((p_1.e164)::text = '+15038904993'::text))
   SubPlan 3
     ->  Index Scan using idx_email_address on public.email_addresses e  (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
           Filter: (e.contact_id = c.id)
   SubPlan 4
     ->  Index Scan using idx_email_address on public.email_addresses e_1  (cost=0.06..4.06 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1)
           Output: e_1.contact_id
           Index Cond: ((e_1.email)::text = 'robert.xxxxx@gmail.com'::text)
 Total runtime: 0.559 ms
(21 rows)

dft1fjfv106r48=> explain analyze verbose select * from contacts where id IN (                                                                                          (select c.id                                                                                                                                                           from contacts c                                                                                                                                                        join phone_numbers p on c.id = p.contact_id and p.national = 5038904993 and p.e164 = '+15038904993')                                                                   union (select c.id                                                                                                                                                     from contacts c                                                                                                                                                        join email_addresses e on c.id = e.contact_id and e.email = 'robert.xxxxx@gmail.com'));                                                                                                                                                                                              QUERY PLAN                                                                                                               ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=16.31..24.39 rows=2 width=95) (actual time=0.060..0.060 rows=0 loops=1)
   Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
   ->  Unique  (cost=16.26..16.26 rows=2 width=8) (actual time=0.057..0.057 rows=0 loops=1)
         Output: c.id
         ->  Sort  (cost=16.26..16.26 rows=2 width=8) (actual time=0.055..0.055 rows=0 loops=1)
               Output: c.id
               Sort Key: c.id
               Sort Method: quicksort  Memory: 25kB
               ->  Append  (cost=0.11..16.25 rows=2 width=8) (actual time=0.034..0.034 rows=0 loops=1)
                     ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1)
                           Output: c.id
                           ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=1)
                                 Output: p.id, p.contact_id, p."national", p.e164, p.raw_number
                                 Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
                           ->  Index Only Scan using idx_contacts_pkey_owner on public.contacts c  (cost=0.06..4.06 rows=1 width=8) (never executed)
                                 Output: c.id, c.owner_id, c.user_id
                                 Index Cond: (c.id = p.contact_id)
                                 Heap Fetches: 0
                     ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
                           Output: c_1.id
                           ->  Index Scan using idx_email_address on public.email_addresses e  (cost=0.06..4.06 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1)
                                 Output: e.id, e.contact_id, e.email
                                 Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
                           ->  Index Only Scan using idx_contacts_pkey_owner on public.contacts c_1  (cost=0.06..4.06 rows=1 width=8) (never executed)
                                 Output: c_1.id, c_1.owner_id, c_1.user_id
                                 Index Cond: (c_1.id = e.contact_id)
                                 Heap Fetches: 0
   ->  Index Scan using idx_contacts_pkey_owner on public.contacts  (cost=0.06..4.06 rows=1 width=95) (never executed)
         Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
         Index Cond: (contacts.id = c.id)
 Total runtime: 0.332 ms
(31 rows)




On Thu, Nov 21, 2013 at 12:38 PM, desmodemone <desmodemone@gmail.com> wrote:
Could you please attache the plan with explain buffers verbose?

thank you


2013/11/21 Robert DiFalco <robert.difalco@gmail.com>
UNION and subselect both performed better than EXISTS for this particular case.


On Thu, Nov 21, 2013 at 12:31 PM, desmodemone <desmodemone@gmail.com> wrote:
Hi Robert, could you try with "exists" ?

SELECT c.* 
FROM contacts c
WHERE  exists  ( SELECT  1 FROM phone p WHERE p.addr =? and  p.contact_id=c.id )
OR exists (SELECT  1 FROM email e WHERE e.addr = ? and  e.contact_id=c.id );





2013/11/21 Robert DiFalco <robert.difalco@gmail.com>
I have found this:

SELECT c.* 
FROM contacts c
WHERE c.id IN ( SELECT p.contact_id FROM phone p WHERE p.addr = ? )
OR c.id IN (SELECT e.contact_id FROM email e WHERE e.addr = ? );

To have a worse plan than:

SELECT * FROM contacts where id IN (
( SELECT c.id FROM contacts c 
JOIN phone p ON c.id = p.contact_id AND p.addr = ? 
UNION
SELECT c.id FROM contacts c 
JOIN email e ON c.id = e.contact_id AND e.addr = ? );

Maybe this is no surprise. But after discovering this my question is this, is there another option I dont' know about that is logically the same that can perform even better than the UNION?




Re: UNION versus SUB SELECT

От
Robert DiFalco
Дата:
Hmmmm...I'm not sure why the buffers option didn't work for me, maybe the heroku psql is out of date. No, the query gets slower with a high load of data and runs pretty often.

I just created a small test dataset for this. When I have a larger one I will post new explain plans but the timings seem pretty consistent regardless of the results returns (usually only 2-200) even when there are millions of records in "contacts", "phone_numbers", and "email_addresses".

In this case doesn't the correlated query have to do more work and access more columns than the subselect approach?


On Thu, Nov 21, 2013 at 1:22 PM, Elliot <yields.falsehood@gmail.com> wrote:
On 2013-11-21 16:12, Robert DiFalco wrote:
Sorry I couldn't get buffers to work but here is the explain analyze verbose:

dft1fjfv106r48=> explain analyze verbose select c.*                                                                                                                    from contacts c                                                                                                                                                        where c.id IN (                                                                                                                                                            select p.contact_id                                                                                                                                                    from phone_numbers p                                                                                                                                                   where (p.national = 5038904993 and p.e164 = '+15038904993'))                                                                                                       or c.id IN (                                                                                                                                                               select e.contact_id                                                                                                                                                    from email_addresses e                                                                                                                                                 where e.email = 'robert.xxxxx@gmail.com')                                                                                                                          ;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=8.12..75.73 rows=1988 width=95) (actual time=0.410..0.410 rows=0 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 2849
   SubPlan 1
     ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=1)
           Output: p.contact_id
           Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
   SubPlan 2
     ->  Index Scan using idx_email_address on public.email_addresses e  (cost=0.06..4.06 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
           Output: e.contact_id
           Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
 Total runtime: 0.489 ms
(13 rows)

dft1fjfv106r48=> explain analyze verbose select c.*                                                                                                                    from contacts c                                                                                                                                                        where exists(                                                                                                                                                              select 1                                                                                                                                                               from phone_numbers p                                                                                                                                                   where (p.national = 5038904993 and p.e164 = '+15038904993') and p.contact_id = c.id)                                                                               or EXISTS(                                                                                                                                                                 select 1                                                                                                                                                               from email_addresses e                                                                                                                                                 where e.email = 'robert.xxxxx@gmail.com' and e.contact_id = c.id)                                                                                                  ;
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=0.00..21596.38 rows=1988 width=95) (actual time=0.479..0.479 rows=0 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))
   Rows Removed by Filter: 2849
   SubPlan 1
     ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
           Filter: (p.contact_id = c.id)
   SubPlan 2
     ->  Index Scan using idx_phone_address on public.phone_numbers p_1  (cost=0.06..4.06 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)
           Output: p_1.contact_id
           Index Cond: ((p_1."national" = 5038904993::bigint) AND ((p_1.e164)::text = '+15038904993'::text))
   SubPlan 3
     ->  Index Scan using idx_email_address on public.email_addresses e  (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
           Filter: (e.contact_id = c.id)
   SubPlan 4
     ->  Index Scan using idx_email_address on public.email_addresses e_1  (cost=0.06..4.06 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1)
           Output: e_1.contact_id
           Index Cond: ((e_1.email)::text = 'robert.xxxxx@gmail.com'::text)
 Total runtime: 0.559 ms
(21 rows)

dft1fjfv106r48=> explain analyze verbose select * from contacts where id IN (                                                                                          (select c.id                                                                                                                                                           from contacts c                                                                                                                                                        join phone_numbers p on c.id = p.contact_id and p.national = 5038904993 and p.e164 = '+15038904993')                                                                   union (select c.id                                                                                                                                                     from contacts c                                                                                                                                                        join email_addresses e on c.id = e.contact_id and e.email = 'robert.xxxxx@gmail.com'));                                                                                                                                                                                              QUERY PLAN                                                                                                               ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=16.31..24.39 rows=2 width=95) (actual time=0.060..0.060 rows=0 loops=1)
   Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
   ->  Unique  (cost=16.26..16.26 rows=2 width=8) (actual time=0.057..0.057 rows=0 loops=1)
         Output: c.id
         ->  Sort  (cost=16.26..16.26 rows=2 width=8) (actual time=0.055..0.055 rows=0 loops=1)
               Output: c.id
               Sort Key: c.id
               Sort Method: quicksort  Memory: 25kB
               ->  Append  (cost=0.11..16.25 rows=2 width=8) (actual time=0.034..0.034 rows=0 loops=1)
                     ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1)
                           Output: c.id
                           ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=1)
                                 Output: p.id, p.contact_id, p."national", p.e164, p.raw_number
                                 Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
                           ->  Index Only Scan using idx_contacts_pkey_owner on public.contacts c  (cost=0.06..4.06 rows=1 width=8) (never executed)
                                 Output: c.id, c.owner_id, c.user_id
                                 Index Cond: (c.id = p.contact_id)
                                 Heap Fetches: 0
                     ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
                           Output: c_1.id
                           ->  Index Scan using idx_email_address on public.email_addresses e  (cost=0.06..4.06 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1)
                                 Output: e.id, e.contact_id, e.email
                                 Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
                           ->  Index Only Scan using idx_contacts_pkey_owner on public.contacts c_1  (cost=0.06..4.06 rows=1 width=8) (never executed)
                                 Output: c_1.id, c_1.owner_id, c_1.user_id
                                 Index Cond: (c_1.id = e.contact_id)
                                 Heap Fetches: 0
   ->  Index Scan using idx_contacts_pkey_owner on public.contacts  (cost=0.06..4.06 rows=1 width=95) (never executed)
         Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
         Index Cond: (contacts.id = c.id)
 Total runtime: 0.332 ms
(31 rows)



The buffers option is 9.0+ and is used like "explain (analyze, verbose, buffers) select 1".
To your original question, the union output there runs slightly faster than the "in" approach, although this may not be a good example - your inputs don't return any data, so this might not be realistic - and those numbers are so low that the difference might just be noise.
Are you tuning a <0.5ms-runtime query? Or is this just curiosity? FWIW I tend to write queries like this using an exists check first, then if that's still not good enough (all things like proper indexing taken in to account) I'll try an in check, then finally a union if that's still not good enough.


Re: UNION versus SUB SELECT

От
bricklen
Дата:
On Thu, Nov 21, 2013 at 2:04 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
Hmmmm...I'm not sure why the buffers option didn't work for me, maybe the heroku psql is out of date.

Did you enclose it in brackets? Eg. "EXPLAIN (ANALYZE, BUFFERS) SELECT ..."

Re: UNION versus SUB SELECT

От
bricklen
Дата:
On Thu, Nov 21, 2013 at 2:58 PM, bricklen <bricklen@gmail.com> wrote:

Did you enclose it in brackets? Eg. "EXPLAIN (ANALYZE, BUFFERS) SELECT ..."

Never mind, I see it further down. My apologies.

Re: UNION versus SUB SELECT

От
Robert DiFalco
Дата:
No I didn't, thank you.  I missed the parens.


On Thu, Nov 21, 2013 at 2:58 PM, bricklen <bricklen@gmail.com> wrote:
On Thu, Nov 21, 2013 at 2:04 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
Hmmmm...I'm not sure why the buffers option didn't work for me, maybe the heroku psql is out of date.

Did you enclose it in brackets? Eg. "EXPLAIN (ANALYZE, BUFFERS) SELECT ..."

Re: UNION versus SUB SELECT

От
Merlin Moncure
Дата:
On Thu, Nov 21, 2013 at 2:31 PM, desmodemone <desmodemone@gmail.com> wrote:
> Hi Robert, could you try with "exists" ?
>
> SELECT c.*
> FROM contacts c
> WHERE  exists  ( SELECT  1 FROM phone p WHERE p.addr =? and
> p.contact_id=c.id )
> OR exists (SELECT  1 FROM email e WHERE e.addr = ? and  e.contact_id=c.id );

hm, how about:
SELECT c.*
FROM contacts c
WHERE  exists  (
  SELECT  1
  FROM phone p
  WHERE p.addr =? AND p.contact_id=c.id
  UNION ALL
  SELECT  1 FROM email e
  WHERE e.addr = ? AND e.contact_id=c.id
);

merlin


Re: UNION versus SUB SELECT

От
Robert DiFalco
Дата:
So far that one was the worst in terms of cost and time. Here are all the plans with buffers, more records, and results being returned. At this point I have good enough performance with my UNION approach but I'm just trying to learn now. WHY is the union approach the fastest? I would have expected the EXISTS or IN approaches to be faster or at least have the SAME cost? At this point I just want to understand.

dft1fjfv106r48=> explain (analyze, buffers, verbose) 
select * 
from contacts c 
where EXISTS(
   (select 1 from phone_numbers p where c.id = p.contact_id and p.national = 5038904993 and p.e164 = '+15038904993')
   union 
   (select 1 id = e.contact_id and e.email = 'robert.xxxx@gmail.com'));
                                                                                QUERY PLAN                                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=0.00..23238.90 rows=1425 width=95) (actual time=2.241..46.817 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: (SubPlan 1)
   Rows Removed by Filter: 2843
   Buffers: shared hit=11497
   SubPlan 1
     ->  Unique  (cost=8.13..8.13 rows=2 width=0) (actual time=0.015..0.015 rows=0 loops=2850)
           Output: (1)
           Buffers: shared hit=11440
           ->  Sort  (cost=8.13..8.13 rows=2 width=0) (actual time=0.013..0.013 rows=0 loops=2850)
                 Output: (1)
                 Sort Key: (1)
                 Sort Method: quicksort  Memory: 25kB
                 Buffers: shared hit=11440
                 ->  Append  (cost=0.06..8.13 rows=2 width=0) (actual time=0.009..0.009 rows=0 loops=2850)
                       Buffers: shared hit=11440
                       ->  Index Only Scan using idx_phone on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=2850)
                             Output: 1
                             Index Cond: ((p.contact_id = c.id) AND (p."national" = 5038904993::bigint) AND (p.e164 = '+15038904993'::text))
                             Heap Fetches: 11
                             Buffers: shared hit=5721
                       ->  Index Only Scan using idx_email_full on public.email_addresses e  (cost=0.06..4.06 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=2850)
                             Output: 1
                             Index Cond: ((e.contact_id = c.id) AND (e.email = 'robert.xxxx@gmail.com'::text))
                             Heap Fetches: 5
                             Buffers: shared hit=5719
 Total runtime: 46.897 ms
(27 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose) 
select * 
from contacts where id IN (
  (select c.id
  from contacts c join phone_numbers p on c.id = p.contact_id and p.national = 5038904993 and p.e164 = '+15038904993')
  union 
  (select c.id from contacts c join email_addresses e on c.id = e.contact_id and e.email = 'robert.xxxx@gmail.com'));
                                                                                                              QUERY PLAN                                                                                                               
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=29.38..53.74 rows=6 width=95) (actual time=0.356..0.418 rows=7 loops=1)
   Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
   Buffers: shared hit=87
   ->  HashAggregate  (cost=29.32..29.34 rows=6 width=8) (actual time=0.347..0.354 rows=7 loops=1)
         Output: c.id
         Buffers: shared hit=66
         ->  Append  (cost=0.11..29.32 rows=6 width=8) (actual time=0.047..0.316 rows=16 loops=1)
               Buffers: shared hit=66
               ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8) (actual time=0.045..0.169 rows=11 loops=1)
                     Output: c.id
                     Buffers: shared hit=43
                     ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual time=0.027..0.047 rows=11 loops=1)
                           Output: p.id, p.contact_id, p."national", p.e164, p.raw_number
                           Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
                           Buffers: shared hit=9
                     ->  Index Only Scan using idx_contacts_pkey_owner on public.contacts c  (cost=0.06..4.06 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=11)
                           Output: c.id, c.owner_id, c.user_id
                           Index Cond: (c.id = p.contact_id)
                           Heap Fetches: 11
                           Buffers: shared hit=34
               ->  Nested Loop  (cost=2.12..21.17 rows=5 width=8) (actual time=0.057..0.114 rows=5 loops=1)
                     Output: c_1.id
                     Buffers: shared hit=23
                     ->  Bitmap Heap Scan on public.email_addresses e  (cost=2.06..8.85 rows=5 width=8) (actual time=0.044..0.055 rows=5 loops=1)
                           Output: e.id, e.contact_id, e.email
                           Recheck Cond: ((e.email)::text = 'robert.xxxx@gmail.com'::text)
                           Buffers: shared hit=7
                           ->  Bitmap Index Scan on idx_email_address  (cost=0.00..2.06 rows=5 width=0) (actual time=0.031..0.031 rows=6 loops=1)
                                 Index Cond: ((e.email)::text = 'robert.xxxx@gmail.com'::text)
                                 Buffers: shared hit=2
                     ->  Index Only Scan using idx_contacts_pkey_owner on public.contacts c_1  (cost=0.06..2.46 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=5)
                           Output: c_1.id, c_1.owner_id, c_1.user_id
                           Index Cond: (c_1.id = e.contact_id)
                           Heap Fetches: 5
                           Buffers: shared hit=16
   ->  Index Scan using idx_contacts_pkey_owner on public.contacts  (cost=0.06..4.06 rows=1 width=95) (actual time=0.003..0.004 rows=1 loops=7)
         Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
         Index Cond: (contacts.id = c.id)
         Buffers: shared hit=21
 Total runtime: 0.535 ms
(40 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose) 
select c.*
from contacts c
where EXISTS(select 1 from phone_numbers p where (p.national = 5038904993 and p.e164 = '+15038904993') and p.contact_id = c.id)
   or EXISTS(select 1 from email_addresses e where e.email = 'robert.xxxx@gmail.com' and e.contact_id = c.id);
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=0.00..23213.25 rows=2138 width=95) (actual time=0.209..1.290 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))
   Rows Removed by Filter: 2843
   Buffers: shared hit=73
   SubPlan 1
     ->  Index Only Scan using idx_phone on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((p.contact_id = c.id) AND (p."national" = 5038904993::bigint) AND (p.e164 = '+15038904993'::text))
           Heap Fetches: 0
   SubPlan 2
     ->  Index Scan using idx_phone_address on public.phone_numbers p_1  (cost=0.06..4.06 rows=1 width=8) (actual time=0.033..0.056 rows=11 loops=1)
           Output: p_1.contact_id
           Index Cond: ((p_1."national" = 5038904994::bigint) AND ((p_1.e164)::text = '+15038904993'::text))
           Buffers: shared hit=9
   SubPlan 3
     ->  Index Only Scan using idx_email_full on public.email_addresses e  (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((e.contact_id = c.id) AND (e.email = 'robert.xxxx@gmail.com'::text))
           Heap Fetches: 0
   SubPlan 4
     ->  Bitmap Heap Scan on public.email_addresses e_1  (cost=2.06..8.85 rows=5 width=8) (actual time=0.040..0.050 rows=5 loops=1)
           Output: e_1.contact_id
           Recheck Cond: ((e_1.email)::text = 'robert.xxxx@gmail.com'::text)
           Buffers: shared hit=7
           ->  Bitmap Index Scan on idx_email_address  (cost=0.00..2.06 rows=5 width=0) (actual time=0.030..0.030 rows=6 loops=1)
                 Index Cond: ((e_1.email)::text = 'robert.xxxx@gmail.com'::text)
                 Buffers: shared hit=2
 Total runtime: 1.395 ms
(27 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose) 
select c.*
from contacts c
where c.id IN (
    select p.contact_id from phone_numbers p where (p.national = 5038904993 and p.e164 = '+15038904993'))
or c.id IN (
    select e.contact_id from email_addresses e where e.email = 'robert.xxxx@gmail.com');
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=12.92..81.32 rows=2138 width=95) (actual time=0.208..1.283 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 2843
   Buffers: shared hit=73
   SubPlan 1
     ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual time=0.032..0.054 rows=11 loops=1)
           Output: p.contact_id
           Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
           Buffers: shared hit=9
   SubPlan 2
     ->  Bitmap Heap Scan on public.email_addresses e  (cost=2.06..8.85 rows=5 width=8) (actual time=0.040..0.049 rows=5 loops=1)
           Output: e.contact_id
           Recheck Cond: ((e.email)::text = 'robert.xxxx@gmail.com'::text)
           Buffers: shared hit=7
           ->  Bitmap Index Scan on idx_email_address  (cost=0.00..2.06 rows=5 width=0) (actual time=0.031..0.031 rows=6 loops=1)
                 Index Cond: ((e.email)::text = 'robert.xxxx@gmail.com'::text)
                 Buffers: shared hit=2
 Total runtime: 1.371 ms
(19 rows)

dft1fjfv106r48=> 


On Fri, Nov 22, 2013 at 7:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, Nov 21, 2013 at 2:31 PM, desmodemone <desmodemone@gmail.com> wrote:
> Hi Robert, could you try with "exists" ?
>
> SELECT c.*
> FROM contacts c
> WHERE  exists  ( SELECT  1 FROM phone p WHERE p.addr =? and
> p.contact_id=c.id )
> OR exists (SELECT  1 FROM email e WHERE e.addr = ? and  e.contact_id=c.id );

hm, how about:
SELECT c.*
FROM contacts c
WHERE  exists  (
  SELECT  1
  FROM phone p
  WHERE p.addr =? AND p.contact_id=c.id
  UNION ALL
  SELECT  1 FROM email e
  WHERE e.addr = ? AND e.contact_id=c.id
);

merlin