Re: UNION versus SUB SELECT

Поиск

Список

Период

Сортировка

От	Robert DiFalco
Тема	Re: UNION versus SUB SELECT
Дата	22 ноября 2013 г. 01:04:10
Msg-id	CAAXGW-wz8Xu8TcxwFpopct2=L7MHU48Zfo__epcUE7js5bkfSw@mail.gmail.com обсуждение исходный текст
Ответ на	UNION versus SUB SELECT (Robert DiFalco <robert.difalco@gmail.com>)
Ответы	Re: UNION versus SUB SELECT
Список	pgsql-performance

Дерево обсуждения

Hmmmm...I'm not sure why the buffers option didn't work for me, maybe the heroku psql is out of date. No, the query gets slower with a high load of data and runs pretty often.

I just created a small test dataset for this. When I have a larger one I will post new explain plans but the timings seem pretty consistent regardless of the results returns (usually only 2-200) even when there are millions of records in "contacts", "phone_numbers", and "email_addresses".

In this case doesn't the correlated query have to do more work and access more columns than the subselect approach?

On Thu, Nov 21, 2013 at 1:22 PM, Elliot <yields.falsehood@gmail.com> wrote:

On 2013-11-21 16:12, Robert DiFalco wrote:
Sorry I couldn't get buffers to work but here is the explain analyze verbose:

dft1fjfv106r48=> explain analyze verbose select c.* from contacts c where c.id IN ( select p.contact_id from phone_numbers p where (p.national = 5038904993 and p.e164 = '+15038904993')) or c.id IN ( select e.contact_id from email_addresses e where e.email = 'robert.xxxxx@gmail.com') ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.contacts c (cost=8.12..75.73 rows=1988 width=95) (actual time=0.410..0.410 rows=0 loops=1)
Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
Rows Removed by Filter: 2849
SubPlan 1
-> Index Scan using idx_phone_address on public.phone_numbers p (cost=0.06..4.06 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=1)
Output: p.contact_id
Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
SubPlan 2
-> Index Scan using idx_email_address on public.email_addresses e (cost=0.06..4.06 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
Output: e.contact_id
Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
Total runtime: 0.489 ms
(13 rows)

dft1fjfv106r48=> explain analyze verbose select c.* from contacts c where exists( select 1 from phone_numbers p where (p.national = 5038904993 and p.e164 = '+15038904993') and p.contact_id = c.id) or EXISTS( select 1 from email_addresses e where e.email = 'robert.xxxxx@gmail.com' and e.contact_id = c.id) ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.contacts c (cost=0.00..21596.38 rows=1988 width=95) (actual time=0.479..0.479 rows=0 loops=1)
Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))
Rows Removed by Filter: 2849
SubPlan 1
-> Index Scan using idx_phone_address on public.phone_numbers p (cost=0.06..4.06 rows=1 width=0) (never executed)
Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
Filter: (p.contact_id = c.id)
SubPlan 2
-> Index Scan using idx_phone_address on public.phone_numbers p_1 (cost=0.06..4.06 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)
Output: p_1.contact_id
Index Cond: ((p_1."national" = 5038904993::bigint) AND ((p_1.e164)::text = '+15038904993'::text))
SubPlan 3
-> Index Scan using idx_email_address on public.email_addresses e (cost=0.06..4.06 rows=1 width=0) (never executed)
Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
Filter: (e.contact_id = c.id)
SubPlan 4
-> Index Scan using idx_email_address on public.email_addresses e_1 (cost=0.06..4.06 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1)
Output: e_1.contact_id
Index Cond: ((e_1.email)::text = 'robert.xxxxx@gmail.com'::text)
Total runtime: 0.559 ms
(21 rows)

dft1fjfv106r48=> explain analyze verbose select * from contacts where id IN ( (select c.id from contacts c join phone_numbers p on c.id = p.contact_id and p.national = 5038904993 and p.e164 = '+15038904993') union (select c.id from contacts c join email_addresses e on c.id = e.contact_id and e.email = 'robert.xxxxx@gmail.com')); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=16.31..24.39 rows=2 width=95) (actual time=0.060..0.060 rows=0 loops=1)
Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
-> Unique (cost=16.26..16.26 rows=2 width=8) (actual time=0.057..0.057 rows=0 loops=1)
Output: c.id
-> Sort (cost=16.26..16.26 rows=2 width=8) (actual time=0.055..0.055 rows=0 loops=1)
Output: c.id
Sort Key: c.id
Sort Method: quicksort Memory: 25kB
-> Append (cost=0.11..16.25 rows=2 width=8) (actual time=0.034..0.034 rows=0 loops=1)
-> Nested Loop (cost=0.11..8.12 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1)
Output: c.id
-> Index Scan using idx_phone_address on public.phone_numbers p (cost=0.06..4.06 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=1)
Output: p.id, p.contact_id, p."national", p.e164, p.raw_number
Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
-> Index Only Scan using idx_contacts_pkey_owner on public.contacts c (cost=0.06..4.06 rows=1 width=8) (never executed)
Output: c.id, c.owner_id, c.user_id
Index Cond: (c.id = p.contact_id)
Heap Fetches: 0
-> Nested Loop (cost=0.11..8.12 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
Output: c_1.id
-> Index Scan using idx_email_address on public.email_addresses e (cost=0.06..4.06 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1)
Output: e.id, e.contact_id, e.email
Index Cond: ((e.email)::text = 'robert.xxxxx@gmail.com'::text)
-> Index Only Scan using idx_contacts_pkey_owner on public.contacts c_1 (cost=0.06..4.06 rows=1 width=8) (never executed)
Output: c_1.id, c_1.owner_id, c_1.user_id
Index Cond: (c_1.id = e.contact_id)
Heap Fetches: 0
-> Index Scan using idx_contacts_pkey_owner on public.contacts (cost=0.06..4.06 rows=1 width=95) (never executed)
Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
Index Cond: (contacts.id = c.id)
Total runtime: 0.332 ms
(31 rows)

The buffers option is 9.0+ and is used like "explain (analyze, verbose, buffers) select 1".
To your original question, the union output there runs slightly faster than the "in" approach, although this may not be a good example - your inputs don't return any data, so this might not be realistic - and those numbers are so low that the difference might just be noise.
Are you tuning a <0.5ms-runtime query? Or is this just curiosity? FWIW I tend to write queries like this using an exists check first, then if that's still not good enough (all things like proper indexing taken in to account) I'll try an in check, then finally a union if that's still not good enough.

В списке pgsql-performance по дате отправления:

Предыдущее

От: Robert DiFalco
Дата: 22 ноября 2013 г., 00:12:43
Сообщение: Re: UNION versus SUB SELECT

Следующее

От: bricklen
Дата: 22 ноября 2013 г., 01:59:00
Сообщение: Re: UNION versus SUB SELECT

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: UNION versus SUB SELECT

Предыдущее

Следующее