Re: Efficient Correlated Update

Поиск
Список
Период
Сортировка
От Robert DiFalco
Тема Re: Efficient Correlated Update
Дата
Msg-id CAAXGW-zbqSEqczWM0wr=HLyyM1x6rK7i6CGZOsBK00rOMzs=5Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Efficient Correlated Update  (Kevin Grittner <kgrittn@ymail.com>)
Список pgsql-performance
Well, heh I'm no SQL expert. I kinda piece things together the best I can from what I can read and this was really the only way I could make the UPDATE work correctly. But the plan looks complicated with a lot of hash conditions, hash joins, and scans. I'm worried it wont perform with a very large dataset.

Here's the plan:

Update on public.contacts  (cost=16.64..27.22 rows=42 width=163) (actual time=1.841..1.841 rows=0 loops=1)
   ->  Hash Join  (cost=16.64..27.22 rows=42 width=163) (actual time=1.837..1.837 rows=0 loops=1)
         Output: contacts.dtype, contacts.id, contacts.blocked, contacts.fname, contacts.last_call, contacts.lname, contacts.hash, contacts.record_id, contacts.fb_id, contacts.owner_id, u.id, contacts.device, contacts.ctid, u.ctid, e.ctid
         Hash Cond: ((u.phone_short)::text = (e.significant)::text)
         ->  Seq Scan on public.wai_users u  (cost=0.00..10.36 rows=120 width=46) (actual time=0.022..0.028 rows=6 loops=1)
               Output: u.id, u.ctid, u.phone_short
         ->  Hash  (cost=16.24..16.24 rows=116 width=157) (actual time=1.744..1.744 rows=87 loops=1)
               Output: contacts.dtype, contacts.id, contacts.blocked, contacts.fname, contacts.last_call, contacts.lname, contacts.hash, contacts.record_id, contacts.fb_id, contacts.owner_id, contacts.device, contacts.ctid, e.ctid, e.significant
               Buckets: 1024  Batches: 1  Memory Usage: 12kB
               ->  Hash Join  (cost=10.47..16.24 rows=116 width=157) (actual time=0.636..1.583 rows=87 loops=1)
                     Output: contacts.dtype, contacts.id, contacts.blocked, contacts.fname, contacts.last_call, contacts.lname, contacts.hash, contacts.record_id, contacts.fb_id, contacts.owner_id, contacts.device, contacts.ctid, e.ctid, e.significant
                     Hash Cond: (e.owner_id = contacts.id)
                     ->  Seq Scan on public.phone_numbers e  (cost=0.00..5.13 rows=378 width=22) (actual time=0.008..0.467 rows=378 loops=1)
                           Output: e.ctid, e.significant, e.owner_id
                     ->  Hash  (cost=9.89..9.89 rows=166 width=143) (actual time=0.578..0.578 rows=124 loops=1)
                           Output: contacts.dtype, contacts.id, contacts.blocked, contacts.fname, contacts.last_call, contacts.lname, contacts.hash, contacts.record_id, contacts.fb_id, contacts.owner_id, contacts.device, contacts.ctid
                           Buckets: 1024  Batches: 1  Memory Usage: 16kB
                           ->  Seq Scan on public.contacts  (cost=0.00..9.89 rows=166 width=143) (actual time=0.042..0.365 rows=124 loops=1)
                                 Output: contacts.dtype, contacts.id, contacts.blocked, contacts.fname, contacts.last_call, contacts.lname, contacts.hash, contacts.record_id, contacts.fb_id, contacts.owner_id, contacts.device, contacts.ctid
                                 Filter: ((contacts.user_id IS NULL) AND (contacts.owner_id = 7))
                                 Rows Removed by Filter: 290
 Total runtime: 2.094 ms
(22 rows)

If I wasn't having to update I could write a query like this which seems like it has a much better plan:

dfmvu2a0bvs93n=> explain analyze verbose SELECT c.id                                                                                                                                       FROM wai_users u                                                                                                                                                                          JOIN phone_numbers e ON u.phone_short = e.significant                                                                                                                                     JOIN contacts c ON c.id = e.owner_id                                                                                                                                                      WHERE c.owner_id = 5 AND c.user_id IS NULL                                                                                                                                               ;                                                                                                                                                                                                                                                            QUERY PLAN                                                                    -------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..7.18 rows=1 width=8) (actual time=0.091..0.091 rows=0 loops=1)
   Output: c.id
   ->  Nested Loop  (cost=0.00..7.06 rows=1 width=16) (actual time=0.089..0.089 rows=0 loops=1)
         Output: e.significant, c.id
         ->  Index Scan using idx_contact_owner on public.contacts c  (cost=0.00..3.00 rows=1 width=8) (actual time=0.086..0.086 rows=0 loops=1)
               Output: c.dtype, c.id, c.blocked, c.fname, c.last_call, c.lname, c.hash, c.record_id, c.fb_id, c.owner_id, c.user_id, c.device
               Index Cond: (c.owner_id = 5)
               Filter: (c.user_id IS NULL)
         ->  Index Scan using idx_phone_owner on public.phone_numbers e  (cost=0.00..4.06 rows=1 width=16) (never executed)
               Output: e.id, e.raw_number, e.significant, e.owner_id
               Index Cond: (e.owner_id = c.id)
   ->  Index Only Scan using idx_user_short_phone on public.wai_users u  (cost=0.00..0.12 rows=1 width=32) (never executed)
         Output: u.phone_short
         Index Cond: (u.phone_short = (e.significant)::text)
         Heap Fetches: 0
 Total runtime: 0.158 ms
(16 rows)




On Fri, Aug 9, 2013 at 8:44 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Robert DiFalco <robert.difalco@gmail.com> wrote:

> In my system a user can have external contacts. When I am
> bringing in external contacts I want to correlate any other
> existing users in the system with those external contacts. A
> users external contacts may or may not be users in my system. I
> have a user_id field in "contacts" that is NULL if that contact
> is not a user in my system
>
> Currently I do something like this after reading in external
> contacts:
>
>   UPDATE contacts SET user_id = u.id
>   FROM my_users u
>   JOIN phone_numbers pn ON u.phone_significant = pn.significant
>   WHERE contacts.owner_id = 7
>   AND contacts.user_id IS NULL
>   AND contacts.id = pn.ref_contact_id;
>
> If any of the fields are not self explanatory let me know.
> "Significant" is just the right 7 most digits of a raw phone
> number.
>
> I'm more interested in possible improvements to my relational
> logic than the details of the "significant" condition. IOW, I'm
> start enough to optimize the "significant" query but not smart
> enough to know if this is the best approach for the overall
> correlated UPDATE query. :)
>
> So yeah, is this the best way to update a contact's user_id
> reference based on a contacts phone number matching the phone
> number of a user?
>
> One detail from the schema -- A contact can have many phone
> numbers but a user in my system will only ever have just one
> phone number. Hence the JOIN to "phone_numbers" versus the column
> in "my_users".
 
In looking it over, nothing jumped out at me as a problem.  Are you
having some problem with it, like poor performance or getting
results different from what you expected?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-performance по дате отправления:

Предыдущее
От: Klaus Ita
Дата:
Сообщение: Re: Efficient Correlated Update
Следующее
От: Igor Neyman
Дата:
Сообщение: Re: Efficient Correlated Update