Conditional query plans.
От | Michael Richards |
---|---|
Тема | Conditional query plans. |
Дата | |
Msg-id | 39EF613E.000005.22998@frodo.searchcanada.ca обсуждение исходный текст |
Ответы |
Re: Conditional query plans.
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
Hi. This whole message might be a giant brain fart but I had an interesting idea today. I was confronted by an obscene query plan. I have a table of logins that shows when webmail accounts were created. So a spammer went and set up 20 or so spam accounts. So I got a list by his IP and the time when he set them up. Now to batch cancel them I hacked up a quick query: update users set enabled='f',disablereason='We do not allow our system to be used for SPAM.' where id in (select id from users where loginid in (select distinct loginid from logins where ip='123.123.12.12')); This is a horrible way to do it and the query plan is even worse: NOTICE: QUERY PLAN: Seq Scan on users (cost=0.00..612996782699.54 rows=18180 width=172)SubPlan -> Materialize (cost=33718194.83..33718194.83rows=18180 width=4) -> Seq Scan on users (cost=0.00..33718194.83 rows=18180 width=4) SubPlan -> Materialize (cost=1854.65..1854.65 rows=48 width=12) -> Unique (cost=1853.44..1854.65 rows=48 width=12) -> Sort (cost=1853.44..1853.44 rows=482 width=12) -> Index Scan using logins_ip_idx on logins (cost=0.00..1831.97 rows=482 width=12) Given that the first and second subplan actually return only 25 rows, there are 2 possibly distillations of this plan: update users set enabled='f',disablereason='We do not allow our system to be used for SPAM.' where id in (27082,27083,27084,27085,27086,27087,27088,27089,27090,27091,27092,270 97,27098,27099,27101,27102,27103,27104,27094,27096,27095,27106,27100,2 7105,27093); Which comes up with a plan: NOTICE: QUERY PLAN: Index Scan using users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey, users_pkey on users (cost=0.00..57.04 rows=2 width=172) Basically it's going through each of the 25 as though they were separate updates. The second and probably less optimal plan would be to create a hash of these 25 answers and do a sequential scan on users updating rows where id is found in that hash. For these 2 query plans, 1 would be optimal in the event there is a small list to update, and the other would be ideal in the event there is a large list to update. Why attempt to formulate a complete query plan at the outset. Could you not break the query into smaller parts and re-optimize after every subplan completes? This way you would have an exact number of rows provided from the subplans so more accurate choices could be made farther down the line? This becomes especially relevant on large joins and other complex queries. Maybe I just gave away an idea I could have sold to Oracle for millions, and maybe everyone is already doing this. Anyway, it's just thoughts and if anyone makes it this far it might be worthwhile for a little discussion. -Michael _________________________________________________________________ http://fastmail.ca/ - Fast Free Web Email for Canadians
В списке pgsql-hackers по дате отправления: